A performance comparison of the Cray-2 and the Cray X-MP
NASA Technical Reports Server (NTRS)
Schmickley, Ronald; Bailey, David H.
1986-01-01
A suite of thirteen large Fortran benchmark codes were run on Cray-2 and Cray X-MP supercomputers. These codes were a mix of compute-intensive scientific application programs (mostly Computational Fluid Dynamics) and some special vectorized computation exercise programs. For the general class of programs tested on the Cray-2, most of which were not specially tuned for speed, the floating point operation rates varied under a variety of system load configurations from 40 percent up to 125 percent of X-MP performance rates. It is concluded that the Cray-2, in the original system configuration studied (without memory pseudo-banking) will run untuned Fortran code, on average, about 70 percent of X-MP speeds.
Optimization of large matrix calculations for execution on the Cray X-MP vector supercomputer
NASA Technical Reports Server (NTRS)
Hornfeck, William A.
1988-01-01
A considerable volume of large computational computer codes were developed for NASA over the past twenty-five years. This code represents algorithms developed for machines of earlier generation. With the emergence of the vector supercomputer as a viable, commercially available machine, an opportunity exists to evaluate optimization strategies to improve the efficiency of existing software. This result is primarily due to architectural differences in the latest generation of large-scale machines and the earlier, mostly uniprocessor, machines. A sofware package being used by NASA to perform computations on large matrices is described, and a strategy for conversion to the Cray X-MP vector supercomputer is also described.
FFTs in external or hierarchical memory
NASA Technical Reports Server (NTRS)
Bailey, David H.
1989-01-01
A description is given of advanced techniques for computing an ordered FFT on a computer with external or hierarchical memory. These algorithms (1) require as few as two passes through the external data set, (2) use strictly unit stride, long vector transfers between main memory and external storage, (3) require only a modest amount of scratch space in main memory, and (4) are well suited for vector and parallel computation. Performance figures are included for implementations of some of these algorithms on Cray supercomputers. Of interest is the fact that a main memory version outperforms the current Cray library FFT routines on the Cray-2, the Cray X-MP, and the Cray Y-MP systems. Using all eight processors on the Cray Y-MP, this main memory routine runs at nearly 2 Gflops.
NAS technical summaries: Numerical aerodynamic simulation program, March 1991 - February 1992
NASA Technical Reports Server (NTRS)
1992-01-01
NASA created the Numerical Aerodynamic Simulation (NAS) Program in 1987 to focus resources on solving critical problems in aeroscience and related disciplines by utilizing the power of the most advanced supercomputers available. The NAS Program provides scientists with the necessary computing power to solve today's most demanding computational fluid dynamics problems and serves as a pathfinder in integrating leading-edge supercomputing technologies, thus benefiting other supercomputer centers in Government and industry. This report contains selected scientific results from the 1991-92 NAS Operational Year, March 4, 1991 to March 3, 1992, which is the fifth year of operation. During this year, the scientific community was given access to a Cray-2 and a Cray Y-MP. The Cray-2, the first generation supercomputer, has four processors, 256 megawords of central memory, and a total sustained speed of 250 million floating point operations per second. The Cray Y-MP, the second generation supercomputer, has eight processors and a total sustained speed of one billion floating point operations per second. Additional memory was installed this year, doubling capacity from 128 to 256 megawords of solid-state storage-device memory. Because of its higher performance, the Cray Y-MP delivered approximately 77 percent of the total number of supercomputer hours used during this year.
NASA Technical Reports Server (NTRS)
Tennille, Geoffrey M.; Howser, Lona M.
1993-01-01
This document briefly describes the use of the CRAY supercomputers that are an integral part of the Supercomputing Network Subsystem of the Central Scientific Computing Complex at LaRC. Features of the CRAY supercomputers are covered, including: FORTRAN, C, PASCAL, architectures of the CRAY-2 and CRAY Y-MP, the CRAY UNICOS environment, batch job submittal, debugging, performance analysis, parallel processing, utilities unique to CRAY, and documentation. The document is intended for all CRAY users as a ready reference to frequently asked questions and to more detailed information contained in the vendor manuals. It is appropriate for both the novice and the experienced user.
Using Strassen's algorithm to accelerate the solution of linear systems
NASA Technical Reports Server (NTRS)
Bailey, David H.; Lee, King; Simon, Horst D.
1990-01-01
Strassen's algorithm for fast matrix-matrix multiplication has been implemented for matrices of arbitrary shapes on the CRAY-2 and CRAY Y-MP supercomputers. Several techniques have been used to reduce the scratch space requirement for this algorithm while simultaneously preserving a high level of performance. When the resulting Strassen-based matrix multiply routine is combined with some routines from the new LAPACK library, LU decomposition can be performed with rates significantly higher than those achieved by conventional means. We succeeded in factoring a 2048 x 2048 matrix on the CRAY Y-MP at a rate equivalent to 325 MFLOPS.
Optimization strategies for molecular dynamics programs on Cray computers and scalar work stations
NASA Astrophysics Data System (ADS)
Unekis, Michael J.; Rice, Betsy M.
1994-12-01
We present results of timing runs and different optimization strategies for a prototype molecular dynamics program that simulates shock waves in a two-dimensional (2-D) model of a reactive energetic solid. The performance of the program may be improved substantially by simple changes to the Fortran or by employing various vendor-supplied compiler optimizations. The optimum strategy varies among the machines used and will vary depending upon the details of the program. The effect of various compiler options and vendor-supplied subroutine calls is demonstrated. Comparison is made between two scalar workstations (IBM RS/6000 Model 370 and Model 530) and several Cray supercomputers (X-MP/48, Y-MP8/128, and C-90/16256). We find that for a scientific application program dominated by sequential, scalar statements, a relatively inexpensive high-end work station such as the IBM RS/60006 RISC series will outperform single processor performance of the Cray X-MP/48 and perform competitively with single processor performance of the Y-MP8/128 and C-9O/16256.
Parallel computation in a three-dimensional elastic-plastic finite-element analysis
NASA Technical Reports Server (NTRS)
Shivakumar, K. N.; Bigelow, C. A.; Newman, J. C., Jr.
1992-01-01
A CRAY parallel processing technique called autotasking was implemented in a three-dimensional elasto-plastic finite-element code. The technique was evaluated on two CRAY supercomputers, a CRAY 2 and a CRAY Y-MP. Autotasking was implemented in all major portions of the code, except the matrix equations solver. Compiler directives alone were not able to properly multitask the code; user-inserted directives were required to achieve better performance. It was noted that the connect time, rather than wall-clock time, was more appropriate to determine speedup in multiuser environments. For a typical example problem, a speedup of 2.1 (1.8 when the solution time was included) was achieved in a dedicated environment and 1.7 (1.6 with solution time) in a multiuser environment on a four-processor CRAY 2 supercomputer. The speedup on a three-processor CRAY Y-MP was about 2.4 (2.0 with solution time) in a multiuser environment.
NASA Astrophysics Data System (ADS)
Tripathi, Vijay S.; Yeh, G. T.
1993-06-01
Sophisticated and highly computation-intensive models of transport of reactive contaminants in groundwater have been developed in recent years. Application of such models to real-world contaminant transport problems, e.g., simulation of groundwater transport of 10-15 chemically reactive elements (e.g., toxic metals) and relevant complexes and minerals in two and three dimensions over a distance of several hundred meters, requires high-performance computers including supercomputers. Although not widely recognized as such, the computational complexity and demand of these models compare with well-known computation-intensive applications including weather forecasting and quantum chemical calculations. A survey of the performance of a variety of available hardware, as measured by the run times for a reactive transport model HYDROGEOCHEM, showed that while supercomputers provide the fastest execution times for such problems, relatively low-cost reduced instruction set computer (RISC) based scalar computers provide the best performance-to-price ratio. Because supercomputers like the Cray X-MP are inherently multiuser resources, often the RISC computers also provide much better turnaround times. Furthermore, RISC-based workstations provide the best platforms for "visualization" of groundwater flow and contaminant plumes. The most notable result, however, is that current workstations costing less than $10,000 provide performance within a factor of 5 of a Cray X-MP.
A parallel finite-difference method for computational aerodynamics
NASA Technical Reports Server (NTRS)
Swisshelm, Julie M.
1989-01-01
A finite-difference scheme for solving complex three-dimensional aerodynamic flow on parallel-processing supercomputers is presented. The method consists of a basic flow solver with multigrid convergence acceleration, embedded grid refinements, and a zonal equation scheme. Multitasking and vectorization have been incorporated into the algorithm. Results obtained include multiprocessed flow simulations from the Cray X-MP and Cray-2. Speedups as high as 3.3 for the two-dimensional case and 3.5 for segments of the three-dimensional case have been achieved on the Cray-2. The entire solver attained a factor of 2.7 improvement over its unitasked version on the Cray-2. The performance of the parallel algorithm on each machine is analyzed.
Multitasking domain decomposition fast Poisson solvers on the Cray Y-MP
NASA Technical Reports Server (NTRS)
Chan, Tony F.; Fatoohi, Rod A.
1990-01-01
The results of multitasking implementation of a domain decomposition fast Poisson solver on eight processors of the Cray Y-MP are presented. The object of this research is to study the performance of domain decomposition methods on a Cray supercomputer and to analyze the performance of different multitasking techniques using highly parallel algorithms. Two implementations of multitasking are considered: macrotasking (parallelism at the subroutine level) and microtasking (parallelism at the do-loop level). A conventional FFT-based fast Poisson solver is also multitasked. The results of different implementations are compared and analyzed. A speedup of over 7.4 on the Cray Y-MP running in a dedicated environment is achieved for all cases.
Distributed Finite Element Analysis Using a Transputer Network
NASA Technical Reports Server (NTRS)
Watson, James; Favenesi, James; Danial, Albert; Tombrello, Joseph; Yang, Dabby; Reynolds, Brian; Turrentine, Ronald; Shephard, Mark; Baehmann, Peggy
1989-01-01
The principal objective of this research effort was to demonstrate the extraordinarily cost effective acceleration of finite element structural analysis problems using a transputer-based parallel processing network. This objective was accomplished in the form of a commercially viable parallel processing workstation. The workstation is a desktop size, low-maintenance computing unit capable of supercomputer performance yet costs two orders of magnitude less. To achieve the principal research objective, a transputer based structural analysis workstation termed XPFEM was implemented with linear static structural analysis capabilities resembling commercially available NASTRAN. Finite element model files, generated using the on-line preprocessing module or external preprocessing packages, are downloaded to a network of 32 transputers for accelerated solution. The system currently executes at about one third Cray X-MP24 speed but additional acceleration appears likely. For the NASA selected demonstration problem of a Space Shuttle main engine turbine blade model with about 1500 nodes and 4500 independent degrees of freedom, the Cray X-MP24 required 23.9 seconds to obtain a solution while the transputer network, operated from an IBM PC-AT compatible host computer, required 71.7 seconds. Consequently, the $80,000 transputer network demonstrated a cost-performance ratio about 60 times better than the $15,000,000 Cray X-MP24 system.
Multitasking and microtasking experience on the NA S Cray-2 and ACF Cray X-MP
NASA Technical Reports Server (NTRS)
Raiszadeh, Farhad
1987-01-01
The fast Fourier transform (FFT) kernel of the NAS benchmark program has been utilized to experiment with the multitasking library on the Cray-2 and Cray X-MP/48, and microtasking directives on the Cray X-MP. Some performance figures are shown, and the state of multitasking software is described.
Solving large sparse eigenvalue problems on supercomputers
NASA Technical Reports Server (NTRS)
Philippe, Bernard; Saad, Youcef
1988-01-01
An important problem in scientific computing consists in finding a few eigenvalues and corresponding eigenvectors of a very large and sparse matrix. The most popular methods to solve these problems are based on projection techniques on appropriate subspaces. The main attraction of these methods is that they only require the use of the matrix in the form of matrix by vector multiplications. The implementations on supercomputers of two such methods for symmetric matrices, namely Lanczos' method and Davidson's method are compared. Since one of the most important operations in these two methods is the multiplication of vectors by the sparse matrix, methods of performing this operation efficiently are discussed. The advantages and the disadvantages of each method are compared and implementation aspects are discussed. Numerical experiments on a one processor CRAY 2 and CRAY X-MP are reported. Possible parallel implementations are also discussed.
Supercomputers for engineering analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goudreau, G.L.; Benson, D.J.; Hallquist, J.O.
1986-07-01
The Cray-1 and Cray X-MP/48 experience in engineering computations at the Lawrence Livermore National Laboratory is surveyed. The fully vectorized explicit DYNA and implicit NIKE finite element codes are discussed with respect to solid and structural mechanics. The main efficiencies for production analyses are currently obtained by simple CFT compiler exploitation of pipeline architecture for inner do-loop optimization. Current developmet of outer-loop multitasking is also discussed. Applications emphasis will be on 3D examples spanning earth penetrator loads analysis, target lethality assessment, and crashworthiness. The use of a vectorized large deformation shell element in both DYNA and NIKE has substantially expandedmore » 3D nonlinear capability. 25 refs., 7 figs.« less
Comparison of the MPP with other supercomputers for LANDSAT data processing
NASA Technical Reports Server (NTRS)
Ozga, Martin
1987-01-01
The massively parallel processor is compared to the CRAY X-MP and the CYBER-205 for LANDSAT data processing. The maximum likelihood classification algorithm is the basis for comparison since this algorithm is simple to implement and vectorizes very well. The algorithm was implemented on all three machines and tested by classifying the same full scene of LANDSAT multispectral scan data. Timings are compared as well as features of the machines and available software.
Transferring ecosystem simulation codes to supercomputers
NASA Technical Reports Server (NTRS)
Skiles, J. W.; Schulbach, C. H.
1995-01-01
Many ecosystem simulation computer codes have been developed in the last twenty-five years. This development took place initially on main-frame computers, then mini-computers, and more recently, on micro-computers and workstations. Supercomputing platforms (both parallel and distributed systems) have been largely unused, however, because of the perceived difficulty in accessing and using the machines. Also, significant differences in the system architectures of sequential, scalar computers and parallel and/or vector supercomputers must be considered. We have transferred a grassland simulation model (developed on a VAX) to a Cray Y-MP/C90. We describe porting the model to the Cray and the changes we made to exploit the parallelism in the application and improve code execution. The Cray executed the model 30 times faster than the VAX and 10 times faster than a Unix workstation. We achieved an additional speedup of 30 percent by using the compiler's vectoring and 'in-line' capabilities. The code runs at only about 5 percent of the Cray's peak speed because it ineffectively uses the vector and parallel processing capabilities of the Cray. We expect that by restructuring the code, it could execute an additional six to ten times faster.
Jungheim, L N; Boyd, D B; Indelicato, J M; Pasini, C E; Preston, D A; Alborn, W E
1991-05-01
Bicyclic tetrahydropyridazinones, such as 13, where X are strongly electron-withdrawing groups, were synthesized to investigate their antibacterial activity. These delta-lactams are homologues of bicyclic pyrazolidinones 15, which were the first non-beta-lactam containing compounds reported to bind to penicillin-binding proteins (PBPs). The delta-lactam compounds exhibit poor antibacterial activity despite having reactivity comparable to the gamma-lactams. Molecular modeling based on semiempirical molecular orbital calculations on a Cray X-MP supercomputer, predicted that the reason for the inactivity is steric bulk hindering high affinity of the compounds to PBPs, as well as high conformational flexibility of the tetrahydropyridazinone ring hampering effective alignment of the molecule in the active site. Subsequent PBP binding experiments confirmed that this class of compound does not bind to PBPs.
A parallel algorithm for generation and assembly of finite element stiffness and mass matrices
NASA Technical Reports Server (NTRS)
Storaasli, O. O.; Carmona, E. A.; Nguyen, D. T.; Baddourah, M. A.
1991-01-01
A new algorithm is proposed for parallel generation and assembly of the finite element stiffness and mass matrices. The proposed assembly algorithm is based on a node-by-node approach rather than the more conventional element-by-element approach. The new algorithm's generality and computation speed-up when using multiple processors are demonstrated for several practical applications on multi-processor Cray Y-MP and Cray 2 supercomputers.
SNS programming environment user's guide
NASA Technical Reports Server (NTRS)
Tennille, Geoffrey M.; Howser, Lona M.; Humes, D. Creig; Cronin, Catherine K.; Bowen, John T.; Drozdowski, Joseph M.; Utley, Judith A.; Flynn, Theresa M.; Austin, Brenda A.
1992-01-01
The computing environment is briefly described for the Supercomputing Network Subsystem (SNS) of the Central Scientific Computing Complex of NASA Langley. The major SNS computers are a CRAY-2, a CRAY Y-MP, a CONVEX C-210, and a CONVEX C-220. The software is described that is common to all of these computers, including: the UNIX operating system, computer graphics, networking utilities, mass storage, and mathematical libraries. Also described is file management, validation, SNS configuration, documentation, and customer services.
Some Problems and Solutions in Transferring Ecosystem Simulation Codes to Supercomputers
NASA Technical Reports Server (NTRS)
Skiles, J. W.; Schulbach, C. H.
1994-01-01
Many computer codes for the simulation of ecological systems have been developed in the last twenty-five years. This development took place initially on main-frame computers, then mini-computers, and more recently, on micro-computers and workstations. Recent recognition of ecosystem science as a High Performance Computing and Communications Program Grand Challenge area emphasizes supercomputers (both parallel and distributed systems) as the next set of tools for ecological simulation. Transferring ecosystem simulation codes to such systems is not a matter of simply compiling and executing existing code on the supercomputer since there are significant differences in the system architectures of sequential, scalar computers and parallel and/or vector supercomputers. To more appropriately match the application to the architecture (necessary to achieve reasonable performance), the parallelism (if it exists) of the original application must be exploited. We discuss our work in transferring a general grassland simulation model (developed on a VAX in the FORTRAN computer programming language) to a Cray Y-MP. We show the Cray shared-memory vector-architecture, and discuss our rationale for selecting the Cray. We describe porting the model to the Cray and executing and verifying a baseline version, and we discuss the changes we made to exploit the parallelism in the application and to improve code execution. As a result, the Cray executed the model 30 times faster than the VAX 11/785 and 10 times faster than a Sun 4 workstation. We achieved an additional speed-up of approximately 30 percent over the original Cray run by using the compiler's vectorizing capabilities and the machine's ability to put subroutines and functions "in-line" in the code. With the modifications, the code still runs at only about 5% of the Cray's peak speed because it makes ineffective use of the vector processing capabilities of the Cray. We conclude with a discussion and future plans.
Vectorized program architectures for supercomputer-aided circuit design
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rizzoli, V.; Ferlito, M.; Neri, A.
1986-01-01
Vector processors (supercomputers) can be effectively employed in MIC or MMIC applications to solve problems of large numerical size such as broad-band nonlinear design or statistical design (yield optimization). In order to fully exploit the capabilities of a vector hardware, any program architecture must be structured accordingly. This paper presents a possible approach to the ''semantic'' vectorization of microwave circuit design software. Speed-up factors of the order of 50 can be obtained on a typical vector processor (Cray X-MP), with respect to the most powerful scaler computers (CDC 7600), with cost reductions of more than one order of magnitude. Thismore » could broaden the horizon of microwave CAD techniques to include problems that are practically out of the reach of conventional systems.« less
Input/output behavior of supercomputing applications
NASA Technical Reports Server (NTRS)
Miller, Ethan L.
1991-01-01
The collection and analysis of supercomputer I/O traces and their use in a collection of buffering and caching simulations are described. This serves two purposes. First, it gives a model of how individual applications running on supercomputers request file system I/O, allowing system designer to optimize I/O hardware and file system algorithms to that model. Second, the buffering simulations show what resources are needed to maximize the CPU utilization of a supercomputer given a very bursty I/O request rate. By using read-ahead and write-behind in a large solid stated disk, one or two applications were sufficient to fully utilize a Cray Y-MP CPU.
NAS (Numerical Aerodynamic Simulation Program) technical summaries, March 1989 - February 1990
NASA Technical Reports Server (NTRS)
1990-01-01
Given here are selected scientific results from the Numerical Aerodynamic Simulation (NAS) Program's third year of operation. During this year, the scientific community was given access to a Cray-2 and a Cray Y-MP supercomputer. Topics covered include flow field analysis of fighter wing configurations, large-scale ocean modeling, the Space Shuttle flow field, advanced computational fluid dynamics (CFD) codes for rotary-wing airloads and performance prediction, turbulence modeling of separated flows, airloads and acoustics of rotorcraft, vortex-induced nonlinearities on submarines, and standing oblique detonation waves.
Using a multifrontal sparse solver in a high performance, finite element code
NASA Technical Reports Server (NTRS)
King, Scott D.; Lucas, Robert; Raefsky, Arthur
1990-01-01
We consider the performance of the finite element method on a vector supercomputer. The computationally intensive parts of the finite element method are typically the individual element forms and the solution of the global stiffness matrix both of which are vectorized in high performance codes. To further increase throughput, new algorithms are needed. We compare a multifrontal sparse solver to a traditional skyline solver in a finite element code on a vector supercomputer. The multifrontal solver uses the Multiple-Minimum Degree reordering heuristic to reduce the number of operations required to factor a sparse matrix and full matrix computational kernels (e.g., BLAS3) to enhance vector performance. The net result in an order-of-magnitude reduction in run time for a finite element application on one processor of a Cray X-MP.
Attaching IBM-compatible 3380 disks to Cray X-MP
DOE Office of Scientific and Technical Information (OSTI.GOV)
Engert, D.E.; Midlock, J.L.
1989-01-01
A method of attaching IBM-compatible 3380 disks directly to a Cray X-MP via the XIOP with a BMC is described. The IBM 3380 disks appear to the UNICOS operating system as DD-29 disks with UNICOS file systems. IBM 3380 disks provide cheap, reliable large capacity disk storage. Combined with a small number of high-speed Cray disks, the IBM disks provide for the bulk of the storage for small files and infrequently used files. Cray Research designed the BMC and its supporting software in the XIOP to allow IBM tapes and other devices to be attached to the X-MP. No hardwaremore » changes were necessary, and we added less than 2000 lines of code to the XIOP to accomplish this project. This system has been in operation for over eight months. Future enhancements such as the use of a cache controller and attachment to a Y-MP are also described. 1 tab.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dongarra, J.J.; Hewitt, T.
1985-08-01
This note describes some experiments on simple, dense linear algebra algorithms. These experiments show that the CRAY X-MP is capable of small-grain multitasking arising from standard implementations of LU and Cholesky decomposition. The implementation described here provides the ''fastest'' execution rate for LU decomposition, 718 MFLOPS for a matrix of order 1000.
Supercomputer analysis of purine and pyrimidine metabolism leading to DNA synthesis.
Heinmets, F
1989-06-01
A model-system is established to analyze purine and pyrimidine metabolism leading to DNA synthesis. The principal aim is to explore the flow and regulation of terminal deoxynucleoside triophosphates (dNTPs) in various input and parametric conditions. A series of flow equations are established, which are subsequently converted to differential equations. These are programmed (Fortran) and analyzed on a Cray chi-MP/48 supercomputer. The pool concentrations are presented as a function of time in conditions in which various pertinent parameters of the system are modified. The system is formulated by 100 differential equations.
NASA Technical Reports Server (NTRS)
Swisshelm, Julie M.
1989-01-01
An explicit flow solver, applicable to the hierarchy of model equations ranging from Euler to full Navier-Stokes, is combined with several techniques designed to reduce computational expense. The computational domain consists of local grid refinements embedded in a global coarse mesh, where the locations of these refinements are defined by the physics of the flow. Flow characteristics are also used to determine which set of model equations is appropriate for solution in each region, thereby reducing not only the number of grid points at which the solution must be obtained, but also the computational effort required to get that solution. Acceleration to steady-state is achieved by applying multigrid on each of the subgrids, regardless of the particular model equations being solved. Since each of these components is explicit, advantage can readily be taken of the vector- and parallel-processing capabilities of machines such as the Cray X-MP and Cray-2.
NASA Astrophysics Data System (ADS)
Clay, M. P.; Buaria, D.; Yeung, P. K.; Gotoh, T.
2018-07-01
This paper reports on the successful implementation of a massively parallel GPU-accelerated algorithm for the direct numerical simulation of turbulent mixing at high Schmidt number. The work stems from a recent development (Comput. Phys. Commun., vol. 219, 2017, 313-328), in which a low-communication algorithm was shown to attain high degrees of scalability on the Cray XE6 architecture when overlapping communication and computation via dedicated communication threads. An even higher level of performance has now been achieved using OpenMP 4.5 on the Cray XK7 architecture, where on each node the 16 integer cores of an AMD Interlagos processor share a single Nvidia K20X GPU accelerator. In the new algorithm, data movements are minimized by performing virtually all of the intensive scalar field computations in the form of combined compact finite difference (CCD) operations on the GPUs. A memory layout in departure from usual practices is found to provide much better performance for a specific kernel required to apply the CCD scheme. Asynchronous execution enabled by adding the OpenMP 4.5 NOWAIT clause to TARGET constructs improves scalability when used to overlap computation on the GPUs with computation and communication on the CPUs. On the 27-petaflops supercomputer Titan at Oak Ridge National Laboratory, USA, a GPU-to-CPU speedup factor of approximately 5 is consistently observed at the largest problem size of 81923 grid points for the scalar field computed with 8192 XK7 nodes.
Efficient multitasking of Choleski matrix factorization on CRAY supercomputers
NASA Technical Reports Server (NTRS)
Overman, Andrea L.; Poole, Eugene L.
1991-01-01
A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm.
Particle simulation on heterogeneous distributed supercomputers
NASA Technical Reports Server (NTRS)
Becker, Jeffrey C.; Dagum, Leonardo
1993-01-01
We describe the implementation and performance of a three dimensional particle simulation distributed between a Thinking Machines CM-2 and a Cray Y-MP. These are connected by a combination of two high-speed networks: a high-performance parallel interface (HIPPI) and an optical network (UltraNet). This is the first application to use this configuration at NASA Ames Research Center. We describe our experience implementing and using the application and report the results of several timing measurements. We show that the distribution of applications across disparate supercomputing platforms is feasible and has reasonable performance. In addition, several practical aspects of the computing environment are discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Doerfler, Douglas; Austin, Brian; Cook, Brandon
There are many potential issues associated with deploying the Intel Xeon Phi™ (code named Knights Landing [KNL]) manycore processor in a large-scale supercomputer. One in particular is the ability to fully utilize the high-speed communications network, given that the serial performance of a Xeon Phi TM core is a fraction of a Xeon®core. In this paper, we take a look at the trade-offs associated with allocating enough cores to fully utilize the Aries high-speed network versus cores dedicated to computation, e.g., the trade-off between MPI and OpenMP. In addition, we evaluate new features of Cray MPI in support of KNL,more » such as internode optimizations. We also evaluate one-sided programming models such as Unified Parallel C. We quantify the impact of the above trade-offs and features using a suite of National Energy Research Scientific Computing Center applications.« less
A vectorized Lanczos eigensolver for high-performance computers
NASA Technical Reports Server (NTRS)
Bostic, Susan W.
1990-01-01
The computational strategies used to implement a Lanczos-based-method eigensolver on the latest generation of supercomputers are described. Several examples of structural vibration and buckling problems are presented that show the effects of using optimization techniques to increase the vectorization of the computational steps. The data storage and access schemes and the tools and strategies that best exploit the computer resources are presented. The method is implemented on the Convex C220, the Cray 2, and the Cray Y-MP computers. Results show that very good computation rates are achieved for the most computationally intensive steps of the Lanczos algorithm and that the Lanczos algorithm is many times faster than other methods extensively used in the past.
Early MIMD experience on the CRAY X-MP
NASA Astrophysics Data System (ADS)
Rhoades, Clifford E.; Stevens, K. G.
1985-07-01
This paper describes some early experience with converting four physics simulation programs to the CRAY X-MP, a current Multiple Instruction, Multiple Data (MIMD) computer consisting of two processors each with an architecture similar to that of the CRAY-1. As a multi-processor, the CRAY X-MP together with the high speed Solid-state Storage Device (SSD) in an ideal machine upon which to study MIMD algorithms for solving the equations of mathematical physics because it is fast enough to run real problems. The computer programs used in this study are all FORTRAN versions of original production codes. They range in sophistication from a one-dimensional numerical simulation of collisionless plasma to a two-dimensional hydrodynamics code with heat flow to a couple of three-dimensional fluid dynamics codes with varying degrees of viscous modeling. Early research with a dual processor configuration has shown speed-ups ranging from 1.55 to 1.98. It has been observed that a few simple extensions to FORTRAN allow a typical programmer to achieve a remarkable level of efficiency. These extensions involve the concept of memory local to a concurrent subprogram and memory common to all concurrent subprograms.
NASA Astrophysics Data System (ADS)
Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul
2017-03-01
We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the "resolution of the identity second-order Møller-Plesset perturbation theory" (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.
High performance computing applications in neurobiological research
NASA Technical Reports Server (NTRS)
Ross, Muriel D.; Cheng, Rei; Doshay, David G.; Linton, Samuel W.; Montgomery, Kevin; Parnas, Bruce R.
1994-01-01
The human nervous system is a massively parallel processor of information. The vast numbers of neurons, synapses and circuits is daunting to those seeking to understand the neural basis of consciousness and intellect. Pervading obstacles are lack of knowledge of the detailed, three-dimensional (3-D) organization of even a simple neural system and the paucity of large scale, biologically relevant computer simulations. We use high performance graphics workstations and supercomputers to study the 3-D organization of gravity sensors as a prototype architecture foreshadowing more complex systems. Scaled-down simulations run on a Silicon Graphics workstation and scale-up, three-dimensional versions run on the Cray Y-MP and CM5 supercomputers.
NASA Technical Reports Server (NTRS)
Gentzsch, W.
1982-01-01
Problems which can arise with vector and parallel computers are discussed in a user oriented context. Emphasis is placed on the algorithms used and the programming techniques adopted. Three recently developed supercomputers are examined and typical application examples are given in CRAY FORTRAN, CYBER 205 FORTRAN and DAP (distributed array processor) FORTRAN. The systems performance is compared. The addition of parts of two N x N arrays is considered. The influence of the architecture on the algorithms and programming language is demonstrated. Numerical analysis of magnetohydrodynamic differential equations by an explicit difference method is illustrated, showing very good results for all three systems. The prognosis for supercomputer development is assessed.
Understanding the Cray X1 System
NASA Technical Reports Server (NTRS)
Cheung, Samson
2004-01-01
This paper helps the reader understand the characteristics of the Cray X1 vector supercomputer system, and provides hints and information to enable the reader to port codes to the system. It provides a comparison between the basic performance of the X1 platform and other platforms that are available at NASA Ames Research Center. A set of codes, solving the Laplacian equation with different parallel paradigms, is used to understand some features of the X1 compiler. An example code from the NAS Parallel Benchmarks is used to demonstrate performance optimization on the X1 platform.
Edison - A New Cray Supercomputer Advances Discovery at NERSC
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dosanjh, Sudip; Parkinson, Dula; Yelick, Kathy
2014-02-06
When a supercomputing center installs a new system, users are invited to make heavy use of the computer as part of the rigorous testing. In this video, find out what top scientists have discovered using Edison, a Cray XC30 supercomputer, and how NERSC's newest supercomputer will accelerate their future research.
Edison - A New Cray Supercomputer Advances Discovery at NERSC
Dosanjh, Sudip; Parkinson, Dula; Yelick, Kathy; Trebotich, David; Broughton, Jeff; Antypas, Katie; Lukic, Zarija, Borrill, Julian; Draney, Brent; Chen, Jackie
2018-01-16
When a supercomputing center installs a new system, users are invited to make heavy use of the computer as part of the rigorous testing. In this video, find out what top scientists have discovered using Edison, a Cray XC30 supercomputer, and how NERSC's newest supercomputer will accelerate their future research.
Parallel-vector out-of-core equation solver for computational mechanics
NASA Technical Reports Server (NTRS)
Qin, J.; Agarwal, T. K.; Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.
1993-01-01
A parallel/vector out-of-core equation solver is developed for shared-memory computers, such as the Cray Y-MP machine. The input/ output (I/O) time is reduced by using the a synchronous BUFFER IN and BUFFER OUT, which can be executed simultaneously with the CPU instructions. The parallel and vector capability provided by the supercomputers is also exploited to enhance the performance. Numerical applications in large-scale structural analysis are given to demonstrate the efficiency of the present out-of-core solver.
Performance of the fusion code GYRO on four generations of Cray computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fahey, Mark R
2014-01-01
GYRO is a code used for the direct numerical simulation of plasma microturbulence. It has been ported to a variety of modern MPP platforms including several modern commodity clusters, IBM SPs, and Cray XC, XT, and XE series machines. We briefly describe the mathematical structure of the equations, the data layout, and the redistribution scheme. Also, while the performance and scaling of GYRO on many of these systems has been shown before, here we show the comparative performance and scaling on four generations of Cray supercomputers including the newest addition - the Cray XC30. The more recently added hybrid OpenMP/MPImore » imple- mentation also shows a great deal of promise on custom HPC systems that utilize fast CPUs and proprietary interconnects. Four machines of varying sizes were used in the experiment, all of which are located at the National Institute for Computational Sciences at the University of Tennessee at Knoxville and Oak Ridge National Laboratory. The advantages, limitations, and performance of using each system are discussed.« less
New computing systems and their impact on structural analysis and design
NASA Technical Reports Server (NTRS)
Noor, Ahmed K.
1989-01-01
A review is given of the recent advances in computer technology that are likely to impact structural analysis and design. The computational needs for future structures technology are described. The characteristics of new and projected computing systems are summarized. Advances in programming environments, numerical algorithms, and computational strategies for new computing systems are reviewed, and a novel partitioning strategy is outlined for maximizing the degree of parallelism. The strategy is designed for computers with a shared memory and a small number of powerful processors (or a small number of clusters of medium-range processors). It is based on approximating the response of the structure by a combination of symmetric and antisymmetric response vectors, each obtained using a fraction of the degrees of freedom of the original finite element model. The strategy was implemented on the CRAY X-MP/4 and the Alliant FX/8 computers. For nonlinear dynamic problems on the CRAY X-MP with four CPUs, it resulted in an order of magnitude reduction in total analysis time, compared with the direct analysis on a single-CPU CRAY X-MP machine.
Kjaergaard, Thomas; Baudin, Pablo; Bykov, Dmytro; ...
2016-11-16
Here, we present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide–Expand–Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide–Expand–Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalabilitymore » of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the “resolution of the identity second-order Moller–Plesset perturbation theory” (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wolfe, A.
1986-03-10
Supercomputing software is moving into high gear, spurred by the rapid spread of supercomputers into new applications. The critical challenge is how to develop tools that will make it easier for programmers to write applications that take advantage of vectorizing in the classical supercomputer and the parallelism that is emerging in supercomputers and minisupercomputers. Writing parallel software is a challenge that every programmer must face because parallel architectures are springing up across the range of computing. Cray is developing a host of tools for programmers. Tools to support multitasking (in supercomputer parlance, multitasking means dividing up a single program tomore » run on multiple processors) are high on Cray's agenda. On tap for multitasking is Premult, dubbed a microtasking tool. As a preprocessor for Cray's CFT77 FORTRAN compiler, Premult will provide fine-grain multitasking.« less
Reorientation of rotating fluid in microgravity environment with and without gravity jitters
NASA Technical Reports Server (NTRS)
Hung, R. J.; Lee, C. C.; Shyu, K. L.
1990-01-01
In a spacecraft design, the requirements of settled propellant are different for tank pressurization, engine restart, venting, or propellant transfer. The requirement to settle or to position liquid fuel over the outlet end of the spacecraft propellant tank prior main engine restart poses a microgravity fluid behavior problem. In this paper, the dynamical behavior of liquid propellant, fluid reorientation, and propellant resettling have been carried out through the execution of supercomputer CRAY X-MP to simulate the fluid management in a microgravity environment. Results show that the resettlement of fluid can be accomplished more efficiently for fluid in rotating tank than in nonrotating tank, and also better performance for gravity jitters imposed on fluid settlement than without gravity jitters based on the amount of time needed to carry out resettlement period of time between the initiation and termination of geysering.
2012-02-10
Then and Now: These images illustrate the dramatic improvement in NASA computing power over the last 23 years, and its effect on the number of grid points used for flow simulations. At left, an image from the first full-body Navier-Stokes simulation (1988) of an F-16 fighter jet showing pressure on the aircraft body, and fore-body streamlines at Mach 0.90. This steady-state solution took 25 hours using a single Cray X-MP processor to solve the 500,000 grid-point problem. Investigator: Neal Chaderjian, NASA Ames Research Center At right, a 2011 snapshot from a Navier-Stokes simulation of a V-22 Osprey rotorcraft in hover. The blade vortices interact with the smaller turbulent structures. This very detailed simulation used 660 million grid points, and ran on 1536 processors on the Pleiades supercomputer for 180 hours. Investigator: Neal Chaderjian, NASA Ames Research Center; Image: Tim Sandstrom, NASA Ames Research Center
NASA Technical Reports Server (NTRS)
Kramer, Williams T. C.; Simon, Horst D.
1994-01-01
This tutorial proposes to be a practical guide for the uninitiated to the main topics and themes of high-performance computing (HPC), with particular emphasis to distributed computing. The intent is first to provide some guidance and directions in the rapidly increasing field of scientific computing using both massively parallel and traditional supercomputers. Because of their considerable potential computational power, loosely or tightly coupled clusters of workstations are increasingly considered as a third alternative to both the more conventional supercomputers based on a small number of powerful vector processors, as well as high massively parallel processors. Even though many research issues concerning the effective use of workstation clusters and their integration into a large scale production facility are still unresolved, such clusters are already used for production computing. In this tutorial we will utilize the unique experience made at the NAS facility at NASA Ames Research Center. Over the last five years at NAS massively parallel supercomputers such as the Connection Machines CM-2 and CM-5 from Thinking Machines Corporation and the iPSC/860 (Touchstone Gamma Machine) and Paragon Machines from Intel were used in a production supercomputer center alongside with traditional vector supercomputers such as the Cray Y-MP and C90.
User's and test case manual for FEMATS
NASA Technical Reports Server (NTRS)
Chatterjee, Arindam; Volakis, John; Nurnberger, Mike; Natzke, John
1995-01-01
The FEMATS program incorporates first-order edge-based finite elements and vector absorbing boundary conditions into the scattered field formulation for computation of the scattering from three-dimensional geometries. The code has been validated extensively for a large class of geometries containing inhomogeneities and satisfying transition conditions. For geometries that are too large for the workstation environment, the FEMATS code has been optimized to run on various supercomputers. Currently, FEMATS has been configured to run on the HP 9000 workstation, vectorized for the Cray Y-MP, and parallelized to run on the Kendall Square Research (KSR) architecture and the Intel Paragon.
Multitasking the three-dimensional shock wave code CTH on the Cray X-MP/416
DOE Office of Scientific and Technical Information (OSTI.GOV)
McGlaun, J.M.; Thompson, S.L.
1988-01-01
CTH is a software system under development at Sandia National Laboratories Albuquerque that models multidimensional, multi-material, large-deformation, strong shock wave physics. CTH was carefully designed to both vectorize and multitask on the Cray X-MP/416. All of the physics routines are vectorized except the thermodynamics and the interface tracer. All of the physics routines are multitasked except the boundary conditions. The Los Alamos National Laboratory multitasking library was used for the multitasking. The resulting code is easy to maintain, easy to understand, gives the same answers as the unitasked code, and achieves a measured speedup of approximately 3.5 on the fourmore » cpu Cray. This document discusses the design, prototyping, development, and debugging of CTH. It also covers the architecture features of CTH that enhances multitasking, granularity of the tasks, and synchronization of tasks. The utility of system software and utilities such as simulators and interactive debuggers are also discussed. 5 refs., 7 tabs.« less
Using a Cray Y-MP as an array processor for a RISC Workstation
NASA Technical Reports Server (NTRS)
Lamaster, Hugh; Rogallo, Sarah J.
1992-01-01
As microprocessors increase in power, the economics of centralized computing has changed dramatically. At the beginning of the 1980's, mainframes and super computers were often considered to be cost-effective machines for scalar computing. Today, microprocessor-based RISC (reduced-instruction-set computer) systems have displaced many uses of mainframes and supercomputers. Supercomputers are still cost competitive when processing jobs that require both large memory size and high memory bandwidth. One such application is array processing. Certain numerical operations are appropriate to use in a Remote Procedure Call (RPC)-based environment. Matrix multiplication is an example of an operation that can have a sufficient number of arithmetic operations to amortize the cost of an RPC call. An experiment which demonstrates that matrix multiplication can be executed remotely on a large system to speed the execution over that experienced on a workstation is described.
1990-11-12
This feature prevents any significant unexpected and undesired size overhead introduced by the automatic inlining of a called subprogram. Any...PRESERVELAYOUT forces the 5.5.1 compiler to maintain the Ada source order of a given record type, thereby, preventing the compiler from performing this...Environment, Volme 2: Prgram nng Guide assignments to the copied array in Ada do not affect the Fortran version of the array. The dimensions and order of
Y-MP floating point and Cholesky factorization
NASA Technical Reports Server (NTRS)
Carter, Russell
1991-01-01
The floating point arithmetics implemented in the Cray 2 and Cray Y-MP computer systems are nearly identical, but large scale computations performed on the two systems have exhibited significant differences in accuracy. The difference in accuracy is analyzed for Cholesky factorization algorithm, and it is found that the source of the difference is the subtract magnitude operation of the Cray Y-MP. The results from numerical experiments for a range of problem sizes are presented, and an efficient method for improving the accuracy of the factorization obtained on the Y-MP is presented.
Application of high-performance computing to numerical simulation of human movement
NASA Technical Reports Server (NTRS)
Anderson, F. C.; Ziegler, J. M.; Pandy, M. G.; Whalen, R. T.
1995-01-01
We have examined the feasibility of using massively-parallel and vector-processing supercomputers to solve large-scale optimization problems for human movement. Specifically, we compared the computational expense of determining the optimal controls for the single support phase of gait using a conventional serial machine (SGI Iris 4D25), a MIMD parallel machine (Intel iPSC/860), and a parallel-vector-processing machine (Cray Y-MP 8/864). With the human body modeled as a 14 degree-of-freedom linkage actuated by 46 musculotendinous units, computation of the optimal controls for gait could take up to 3 months of CPU time on the Iris. Both the Cray and the Intel are able to reduce this time to practical levels. The optimal solution for gait can be found with about 77 hours of CPU on the Cray and with about 88 hours of CPU on the Intel. Although the overall speeds of the Cray and the Intel were found to be similar, the unique capabilities of each machine are better suited to different portions of the computational algorithm used. The Intel was best suited to computing the derivatives of the performance criterion and the constraints whereas the Cray was best suited to parameter optimization of the controls. These results suggest that the ideal computer architecture for solving very large-scale optimal control problems is a hybrid system in which a vector-processing machine is integrated into the communication network of a MIMD parallel machine.
A Performance Evaluation of the Cray X1 for Scientific Applications
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Borrill, Julian; Canning, Andrew; Carter, Jonathan; Djomehri, M. Jahed; Shan, Hongzhang; Skinner, David
2004-01-01
The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.
The International Conference on Vector and Parallel Computing (2nd)
1989-01-17
Computation of the SVD of Bidiagonal Matrices" ...................................... 11 " Lattice QCD -As a Large Scale Scientific Computation...vectorizcd for the IBM 3090 Vector Facility. In addition, elapsed times " Lattice QCD -As a Large Scale Scientific have been reduced by using 3090...benchmarked Lattice QCD on a large number ofcompu- come from the wavefront solver routine. This was exten- ters: CrayX-MP and Cray 2 (vector
Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sarje, Abhinav; Jacobsen, Douglas W.; Williams, Samuel W.
The incorporation of increasing core counts in modern processors used to build state-of-the-art supercomputers is driving application development towards exploitation of thread parallelism, in addition to distributed memory parallelism, with the goal of delivering efficient high-performance codes. In this work we describe the exploitation of threading and our experiences with it with respect to a real-world ocean modeling application code, MPAS-Ocean. We present detailed performance analysis and comparisons of various approaches and configurations for threading on the Cray XC series supercomputers.
Performance Evaluation of Supercomputers using HPCC and IMB Benchmarks
NASA Technical Reports Server (NTRS)
Saini, Subhash; Ciotti, Robert; Gunney, Brian T. N.; Spelce, Thomas E.; Koniges, Alice; Dossa, Don; Adamidis, Panagiotis; Rabenseifner, Rolf; Tiyyagura, Sunil R.; Mueller, Matthias;
2006-01-01
The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers - SGI Altix BX2, Cray XI, Cray Opteron Cluster, Dell Xeon cluster, and NEC SX-8. These five systems use five different networks (SGI NUMALINK4, Cray network, Myrinet, InfiniBand, and NEC IXS). The complete set of HPCC benchmarks are run on each of these systems. Additionally, we present Intel MPI Benchmarks (IMB) results to study the performance of 11 MPI communication functions on these systems.
Scalability of Parallel Spatial Direct Numerical Simulations on Intel Hypercube and IBM SP1 and SP2
NASA Technical Reports Server (NTRS)
Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad
1995-01-01
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 parallel computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be parallelized on a distributed-memory parallel machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same simulation, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this simulation, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical simulations; incompressible viscous flows; spectral methods; finite differences; parallel computing.
Improved Access to Supercomputers Boosts Chemical Applications.
ERIC Educational Resources Information Center
Borman, Stu
1989-01-01
Supercomputing is described in terms of computing power and abilities. The increase in availability of supercomputers for use in chemical calculations and modeling are reported. Efforts of the National Science Foundation and Cray Research are highlighted. (CW)
A Performance Evaluation of the Cray X1 for Scientific Applications
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Borrill, Julian; Canning, Andrew; Carter, Jonathan; Djomehri, M. Jahed; Shan, Hongzhang; Skinner, David
2003-01-01
The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and capacity computers because of their generality, scalability, and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently-released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.
OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms
Meng, Zhaoyi; Koniges, Alice; He, Yun Helen; ...
2016-09-21
In this paper, we investigate the OpenMP parallelization and optimization of two novel data classification algorithms. The new algorithms are based on graph and PDE solution techniques and provide significant accuracy and performance advantages over traditional data classification algorithms in serial mode. The methods leverage the Nystrom extension to calculate eigenvalue/eigenvectors of the graph Laplacian and this is a self-contained module that can be used in conjunction with other graph-Laplacian based methods such as spectral clustering. We use performance tools to collect the hotspots and memory access of the serial codes and use OpenMP as the parallelization language to parallelizemore » the most time-consuming parts. Where possible, we also use library routines. We then optimize the OpenMP implementations and detail the performance on traditional supercomputer nodes (in our case a Cray XC30), and test the optimization steps on emerging testbed systems based on Intel’s Knights Corner and Landing processors. We show both performance improvement and strong scaling behavior. Finally, a large number of optimization techniques and analyses are necessary before the algorithm reaches almost ideal scaling.« less
NASA Technical Reports Server (NTRS)
Babrauckas, Theresa
2000-01-01
The Affordable High Performance Computing (AHPC) project demonstrated that high-performance computing based on a distributed network of computer workstations is a cost-effective alternative to vector supercomputers for running CPU and memory intensive design and analysis tools. The AHPC project created an integrated system called a Network Supercomputer. By connecting computer work-stations through a network and utilizing the workstations when they are idle, the resulting distributed-workstation environment has the same performance and reliability levels as the Cray C90 vector Supercomputer at less than 25 percent of the C90 cost. In fact, the cost comparison between a Cray C90 Supercomputer and Sun workstations showed that the number of distributed networked workstations equivalent to a C90 costs approximately 8 percent of the C90.
Introducing Argonne’s Theta Supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
Theta, the Argonne Leadership Computing Facility’s (ALCF) new Intel-Cray supercomputer, is officially open to the research community. Theta’s massively parallel, many-core architecture puts the ALCF on the path to Aurora, the facility’s future Intel-Cray system. Capable of nearly 10 quadrillion calculations per second, Theta enables researchers to break new ground in scientific investigations that range from modeling the inner workings of the brain to developing new materials for renewable energy applications.
Early Experiences Writing Performance Portable OpenMP 4 Codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joubert, Wayne; Hernandez, Oscar R
In this paper, we evaluate the recently available directives in OpenMP 4 to parallelize a computational kernel using both the traditional shared memory approach and the newer accelerator targeting capabilities. In addition, we explore various transformations that attempt to increase application performance portability, and examine the expressiveness and performance implications of using these approaches. For example, we want to understand if the target map directives in OpenMP 4 improve data locality when mapped to a shared memory system, as opposed to the traditional first touch policy approach in traditional OpenMP. To that end, we use recent Cray and Intel compilersmore » to measure the performance variations of a simple application kernel when executed on the OLCF s Titan supercomputer with NVIDIA GPUs and the Beacon system with Intel Xeon Phi accelerators attached. To better understand these trade-offs, we compare our results from traditional OpenMP shared memory implementations to the newer accelerator programming model when it is used to target both the CPU and an attached heterogeneous device. We believe the results and lessons learned as presented in this paper will be useful to the larger user community by providing guidelines that can assist programmers in the development of performance portable code.« less
A comparison of five benchmarks
NASA Technical Reports Server (NTRS)
Huss, Janice E.; Pennline, James A.
1987-01-01
Five benchmark programs were obtained and run on the NASA Lewis CRAY X-MP/24. A comparison was made between the programs codes and between the methods for calculating performance figures. Several multitasking jobs were run to gain experience in how parallel performance is measured.
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code
NASA Astrophysics Data System (ADS)
Mendygral, P. J.; Radcliffe, N.; Kandalla, K.; Porter, D.; O'Neill, B. J.; Nolting, C.; Edmon, P.; Donnert, J. M. F.; Jones, T. W.
2017-02-01
We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it may be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.
Data communication requirements for the advanced NAS network
NASA Technical Reports Server (NTRS)
Levin, Eugene; Eaton, C. K.; Young, Bruce
1986-01-01
The goal of the Numerical Aerodynamic Simulation (NAS) Program is to provide a powerful computational environment for advanced research and development in aeronautics and related disciplines. The present NAS system consists of a Cray 2 supercomputer connected by a data network to a large mass storage system, to sophisticated local graphics workstations, and by remote communications to researchers throughout the United States. The program plan is to continue acquiring the most powerful supercomputers as they become available. In the 1987/1988 time period it is anticipated that a computer with 4 times the processing speed of a Cray 2 will be obtained and by 1990 an additional supercomputer with 16 times the speed of the Cray 2. The implications of this 20-fold increase in processing power on the data communications requirements are described. The analysis was based on models of the projected workload and system architecture. The results are presented together with the estimates of their sensitivity to assumptions inherent in the models.
Optimization of Supercomputer Use on EADS II System
NASA Technical Reports Server (NTRS)
Ahmed, Ardsher
1998-01-01
The main objective of this research was to optimize supercomputer use to achieve better throughput and utilization of supercomputers and to help facilitate the movement of non-supercomputing (inappropriate for supercomputer) codes to mid-range systems for better use of Government resources at Marshall Space Flight Center (MSFC). This work involved the survey of architectures available on EADS II and monitoring customer (user) applications running on a CRAY T90 system.
User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Earth Sciences Division; Zhang, Keni; Zhang, Keni
TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator ismore » to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used. To familiarize users with the parallel code, illustrative sample problems are presented.« less
LASL benchmark performance 1978. [CDC STAR-100, 6600, 7600, Cyber 73, and CRAY-1
DOE Office of Scientific and Technical Information (OSTI.GOV)
McKnight, A.L.
1979-08-01
This report presents the results of running several benchmark programs on a CDC STAR-100, a Cray Research CRAY-1, a CDC 6600, a CDC 7600, and a CDC Cyber 73. The benchmark effort included CRAY-1's at several installations running different operating systems and compilers. This benchmark is part of an ongoing program at Los Alamos Scientific Laboratory to collect performance data and monitor the development trend of supercomputers. 3 tables.
Multitasking 3-D forward modeling using high-order finite difference methods on the Cray X-MP/416
DOE Office of Scientific and Technical Information (OSTI.GOV)
Terki-Hassaine, O.; Leiss, E.L.
1988-01-01
The CRAY X-MP/416 was used to multitask 3-D forward modeling by the high-order finite difference method. Flowtrace analysis reveals that the most expensive operation in the unitasked program is a matrix vector multiplication. The in-core and out-of-core versions of a reentrant subroutine can perform any fraction of the matrix vector multiplication independently, a pattern compatible with multitasking. The matrix vector multiplication routine can be distributed over two to four processors. The rest of the program utilizes the microtasking feature that lets the system treat independent iterations of DO-loops as subtasks to be performed by any available processor. The availability ofmore » the Solid-State Storage Device (SSD) meant the I/O wait time was virtually zero. A performance study determined a theoretical speedup, taking into account the multitasking overhead. Multitasking programs utilizing both macrotasking and microtasking features obtained actual speedups that were approximately 80% of the ideal speedup.« less
Experiences From NASA/Langley's DMSS Project
NASA Technical Reports Server (NTRS)
1996-01-01
There is a trend in institutions with high performance computing and data management requirements to explore mass storage systems with peripherals directly attached to a high speed network. The Distributed Mass Storage System (DMSS) Project at the NASA Langley Research Center (LaRC) has placed such a system into production use. This paper will present the experiences, both good and bad, we have had with this system since putting it into production usage. The system is comprised of: 1) National Storage Laboratory (NSL)/UniTree 2.1, 2) IBM 9570 HIPPI attached disk arrays (both RAID 3 and RAID 5), 3) IBM RS6000 server, 4) HIPPI/IPI3 third party transfers between the disk array systems and the supercomputer clients, a CRAY Y-MP and a CRAY 2, 5) a "warm spare" file server, 6) transition software to convert from CRAY's Data Migration Facility (DMF) based system to DMSS, 7) an NSC PS32 HIPPI switch, and 8) a STK 4490 robotic library accessed from the IBM RS6000 block mux interface. This paper will cover: the performance of the DMSS in the following areas: file transfer rates, migration and recall, and file manipulation (listing, deleting, etc.); the appropriateness of a workstation class of file server for NSL/UniTree with LaRC's present storage requirements in mind the role of the third party transfers between the supercomputers and the DMSS disk array systems in DMSS; a detailed comparison (both in performance and functionality) between the DMF and DMSS systems LaRC's enhancements to the NSL/UniTree system administration environment the mechanism for DMSS to provide file server redundancy the statistics on the availability of DMSS the design and experiences with the locally developed transparent transition software which allowed us to make over 1.5 million DMF files available to NSL/UniTree with minimal system outage
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mendygral, P. J.; Radcliffe, N.; Kandalla, K.
2017-02-01
We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it maymore » be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.« less
NASA Technical Reports Server (NTRS)
Tennille, Geoffrey M.; Howser, Lona M.
1993-01-01
The use of the CONVEX computers that are an integral part of the Supercomputing Network Subsystems (SNS) of the Central Scientific Computing Complex of LaRC is briefly described. Features of the CONVEX computers that are significantly different than the CRAY supercomputers are covered, including: FORTRAN, C, architecture of the CONVEX computers, the CONVEX environment, batch job submittal, debugging, performance analysis, utilities unique to CONVEX, and documentation. This revision reflects the addition of the Applications Compiler and X-based debugger, CXdb. The document id intended for all CONVEX users as a ready reference to frequently asked questions and to more detailed information contained with the vendor manuals. It is appropriate for both the novice and the experienced user.
Experiences and results multitasking a hydrodynamics code on global and local memory machines
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mandell, D.
1987-01-01
A one-dimensional, time-dependent Lagrangian hydrodynamics code using a Godunov solution method has been multitasked for the Cray X-MP/48, the Intel iPSC hypercube, the Alliant FX series and the IBM RP3 computers. Actual multitasking results have been obtained for the Cray, Intel and Alliant computers and simulated results were obtained for the Cray and RP3 machines. The differences in the methods required to multitask on each of the machines is discussed. Results are presented for a sample problem involving a shock wave moving down a channel. Comparisons are made between theoretical speedups, predicted by Amdahl's law, and the actual speedups obtained.more » The problems of debugging on the different machines are also described.« less
Scalable Vector Media-processors for Embedded Systems
2002-05-01
Set Architecture for Multimedia “When you do the common things in life in an uncommon way, you will command the attention of the world.” George ...Bibliography [ABHS89] M. August, G. Brost , C. Hsiung, and C. Schiffleger. Cray X-MP: The Birth of a Super- computer. IEEE Computer, 22(1):45–52, January
Shooting and bouncing rays - Calculating the RCS of an arbitrarily shaped cavity
NASA Technical Reports Server (NTRS)
Ling, Hao; Chou, Ri-Chee; Lee, Shung-Wu
1989-01-01
A ray-shooting approach is presented for calculating the interior radar cross section (RCS) from a partially open cavity. In the problem considered, a dense grid of rays is launched into the cavity through the opening. The rays bounce from the cavity walls based on the laws of geometrical optics and eventually exit the cavity via the aperture. The ray-bouncing method is based on tracking a large number of rays launched into the cavity through the opening and determining the geometrical optics field associated with each ray by taking into consideration (1) the geometrical divergence factor, (2) polarization, and (3) material loading of the cavity walls. A physical optics scheme is then applied to compute the backscattered field from the exit rays. This method is so simple in concept that there is virtually no restriction on the shape or material loading of the cavity. Numerical results obtained by this method are compared with those for the modal analysis for a circular cylinder terminated by a PEC plate. RCS results for an S-bend circular cylinder generated on the Cray X-MP supercomputer show significant RCS reduction. Some of the limitations and possible extensions of this technique are discussed.
Computer aided design of monolithic microwave and millimeter wave integrated circuits and subsystems
NASA Astrophysics Data System (ADS)
Ku, Walter H.
1987-08-01
This interim technical report presents results of research on the computer aided design of monolithic microwave and millimeter wave integrated circuits and subsystems. A specific objective is to extend the state-of-the-art of the Computer Aided Design (CAD) of the monolithic microwave and millimeter wave integrated circuits (MIMIC). In this reporting period, we have derived a new model for the high electron mobility transistor (HEMT) based on a nonlinear charge control formulation which takes into consideration the variation of the 2DEG distance offset from the heterointerface as a function of bias. Pseudomorphic InGaAs/GaAs HEMT devices have been successfully fabricated at UCSD. For a 1 micron gate length, a maximum transconductance of 320 mS/mm was obtained. In cooperation with TRW, devices with 0.15 micron and 0.25 micron gate lengths have been successfully fabricated and tested. New results on the design of ultra-wideband distributed amplifiers using 0.15 micron pseudomorphic InGaAs/GaAs HEMT's have also been obtained. In addition, two-dimensional models of the submicron MESFET's, HEMT's and HBT's are currently being developed for the CRAY X-MP/48 supercomputer. Preliminary results obtained are also presented in this report.
Chemical calculations on Cray computers
NASA Technical Reports Server (NTRS)
Taylor, Peter R.; Bauschlicher, Charles W., Jr.; Schwenke, David W.
1989-01-01
The influence of recent developments in supercomputing on computational chemistry is discussed with particular reference to Cray computers and their pipelined vector/limited parallel architectures. After reviewing Cray hardware and software the performance of different elementary program structures are examined, and effective methods for improving program performance are outlined. The computational strategies appropriate for obtaining optimum performance in applications to quantum chemistry and dynamics are discussed. Finally, some discussion is given of new developments and future hardware and software improvements.
Optimal Full Information Synthesis for Flexible Structures Implemented on Cray Supercomputers
NASA Technical Reports Server (NTRS)
Lind, Rick; Balas, Gary J.
1995-01-01
This paper considers an algorithm for synthesis of optimal controllers for full information feedback. The synthesis procedure reduces to a single linear matrix inequality which may be solved via established convex optimization algorithms. The computational cost of the optimization is investigated. It is demonstrated the problem dimension and corresponding matrices can become large for practical engineering problems. This algorithm represents a process that is impractical for standard workstations for large order systems. A flexible structure is presented as a design example. Control synthesis requires several days on a workstation but may be solved in a reasonable amount of time using a Cray supercomputer.
Theoretical research program to study chemical reactions in AOTV bow shock tubes
NASA Technical Reports Server (NTRS)
Taylor, P.
1986-01-01
Progress in the development of computational methods for the characterization of chemical reactions in aerobraking orbit transfer vehicle (AOTV) propulsive flows is reported. Two main areas of code development were undertaken: (1) the implementation of CASSCF (complete active space self-consistent field) and SCF (self-consistent field) analytical first derivatives on the CRAY X-MP; and (2) the installation of the complete set of electronic structure codes on the CRAY 2. In the area of application calculations the main effort was devoted to performing full configuration-interaction calculations and using these results to benchmark other methods. Preprints describing some of the systems studied are included.
NASA Technical Reports Server (NTRS)
Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad
1994-01-01
The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.
ORNL Cray X1 evaluation status report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agarwal, P.K.; Alexander, R.A.; Apra, E.
2004-05-01
On August 15, 2002 the Department of Energy (DOE) selected the Center for Computational Sciences (CCS) at Oak Ridge National Laboratory (ORNL) to deploy a new scalable vector supercomputer architecture for solving important scientific problems in climate, fusion, biology, nanoscale materials and astrophysics. ''This program is one of the first steps in an initiative designed to provide U.S. scientists with the computational power that is essential to 21st century scientific leadership,'' said Dr. Raymond L. Orbach, director of the department's Office of Science. In FY03, CCS procured a 256-processor Cray X1 to evaluate the processors, memory subsystem, scalability of themore » architecture, software environment and to predict the expected sustained performance on key DOE applications codes. The results of the micro-benchmarks and kernel bench marks show the architecture of the Cray X1 to be exceptionally fast for most operations. The best results are shown on large problems, where it is not possible to fit the entire problem into the cache of the processors. These large problems are exactly the types of problems that are important for the DOE and ultra-scale simulation. Application performance is found to be markedly improved by this architecture: - Large-scale simulations of high-temperature superconductors run 25 times faster than on an IBM Power4 cluster using the same number of processors. - Best performance of the parallel ocean program (POP v1.4.3) is 50 percent higher than on Japan s Earth Simulator and 5 times higher than on an IBM Power4 cluster. - A fusion application, global GYRO transport, was found to be 16 times faster on the X1 than on an IBM Power3. The increased performance allowed simulations to fully resolve questions raised by a prior study. - The transport kernel in the AGILE-BOLTZTRAN astrophysics code runs 15 times faster than on an IBM Power4 cluster using the same number of processors. - Molecular dynamics simulations related to the phenomenon of photon echo run 8 times faster than previously achieved. Even at 256 processors, the Cray X1 system is already outperforming other supercomputers with thousands of processors for a certain class of applications such as climate modeling and some fusion applications. This evaluation is the outcome of a number of meetings with both high-performance computing (HPC) system vendors and application experts over the past 9 months and has received broad-based support from the scientific community and other agencies.« less
Internal computational fluid mechanics on supercomputers for aerospace propulsion systems
NASA Technical Reports Server (NTRS)
Andersen, Bernhard H.; Benson, Thomas J.
1987-01-01
The accurate calculation of three-dimensional internal flowfields for application towards aerospace propulsion systems requires computational resources available only on supercomputers. A survey is presented of three-dimensional calculations of hypersonic, transonic, and subsonic internal flowfields conducted at the Lewis Research Center. A steady state Parabolized Navier-Stokes (PNS) solution of flow in a Mach 5.0, mixed compression inlet, a Navier-Stokes solution of flow in the vicinity of a terminal shock, and a PNS solution of flow in a diffusing S-bend with vortex generators are presented and discussed. All of these calculations were performed on either the NAS Cray-2 or the Lewis Research Center Cray XMP.
New tools using the hardware performance monitor to help users tune programs on the Cray X-MP
DOE Office of Scientific and Technical Information (OSTI.GOV)
Engert, D.E.; Rudsinski, L.; Doak, J.
1991-09-25
The performance of a Cray system is highly dependent on the tuning techniques used by individuals on their codes. Many of our users were not taking advantage of the tuning tools that allow them to monitor their own programs by using the Hardware Performance Monitor (HPM). We therefore modified UNICOS to collect HPM data for all processes and to report Mflop ratings based on users, programs, and time used. Our tuning efforts are now being focused on the users and programs that have the best potential for performance improvements. These modifications and some of the more striking performance improvements aremore » described.« less
Antenna pattern control using impedance surfaces
NASA Technical Reports Server (NTRS)
Balanis, Constantine A.; Liu, Kefeng
1992-01-01
During this research period, we have effectively transferred existing computer codes from CRAY supercomputer to work station based systems. The work station based version of our code preserved the accuracy of the numerical computations while giving a much better turn-around time than the CRAY supercomputer. Such a task relieved us of the heavy dependence of the supercomputer account budget and made codes developed in this research project more feasible for applications. The analysis of pyramidal horns with impedance surfaces was our major focus during this research period. Three different modeling algorithms in analyzing lossy impedance surfaces were investigated and compared with measured data. Through this investigation, we discovered that a hybrid Fourier transform technique, which uses the eigen mode in the stepped waveguide section and the Fourier transformed field distributions across the stepped discontinuities for lossy impedances coating, gives a better accuracy in analyzing lossy coatings. After a further refinement of the present technique, we will perform an accurate radiation pattern synthesis in the coming reporting period.
FAST: A multi-processed environment for visualization of computational fluid dynamics
NASA Technical Reports Server (NTRS)
Bancroft, Gordon V.; Merritt, Fergus J.; Plessel, Todd C.; Kelaita, Paul G.; Mccabe, R. Kevin
1991-01-01
Three-dimensional, unsteady, multi-zoned fluid dynamics simulations over full scale aircraft are typical of the problems being investigated at NASA Ames' Numerical Aerodynamic Simulation (NAS) facility on CRAY2 and CRAY-YMP supercomputers. With multiple processor workstations available in the 10-30 Mflop range, we feel that these new developments in scientific computing warrant a new approach to the design and implementation of analysis tools. These larger, more complex problems create a need for new visualization techniques not possible with the existing software or systems available as of this writing. The visualization techniques will change as the supercomputing environment, and hence the scientific methods employed, evolves even further. The Flow Analysis Software Toolkit (FAST), an implementation of a software system for fluid mechanics analysis, is discussed.
Parallel processing a three-dimensional free-lagrange code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mandell, D.A.; Trease, H.E.
1989-01-01
A three-dimensional, time-dependent free-Lagrange hydrodynamics code has been multitasked and autotasked on a CRAY X-MP/416. The multitasking was done by using the Los Alamos Multitasking Control Library, which is a superset of the CRAY multitasking library. Autotasking is done by using constructs which are only comment cards if the source code is not run through a preprocessor. The three-dimensional algorithm has presented a number of problems that simpler algorithms, such as those for one-dimensional hydrodynamics, did not exhibit. Problems in converting the serial code, originally written for a CRAY-1, to a multitasking code are discussed. Autotasking of a rewritten versionmore » of the code is discussed. Timing results for subroutines and hot spots in the serial code are presented and suggestions for additional tools and debugging aids are given. Theoretical speedup results obtained from Amdahl's law and actual speedup results obtained on a dedicated machine are presented. Suggestions for designing large parallel codes are given.« less
Parallel processing a real code: A case history
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mandell, D.A.; Trease, H.E.
1988-01-01
A three-dimensional, time-dependent Free-Lagrange hydrodynamics code has been multitasked and autotasked on a Cray X-MP/416. The multitasking was done by using the Los Alamos Multitasking Control Library, which is a superset of the Cray multitasking library. Autotasking is done by using constructs which are only comment cards if the source code is not run through a preprocessor. The 3-D algorithm has presented a number of problems that simpler algorithms, such as 1-D hydrodynamics, did not exhibit. Problems in converting the serial code, originally written for a Cray 1, to a multitasking code are discussed, Autotasking of a rewritten version ofmore » the code is discussed. Timing results for subroutines and hot spots in the serial code are presented and suggestions for additional tools and debugging aids are given. Theoretical speedup results obtained from Amdahl's law and actual speedup results obtained on a dedicated machine are presented. Suggestions for designing large parallel codes are given. 8 refs., 13 figs.« less
Supercomputer algorithms for efficient linear octree encoding of three-dimensional brain images.
Berger, S B; Reis, D J
1995-02-01
We designed and implemented algorithms for three-dimensional (3-D) reconstruction of brain images from serial sections using two important supercomputer architectures, vector and parallel. These architectures were represented by the Cray YMP and Connection Machine CM-2, respectively. The programs operated on linear octree representations of the brain data sets, and achieved 500-800 times acceleration when compared with a conventional laboratory workstation. As the need for higher resolution data sets increases, supercomputer algorithms may offer a means of performing 3-D reconstruction well above current experimental limits.
A Block-LU Update for Large-Scale Linear Programming
1990-01-01
linear programming problems. Results are given from runs on the Cray Y -MP. 1. Introduction We wish to use the simplex method [Dan63] to solve the...standard linear program, minimize cTx subject to Ax = b 1< x <U, where A is an m by n matrix and c, x, 1, u, and b are of appropriate dimension. The simplex...the identity matrix. The basis is used to solve for the search direction y and the dual variables 7r in the following linear systems: Bky = aq (1.2) and
Vectorized and multitasked solution of the few-group neutron diffusion equations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zee, S.K.; Turinsky, P.J.; Shayer, Z.
1989-03-01
A numerical algorithm with parallelism was used to solve the two-group, multidimensional neutron diffusion equations on computers characterized by shared memory, vector pipeline, and multi-CPU architecture features. Specifically, solutions were obtained on the Cray X/MP-48, the IBM-3090 with vector facilities, and the FPS-164. The material-centered mesh finite difference method approximation and outer-inner iteration method were employed. Parallelism was introduced in the inner iterations using the cyclic line successive overrelaxation iterative method and solving in parallel across lines. The outer iterations were completed using the Chebyshev semi-iterative method that allows parallelism to be introduced in both space and energy groups. Formore » the three-dimensional model, power, soluble boron, and transient fission product feedbacks were included. Concentrating on the pressurized water reactor (PWR), the thermal-hydraulic calculation of moderator density assumed single-phase flow and a closed flow channel, allowing parallelism to be introduced in the solution across the radial plane. Using a pinwise detail, quarter-core model of a typical PWR in cycle 1, for the two-dimensional model without feedback the measured million floating point operations per second (MFLOPS)/vector speedups were 83/11.7. 18/2.2, and 2.4/5.6 on the Cray, IBM, and FPS without multitasking, respectively. Lower performance was observed with a coarser mesh, i.e., shorter vector length, due to vector pipeline start-up. For an 18 x 18 x 30 (x-y-z) three-dimensional model with feedback of the same core, MFLOPS/vector speedups of --61/6.7 and an execution time of 0.8 CPU seconds on the Cray without multitasking were measured. Finally, using two CPUs and the vector pipelines of the Cray, a multitasking efficiency of 81% was noted for the three-dimensional model.« less
Scaling of data communications for an advanced supercomputer network
NASA Technical Reports Server (NTRS)
Levin, E.; Eaton, C. K.; Young, Bruce
1986-01-01
The goal of NASA's Numerical Aerodynamic Simulation (NAS) Program is to provide a powerful computational environment for advanced research and development in aeronautics and related disciplines. The present NAS system consists of a Cray 2 supercomputer connected by a data network to a large mass storage system, to sophisticated local graphics workstations and by remote communication to researchers throughout the United States. The program plan is to continue acquiring the most powerful supercomputers as they become available. The implications of a projected 20-fold increase in processing power on the data communications requirements are described.
Engineering PFLOTRAN for Scalable Performance on Cray XT and IBM BlueGene Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mills, Richard T; Sripathi, Vamsi K; Mahinthakumar, Gnanamanika
We describe PFLOTRAN - a code for simulation of coupled hydro-thermal-chemical processes in variably saturated, non-isothermal, porous media - and the approaches we have employed to obtain scalable performance on some of the largest scale supercomputers in the world. We present detailed analyses of I/O and solver performance on Jaguar, the Cray XT5 at Oak Ridge National Laboratory, and Intrepid, the IBM BlueGene/P at Argonne National Laboratory, that have guided our choice of algorithms.
Dynamic overset grid communication on distributed memory parallel processors
NASA Technical Reports Server (NTRS)
Barszcz, Eric; Weeratunga, Sisira K.; Meakin, Robert L.
1993-01-01
A parallel distributed memory implementation of intergrid communication for dynamic overset grids is presented. Included are discussions of various options considered during development. Results are presented comparing an Intel iPSC/860 to a single processor Cray Y-MP. Results for grids in relative motion show the iPSC/860 implementation to be faster than the Cray implementation.
Vectorization of a particle code used in the simulation of rarefied hypersonic flow
NASA Technical Reports Server (NTRS)
Baganoff, D.
1990-01-01
A limitation of the direct simulation Monte Carlo (DSMC) method is that it does not allow efficient use of vector architectures that predominate in current supercomputers. Consequently, the problems that can be handled are limited to those of one- and two-dimensional flows. This work focuses on a reformulation of the DSMC method with the objective of designing a procedure that is optimized to the vector architectures found on machines such as the Cray-2. In addition, it focuses on finding a better balance between algorithmic complexity and the total number of particles employed in a simulation so that the overall performance of a particle simulation scheme can be greatly improved. Simulations of the flow about a 3D blunt body are performed with 10 to the 7th particles and 4 x 10 to the 5th mesh cells. Good statistics are obtained with time averaging over 800 time steps using 4.5 h of Cray-2 single-processor CPU time.
1993 Gordon Bell Prize Winners
NASA Technical Reports Server (NTRS)
Karp, Alan H.; Simon, Horst; Heller, Don; Cooper, D. M. (Technical Monitor)
1994-01-01
The Gordon Bell Prize recognizes significant achievements in the application of supercomputers to scientific and engineering problems. In 1993, finalists were named for work in three categories: (1) Performance, which recognizes those who solved a real problem in the quickest elapsed time. (2) Price/performance, which encourages the development of cost-effective supercomputing. (3) Compiler-generated speedup, which measures how well compiler writers are facilitating the programming of parallel processors. The winners were announced November 17 at the Supercomputing 93 conference in Portland, Oregon. Gordon Bell, an independent consultant in Los Altos, California, is sponsoring $2,000 in prizes each year for 10 years to promote practical parallel processing research. This is the sixth year of the prize, which Computer administers. Something unprecedented in Gordon Bell Prize competition occurred this year: A computer manufacturer was singled out for recognition. Nine entries reporting results obtained on the Cray C90 were received, seven of the submissions orchestrated by Cray Research. Although none of these entries showed sufficiently high performance to win outright, the judges were impressed by the breadth of applications that ran well on this machine, all nine running at more than a third of the peak performance of the machine.
1990-08-01
corneal structure for both normal and swollen corneas. Other problems of future interest are the understanding of the structure of scarred and dystrophied ...METHOD AND RESULTS The system of equations is solved numerically on a Cray X-MP by a finite element method with 9-node Lagrange quadrilaterals ( Becker ...Appl. Math., 42, 430. Becker , E. B., G. F. Carey, and J. T. Oden, 1981. Finite Elements: An Introduction (Vol. 1), Prentice- Hall, Englewood Cliffs, New
Heart Fibrillation and Parallel Supercomputers
NASA Technical Reports Server (NTRS)
Kogan, B. Y.; Karplus, W. J.; Chudin, E. E.
1997-01-01
The Luo and Rudy 3 cardiac cell mathematical model is implemented on the parallel supercomputer CRAY - T3D. The splitting algorithm combined with variable time step and an explicit method of integration provide reasonable solution times and almost perfect scaling for rectilinear wave propagation. The computer simulation makes it possible to observe new phenomena: the break-up of spiral waves caused by intracellular calcium and dynamics and the non-uniformity of the calcium distribution in space during the onset of the spiral wave.
The computation of pi to 29,360,000 decimal digits using Borweins' quartically convergent algorithm
NASA Technical Reports Server (NTRS)
Bailey, David H.
1988-01-01
The quartically convergent numerical algorithm developed by Borwein and Borwein (1987) for 1/pi is implemented via a prime-modulus-transform multiprecision technique on the NASA Ames Cray-2 supercomputer to compute the first 2.936 x 10 to the 7th digits of the decimal expansion of pi. The history of pi computations is briefly recalled; the most recent algorithms are characterized; the implementation procedures are described; and samples of the output listing are presented. Statistical analyses show that the present decimal expansion is completely random, with only acceptable numbers of long repeating strings and single-digit runs.
EFFECTS OF TUMORS ON INHALED PHARMACOLOGIC DRUGS: II. PARTICLE MOTION
ABSTRACT
Computer simulations were conducted to describe drug particle motion in human lung bifurcations with tumors. The computations used FIDAP with a Cray T90 supercomputer. The objective was to better understand particle behavior as affected by particle characteristics...
A Programming Model Performance Study Using the NAS Parallel Benchmarks
Shan, Hongzhang; Blagojević, Filip; Min, Seung-Jai; ...
2010-01-01
Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the threemore » programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors. We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.« less
NASA Astrophysics Data System (ADS)
Clay, M. P.; Yeung, P. K.; Buaria, D.; Gotoh, T.
2017-11-01
Turbulent mixing at high Schmidt number is a multiscale problem which places demanding requirements on direct numerical simulations to resolve fluctuations down the to Batchelor scale. We use a dual-grid, dual-scheme and dual-communicator approach where velocity and scalar fields are computed by separate groups of parallel processes, the latter using a combined compact finite difference (CCD) scheme on finer grid with a static 3-D domain decomposition free of the communication overhead of memory transposes. A high degree of scalability is achieved for a 81923 scalar field at Schmidt number 512 in turbulence with a modest inertial range, by overlapping communication with computation whenever possible. On the Cray XE6 partition of Blue Waters, use of a dedicated thread for communication combined with OpenMP locks and nested parallelism reduces CCD timings by 34% compared to an MPI baseline. The code has been further optimized for the 27-petaflops Cray XK7 machine Titan using GPUs as accelerators with the latest OpenMP 4.5 directives, giving 2.7X speedup compared to CPU-only execution at the largest problem size. Supported by NSF Grant ACI-1036170, the NCSA Blue Waters Project with subaward via UIUC, and a DOE INCITE allocation at ORNL.
Hot Chips and Hot Interconnects for High End Computing Systems
NASA Technical Reports Server (NTRS)
Saini, Subhash
2005-01-01
I will discuss several processors: 1. The Cray proprietary processor used in the Cray X1; 2. The IBM Power 3 and Power 4 used in an IBM SP 3 and IBM SP 4 systems; 3. The Intel Itanium and Xeon, used in the SGI Altix systems and clusters respectively; 4. IBM System-on-a-Chip used in IBM BlueGene/L; 5. HP Alpha EV68 processor used in DOE ASCI Q cluster; 6. SPARC64 V processor, which is used in the Fujitsu PRIMEPOWER HPC2500; 7. An NEC proprietary processor, which is used in NEC SX-6/7; 8. Power 4+ processor, which is used in Hitachi SR11000; 9. NEC proprietary processor, which is used in Earth Simulator. The IBM POWER5 and Red Storm Computing Systems will also be discussed. The architectures of these processors will first be presented, followed by interconnection networks and a description of high-end computer systems based on these processors and networks. The performance of various hardware/programming model combinations will then be compared, based on latest NAS Parallel Benchmark results (MPI, OpenMP/HPF and hybrid (MPI + OpenMP). The tutorial will conclude with a discussion of general trends in the field of high performance computing, (quantum computing, DNA computing, cellular engineering, and neural networks).
Enabling Diverse Software Stacks on Supercomputers using High Performance Virtual Clusters.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Younge, Andrew J.; Pedretti, Kevin; Grant, Ryan
While large-scale simulations have been the hallmark of the High Performance Computing (HPC) community for decades, Large Scale Data Analytics (LSDA) workloads are gaining attention within the scientific community not only as a processing component to large HPC simulations, but also as standalone scientific tools for knowledge discovery. With the path towards Exascale, new HPC runtime systems are also emerging in a way that differs from classical distributed com- puting models. However, system software for such capabilities on the latest extreme-scale DOE supercomputing needs to be enhanced to more appropriately support these types of emerging soft- ware ecosystems. In thismore » paper, we propose the use of Virtual Clusters on advanced supercomputing resources to enable systems to support not only HPC workloads, but also emerging big data stacks. Specifi- cally, we have deployed the KVM hypervisor within Cray's Compute Node Linux on a XC-series supercomputer testbed. We also use libvirt and QEMU to manage and provision VMs directly on compute nodes, leveraging Ethernet-over-Aries network emulation. To our knowledge, this is the first known use of KVM on a true MPP supercomputer. We investigate the overhead our solution using HPC benchmarks, both evaluating single-node performance as well as weak scaling of a 32-node virtual cluster. Overall, we find single node performance of our solution using KVM on a Cray is very efficient with near-native performance. However overhead increases by up to 20% as virtual cluster size increases, due to limitations of the Ethernet-over-Aries bridged network. Furthermore, we deploy Apache Spark with large data analysis workloads in a Virtual Cluster, ef- fectively demonstrating how diverse software ecosystems can be supported by High Performance Virtual Clusters.« less
New Computer Simulations of Macular Neural Functioning
NASA Technical Reports Server (NTRS)
Ross, Muriel D.; Doshay, D.; Linton, S.; Parnas, B.; Montgomery, K.; Chimento, T.
1994-01-01
We use high performance graphics workstations and supercomputers to study the functional significance of the three-dimensional (3-D) organization of gravity sensors. These sensors have a prototypic architecture foreshadowing more complex systems. Scaled-down simulations run on a Silicon Graphics workstation and scaled-up, 3-D versions run on a Cray Y-MP supercomputer. A semi-automated method of reconstruction of neural tissue from serial sections studied in a transmission electron microscope has been developed to eliminate tedious conventional photography. The reconstructions use a mesh as a step in generating a neural surface for visualization. Two meshes are required to model calyx surfaces. The meshes are connected and the resulting prisms represent the cytoplasm and the bounding membranes. A finite volume analysis method is employed to simulate voltage changes along the calyx in response to synapse activation on the calyx or on calyceal processes. The finite volume method insures that charge is conserved at the calyx-process junction. These and other models indicate that efferent processes act as voltage followers, and that the morphology of some afferent processes affects their functioning. In a final application, morphological information is symbolically represented in three dimensions in a computer. The possible functioning of the connectivities is tested using mathematical interpretations of physiological parameters taken from the literature. Symbolic, 3-D simulations are in progress to probe the functional significance of the connectivities. This research is expected to advance computer-based studies of macular functioning and of synaptic plasticity.
NASA Technical Reports Server (NTRS)
Gillian, Ronnie E.; Lotts, Christine G.
1988-01-01
The Computational Structural Mechanics (CSM) Activity at Langley Research Center is developing methods for structural analysis on modern computers. To facilitate that research effort, an applications development environment has been constructed to insulate the researcher from the many computer operating systems of a widely distributed computer network. The CSM Testbed development system was ported to the Numerical Aerodynamic Simulator (NAS) Cray-2, at the Ames Research Center, to provide a high end computational capability. This paper describes the implementation experiences, the resulting capability, and the future directions for the Testbed on supercomputers.
Multitasking the three-dimensional transport code TORT on CRAY platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azmy, Y.Y.; Barnett, D.A.; Burre, C.A.
1996-04-01
The multitasking options in the three-dimensional neutral particle transport code TORT originally implemented for Cray`s CTSS operating system are revived and extended to run on Cray Y/MP and C90 computers using the UNICOS operating system. These include two coarse-grained domain decompositions; across octants, and across directions within an octant, termed Octant Parallel (OP), and Direction Parallel (DP), respectively. Parallel performance of the DP is significantly enhanced by increasing the task grain size and reducing load imbalance via dynamic scheduling of the discrete angles among the participating tasks. Substantial Wall Clock speedup factors, approaching 4.5 using 8 tasks, have been measuredmore » in a time-sharing environment, and generally depend on the test problem specifications, number of tasks, and machine loading during execution.« less
RISC Processors and High Performance Computing
NASA Technical Reports Server (NTRS)
Saini, Subhash; Bailey, David H.; Lasinski, T. A. (Technical Monitor)
1995-01-01
In this tutorial, we will discuss top five current RISC microprocessors: The IBM Power2, which is used in the IBM RS6000/590 workstation and in the IBM SP2 parallel supercomputer, the DEC Alpha, which is in the DEC Alpha workstation and in the Cray T3D; the MIPS R8000, which is used in the SGI Power Challenge; the HP PA-RISC 7100, which is used in the HP 700 series workstations and in the Convex Exemplar; and the Cray proprietary processor, which is used in the new Cray J916. The architecture of these microprocessors will first be presented. The effective performance of these processors will then be compared, both by citing standard benchmarks and also in the context of implementing a real applications. In the process, different programming models such as data parallel (CM Fortran and HPF) and message passing (PVM and MPI) will be introduced and compared. The latest NAS Parallel Benchmark (NPB) absolute performance and performance per dollar figures will be presented. The next generation of the NP13 will also be described. The tutorial will conclude with a discussion of general trends in the field of high performance computing, including likely future developments in hardware and software technology, and the relative roles of vector supercomputers tightly coupled parallel computers, and clusters of workstations. This tutorial will provide a unique cross-machine comparison not available elsewhere.
MILC Code Performance on High End CPU and GPU Supercomputer Clusters
NASA Astrophysics Data System (ADS)
DeTar, Carleton; Gottlieb, Steven; Li, Ruizi; Toussaint, Doug
2018-03-01
With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.
Performance Analysis of the NAS Y-MP Workload
NASA Technical Reports Server (NTRS)
Bergeron, Robert J.; Kutler, Paul (Technical Monitor)
1997-01-01
This paper describes the performance characteristics of the computational workloads on the NAS Cray Y-MP machines, a Y-MP 832 and later a Y-MP 8128. Hardware measurements indicated that the Y-MP workload performance matured over time, ultimately sustaining an average throughput of 0.8 GFLOPS and a vector operation fraction of 87%. The measurements also revealed an operation rate exceeding 1 per clock period, a well-balanced architecture featuring a strong utilization of vector functional units, and an efficient memory organization. Introduction of the larger memory 8128 increased throughput by allowing a more efficient utilization of CPUs. Throughput also depended on the metering of the batch queues; low-idle Saturday workloads required a buffer of small jobs to prevent memory starvation of the CPU. UNICOS required about 7% of total CPU time to service the 832 workloads; this overhead decreased to 5% for the 8128 workloads. While most of the system time went to service I/O requests, efficient scheduling prevented excessive idle due to I/O wait. System measurements disclosed no obvious bottlenecks in the response of the machine and UNICOS to the workloads. In most cases, Cray-provided software tools were- quite sufficient for measuring the performance of both the machine and operating, system.
Molecular orbital studies of the bonding in heavy element organometallics: Progress report
NASA Astrophysics Data System (ADS)
Bursten, B. E.
1988-03-01
Over the past two years we have made considerable progress in the understanding of the bonding in heavy element mononuclear and binuclear complexes. For mononuclear complexes, our strategy has been to study the orbital interactions between the actinide metal center and the surrounding ligands. One particular system which has been studied extensively is X sub 3 AnL (where X = Cp, Cl, NH sub 2 ; An = actinide; and L = neutral or anionic ligand). We are interested not only in the mechanics of the An-X orbital interactions, but also how the relative donor characteristics of X may influence coordination of the fourth ligand L to the actinide. For binuclear systems, we are interested not only in homobimetallic complexes, but also in heterobimetallic complexes containing actinides and transition metals. In order to make the calculations of such large systems tractable, we have transferred the X-alpha-SW codes to the newly acquired Cray XMP24 at the Ohio Supercomputer Center. This has resulted in significant savings of money and time.
TOP500 Sublist for November 2001
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack J.
2001-11-09
18th Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, GERMANY; KNOXVILLE, TENN.; BERKELEY, CALIF. In what has become a much-anticipated event in the world of high-performance computing, the 18th edition of the TOP500 list of the world's fastest supercomputers was released today (November 9, 2001). The latest edition of the twice-yearly ranking finds IBM as the leader in the field, with 32 percent in terms of installed systems and 37 percent in terms of total performance of all the installed systems. In a surprise move Hewlett-Packard captured the second place with 30 percent of the systems. Most ofmore » these systems are smaller in size and as a consequence HP's share of installed performance is smaller with 15 percent. This is still enough for second place in this category. SGI, Cray and Sun follow in the number of TOP500 systems with 41 (8 percent), 39 (8 percent), and 31 (6 percent) respectively. In the category of installed performance Cray Inc. keeps the third position with 11 percent ahead of SGI (8 percent) and Compaq (8 percent).« less
Barrier-breaking performance for industrial problems on the CRAY C916
DOE Office of Scientific and Technical Information (OSTI.GOV)
Graffunder, S.K.
1993-12-31
Nine applications, including third-party codes, were submitted to the Gordon Bell Prize committee showing the CRAY C916 supercomputer providing record-breaking time to solution for industrial problems in several disciplines. Performance was obtained by balancing raw hardware speed; effective use of large, real, shared memory; compiler vectorization and autotasking; hand optimization; asynchronous I/O techniques; and new algorithms. The highest GFLOPS performance for the submissions was 11.1 GFLOPS out of a peak advertised performance of 16 GFLOPS for the CRAY C916 system. One program achieved a 15.45 speedup from the compiler with just two hand-inserted directives to scope variables properly for themore » mathematical library. New I/O techniques hide tens of gigabytes of I/O behind parallel computations. Finally, new iterative solver algorithms have demonstrated times to solution on 1 CPU as high as 70 times faster than the best direct solvers.« less
Fluid behavior in microgravity environment
NASA Technical Reports Server (NTRS)
Hung, R. J.; Lee, C. C.; Tsao, Y. D.
1990-01-01
The instability of liquid and gas interface can be induced by the presence of longitudinal and lateral accelerations, vehicle vibration, and rotational fields of spacecraft in a microgravity environment. In a spacecraft design, the requirements of settled propellant are different for tank pressurization, engine restart, venting, or propellent transfer. In this paper, the dynamical behavior of liquid propellant, fluid reorientation, and propellent resettling have been carried out through the execution of a CRAY X-MP super computer to simulate fluid management in a microgravity environment. Characteristics of slosh waves excited by the restoring force field of gravity jitters have also been investigated.
TOSCA calculations and measurements for the SLAC SLC damping ring dipole magnet
NASA Astrophysics Data System (ADS)
Early, R. A.; Cobb, J. K.
1985-04-01
The SLAC damping ring dipole magnet was originally designed with removable nose pieces at the ends. Recently, a set of magnetic measurements was taken of the vertical component of induction along the center of the magnet for four different pole-end configurations and several current settings. The three dimensional computer code TOSCA, which is currently installed on the National Magnetic Fusion Energy Computer Center's Cray X-MP, was used to compute field values for the four configurations at current settings near saturation. Comparisons were made for magnetic induction as well as effective magnetic lengths for the different configurations.
NASA Technical Reports Server (NTRS)
Mulac, Richard A.; Celestina, Mark L.; Adamczyk, John J.; Misegades, Kent P.; Dawson, Jef M.
1987-01-01
A procedure is outlined which utilizes parallel processing to solve the inviscid form of the average-passage equation system for multistage turbomachinery along with a description of its implementation in a FORTRAN computer code, MSTAGE. A scheme to reduce the central memory requirements of the program is also detailed. Both the multitasking and I/O routines referred to are specific to the Cray X-MP line of computers and its associated SSD (Solid-State Disk). Results are presented for a simulation of a two-stage rocket engine fuel pump turbine.
Research on Spectroscopy, Opacity, and Atmospheres
NASA Technical Reports Server (NTRS)
Kurucz, Robert L.
1999-01-01
To make my calculations more readily accessible I have set up a web site cfaku5.harvard.edu that can also be accessed by FTP. it has 5 9GB disks that hold all of my atomic and diatomic molecular data, my tables of distribution function opacities, my grids of model atmospheres, colors, fluxes, etc, my program that are ready for distribution, most of my recent papers. Atlases and computed spectra will be added as they are completed. New atomic and molecular calculations will be added as they are completed. I got my atomic programs that had been running on a Cray at the San Diego Supercomputer Center to run on my Vaxes and Alpha. I started with Ni and Co because there were new laboratory analyses that included isotopic and hyperfine splitting. Those calculations are described in the appended abstract for the 6th Atomic Spectroscopy and oscillator Strengths meeting in Victoria last summer. A surprising finding is that quadrupole transitions have been grossly in error because mixing with higher levels has not been included. I now have enough memory in my Alpha to treat 3000 x 3000 matrices. I now include all levels up through n=9 for Fe I and 11, the spectra for which the most information is available. I am finishing those calculations right now. After Fe I and Fe 11, all other spectra are "easy", and I will be in mass production. ATL;LS12, my opacity sampling program for computing models with arbitrary abundances, has been put on the web server. I wrote a new distribution function opacity program for workstations that replaces the one I used on the Cray at the San Diego Supercomputer Center. Each set of abundances would take 100 Cray hours costing $100,000. 1 ran 25 cases. Each of my opacity CDs contains three abundances. I have a new program -iinning on the Alpha that takes about a week. I am going to have to get a faster processor or I will have to dedicate a whole workstation just to opacities.
Gigaflop performance on a CRAY-2: Multitasking a computational fluid dynamics application
NASA Technical Reports Server (NTRS)
Tennille, Geoffrey M.; Overman, Andrea L.; Lambiotte, Jules J.; Streett, Craig L.
1991-01-01
The methodology is described for converting a large, long-running applications code that executed on a single processor of a CRAY-2 supercomputer to a version that executed efficiently on multiple processors. Although the conversion of every application is different, a discussion of the types of modification used to achieve gigaflop performance is included to assist others in the parallelization of applications for CRAY computers, especially those that were developed for other computers. An existing application, from the discipline of computational fluid dynamics, that had utilized over 2000 hrs of CPU time on CRAY-2 during the previous year was chosen as a test case to study the effectiveness of multitasking on a CRAY-2. The nature of dominant calculations within the application indicated that a sustained computational rate of 1 billion floating-point operations per second, or 1 gigaflop, might be achieved. The code was first analyzed and modified for optimal performance on a single processor in a batch environment. After optimal performance on a single CPU was achieved, the code was modified to use multiple processors in a dedicated environment. The results of these two efforts were merged into a single code that had a sustained computational rate of over 1 gigaflop on a CRAY-2. Timings and analysis of performance are given for both single- and multiple-processor runs.
OpenMP Performance on the Columbia Supercomputer
NASA Technical Reports Server (NTRS)
Haoqiang, Jin; Hood, Robert
2005-01-01
This presentation discusses Columbia World Class Supercomputer which is one of the world's fastest supercomputers providing 61 TFLOPs (10/20/04). Conceived, designed, built, and deployed in just 120 days. A 20-node supercomputer built on proven 512-processor nodes. The largest SGI system in the world with over 10,000 Intel Itanium 2 processors and provides the largest node size incorporating commodity parts (512) and the largest shared-memory environment (2048) with 88% efficiency tops the scalar systems on the Top500 list.
Azad, Ariful; Buluç, Aydın
2016-05-16
We describe parallel algorithms for computing maximal cardinality matching in a bipartite graph on distributed-memory systems. Unlike traditional algorithms that match one vertex at a time, our algorithms process many unmatched vertices simultaneously using a matrix-algebraic formulation of maximal matching. This generic matrix-algebraic framework is used to develop three efficient maximal matching algorithms with minimal changes. The newly developed algorithms have two benefits over existing graph-based algorithms. First, unlike existing parallel algorithms, cardinality of matching obtained by the new algorithms stays constant with increasing processor counts, which is important for predictable and reproducible performance. Second, relying on bulk-synchronous matrix operations,more » these algorithms expose a higher degree of parallelism on distributed-memory platforms than existing graph-based algorithms. We report high-performance implementations of three maximal matching algorithms using hybrid OpenMP-MPI and evaluate the performance of these algorithm using more than 35 real and randomly generated graphs. On real instances, our algorithms achieve up to 200 × speedup on 2048 cores of a Cray XC30 supercomputer. Even higher speedups are obtained on larger synthetically generated graphs where our algorithms show good scaling on up to 16,384 cores.« less
Ellingson, Sally R; Dakshanamurthy, Sivanesan; Brown, Milton; Smith, Jeremy C; Baudry, Jerome
2014-04-25
In this paper we give the current state of high-throughput virtual screening. We describe a case study of using a task-parallel MPI (Message Passing Interface) version of Autodock4 [1], [2] to run a virtual high-throughput screen of one-million compounds on the Jaguar Cray XK6 Supercomputer at Oak Ridge National Laboratory. We include a description of scripts developed to increase the efficiency of the predocking file preparation and postdocking analysis. A detailed tutorial, scripts, and source code for this MPI version of Autodock4 are available online at http://www.bio.utk.edu/baudrylab/autodockmpi.htm.
Discrete sensitivity derivatives of the Navier-Stokes equations with a parallel Krylov solver
NASA Technical Reports Server (NTRS)
Ajmani, Kumud; Taylor, Arthur C., III
1994-01-01
This paper solves an 'incremental' form of the sensitivity equations derived by differentiating the discretized thin-layer Navier Stokes equations with respect to certain design variables of interest. The equations are solved with a parallel, preconditioned Generalized Minimal RESidual (GMRES) solver on a distributed-memory architecture. The 'serial' sensitivity analysis code is parallelized by using the Single Program Multiple Data (SPMD) programming model, domain decomposition techniques, and message-passing tools. Sensitivity derivatives are computed for low and high Reynolds number flows over a NACA 1406 airfoil on a 32-processor Intel Hypercube, and found to be identical to those computed on a single-processor Cray Y-MP. It is estimated that the parallel sensitivity analysis code has to be run on 40-50 processors of the Intel Hypercube in order to match the single-processor processing time of a Cray Y-MP.
NASA Technical Reports Server (NTRS)
Mulac, Richard A.; Celestina, Mark L.; Adamczyk, John J.; Misegades, Kent P.; Dawson, Jef M.
1987-01-01
A procedure is outlined which utilizes parallel processing to solve the inviscid form of the average-passage equation system for multistage turbomachinery along with a description of its implementation in a FORTRAN computer code, MSTAGE. A scheme to reduce the central memory requirements of the program is also detailed. Both the multitasking and I/O routines referred to in this paper are specific to the Cray X-MP line of computers and its associated SSD (Solid-state Storage Device). Results are presented for a simulation of a two-stage rocket engine fuel pump turbine.
Time-partitioning simulation models for calculation on parallel computers
NASA Technical Reports Server (NTRS)
Milner, Edward J.; Blech, Richard A.; Chima, Rodrick V.
1987-01-01
A technique allowing time-staggered solution of partial differential equations is presented in this report. Using this technique, called time-partitioning, simulation execution speedup is proportional to the number of processors used because all processors operate simultaneously, with each updating of the solution grid at a different time point. The technique is limited by neither the number of processors available nor by the dimension of the solution grid. Time-partitioning was used to obtain the flow pattern through a cascade of airfoils, modeled by the Euler partial differential equations. An execution speedup factor of 1.77 was achieved using a two processor Cray X-MP/24 computer.
NASA Technical Reports Server (NTRS)
McGuire, Tim
1998-01-01
In this paper, we report the results of our recent research on the application of a multiprocessor Cray T916 supercomputer in modeling super-thermal electron transport in the earth's magnetic field. In general, this mathematical model requires numerical solution of a system of partial differential equations. The code we use for this model is moderately vectorized. By using Amdahl's Law for vector processors, it can be verified that the code is about 60% vectorized on a Cray computer. Speedup factors on the order of 2.5 were obtained compared to the unvectorized code. In the following sections, we discuss the methodology of improving the code. In addition to our goal of optimizing the code for solution on the Cray computer, we had the goal of scalability in mind. Scalability combines the concepts of portabilty with near-linear speedup. Specifically, a scalable program is one whose performance is portable across many different architectures with differing numbers of processors for many different problem sizes. Though we have access to a Cray at this time, the goal was to also have code which would run well on a variety of architectures.
Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villa, Oreste; Tumeo, Antonino; Secchi, Simone
Irregular applications, such as data mining and analysis or graph-based computations, show unpredictable memory/network access patterns and control structures. Highly multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2 and XMT, appear to address their requirements better than commodity clusters. However, the research on highly multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy and customization. At the same time, Shared-memory MultiProcessors (SMPs) with multi-core processors have become an attractive platform to simulate large scale machines. In this paper, wemore » introduce a cycle-level simulator of the highly multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques introduced to make the simulation as fast as possible while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at run-time and includes a network model that takes into account contention. On a modern 48-core SMP host, our infrastructure simulates a large set of irregular applications 500 to 2000 times slower than real time when compared to a 128-processor XMT, while remaining within 10\\% of accuracy. Emulation is only from 25 to 200 times slower than real time.« less
Research on Spectroscopy, Opacity, and Atmospheres
NASA Technical Reports Server (NTRS)
Kurucz, Robert L.
1999-01-01
A web site has been set up to make the calculations accessible; (i.e., cfakus.harvard.edu) This data can also be accessed by FTP. It has all of the atomic and diatomic molecular data, tables of distribution function opacities, grids of model atmospheres, colors, fluxes, etc, programs that are ready for distribution, and most of recent papers developed during this grant. Atlases and computed spectra will be added as they are completed. New atomic and molecular calculations will be added as they are completed. The atomic programs that had been running on a Cray at the San Diego Supercomputer Center can now run on the Vaxes and Alpha. The work started with Ni and Co because there were new laboratory analyses that included isotopic and hyperfine splitting. Those calculations are described in the appended abstract for the 6th Atomic Spectroscopy and oscillator Strengths meeting in Victoria last summer. A surprising finding is that quadrupole transitions have been grossly in error because mixing with higher levels has not been included. All levels up through n=9 for Fe I and II, the spectra for which the most information is available, are now included. After Fe I and Fe II, all other spectra are "easy". ATLAS12, the opacity sampling program for computing models with arbitrary abundances, has been put on the web server. A new distribution function opacity program for workstations that replaces the one used on the Cray at the San Diego Supercomputer Center has been written. Each set of abundances would take 100 Cray hours costing $100,000.
NASA Technical Reports Server (NTRS)
Logan, Terry G.
1994-01-01
The purpose of this study is to investigate the performance of the integral equation computations using numerical source field-panel method in a massively parallel processing (MPP) environment. A comparative study of computational performance of the MPP CM-5 computer and conventional Cray-YMP supercomputer for a three-dimensional flow problem is made. A serial FORTRAN code is converted into a parallel CM-FORTRAN code. Some performance results are obtained on CM-5 with 32, 62, 128 nodes along with those on Cray-YMP with a single processor. The comparison of the performance indicates that the parallel CM-FORTRAN code near or out-performs the equivalent serial FORTRAN code for some cases.
ATLAS and LHC computing on CRAY
NASA Astrophysics Data System (ADS)
Sciacca, F. G.; Haug, S.; ATLAS Collaboration
2017-10-01
Access and exploitation of large scale computing resources, such as those offered by general purpose HPC centres, is one important measure for ATLAS and the other Large Hadron Collider experiments in order to meet the challenge posed by the full exploitation of the future data within the constraints of flat budgets. We report on the effort of moving the Swiss WLCG T2 computing, serving ATLAS, CMS and LHCb, from a dedicated cluster to the large Cray systems at the Swiss National Supercomputing Centre CSCS. These systems do not only offer very efficient hardware, cooling and highly competent operators, but also have large backfill potentials due to size and multidisciplinary usage and potential gains due to economy at scale. Technical solutions, performance, expected return and future plans are discussed.
Climate Data Assimilation on a Massively Parallel Supercomputer
NASA Technical Reports Server (NTRS)
Ding, Hong Q.; Ferraro, Robert D.
1996-01-01
We have designed and implemented a set of highly efficient and highly scalable algorithms for an unstructured computational package, the PSAS data assimilation package, as demonstrated by detailed performance analysis of systematic runs on up to 512-nodes of an Intel Paragon. The preconditioned Conjugate Gradient solver achieves a sustained 18 Gflops performance. Consequently, we achieve an unprecedented 100-fold reduction in time to solution on the Intel Paragon over a single head of a Cray C90. This not only exceeds the daily performance requirement of the Data Assimilation Office at NASA's Goddard Space Flight Center, but also makes it possible to explore much larger and challenging data assimilation problems which are unthinkable on a traditional computer platform such as the Cray C90.
Close to real life. [solving for transonic flow about lifting airfoils using supercomputers
NASA Technical Reports Server (NTRS)
Peterson, Victor L.; Bailey, F. Ron
1988-01-01
NASA's Numerical Aerodynamic Simulation (NAS) facility for CFD modeling of highly complex aerodynamic flows employs as its basic hardware two Cray-2s, an ETA-10 Model Q, an Amdahl 5880 mainframe computer that furnishes both support processing and access to 300 Gbytes of disk storage, several minicomputers and superminicomputers, and a Thinking Machines 16,000-device 'connection machine' processor. NAS, which was the first supercomputer facility to standardize operating-system and communication software on all processors, has done important Space Shuttle aerodynamics simulations and will be critical to the configurational refinement of the National Aerospace Plane and its intergrated powerplant, which will involve complex, high temperature reactive gasdynamic computations.
Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster
NASA Technical Reports Server (NTRS)
Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)
2002-01-01
In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction
Kumar, B.; Huang, C. -H.; Sadayappan, P.; ...
1995-01-01
In this article, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storagemore » of size O(7 n ) for multiplying 2 n × 2 n matrices. We present a modified formulation in which the working storage requirement is reduced to O(4 n ). The modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor. Performance results on a Cray Y-MP8/64 are presented.« less
A computational/experimental study of the flow around a body of revolution at angle of attack
NASA Technical Reports Server (NTRS)
Zilliac, Gregory G.
1986-01-01
The incompressible Navier-Stokes equations are numerically solved for steady flow around an ogive-cylinder (fineness ration 4.5) at angle of attack. The three-dimensional vortical flow is investigated with emphasis on the tip and the near wake region. The implicit, finite-difference computation is performed on the CRAY X-MP computer using the method of pseudo-compressibility. Comparisons of computational results with results of a companion towing tank experiment are presented for two symmetric leeside flow cases of moderate angles of attack. The topology of the flow is discussed and conclusions are drawn concerning the growth and stability of the primary vortices.
Multitasking for flows about multiple body configurations using the chimera grid scheme
NASA Technical Reports Server (NTRS)
Dougherty, F. C.; Morgan, R. L.
1987-01-01
The multitasking of a finite-difference scheme using multiple overset meshes is described. In this chimera, or multiple overset mesh approach, a multiple body configuration is mapped using a major grid about the main component of the configuration, with minor overset meshes used to map each additional component. This type of code is well suited to multitasking. Both steady and unsteady two dimensional computations are run on parallel processors on a CRAY-X/MP 48, usually with one mesh per processor. Flow field results are compared with single processor results to demonstrate the feasibility of running multiple mesh codes on parallel processors and to show the increase in efficiency.
Numerical simulation of three dimensional transonic flows
NASA Technical Reports Server (NTRS)
Sahu, Jubaraj; Steger, Joseph L.
1987-01-01
The three-dimensional flow over a projectile has been computed using an implicit, approximately factored, partially flux-split algorithm. A simple composite grid scheme has been developed in which a single grid is partitioned into a series of smaller grids for applications which require an external large memory device such as the SSD of the CRAY X-MP/48, or multitasking. The accuracy and stability of the composite grid scheme has been tested by numerically simulating the flow over an ellipsoid at angle of attack and comparing the solution with a single grid solution. The flowfield over a projectile at M = 0.96 and 4 deg angle-of-attack has been computed using a fine grid, and compared with experiment.
Extensions and improvements on XTRAN3S
NASA Technical Reports Server (NTRS)
Borland, C. J.
1989-01-01
Improvements to the XTRAN3S computer program are summarized. Work on this code, for steady and unsteady aerodynamic and aeroelastic analysis in the transonic flow regime has concentrated on the following areas: (1) Maintenance of the XTRAN3S code, including correction of errors, enhancement of operational capability, and installation on the Cray X-MP system; (2) Extension of the vectorization concepts in XTRAN3S to include additional areas of the code for improved execution speed; (3) Modification of the XTRAN3S algorithm for improved numerical stability for swept, tapered wing cases and improved computational efficiency; and (4) Extension of the wing-only version of XTRAN3S to include pylon and nacelle or external store capability.
An analysis of file migration in a UNIX supercomputing environment
NASA Technical Reports Server (NTRS)
Miller, Ethan L.; Katz, Randy H.
1992-01-01
The super computer center at the National Center for Atmospheric Research (NCAR) migrates large numbers of files to and from its mass storage system (MSS) because there is insufficient space to store them on the Cray supercomputer's local disks. This paper presents an analysis of file migration data collected over two years. The analysis shows that requests to the MSS are periodic, with one day and one week periods. Read requests to the MSS account for the majority of the periodicity; as write requests are relatively constant over the course of a week. Additionally, reads show a far greater fluctuation than writes over a day and week since reads are driven by human users while writes are machine-driven.
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1
NASA Technical Reports Server (NTRS)
Simon, Horst D.; Saini, Subhash; Grassi, Charles
1994-01-01
The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
Adaptation of MSC/NASTRAN to a supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gloudeman, J.F.; Hodge, J.C.
1982-01-01
MSC/NASTRAN is a large-scale general purpose digital computer program which solves a wider variety of engineering analysis problems by the finite element method. The program capabilities include static and dynamic structural analysis (linear and nonlinear), heat transfer, acoustics, electromagnetism and other types of field problems. It is used worldwide by large and small companies in such diverse fields as automotive, aerospace, civil engineering, shipbuilding, offshore oil, industrial equipment, chemical engineering, biomedical research, optics and government research. The paper presents the significant aspects of the adaptation of MSC/NASTRAN to the Cray-1. First, the general architecture and predominant functional use of MSC/NASTRANmore » are discussed to help explain the imperatives and the challenges of this undertaking. The key characteristics of the Cray-1 which influenced the decision to undertake this effort are then reviewed to help identify performance targets. An overview of the MSC/NASTRAN adaptation effort is then given to help define the scope of the project. Finally, some measures of MSC/NASTRAN's operational performance on the Cray-1 are given, along with a few guidelines to help avoid improper interpretation. 17 references.« less
NASA Langley Research Center's distributed mass storage system
NASA Technical Reports Server (NTRS)
Pao, Juliet Z.; Humes, D. Creig
1993-01-01
There is a trend in institutions with high performance computing and data management requirements to explore mass storage systems with peripherals directly attached to a high speed network. The Distributed Mass Storage System (DMSS) Project at NASA LaRC is building such a system and expects to put it into production use by the end of 1993. This paper presents the design of the DMSS, some experiences in its development and use, and a performance analysis of its capabilities. The special features of this system are: (1) workstation class file servers running UniTree software; (2) third party I/O; (3) HIPPI network; (4) HIPPI/IPI3 disk array systems; (5) Storage Technology Corporation (STK) ACS 4400 automatic cartridge system; (6) CRAY Research Incorporated (CRI) CRAY Y-MP and CRAY-2 clients; (7) file server redundancy provision; and (8) a transition mechanism from the existent mass storage system to the DMSS.
LANZ: Software solving the large sparse symmetric generalized eigenproblem
NASA Technical Reports Server (NTRS)
Jones, Mark T.; Patrick, Merrell L.
1990-01-01
A package, LANZ, for solving the large symmetric generalized eigenproblem is described. The package was tested on four different architectures: Convex 200, CRAY Y-MP, Sun-3, and Sun-4. The package uses a Lanczos' method and is based on recent research into solving the generalized eigenproblem.
Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization
NASA Technical Reports Server (NTRS)
Jones, James Patton; Nitzberg, Bill
1999-01-01
The NAS facility has operated parallel supercomputers for the past 11 years, including the Intel iPSC/860, Intel Paragon, Thinking Machines CM-5, IBM SP-2, and Cray Origin 2000. Across this wide variety of machine architectures, across a span of 10 years, across a large number of different users, and through thousands of minor configuration and policy changes, the utilization of these machines shows three general trends: (1) scheduling using a naive FIFO first-fit policy results in 40-60% utilization, (2) switching to the more sophisticated dynamic backfilling scheduling algorithm improves utilization by about 15 percentage points (yielding about 70% utilization), and (3) reducing the maximum allowable job size further increases utilization. Most surprising is the consistency of these trends. Over the lifetime of the NAS parallel systems, we made hundreds, perhaps thousands, of small changes to hardware, software, and policy, yet, utilization was affected little. In particular these results show that the goal of achieving near 100% utilization while supporting a real parallel supercomputing workload is unrealistic.
Solving large-scale dynamic systems using band Lanczos method in Rockwell NASTRAN on CRAY X-MP
NASA Technical Reports Server (NTRS)
Gupta, V. K.; Zillmer, S. D.; Allison, R. E.
1986-01-01
The improved cost effectiveness using better models, more accurate and faster algorithms and large scale computing offers more representative dynamic analyses. The band Lanczos eigen-solution method was implemented in Rockwell's version of 1984 COSMIC-released NASTRAN finite element structural analysis computer program to effectively solve for structural vibration modes including those of large complex systems exceeding 10,000 degrees of freedom. The Lanczos vectors were re-orthogonalized locally using the Lanczos Method and globally using the modified Gram-Schmidt method for sweeping rigid-body modes and previously generated modes and Lanczos vectors. The truncated band matrix was solved for vibration frequencies and mode shapes using Givens rotations. Numerical examples are included to demonstrate the cost effectiveness and accuracy of the method as implemented in ROCKWELL NASTRAN. The CRAY version is based on RPK's COSMIC/NASTRAN. The band Lanczos method was more reliable and accurate and converged faster than the single vector Lanczos Method. The band Lanczos method was comparable to the subspace iteration method which was a block version of the inverse power method. However, the subspace matrix tended to be fully populated in the case of subspace iteration and not as sparse as a band matrix.
Deploying Darter A Cray XC30 System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fahey, Mark R; Budiardja, Reuben D; Crosby, Lonnie D
TheUniversityofTennessee,KnoxvilleacquiredaCrayXC30 supercomputer, called Darter, with a peak performance of 248.9 Ter- aflops. Darter was deployed in late March of 2013 with a very aggressive production timeline - the system was deployed, accepted, and placed into production in only 2 weeks. The Spring Experiment for the Center for Analysis and Prediction of Storms (CAPS) largely drove the accelerated timeline, as the experiment was scheduled to start in mid-April. The Consortium for Advanced Simulation of Light Water Reactors (CASL) project also needed access and was able to meet their tight deadlines on the newly acquired XC30. Darter s accelerated deployment and op-more » erations schedule resulted in substantial scientific impacts within the re- search community as well as immediate real-world impacts such as early severe tornado warnings« less
RATFOR user's guide version 2.0
NASA Technical Reports Server (NTRS)
Helmle, L. C.
1985-01-01
This document is a user's guide for RATFOR at Ames Research Center. The main part of the document is a general description of RATFOR, and the appendix is devoted to a machine specific implementation for the Cray X-MP. The general stylistic features of RATFOR are discussed, including the block structure, keywords, source code, format, and the notion of tokens. There is a section on the basic control structures (IF-ELSE, ELSE IF, WHILE, FOR, DO, REPEAT-UNTIL, BREAK, NEXT), and there is a section on the statements that extend FORTRAN's capabilities (DEFINE, MACRO, INCLUDE, STRING). THE appendix discusses everything needed to compile and run a basic job, the preprocessor options, the supported character sets, the generated listings, fatal errors, and program limitations and the differences from standard FORTRAN.
NASA Technical Reports Server (NTRS)
Purdon, David J.; Baruah, Pranab K.; Bussoletti, John E.; Epton, Michael A.; Massena, William A.; Nelson, Franklin D.; Tsurusaki, Kiyoharu
1990-01-01
The Maintenance Document Version 3.0 is a guide to the PAN AIR software system, a system which computes the subsonic or supersonic linear potential flow about a body of nearly arbitrary shape, using a higher order panel method. The document describes the overall system and each program module of the system. Sufficient detail is given for program maintenance, updating, and modification. It is assumed that the reader is familiar with programming and CRAY computer systems. The PAN AIR system was written in FORTRAN 4 language except for a few CAL language subroutines which exist in the PAN AIR library. Structured programming techniques were used to provide code documentation and maintainability. The operating systems accommodated are COS 1.11, COS 1.12, COS 1.13, and COS 1.14 on the CRAY 1S, 1M, and X-MP computing systems. The system is comprised of a data base management system, a program library, an execution control module, and nine separate FORTRAN technical modules. Each module calculates part of the posed PAN AIR problem. The data base manager is used to communicate between modules and within modules. The technical modules must be run in a prescribed fashion for each PAN AIR problem. In order to ease the problem of supplying the many JCL cards required to execute the modules, a set of CRAY procedures (PAPROCS) was created to automatically supply most of the JCL cards. Most of this document has not changed for Version 3.0. It now, however, strictly applies only to PAN AIR version 3.0. The major changes are: (1) additional sections covering the new FDP module (which calculates streamlines and offbody points); (2) a complete rewrite of the section on the MAG module; and (3) strict applicability to CRAY computing systems.
Reduction and analysis of VLA maps for 281 radio-loud quasars using the UNLV Cray Y-MP supercomputer
NASA Technical Reports Server (NTRS)
Ding, Ailian; Hintzen, Paul; Weistrop, Donna; Owen, Frazer
1993-01-01
The identification of distorted radio-loud quasars provides a potentially very powerful tool for basic cosmological studies. If large morphological distortions are correlated with membership of the quasars in rich clusters of galaxies, optical observations can be used to identify rich clusters of galaxies at large redshifts. Hintzen, Ulvestad, and Owen (1983, HUO) undertook a VLA A array snapshot survey at 20 cm of 123 radio-loud quasars, and they found that among triple sources in their sample, 17 percent had radio axes which were bent more than 20 deg and 5 percent were bent more than 40 deg. Their subsequent optical observations showed that excess galaxy densities within 30 arcsec of 6 low-redshift distorted quasars were on average 3 times as great as those around undistorted quasars (Hintzen 1984). At least one of the distorted quasars observed, 3C275.1, apparently lies in the first-ranked galaxy at the center of a rich cluster of galaxies (Hintzen and Romanishin, 1986). Although their sample was small, these results indicated that observations of distorted quasars could be used to identify clusters of galaxies at large redshifts. The purpose of this project is to increase the available sample of distorted quasars to allow optical detection of a significant sample of quasar-associated clusters of galaxies at large redshifts.
NASA Technical Reports Server (NTRS)
Mavriplis, D. J.; Das, Raja; Saltz, Joel; Vermeland, R. E.
1992-01-01
An efficient three dimensional unstructured Euler solver is parallelized on a Cray Y-MP C90 shared memory computer and on an Intel Touchstone Delta distributed memory computer. This paper relates the experiences gained and describes the software tools and hardware used in this study. Performance comparisons between two differing architectures are made.
Katouda, Michio; Naruse, Akira; Hirano, Yukihiko; Nakajima, Takahito
2016-11-15
A new parallel algorithm and its implementation for the RI-MP2 energy calculation utilizing peta-flop-class many-core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual-level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi-node and multi-GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi-node and multi-GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Nuclear shell model code CRUNCHER
DOE Office of Scientific and Technical Information (OSTI.GOV)
Resler, D.A.; Grimes, S.M.
1988-05-01
A new nuclear shell model code CRUNCHER, patterned after the code VLADIMIR, has been developed. While CRUNCHER and VLADIMIR employ the techniques of an uncoupled basis and the Lanczos process, improvements in the new code allow it to handle much larger problems than the previous code and to perform them more efficiently. Tests involving a moderately sized calculation indicate that CRUNCHER running on a SUN 3/260 workstation requires approximately one-half the central processing unit (CPU) time required by VLADIMIR running on a CRAY-1 supercomputer.
Researchers Mine Information from Next-Generation Subsurface Flow Simulations
Gedenk, Eric D.
2015-12-01
A research team based at Virginia Tech University leveraged computing resources at the US Department of Energy's (DOE's) Oak Ridge National Laboratory to explore subsurface multiphase flow phenomena that can't be experimentally observed. Using the Cray XK7 Titan supercomputer at the Oak Ridge Leadership Computing Facility, the team took Micro-CT images of subsurface geologic systems and created two-phase flow simulations. The team's model development has implications for computational research pertaining to carbon sequestration, oil recovery, and contaminant transport.
NASA Technical Reports Server (NTRS)
Shannon, Robert V., Jr.
1989-01-01
The model generation and structural analysis performed for the High Pressure Oxidizer Turbopump (HPOTP) preburner pump volute housing located on the main pump end of the HPOTP in the space shuttle main engine are summarized. An ANSYS finite element model of the volute housing was built and executed. A static structural analysis was performed on the Engineering Analysis and Data System (EADS) Cray-XMP supercomputer
Parallel FEM Simulation of Electromechanics in the Heart
NASA Astrophysics Data System (ADS)
Xia, Henian; Wong, Kwai; Zhao, Xiaopeng
2011-11-01
Cardiovascular disease is the leading cause of death in America. Computer simulation of complicated dynamics of the heart could provide valuable quantitative guidance for diagnosis and treatment of heart problems. In this paper, we present an integrated numerical model which encompasses the interaction of cardiac electrophysiology, electromechanics, and mechanoelectrical feedback. The model is solved by finite element method on a Linux cluster and the Cray XT5 supercomputer, kraken. Dynamical influences between the effects of electromechanics coupling and mechanic-electric feedback are shown.
Computation of transonic flow about helicopter rotor blades
NASA Technical Reports Server (NTRS)
Arieli, R.; Tauber, M. E.; Saunders, D. A.; Caughey, D. A.
1986-01-01
An inviscid, nonconservative, three-dimensional full-potential flow code, ROT22, has been developed for computing the quasi-steady flow about a lifting rotor blade. The code is valid throughout the subsonic and transonic regime. Calculations from the code are compared with detailed laser velocimeter measurements made in the tip region of a nonlifting rotor at a tip Mach number of 0.95 and zero advance ratio. In addition, comparisons are made with chordwise surface pressure measurements obtained in a wind tunnel for a nonlifting rotor blade at transonic tip speeds at advance ratios from 0.40 to 0.50. The overall agreement between theoretical calculations and experiment is very good. A typical run on a CRAY X-MP computer requires about 30 CPU seconds for one rotor position at transonic tip speed.
Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-core Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aktulga, Hasan Metin; Coffman, Paul; Shan, Tzu-Ray
2015-12-01
Hybrid parallelism allows high performance computing applications to better leverage the increasing on-node parallelism of modern supercomputers. In this paper, we present a hybrid parallel implementation of the widely used LAMMPS/ReaxC package, where the construction of bonded and nonbonded lists and evaluation of complex ReaxFF interactions are implemented efficiently using OpenMP parallelism. Additionally, the performance of the QEq charge equilibration scheme is examined and a dual-solver is implemented. We present the performance of the resulting ReaxC-OMP package on a state-of-the-art multi-core architecture Mira, an IBM BlueGene/Q supercomputer. For system sizes ranging from 32 thousand to 16.6 million particles, speedups inmore » the range of 1.5-4.5x are observed using the new ReaxC-OMP software. Sustained performance improvements have been observed for up to 262,144 cores (1,048,576 processes) of Mira with a weak scaling efficiency of 91.5% in larger simulations containing 16.6 million particles.« less
Optical clock distribution in supercomputers using polyimide-based waveguides
NASA Astrophysics Data System (ADS)
Bihari, Bipin; Gan, Jianhua; Wu, Linghui; Liu, Yujie; Tang, Suning; Chen, Ray T.
1999-04-01
Guided-wave optics is a promising way to deliver high-speed clock-signal in supercomputer with minimized clock-skew. Si- CMOS compatible polymer-based waveguides for optoelectronic interconnects and packaging have been fabricated and characterized. A 1-to-48 fanout optoelectronic interconnection layer (OIL) structure based on Ultradel 9120/9020 for the high-speed massive clock signal distribution for a Cray T-90 supercomputer board has been constructed. The OIL employs multimode polymeric channel waveguides in conjunction with surface-normal waveguide output coupler and 1-to-2 splitters. Surface-normal couplers can couple the optical clock signals into and out from the H-tree polyimide waveguides surface-normally, which facilitates the integration of photodetectors to convert optical-signal to electrical-signal. A 45-degree surface- normal couplers has been integrated at each output end. The measured output coupling efficiency is nearly 100 percent. The output profile from 45-degree surface-normal coupler were calculated using Fresnel approximation. the theoretical result is in good agreement with experimental result. A total insertion loss of 7.98 dB at 850 nm was measured experimentally.
NASA Technical Reports Server (NTRS)
Berger, Marsha J.; Saltzman, Jeff S.
1992-01-01
We describe the development of a structured adaptive mesh algorithm (AMR) for the Connection Machine-2 (CM-2). We develop a data layout scheme that preserves locality even for communication between fine and coarse grids. On 8K of a 32K machine we achieve performance slightly less than 1 CPU of the Cray Y-MP. We apply our algorithm to an inviscid compressible flow problem.
Factoring symmetric indefinite matrices on high-performance architectures
NASA Technical Reports Server (NTRS)
Jones, Mark T.; Patrick, Merrell L.
1990-01-01
The Bunch-Kaufman algorithm is the method of choice for factoring symmetric indefinite matrices in many applications. However, the Bunch-Kaufman algorithm does not take advantage of high-performance architectures such as the Cray Y-MP. Three new algorithms, based on Bunch-Kaufman factorization, that take advantage of such architectures are described. Results from an implementation of the third algorithm are presented.
Parallel Calculation of Sensitivity Derivatives for Aircraft Design using Automatic Differentiation
NASA Technical Reports Server (NTRS)
Bischof, c. H.; Green, L. L.; Haigler, K. J.; Knauff, T. L., Jr.
1994-01-01
Sensitivity derivative (SD) calculation via automatic differentiation (AD) typical of that required for the aerodynamic design of a transport-type aircraft is considered. Two ways of computing SD via code generated by the ADIFOR automatic differentiation tool are compared for efficiency and applicability to problems involving large numbers of design variables. A vector implementation on a Cray Y-MP computer is compared with a coarse-grained parallel implementation on an IBM SP1 computer, employing a Fortran M wrapper. The SD are computed for a swept transport wing in turbulent, transonic flow; the number of geometric design variables varies from 1 to 60 with coupling between a wing grid generation program and a state-of-the-art, 3-D computational fluid dynamics program, both augmented for derivative computation via AD. For a small number of design variables, the Cray Y-MP implementation is much faster. As the number of design variables grows, however, the IBM SP1 becomes an attractive alternative in terms of compute speed, job turnaround time, and total memory available for solutions with large numbers of design variables. The coarse-grained parallel implementation also can be moved easily to a network of workstations.
NASA Technical Reports Server (NTRS)
Hull, Gary; Ranade, Sanjay
1993-01-01
With over 5000 units sold, the Storage Tek Automated Cartridge System (ACS) 4400 tape library is currently the most popular large automated tape library. Based on 3480/90 tape technology, the library is used as the migration device ('nearline' storage) in high-performance mass storage systems. In its maximum configuration, one ACS 4400 tape library houses sixteen 3480/3490 tape drives and is capable of holding approximately 6000 cartridge tapes. The maximum storage capacity of one library using 3480 tapes is 1.2 TB and the advertised aggregate I/O rate is about 24 MB/s. This paper reports on an extensive set of tests designed to accurately assess the performance capabilities and operational characteristics of one STK ACS 4400 tape library holding approximately 5200 cartridge tapes and configured with eight 3480 tape drives. A Cray Y-MP EL2-256 was configured as its host machine. More than 40,000 tape jobs were run in a variety of conditions to gather data in the areas of channel speed characteristics, robotics motion, time taped mounts, and timed tape reads and writes.
NASA Astrophysics Data System (ADS)
Buaria, D.; Yeung, P. K.
2017-12-01
A new parallel algorithm utilizing a partitioned global address space (PGAS) programming model to achieve high scalability is reported for particle tracking in direct numerical simulations of turbulent fluid flow. The work is motivated by the desire to obtain Lagrangian information necessary for the study of turbulent dispersion at the largest problem sizes feasible on current and next-generation multi-petaflop supercomputers. A large population of fluid particles is distributed among parallel processes dynamically, based on instantaneous particle positions such that all of the interpolation information needed for each particle is available either locally on its host process or neighboring processes holding adjacent sub-domains of the velocity field. With cubic splines as the preferred interpolation method, the new algorithm is designed to minimize the need for communication, by transferring between adjacent processes only those spline coefficients determined to be necessary for specific particles. This transfer is implemented very efficiently as a one-sided communication, using Co-Array Fortran (CAF) features which facilitate small data movements between different local partitions of a large global array. The cost of monitoring transfer of particle properties between adjacent processes for particles migrating across sub-domain boundaries is found to be small. Detailed benchmarks are obtained on the Cray petascale supercomputer Blue Waters at the University of Illinois, Urbana-Champaign. For operations on the particles in a 81923 simulation (0.55 trillion grid points) on 262,144 Cray XE6 cores, the new algorithm is found to be orders of magnitude faster relative to a prior algorithm in which each particle is tracked by the same parallel process at all times. This large speedup reduces the additional cost of tracking of order 300 million particles to just over 50% of the cost of computing the Eulerian velocity field at this scale. Improving support of PGAS models on major compilers suggests that this algorithm will be of wider applicability on most upcoming supercomputers.
A secure file manager for UNIX
DOE Office of Scientific and Technical Information (OSTI.GOV)
DeVries, R.G.
1990-12-31
The development of a secure file management system for a UNIX-based computer facility with supercomputers and workstations is described. Specifically, UNIX in its usual form does not address: (1) Operation which would satisfy rigorous security requirements. (2) Online space management in an environment where total data demands would be many times the actual online capacity. (3) Making the file management system part of a computer network in which users of any computer in the local network could retrieve data generated on any other computer in the network. The characteristics of UNIX can be exploited to develop a portable, secure filemore » manager which would operate on computer systems ranging from workstations to supercomputers. Implementation considerations making unusual use of UNIX features, rather than requiring extensive internal system changes, are described, and implementation using the Cray Research Inc. UNICOS operating system is outlined.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1983-09-09
This Validation Summary Report (VSR) for the Cray Research, Inc., CRAY FORTRAN Translator (CFT) Version 1.11 Bugfix 1 running under the CRAY Operating System (COS) Version 1.12 provides a consolidated summary of the results obtained from the validation of the subject compiler against the 1978 FORTRAN Standard (X3.9-1978/FIPS PUB 69). The compiler was validated against the Full Level FORTRAN level of FIPS PUB 69. The VSR is made up of several sections showing all the discrepancies found -if any. These include an overview of the validation which lists all categories of discrepancies together with the tests which failed.
Revealing topographic lineaments through IHS enhancement of DEM data. [Digital Elevation Model
NASA Technical Reports Server (NTRS)
Murdock, Gary
1990-01-01
Intensity-hue-saturation (IHS) processing of slope (dip), aspect (dip direction), and elevation to reveal subtle topographic lineaments which may not be obvious in the unprocessed data are used to enhance digital elevation model (DEM) data from northwestern Nevada. This IHS method of lineament identification was applied to a mosiac of 12 square degrees using a Cray Y-MP8/864. Square arrays from 3 x 3 to 31 x 31 points were tested as well as several different slope enhancements. When relatively few points are used to fit the plane, lineaments of various lengths are observed and a mechanism for lineament classification is described. An area encompassing the gold deposits of the Carlin trend and including the Rain in the southeast to Midas in the northwest is investigated in greater detail. The orientation and density of lineaments may be determined on the gently sloping pediment surface as well as in the more steeply sloping ranges.
NASA Astrophysics Data System (ADS)
Wu, Linghui; Bihari, Bipin; Gan, Jianhua; Chen, Ray T.; Tang, Suning
1998-08-01
Si-CMOS compatible polymer-based waveguides for optoelectronic interconnects and packaging have been fabricated and characterized. A 1-to-48 fanout optoelectronic interconnection layer (OIL) structure based on Ultradel 9120/9020 for the high-speed massive clock signal distribution for a Cray T-90 supercomputer board has been constructed. The OIL employs multimode polymeric channel waveguides in conjunction with surface-normal waveguide output coupler and 1-to-2 splitter. A total insertion loss of 7.98 dB at 850 nm was measured experimentally.
Simulation and analysis of a geopotential research mission
NASA Technical Reports Server (NTRS)
Schutz, B. E.
1986-01-01
A computer simulation was performed for a Geopotential Research Mission (GRM) to enable study of the gravitational sensitivity of the range/rate measurement between two satellites and to provide a set of simulated measurements to assist in the evaluation of techniques developed for the determination of the gravity field. The simulation, identified as SGRM 8511, was conducted with two satellites in near circular, frozen orbits at 160 km altitude and separated by 300 km. High precision numerical integration of the polar orbits was used with a gravitational field complete to degree and order 180 coefficients and to degree 300 in orders 0 to 10. The set of simulated data for a mission duration of about 32 days was generated on a Cray X-MP computer. The characteristics of the simulation and the nature of the results are described.
A portable platform for accelerated PIC codes and its application to GPUs using OpenACC
NASA Astrophysics Data System (ADS)
Hariri, F.; Tran, T. M.; Jocksch, A.; Lanti, E.; Progsch, J.; Messmer, P.; Brunner, S.; Gheller, C.; Villard, L.
2016-10-01
We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the basic steps of the PIC algorithm and has been designed as a test bed for different algorithmic options and data structures. Among the architectures that this engine can explore, particular attention is given here to systems equipped with GPUs. The study demonstrates that our portable PIC implementation based on the OpenACC programming model can achieve performance closely matching theoretical predictions. Using the Cray XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS), we show that PIC_ENGINE running on an NVIDIA Kepler K20X GPU can outperform the one on an Intel Sandy bridge 8-core CPU by a factor of 3.4.
Parallel Navier-Stokes computations on shared and distributed memory architectures
NASA Technical Reports Server (NTRS)
Hayder, M. Ehtesham; Jayasimha, D. N.; Pillay, Sasi Kumar
1995-01-01
We study a high order finite difference scheme to solve the time accurate flow field of a jet using the compressible Navier-Stokes equations. As part of our ongoing efforts, we have implemented our numerical model on three parallel computing platforms to study the computational, communication, and scalability characteristics. The platforms chosen for this study are a cluster of workstations connected through fast networks (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and a distributed memory multiprocessor (the IBM SPI). Our focus in this study is on the LACE testbed. We present some results for the Cray YMP and the IBM SP1 mainly for comparison purposes. On the LACE testbed, we study: (1) the communication characteristics of Ethernet, FDDI, and the ALLNODE networks and (2) the overheads induced by the PVM message passing library used for parallelizing the application. We demonstrate that clustering of workstations is effective and has the potential to be computationally competitive with supercomputers at a fraction of the cost.
Compute Server Performance Results
NASA Technical Reports Server (NTRS)
Stockdale, I. E.; Barton, John; Woodrow, Thomas (Technical Monitor)
1994-01-01
Parallel-vector supercomputers have been the workhorses of high performance computing. As expectations of future computing needs have risen faster than projected vector supercomputer performance, much work has been done investigating the feasibility of using Massively Parallel Processor systems as supercomputers. An even more recent development is the availability of high performance workstations which have the potential, when clustered together, to replace parallel-vector systems. We present a systematic comparison of floating point performance and price-performance for various compute server systems. A suite of highly vectorized programs was run on systems including traditional vector systems such as the Cray C90, and RISC workstations such as the IBM RS/6000 590 and the SGI R8000. The C90 system delivers 460 million floating point operations per second (FLOPS), the highest single processor rate of any vendor. However, if the price-performance ration (PPR) is considered to be most important, then the IBM and SGI processors are superior to the C90 processors. Even without code tuning, the IBM and SGI PPR's of 260 and 220 FLOPS per dollar exceed the C90 PPR of 160 FLOPS per dollar when running our highly vectorized suite,
Sookhak Lari, Kaveh; Johnston, Colin D; Rayner, John L; Davis, Greg B
2018-03-05
Remediation of subsurface systems, including groundwater, soil and soil gas, contaminated with light non-aqueous phase liquids (LNAPLs) is challenging. Field-scale pilot trials of multi-phase remediation were undertaken at a site to determine the effectiveness of recovery options. Sequential LNAPL skimming and vacuum-enhanced skimming, with and without water table drawdown were trialled over 78days; in total extracting over 5m 3 of LNAPL. For the first time, a multi-component simulation framework (including the multi-phase multi-component code TMVOC-MP and processing codes) was developed and applied to simulate the broad range of multi-phase remediation and recovery methods used in the field trials. This framework was validated against the sequential pilot trials by comparing predicted and measured LNAPL mass removal rates and compositional changes. The framework was tested on both a Cray supercomputer and a cluster. Simulations mimicked trends in LNAPL recovery rates (from 0.14 to 3mL/s) across all remediation techniques each operating over periods of 4-14days over the 78day trial. The code also approximated order of magnitude compositional changes of hazardous chemical concentrations in extracted gas during vacuum-enhanced recovery. The verified framework enables longer term prediction of the effectiveness of remediation approaches allowing better determination of remediation endpoints and long-term risks. Copyright © 2017 Commonwealth Scientific and Industrial Research Organisation. Published by Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Huhn, William Paul; Lange, Björn; Yu, Victor; Blum, Volker; Lee, Seyong; Yoon, Mina
Density-functional theory has been well established as the dominant quantum-mechanical computational method in the materials community. Large accurate simulations become very challenging on small to mid-scale computers and require high-performance compute platforms to succeed. GPU acceleration is one promising approach. In this talk, we present a first implementation of all-electron density-functional theory in the FHI-aims code for massively parallel GPU-based platforms. Special attention is paid to the update of the density and to the integration of the Hamiltonian and overlap matrices, realized in a domain decomposition scheme on non-uniform grids. The initial implementation scales well across nodes on ORNL's Titan Cray XK7 supercomputer (8 to 64 nodes, 16 MPI ranks/node) and shows an overall speed up in runtime due to utilization of the K20X Tesla GPUs on each Titan node of 1.4x, with the charge density update showing a speed up of 2x. Further acceleration opportunities will be discussed. Work supported by the LDRD Program of ORNL managed by UT-Battle, LLC, for the U.S. DOE and by the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.
Applications of CFD and visualization techniques
NASA Technical Reports Server (NTRS)
Saunders, James H.; Brown, Susan T.; Crisafulli, Jeffrey J.; Southern, Leslie A.
1992-01-01
In this paper, three applications are presented to illustrate current techniques for flow calculation and visualization. The first two applications use a commercial computational fluid dynamics (CFD) code, FLUENT, performed on a Cray Y-MP. The results are animated with the aid of data visualization software, apE. The third application simulates a particulate deposition pattern using techniques inspired by developments in nonlinear dynamical systems. These computations were performed on personal computers.
NASA Technical Reports Server (NTRS)
Rogers, S. E.
1994-01-01
INS3D computes steady-state solutions to the incompressible Navier-Stokes equations. The INS3D approach utilizes pseudo-compressibility combined with an approximate factorization scheme. This computational fluid dynamics (CFD) code has been verified on problems such as flow through a channel, flow over a backwardfacing step and flow over a circular cylinder. Three dimensional cases include flow over an ogive cylinder, flow through a rectangular duct, wind tunnel inlet flow, cylinder-wall juncture flow and flow through multiple posts mounted between two plates. INS3D uses a pseudo-compressibility approach in which a time derivative of pressure is added to the continuity equation, which together with the momentum equations form a set of four equations with pressure and velocity as the dependent variables. The equations' coordinates are transformed for general three dimensional applications. The equations are advanced in time by the implicit, non-iterative, approximately-factored, finite-difference scheme of Beam and Warming. The numerical stability of the scheme depends on the use of higher-order smoothing terms to damp out higher-frequency oscillations caused by second-order central differencing. The artificial compressibility introduces pressure (sound) waves of finite speed (whereas the speed of sound would be infinite in an incompressible fluid). As the solution converges, these pressure waves die out, causing the derivation of pressure with respect to time to approach zero. Thus, continuity is satisfied for the incompressible fluid in the steady state. Computational efficiency is achieved using a diagonal algorithm. A block tri-diagonal option is also available. When a steady-state solution is reached, the modified continuity equation will satisfy the divergence-free velocity field condition. INS3D is capable of handling several different types of boundaries encountered in numerical simulations, including solid-surface, inflow and outflow, and far-field boundaries. Three machine versions of INS3D are available. INS3D for the CRAY is written in CRAY FORTRAN for execution on a CRAY X-MP under COS, INS3D for the IBM is written in FORTRAN 77 for execution on an IBM 3090 under the VM or MVS operating system, and INS3D for DEC RISC-based systems is written in RISC FORTRAN for execution on a DEC workstation running RISC ULTRIX 3.1 or later. The CRAY version has a central memory requirement of 730279 words. The central memory requirement for the IBM is 150Mb. The memory requirement for the DEC RISC ULTRIX version is 3Mb of main memory. INS3D was developed in 1987. The port to the IBM was done in 1990. The port to the DECstation 3100 was done in 1991. CRAY is a registered trademark of Cray Research Inc. IBM is a registered trademark of International Business Machines. DEC, DECstation, and ULTRIX are trademarks of the Digital Equipment Corporation.
NASA Technical Reports Server (NTRS)
Knupp, Kevin R.
1988-01-01
Described is work performed under NASA Grant NAG8-654 for the period 15 March to 15 September 1988. This work entails primarily data analysis and numerical modeling efforts related to the 1986 Satellite Precipitation and Cloud Experiment (SPACE). In the following, the SPACE acronym is used along with the acronym COHMEX, which represents the encompassing Cooperative Huntsville Meteorological Experiment. Progress made during the second half of the first year of the study included: (1) installation and testing of the RAMS numerical Modeling system on the Alabama CRAY X-MP/24; (2) a start on the analysis of the mesoscale convection system (MCS) of 13 July 1986 COHMEX case; and (3) a cursory examination of a small MCS that formed over the COHMEX region on 15 July 1986. Details of each of these individual tasks are given.
Optimal spacecraft attitude control using collocation and nonlinear programming
NASA Astrophysics Data System (ADS)
Herman, A. L.; Conway, B. A.
1992-10-01
Direct collocation with nonlinear programming (DCNLP) is employed to find the optimal open-loop control histories for detumbling a disabled satellite. The controls are torques and forces applied to the docking arm and joint and torques applied about the body axes of the OMV. Solutions are obtained for cases in which various constraints are placed on the controls and in which the number of controls is reduced or increased from that considered in Conway and Widhalm (1986). DCLNP works well when applied to the optimal control problem of satellite attitude control. The formulation is straightforward and produces good results in a relatively small amount of time on a Cray X/MP with no a priori information about the optimal solution. The addition of joint acceleration to the controls significantly reduces the control magnitudes and optimal cost. In all cases, the torques and acclerations are modest and the optimal cost is very modest.
A static data flow simulation study at Ames Research Center
NASA Technical Reports Server (NTRS)
Barszcz, Eric; Howard, Lauri S.
1987-01-01
Demands in computational power, particularly in the area of computational fluid dynamics (CFD), led NASA Ames Research Center to study advanced computer architectures. One architecture being studied is the static data flow architecture based on research done by Jack B. Dennis at MIT. To improve understanding of this architecture, a static data flow simulator, written in Pascal, has been implemented for use on a Cray X-MP/48. A matrix multiply and a two-dimensional fast Fourier transform (FFT), two algorithms used in CFD work at Ames, have been run on the simulator. Execution times can vary by a factor of more than 2 depending on the partitioning method used to assign instructions to processing elements. Service time for matching tokens has proved to be a major bottleneck. Loop control and array address calculation overhead can double the execution time. The best sustained MFLOPS rates were less than 50% of the maximum capability of the machine.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reed, D.A.; Grunwald, D.C.
The spectrum of parallel processor designs can be divided into three sections according to the number and complexity of the processors. At one end there are simple, bit-serial processors. Any one of thee processors is of little value, but when it is coupled with many others, the aggregate computing power can be large. This approach to parallel processing can be likened to a colony of termites devouring a log. The most notable examples of this approach are the NASA/Goodyear Massively Parallel Processor, which has 16K one-bit processors, and the Thinking Machines Connection Machine, which has 64K one-bit processors. At themore » other end of the spectrum, a small number of processors, each built using the fastest available technology and the most sophisticated architecture, are combined. An example of this approach is the Cray X-MP. This type of parallel processing is akin to four woodmen attacking the log with chainsaws.« less
NASA Technical Reports Server (NTRS)
Noor, Ahmed K.; Peters, Jeanne M.
1989-01-01
A computational procedure is presented for the nonlinear dynamic analysis of unsymmetric structures on vector multiprocessor systems. The procedure is based on a novel hierarchical partitioning strategy in which the response of the unsymmetric and antisymmetric response vectors (modes), each obtained by using only a fraction of the degrees of freedom of the original finite element model. The three key elements of the procedure which result in high degree of concurrency throughout the solution process are: (1) mixed (or primitive variable) formulation with independent shape functions for the different fields; (2) operator splitting or restructuring of the discrete equations at each time step to delineate the symmetric and antisymmetric vectors constituting the response; and (3) two level iterative process for generating the response of the structure. An assessment is made of the effectiveness of the procedure on the CRAY X-MP/4 computers.
MODA A Framework for Memory Centric Performance Characterization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shrestha, Sunil; Su, Chun-Yi; White, Amanda M.
2012-06-29
In the age of massive parallelism, the focus of performance analysis has switched from the processor and related structures to the memory and I/O resources. Adapting to this new reality, a performance analysis tool has to provide a way to analyze resource usage to pinpoint existing and potential problems in a given application. This paper provides an overview of the Memory Observant Data Analysis (MODA) tool, a memory-centric tool first implemented on the Cray XMT supercomputer. Throughout the paper, MODA's capabilities have been showcased with experiments done on matrix multiply and Graph-500 application codes.
NASA Astrophysics Data System (ADS)
Filipcic, A.; Haug, S.; Hostettler, M.; Walker, R.; Weber, M.
2015-12-01
The Piz Daint Cray XC30 HPC system at CSCS, the Swiss National Supercomputing centre, was the highest ranked European system on TOP500 in 2014, also featuring GPU accelerators. Event generation and detector simulation for the ATLAS experiment have been enabled for this machine. We report on the technical solutions, performance, HPC policy challenges and possible future opportunities for HEP on extreme HPC systems. In particular a custom made integration to the ATLAS job submission system has been developed via the Advanced Resource Connector (ARC) middleware. Furthermore, a partial GPU acceleration of the Geant4 detector simulations has been implemented.
Modeling Subsurface Reactive Flows Using Leadership-Class Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mills, Richard T; Hammond, Glenn; Lichtner, Peter
2009-01-01
We describe our experiences running PFLOTRAN - a code for simulation of coupled hydro-thermal-chemical processes in variably saturated, non-isothermal, porous media - on leadership-class supercomputers, including initial experiences running on the petaflop incarnation of Jaguar, the Cray XT5 at the National Center for Computational Sciences at Oak Ridge National Laboratory. PFLOTRAN utilizes fully implicit time-stepping and is built on top of the Portable, Extensible Toolkit for Scientific Computation (PETSc). We discuss some of the hurdles to 'at scale' performance with PFLOTRAN and the progress we have made in overcoming them on leadership-class computer architectures.
Network issues for large mass storage requirements
NASA Technical Reports Server (NTRS)
Perdue, James
1992-01-01
File Servers and Supercomputing environments need high performance networks to balance the I/O requirements seen in today's demanding computing scenarios. UltraNet is one solution which permits both high aggregate transfer rates and high task-to-task transfer rates as demonstrated in actual tests. UltraNet provides this capability as both a Server-to-Server and Server-to-Client access network giving the supercomputing center the following advantages highest performance Transport Level connections (to 40 MBytes/sec effective rates); matches the throughput of the emerging high performance disk technologies, such as RAID, parallel head transfer devices and software striping; supports standard network and file system applications using SOCKET's based application program interface such as FTP, rcp, rdump, etc.; supports access to the Network File System (NFS) and LARGE aggregate bandwidth for large NFS usage; provides access to a distributed, hierarchical data server capability using DISCOS UniTree product; supports file server solutions available from multiple vendors, including Cray, Convex, Alliant, FPS, IBM, and others.
Supercomputer description of human lung morphology for imaging analysis.
Martonen, T B; Hwang, D; Guan, X; Fleming, J S
1998-04-01
A supercomputer code that describes the three-dimensional branching structure of the human lung has been developed. The algorithm was written for the Cray C94. In our simulations, the human lung was divided into a matrix containing discrete volumes (voxels) so as to be compatible with analyses of SPECT images. The matrix has 3840 voxels. The matrix can be segmented into transverse, sagittal and coronal layers analogous to human subject examinations. The compositions of individual voxels were identified by the type and respective number of airways present. The code provides a mapping of the spatial positions of the almost 17 million airways in human lungs and unambiguously assigns each airway to a voxel. Thus, the clinician and research scientist in the medical arena have a powerful new tool to be used in imaging analyses. The code was designed to be integrated into diverse applications, including the interpretation of SPECT images, the design of inhalation exposure experiments and the targeted delivery of inhaled pharmacologic drugs.
Integrated risk/cost planning models for the US Air Traffic system
NASA Technical Reports Server (NTRS)
Mulvey, J. M.; Zenios, S. A.
1985-01-01
A prototype network planning model for the U.S. Air Traffic control system is described. The model encompasses the dual objectives of managing collision risks and transportation costs where traffic flows can be related to these objectives. The underlying structure is a network graph with nonseparable convex costs; the model is solved efficiently by capitalizing on its intrinsic characteristics. Two specialized algorithms for solving the resulting problems are described: (1) truncated Newton, and (2) simplicial decomposition. The feasibility of the approach is demonstrated using data collected from a control center in the Midwest. Computational results with different computer systems are presented, including a vector supercomputer (CRAY-XMP). The risk/cost model has two primary uses: (1) as a strategic planning tool using aggregate flight information, and (2) as an integrated operational system for forecasting congestion and monitoring (controlling) flow throughout the U.S. In the latter case, access to a supercomputer is required due to the model's enormous size.
Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shan, Hongzhang; Williams, Samuel; Jong, Wibe de
In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in tt native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant effort was required to safely and efficiently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI OpenMP hybrid implementations attain up to 65x better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6x better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pope, G.A.; Lake, L.W.; Sepehrnoori, K.
1987-07-01
This report consists of three parts. Part A describes the development of our chemical flood simulator UTCHEM during the past year, simulation studies, and physical property modelling and experiments. Part B is a report on the optimization and vectorization of UTCHEM on our Cray supercomputer to speed it up. Part C describes our use of UTCHEM to investigate the use of tracers for interwell reservoir tests. Part A of this Annual Report consists of five sections. In the first section, we give a general description of the simulator and recent changes in it along with a test case for amore » slightly compressible fluid. In the second section, we describe the major changes which were needed to add gel and alkaline reactions and give preliminary simulation results for these processes. In the third section, comparisons with a surfactant pilot field test are given. In the fourth section, process scaleup and design simulations are given and also our recent mesh refinement results. In the fifth section, experimental results and associated physical property modelling studies are reported. Part B gives our results on the speedup of UTCHEM on a Cray supercomputer. Depending on the size of the problem, this speedup factor was at least tenfold and resulted from a combination of a faster solver, vectorization, and code optimization. Part C describes our use of UTCHEM for field tracer studies and gives the results of a comparison with field tracer data on the same field (Big Muddy) as was simulated and compared with the surfactant pilot reported in section 3 of Part A. 120 figs., 37 tabs.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Haynes, R.A.
The Network File System (NFS) is used in UNIX-based networks to provide transparent file sharing between heterogeneous systems. Although NFS is well-known for being weak in security, it is widely used and has become a de facto standard. This paper examines the user authentication shortcomings of NFS and the approach Sandia National Laboratories has taken to strengthen it with Kerberos. The implementation on a Cray Y-MP8/864 running UNICOS is described and resource/performance issues are discussed. 4 refs., 4 figs.
An implementation of a tree code on a SIMD, parallel computer
NASA Technical Reports Server (NTRS)
Olson, Kevin M.; Dorband, John E.
1994-01-01
We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.
The SGI/Cray T3E: Experiences and Insights
NASA Technical Reports Server (NTRS)
Bernard, Lisa Hamet
1998-01-01
The NASA Goddard Space Flight Center is home to the fifth most powerful supercomputer in the world, a 1024 processor SGI/Cray T3E-600. The original 512 processor system was placed at Goddard in March, 1997 as part of a cooperative agreement between the High Performance Computing and Communications Program's Earth and Space Sciences Project (ESS) and SGI/Cray Research. The goal of this system is to facilitate achievement of the Project milestones of 10, 50 and 100 GFLOPS sustained performance on selected Earth and space science application codes. The additional 512 processors were purchased in March, 1998 by the NASA Earth Science Enterprise for the NASA Seasonal to Interannual Prediction Project (NSIPP). These two "halves" still operate as a single system, and must satisfy the unique requirements of both aforementioned groups, as well as guest researchers from the Earth, space, microgravity, manned space flight and aeronautics communities. Few large scalable parallel systems are configured for capability computing, so models are hard to find. This unique environment has created a challenging system administration task, and has yielded some insights into the supercomputing needs of the various NASA Enterprises, as well as insights into the strengths and weaknesses of the T3E architecture and software. The T3E is a distributed memory system in which the processing elements (PE's) are connected by a low latency, high bandwidth bidirectional 3-D torus. Due to the focus on high speed communication between PE's, the T3E requires PE's to be allocated contiguously per job. Further, jobs will only execute on the user specified number of PE's and PE timesharing is possible but impractical. With a highly varied job mix in both size and runtime of jobs, the resulting scenario is PE fragmentation and an inability to achieve near 100% utilization. SGI/Cray has provided several scheduling and configuration tools to minimize the impact of fragmentation. These tools include PScheD (the political scheduler), GRM (the global resource manager) and NQE (the Network Queuing Environment). Features and impact of these tools will be discussed, as will resulting performance and utilization data. As a distributed memory system, the T3E is designed to be programmed through explicit message passing. Consequently, certain assumptions related to code design are made by the operating system (UNICOS/mk) and its scheduling tools. With the exception of HPF, which does run on the T3E, however poorly, alternative programming styles have the potential to impact the T3E in unexpected and undesirable ways. Several examples will be presented (preceeded with the disclaimer, "Don't try this at home! Violators will be prosecuted!")
Tough2{_}MP: A parallel version of TOUGH2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Keni; Wu, Yu-Shu; Ding, Chris
2003-04-09
TOUGH2{_}MP is a massively parallel version of TOUGH2. It was developed for running on distributed-memory parallel computers to simulate large simulation problems that may not be solved by the standard, single-CPU TOUGH2 code. The new code implements an efficient massively parallel scheme, while preserving the full capacity and flexibility of the original TOUGH2 code. The new software uses the METIS software package for grid partitioning and AZTEC software package for linear-equation solving. The standard message-passing interface is adopted for communication among processors. Numerical performance of the current version code has been tested on CRAY-T3E and IBM RS/6000 SP platforms. Inmore » addition, the parallel code has been successfully applied to real field problems of multi-million-cell simulations for three-dimensional multiphase and multicomponent fluid and heat flow, as well as solute transport. In this paper, we will review the development of the TOUGH2{_}MP, and discuss the basic features, modules, and their applications.« less
Scalable nuclear density functional theory with Sky3D
NASA Astrophysics Data System (ADS)
Afibuzzaman, Md; Schuetrumpf, Bastian; Aktulga, Hasan Metin
2018-02-01
In nuclear astrophysics, quantum simulations of large inhomogeneous dense systems as they appear in the crusts of neutron stars present big challenges. The number of particles in a simulation with periodic boundary conditions is strongly limited due to the immense computational cost of the quantum methods. In this paper, we describe techniques for an efficient and scalable parallel implementation of Sky3D, a nuclear density functional theory solver that operates on an equidistant grid. Presented techniques allow Sky3D to achieve good scaling and high performance on a large number of cores, as demonstrated through detailed performance analysis on a Cray XC40 supercomputer.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reynolds, William; Weber, Marta S.; Farber, Robert M.
Social Media provide an exciting and novel view into social phenomena. The vast amounts of data that can be gathered from the Internet coupled with massively parallel supercomputers such as the Cray XMT open new vistas for research. Conclusions drawn from such analysis must recognize that social media are distinct from the underlying social reality. Rigorous validation is essential. This paper briefly presents results obtained from computational analysis of social media - utilizing both blog and twitter data. Validation of these results is discussed in the context of a framework of established methodologies from the social sciences. Finally, an outlinemore » for a set of supporting studies is proposed.« less
NASA Technical Reports Server (NTRS)
Biyabani, S. R.
1994-01-01
INS3D computes steady-state solutions to the incompressible Navier-Stokes equations. The INS3D approach utilizes pseudo-compressibility combined with an approximate factorization scheme. This computational fluid dynamics (CFD) code has been verified on problems such as flow through a channel, flow over a backwardfacing step and flow over a circular cylinder. Three dimensional cases include flow over an ogive cylinder, flow through a rectangular duct, wind tunnel inlet flow, cylinder-wall juncture flow and flow through multiple posts mounted between two plates. INS3D uses a pseudo-compressibility approach in which a time derivative of pressure is added to the continuity equation, which together with the momentum equations form a set of four equations with pressure and velocity as the dependent variables. The equations' coordinates are transformed for general three dimensional applications. The equations are advanced in time by the implicit, non-iterative, approximately-factored, finite-difference scheme of Beam and Warming. The numerical stability of the scheme depends on the use of higher-order smoothing terms to damp out higher-frequency oscillations caused by second-order central differencing. The artificial compressibility introduces pressure (sound) waves of finite speed (whereas the speed of sound would be infinite in an incompressible fluid). As the solution converges, these pressure waves die out, causing the derivation of pressure with respect to time to approach zero. Thus, continuity is satisfied for the incompressible fluid in the steady state. Computational efficiency is achieved using a diagonal algorithm. A block tri-diagonal option is also available. When a steady-state solution is reached, the modified continuity equation will satisfy the divergence-free velocity field condition. INS3D is capable of handling several different types of boundaries encountered in numerical simulations, including solid-surface, inflow and outflow, and far-field boundaries. Three machine versions of INS3D are available. INS3D for the CRAY is written in CRAY FORTRAN for execution on a CRAY X-MP under COS, INS3D for the IBM is written in FORTRAN 77 for execution on an IBM 3090 under the VM or MVS operating system, and INS3D for DEC RISC-based systems is written in RISC FORTRAN for execution on a DEC workstation running RISC ULTRIX 3.1 or later. The CRAY version has a central memory requirement of 730279 words. The central memory requirement for the IBM is 150Mb. The memory requirement for the DEC RISC ULTRIX version is 3Mb of main memory. INS3D was developed in 1987. The port to the IBM was done in 1990. The port to the DECstation 3100 was done in 1991. CRAY is a registered trademark of Cray Research Inc. IBM is a registered trademark of International Business Machines. DEC, DECstation, and ULTRIX are trademarks of the Digital Equipment Corporation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meneses, Esteban; Ni, Xiang; Jones, Terry R
The unprecedented computational power of cur- rent supercomputers now makes possible the exploration of complex problems in many scientific fields, from genomic analysis to computational fluid dynamics. Modern machines are powerful because they are massive: they assemble millions of cores and a huge quantity of disks, cards, routers, and other components. But it is precisely the size of these machines that glooms the future of supercomputing. A system that comprises many components has a high chance to fail, and fail often. In order to make the next generation of supercomputers usable, it is imperative to use some type of faultmore » tolerance platform to run applications on large machines. Most fault tolerance strategies can be optimized for the peculiarities of each system and boost efficacy by keeping the system productive. In this paper, we aim to understand how failure characterization can improve resilience in several layers of the software stack: applications, runtime systems, and job schedulers. We examine the Titan supercomputer, one of the fastest systems in the world. We analyze a full year of Titan in production and distill the failure patterns of the machine. By looking into Titan s log files and using the criteria of experts, we provide a detailed description of the types of failures. In addition, we inspect the job submission files and describe how the system is used. Using those two sources, we cross correlate failures in the machine to executing jobs and provide a picture of how failures affect the user experience. We believe such characterization is fundamental in developing appropriate fault tolerance solutions for Cray systems similar to Titan.« less
DICE/ColDICE: 6D collisionless phase space hydrodynamics using a lagrangian tesselation
NASA Astrophysics Data System (ADS)
Sousbie, Thierry
2018-01-01
DICE is a C++ template library designed to solve collisionless fluid dynamics in 6D phase space using massively parallel supercomputers via an hybrid OpenMP/MPI parallelization. ColDICE, based on DICE, implements a cosmological and physical VLASOV-POISSON solver for cold systems such as dark matter (CDM) dynamics.
High Resolution Aerospace Applications using the NASA Columbia Supercomputer
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.; Aftosmis, Michael J.; Berger, Marsha
2005-01-01
This paper focuses on the parallel performance of two high-performance aerodynamic simulation packages on the newly installed NASA Columbia supercomputer. These packages include both a high-fidelity, unstructured, Reynolds-averaged Navier-Stokes solver, and a fully-automated inviscid flow package for cut-cell Cartesian grids. The complementary combination of these two simulation codes enables high-fidelity characterization of aerospace vehicle design performance over the entire flight envelope through extensive parametric analysis and detailed simulation of critical regions of the flight envelope. Both packages. are industrial-level codes designed for complex geometry and incorpor.ats. CuStomized multigrid solution algorithms. The performance of these codes on Columbia is examined using both MPI and OpenMP and using both the NUMAlink and InfiniBand interconnect fabrics. Numerical results demonstrate good scalability on up to 2016 CPUs using the NUMAIink4 interconnect, with measured computational rates in the vicinity of 3 TFLOP/s, while InfiniBand showed some performance degradation at high CPU counts, particularly with multigrid. Nonetheless, the results are encouraging enough to indicate that larger test cases using combined MPI/OpenMP communication should scale well on even more processors.
NASA Astrophysics Data System (ADS)
Wang, Z. P.; Hayhurst, D. R.
1994-07-01
The creep deformation and damage evolution in a pipe weldment has been modeled by using the finite-element continuum damage mechanics (CDM) method. The finite-element CDM computer program DAMAGE XX has been adapted to run with increased speed on a Cray XMP/416 supercomputer. Run times are sufficiently short (20 min) to permit many parametric studies to be carried out on vessel lifetimes for different weld and heat affected zone (HAZ) materials. Finite-element mesh sensitivity was studied first in order to select a mesh capable of correctly predicting experimentally observed results using at least possible computer time. A study was then made of the effect on the lifetime of a butt welded vessel of each of the commomly measured material parameters for the weld and HAZ materials. Forty different ferritic steel welded vessels were analyzed for a constant internal pressure of 45.5 MPa at a temperature of 565 C; each vessel having the same parent pipe material but different weld and HAZ materials. A lifetime improvement has been demonstrated of 30% over that obtained for the initial materials property data. A methodology for weldment design has been established which uses supercomputer-based CDM analysis techniques; it is quick to use, provides accurate results, and is a viable design tool.
COOP 3D ARPA Experiment 109 National Center for Atmospheric Research
NASA Technical Reports Server (NTRS)
1998-01-01
Coupled atmospheric and hydrodynamic forecast models were executed on the supercomputing resources of the National Center for Atmospheric Research (NCAR) in Boulder, Colorado and the Ohio Supercomputing Center (OSC)in Columbus, Ohio. respectively. The interoperation of the forecast models on these geographically diverse, high performance Cray platforms required the transfer of large three dimensional data sets at very high information rates. High capacity, terrestrial fiber optic transmission system technologies were integrated with those of an experimental high speed communications satellite in Geosynchronous Earth Orbit (GEO) to test the integration of the two systems. Operation over a spacecraft in GEO orbit required modification of the standard configuration of legacy data communications protocols to facilitate their ability to perform efficiently in the changing environment characteristic of a hybrid network. The success of this performance tuning enabled the use of such an architecture to facilitate high data rate, fiber optic quality data communications between high performance systems not accessible to standard terrestrial fiber transmission systems. Thus obviating the performance degradation often found in contemporary earth/satellite hybrids.
Magnetosphere simulations with a high-performance 3D AMR MHD Code
NASA Astrophysics Data System (ADS)
Gombosi, Tamas; Dezeeuw, Darren; Groth, Clinton; Powell, Kenneth; Song, Paul
1998-11-01
BATS-R-US is a high-performance 3D AMR MHD code for space physics applications running on massively parallel supercomputers. In BATS-R-US the electromagnetic and fluid equations are solved with a high-resolution upwind numerical scheme in a tightly coupled manner. The code is very robust and it is capable of spanning a wide range of plasma parameters (such as β, acoustic and Alfvénic Mach numbers). Our code is highly scalable: it achieved a sustained performance of 233 GFLOPS on a Cray T3E-1200 supercomputer with 1024 PEs. This talk reports results from the BATS-R-US code for the GGCM (Geospace General Circularculation Model) Phase 1 Standard Model Suite. This model suite contains 10 different steady-state configurations: 5 IMF clock angles (north, south, and three equally spaced angles in- between) with 2 IMF field strengths for each angle (5 nT and 10 nT). The other parameters are: solar wind speed =400 km/sec; solar wind number density = 5 protons/cc; Hall conductance = 0; Pedersen conductance = 5 S; parallel conductivity = ∞.
Multitasking the Davidson algorithm for the large, sparse eigenvalue problem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Umar, V.M.; Fischer, C.F.
1989-01-01
The authors report how the Davidson algorithm, developed for handling the eigenvalue problem for large and sparse matrices arising in quantum chemistry, was modified for use in atomic structure calculations. To date these calculations have used traditional eigenvalue methods, which limit the range of feasible calculations because of their excessive memory requirements and unsatisfactory performance attributed to time-consuming and costly processing of zero valued elements. The replacement of a traditional matrix eigenvalue method by the Davidson algorithm reduced these limitations. Significant speedup was found, which varied with the size of the underlying problem and its sparsity. Furthermore, the range ofmore » matrix sizes that can be manipulated efficiently was expended by more than one order or magnitude. On the CRAY X-MP the code was vectorized and the importance of gather/scatter analyzed. A parallelized version of the algorithm obtained an additional 35% reduction in execution time. Speedup due to vectorization and concurrency was also measured on the Alliant FX/8.« less
An Automated Parallel Image Registration Technique Based on the Correlation of Wavelet Features
NASA Technical Reports Server (NTRS)
LeMoigne, Jacqueline; Campbell, William J.; Cromp, Robert F.; Zukor, Dorothy (Technical Monitor)
2001-01-01
With the increasing importance of multiple platform/multiple remote sensing missions, fast and automatic integration of digital data from disparate sources has become critical to the success of these endeavors. Our work utilizes maxima of wavelet coefficients to form the basic features of a correlation-based automatic registration algorithm. Our wavelet-based registration algorithm is tested successfully with data from the National Oceanic and Atmospheric Administration (NOAA) Advanced Very High Resolution Radiometer (AVHRR) and the Landsat/Thematic Mapper(TM), which differ by translation and/or rotation. By the choice of high-frequency wavelet features, this method is similar to an edge-based correlation method, but by exploiting the multi-resolution nature of a wavelet decomposition, our method achieves higher computational speeds for comparable accuracies. This algorithm has been implemented on a Single Instruction Multiple Data (SIMD) massively parallel computer, the MasPar MP-2, as well as on the CrayT3D, the Cray T3E and a Beowulf cluster of Pentium workstations.
Massively Multithreaded Maxflow for Image Segmentation on the Cray XMT-2
Bokhari, Shahid H.; Çatalyürek, Ümit V.; Gurcan, Metin N.
2014-01-01
SUMMARY Image segmentation is a very important step in the computerized analysis of digital images. The maxflow mincut approach has been successfully used to obtain minimum energy segmentations of images in many fields. Classical algorithms for maxflow in networks do not directly lend themselves to efficient parallel implementations on contemporary parallel processors. We present the results of an implementation of Goldberg-Tarjan preflow-push algorithm on the Cray XMT-2 massively multithreaded supercomputer. This machine has hardware support for 128 threads in each physical processor, a uniformly accessible shared memory of up to 4 TB and hardware synchronization for each 64 bit word. It is thus well-suited to the parallelization of graph theoretic algorithms, such as preflow-push. We describe the implementation of the preflow-push code on the XMT-2 and present the results of timing experiments on a series of synthetically generated as well as real images. Our results indicate very good performance on large images and pave the way for practical applications of this machine architecture for image analysis in a production setting. The largest images we have run are 320002 pixels in size, which are well beyond the largest previously reported in the literature. PMID:25598745
IBM PC enhances the world's future
NASA Technical Reports Server (NTRS)
Cox, Jozelle
1988-01-01
Although the purpose of this research is to illustrate the importance of computers to the public, particularly the IBM PC, present examinations will include computers developed before the IBM PC was brought into use. IBM, as well as other computing facilities, began serving the public years ago, and is continuing to find ways to enhance the existence of man. With new developments in supercomputers like the Cray-2, and the recent advances in artificial intelligence programming, the human race is gaining knowledge at a rapid pace. All have benefited from the development of computers in the world; not only have they brought new assets to life, but have made life more and more of a challenge everyday.
Climate Ocean Modeling on a Beowulf Class System
NASA Technical Reports Server (NTRS)
Cheng, B. N.; Chao, Y.; Wang, P.; Bondarenko, M.
2000-01-01
With the growing power and shrinking cost of personal computers. the availability of fast ethernet interconnections, and public domain software packages, it is now possible to combine them to build desktop parallel computers (named Beowulf or PC clusters) at a fraction of what it would cost to buy systems of comparable power front supercomputer companies. This led as to build and assemble our own sys tem. specifically for climate ocean modeling. In this article, we present our experience with such a system, discuss its network performance, and provide some performance comparison data with both HP SPP2000 and Cray T3E for an ocean Model used in present-day oceanographic research.
Full potential methods for analysis/design of complex aerospace configurations
NASA Technical Reports Server (NTRS)
Shankar, Vijaya; Szema, Kuo-Yen; Bonner, Ellwood
1986-01-01
The steady form of the full potential equation, in conservative form, is employed to analyze and design a wide variety of complex aerodynamic shapes. The nonlinear method is based on the theory of characteristic signal propagation coupled with novel flux biasing concepts and body-fitted mapping procedures. The resulting codes are vectorized for the CRAY XMP and the VPS-32 supercomputers. Use of the full potential nonlinear theory is demonstrated for a single-point supersonic wing design and a multipoint design for transonic maneuver/supersonic cruise/maneuver conditions. Achievement of high aerodynamic efficiency through numerical design is verified by wind tunnel tests. Other studies reported include analyses of a canard/wing/nacelle fighter geometry.
Mironov, Vladimir; Moskovsky, Alexander; D’Mello, Michael; ...
2017-10-04
The Hartree-Fock (HF) method in the quantum chemistry package GAMESS represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals (ERIs) and the building of the Fock matrix. These are the central components of the main Self Consistent Field (SCF) loop, the key hotspot in Electronic Structure (ES) codes. By threading the MPI ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4x to 6x for large systems), but also achieve a significant (>2x) reduction in the overallmore » memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel R Xeon PhiTM supercomputer. Here, scaling numbers are reported on up to 7,680 cores on Intel Xeon Phi coprocessors.« less
Space Radar Image of Mammoth Mountain, California
1999-05-01
This false-color composite radar image of the Mammoth Mountain area in the Sierra Nevada Mountains, California, was acquired by the Spaceborne Imaging Radar-C and X-band Synthetic Aperture Radar aboard the space shuttle Endeavour on its 67th orbit on October 3, 1994. The image is centered at 37.6 degrees north latitude and 119.0 degrees west longitude. The area is about 39 kilometers by 51 kilometers (24 miles by 31 miles). North is toward the bottom, about 45 degrees to the right. In this image, red was created using L-band (horizontally transmitted/vertically received) polarization data; green was created using C-band (horizontally transmitted/vertically received) polarization data; and blue was created using C-band (horizontally transmitted and received) polarization data. Crawley Lake appears dark at the center left of the image, just above or south of Long Valley. The Mammoth Mountain ski area is visible at the top right of the scene. The red areas correspond to forests, the dark blue areas are bare surfaces and the green areas are short vegetation, mainly brush. The purple areas at the higher elevations in the upper part of the scene are discontinuous patches of snow cover from a September 28 storm. New, very thin snow was falling before and during the second space shuttle pass. In parallel with the operational SIR-C data processing, an experimental effort is being conducted to test SAR data processing using the Jet Propulsion Laboratory's massively parallel supercomputing facility, centered around the Cray Research T3D. These experiments will assess the abilities of large supercomputers to produce high throughput Synthetic Aperture Radar processing in preparation for upcoming data-intensive SAR missions. The image released here was produced as part of this experimental effort. http://photojournal.jpl.nasa.gov/catalog/PIA01746
Multitasking the INS3D-LU code on the Cray Y-MP
NASA Technical Reports Server (NTRS)
Fatoohi, Rod; Yoon, Seokkwan
1991-01-01
This paper presents the results of multitasking the INS3D-LU code on eight processors. The code is a full Navier-Stokes solver for incompressible fluid in three dimensional generalized coordinates using a lower-upper symmetric-Gauss-Seidel implicit scheme. This code has been fully vectorized on oblique planes of sweep and parallelized using autotasking with some directives and minor modifications. The timing results for five grid sizes are presented and analyzed. The code has achieved a processing rate of over one Gflops.
TRASYS - THERMAL RADIATION ANALYZER SYSTEM (CRAY VERSION WITH NASADIG)
NASA Technical Reports Server (NTRS)
Anderson, G. E.
1994-01-01
The Thermal Radiation Analyzer System, TRASYS, is a computer software system with generalized capability to solve the radiation related aspects of thermal analysis problems. TRASYS computes the total thermal radiation environment for a spacecraft in orbit. The software calculates internode radiation interchange data as well as incident and absorbed heat rate data originating from environmental radiant heat sources. TRASYS provides data of both types in a format directly usable by such thermal analyzer programs as SINDA/FLUINT (available from COSMIC, program number MSC-21528). One primary feature of TRASYS is that it allows users to write their own driver programs to organize and direct the preprocessor and processor library routines in solving specific thermal radiation problems. The preprocessor first reads and converts the user's geometry input data into the form used by the processor library routines. Then, the preprocessor accepts the user's driving logic, written in the TRASYS modified FORTRAN language. In many cases, the user has a choice of routines to solve a given problem. Users may also provide their own routines where desirable. In particular, the user may write output routines to provide for an interface between TRASYS and any thermal analyzer program using the R-C network concept. Input to the TRASYS program consists of Options and Edit data, Model data, and Logic Flow and Operations data. Options and Edit data provide for basic program control and user edit capability. The Model data describe the problem in terms of geometry and other properties. This information includes surface geometry data, documentation data, nodal data, block coordinate system data, form factor data, and flux data. Logic Flow and Operations data house the user's driver logic, including the sequence of subroutine calls and the subroutine library. Output from TRASYS consists of two basic types of data: internode radiation interchange data, and incident and absorbed heat rate data. The flexible structure of TRASYS allows considerable freedom in the definition and choice of solution method for a thermal radiation problem. The program's flexible structure has also allowed TRASYS to retain the same basic input structure as the authors update it in order to keep up with changing requirements. Among its other important features are the following: 1) up to 3200 node problem size capability with shadowing by intervening opaque or semi-transparent surfaces; 2) choice of diffuse, specular, or diffuse/specular radiant interchange solutions; 3) a restart capability that minimizes recomputing; 4) macroinstructions that automatically provide the executive logic for orbit generation that optimizes the use of previously completed computations; 5) a time variable geometry package that provides automatic pointing of the various parts of an articulated spacecraft and an automatic look-back feature that eliminates redundant form factor calculations; 6) capability to specify submodel names to identify sets of surfaces or components as an entity; and 7) subroutines to perform functions which save and recall the internodal and/or space form factors in subsequent steps for nodes with fixed geometry during a variable geometry run. There are two machine versions of TRASYS v27: a DEC VAX version and a Cray UNICOS version. Both versions require installation of the NASADIG library (MSC-21801 for DEC VAX or COS-10049 for CRAY), which is available from COSMIC either separately or bundled with TRASYS. The NASADIG (NASA Device Independent Graphics Library) plot package provides a pictorial representation of input geometry, orbital/orientation parameters, and heating rate output as a function of time. NASADIG supports Tektronix terminals. The CRAY version of TRASYS v27 is written in FORTRAN 77 for batch or interactive execution and has been implemented on CRAY X-MP and CRAY Y-MP series computers running UNICOS. The standard distribution medium for MSC-21959 (CRAY version without NASADIG) is a 1600 BPI 9-track magnetic tape in UNIX tar format. The standard distribution medium for COS-10040 (CRAY version with NASADIG) is a set of two 6250 BPI 9-track magnetic tapes in UNIX tar format. Alternate distribution media and formats are available upon request. The DEC VAX version of TRASYS v27 is written in FORTRAN 77 for batch execution (only the plotting driver program is interactive) and has been implemented on a DEC VAX 8650 computer under VMS. Since the source codes for MSC-21030 and COS-10026 are in VAX/VMS text library files and DEC Command Language files, COSMIC will only provide these programs in the following formats: MSC-21030, TRASYS (DEC VAX version without NASADIG) is available on a 1600 BPI 9-track magnetic tape in VAX BACKUP format (standard distribution medium) or in VAX BACKUP format on a TK50 tape cartridge; COS-10026, TRASYS (DEC VAX version with NASADIG), is available in VAX BACKUP format on a set of three 6250 BPI 9-track magnetic tapes (standard distribution medium) or a set of three TK50 tape cartridges in VAX BACKUP format. TRASYS was last updated in 1993.
Data Movement Dominates: Final Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jacob, Bruce L.
Over the past three years in this project, what we have observed is that the primary reason for data movement in large-scale systems is that the per-node capacity is not large enough—i.e., one of the solutions to the data-movement problem (certainly not the only solution that is required, but a significant one nonetheless) is to increase per-node capacity so that inter-node traffic is reduced. This unfortunately is not as simple as it sounds. Today’s main memory systems for datacenters, enterprise computing systems, and supercomputers, fail to provide high per-socket capacity [Dirik & Jacob 2009; Cooper-Balis et al. 2012], except atmore » extremely high price points (factors of 10–100x the cost/bit of consumer main-memory systems) [Stokes 2008]. The reason is that our choice of technology for today’s main memory systems—i.e., DRAM, which we have used as a main-memory technology since the 1970s [Jacob et al. 2007]—can no longer keep up with our needs for density and price per bit. Main memory systems have always been built from the cheapest, densest, lowest-power memory technology available, and DRAM is no longer the cheapest, the densest, nor the lowest-power storage technology out there. It is now time for DRAM to go the way that SRAM went: move out of the way for a cheaper, slower, denser storage technology, and become a cache instead. This inflection point has happened before, in the context of SRAM yielding to DRAM. There was once a time that SRAM was the storage technology of choice for all main memories [Tomasulo 1967; Thornton 1970; Kidder 1981]. However, once DRAM hit volume production in the 1970s and 80s, it supplanted SRAM as a main memory technology because it was cheaper, and it was denser. It also happened to be lower power, but that was not the primary consideration of the day. At the time, it was recognized that DRAM was much slower than SRAM, but it was only at the supercomputer level (For instance the Cray X-MP in the 1980s and its follow-on, the Cray Y-MP, in the 1990s) that could one afford to build ever- larger main memories out of SRAM—the reasoning for moving to DRAM was that an appropriately designed memory hierarchy, built of DRAM as main memory and SRAM as a cache, would approach the performance of SRAM, at the price-per-bit of DRAM [Mashey 1999]. Today it is quite clear that, were one to build an entire multi-gigabyte main memory out of SRAM instead of DRAM, one could improve the performance of almost any computer system by up to an order of magnitude—but this option is not even considered, because to build that system would be prohibitively expensive. It is now time to revisit the same design choice in the context of modern technologies and modern systems. For reasons both technical and economic, we can no longer afford to build ever-larger main memory systems out of DRAM. Flash memory, on the other hand, is significantly cheaper and denser than DRAM and therefore should take its place. While it is true that flash is significantly slower than DRAM, one can afford to build much larger main memories out of flash than out of DRAM, and we show that an appropriately designed memory hierarchy, built of flash as main memory and DRAM as a cache, will approach the performance of DRAM, at the price-per-bit of flash. In our studies as part of this project, we have investigated Non-Volatile Main Memory (NVMM), a new main-memory architecture for large-scale computing systems, one that is specifically designed to address the weaknesses described previously. In particular, it provides the following features: non-volatility: The bulk of the storage is comprised of NAND flash, and in this organization DRAM is used only as a cache, not as main memory. Furthermore, the flash is journaled, which means that operations such as checkpoint/restore are already built into the system. 1+ terabytes of storage per socket: SSDs and DRAM DIMMs have roughly the same form factor (several square inches of PCB surface area), and terabyte SSDs are now commonplace. performance approaching that of DRAM: DRAM is used as a cache to the flash system. price-per-bit approaching that of NAND: Flash is currently well under $0.50 per gigabyte; DDR3 SDRAM is currently just over $10 per gigabyte [Newegg 2014]. Even today, one can build an easily affordable main memory system with a terabyte or more of NAND storage per CPU socket (which would be extremely expensive were one to use DRAM), and our cycle- accurate, full-system experiments show that this can be done at a performance point that lies within a factor of two of DRAM.« less
NASA Astrophysics Data System (ADS)
Kollet, S. J.; Goergen, K.; Gasper, F.; Shresta, P.; Sulis, M.; Rihani, J.; Simmer, C.; Vereecken, H.
2013-12-01
In studies of the terrestrial hydrologic, energy and biogeochemical cycles, integrated multi-physics simulation platforms take a central role in characterizing non-linear interactions, variances and uncertainties of system states and fluxes in reciprocity with observations. Recently developed integrated simulation platforms attempt to honor the complexity of the terrestrial system across multiple time and space scales from the deeper subsurface including groundwater dynamics into the atmosphere. Technically, this requires the coupling of atmospheric, land surface, and subsurface-surface flow models in supercomputing environments, while ensuring a high-degree of efficiency in the utilization of e.g., standard Linux clusters and massively parallel resources. A systematic performance analysis including profiling and tracing in such an application is crucial in the understanding of the runtime behavior, to identify optimum model settings, and is an efficient way to distinguish potential parallel deficiencies. On sophisticated leadership-class supercomputers, such as the 28-rack 5.9 petaFLOP IBM Blue Gene/Q 'JUQUEEN' of the Jülich Supercomputing Centre (JSC), this is a challenging task, but even more so important, when complex coupled component models are to be analysed. Here we want to present our experience from coupling, application tuning (e.g. 5-times speedup through compiler optimizations), parallel scaling and performance monitoring of the parallel Terrestrial Systems Modeling Platform TerrSysMP. The modeling platform consists of the weather prediction system COSMO of the German Weather Service; the Community Land Model, CLM of NCAR; and the variably saturated surface-subsurface flow code ParFlow. The model system relies on the Multiple Program Multiple Data (MPMD) execution model where the external Ocean-Atmosphere-Sea-Ice-Soil coupler (OASIS3) links the component models. TerrSysMP has been instrumented with the performance analysis tool Scalasca and analyzed on JUQUEEN with processor counts on the order of 10,000. The instrumentation is used in weak and strong scaling studies with real data cases and hypothetical idealized numerical experiments for detailed profiling and tracing analysis. The profiling is not only useful in identifying wait states that are due to the MPMD execution model, but also in fine-tuning resource allocation to the component models in search of the most suitable load balancing. This is especially necessary, as with numerical experiments that cover multiple (high resolution) spatial scales, the time stepping, coupling frequencies, and communication overheads are constantly shifting, which makes it necessary to re-determine the model setup with each new experimental design.
Study of the TRAC Airfoil Table Computational System
NASA Technical Reports Server (NTRS)
Hu, Hong
1999-01-01
The report documents the study of the application of the TRAC airfoil table computational package (TRACFOIL) to the prediction of 2D airfoil force and moment data over a wide range of angle of attack and Mach number. The TRACFOIL generates the standard C-81 airfoil table for input into rotorcraft comprehensive codes such as CAM- RAD. The existing TRACFOIL computer package is successfully modified to run on Digital alpha workstations and on Cray-C90 supercomputers. A step-by-step instruction for using the package on both computer platforms is provided. Application of the newer version of TRACFOIL is made for two airfoil sections. The C-81 data obtained using the TRACFOIL method are compared with those of wind-tunnel data and results are presented.
Numerical results on the transcendence of constants involving pi, e, and Euler's constant
NASA Technical Reports Server (NTRS)
Bailey, David H.
1988-01-01
The existence of simple polynomial equations (integer relations) for the constants e/pi, e + pi, log pi, gamma (Euler's constant), e exp gamma, gamma/e, gamma/pi, and log gamma is investigated by means of numerical computations. The recursive form of the Ferguson-Fourcade algorithm (Ferguson and Fourcade, 1979; Ferguson, 1986 and 1987) is implemented on the Cray-2 supercomputer at NASA Ames, applying multiprecision techniques similar to those described by Bailey (1988) except that FFTs are used instead of dual-prime-modulus transforms for multiplication. It is shown that none of the constants has an integer relation of degree eight or less with coefficients of Euclidean norm 10 to the 9th or less.
Developing software to use parallel processing effectively. Final report, June-December 1987
DOE Office of Scientific and Technical Information (OSTI.GOV)
Center, J.
1988-10-01
This report describes the difficulties involved in writing efficient parallel programs and describes the hardware and software support currently available for generating software that utilizes processing effectively. Historically, the processing rate of single-processor computers has increased by one order of magnitude every five years. However, this pace is slowing since electronic circuitry is coming up against physical barriers. Unfortunately, the complexity of engineering and research problems continues to require ever more processing power (far in excess of the maximum estimated 3 Gflops achievable by single-processor computers). For this reason, parallel-processing architectures are receiving considerable interest, since they offer high performancemore » more cheaply than a single-processor supercomputer, such as the Cray.« less
Tools for 3D scientific visualization in computational aerodynamics
NASA Technical Reports Server (NTRS)
Bancroft, Gordon; Plessel, Todd; Merritt, Fergus; Watson, Val
1989-01-01
The purpose is to describe the tools and techniques in use at the NASA Ames Research Center for performing visualization of computational aerodynamics, for example visualization of flow fields from computer simulations of fluid dynamics about vehicles such as the Space Shuttle. The hardware used for visualization is a high-performance graphics workstation connected to a super computer with a high speed channel. At present, the workstation is a Silicon Graphics IRIS 3130, the supercomputer is a CRAY2, and the high speed channel is a hyperchannel. The three techniques used for visualization are post-processing, tracking, and steering. Post-processing analysis is done after the simulation. Tracking analysis is done during a simulation but is not interactive, whereas steering analysis involves modifying the simulation interactively during the simulation. Using post-processing methods, a flow simulation is executed on a supercomputer and, after the simulation is complete, the results of the simulation are processed for viewing. The software in use and under development at NASA Ames Research Center for performing these types of tasks in computational aerodynamics is described. Workstation performance issues, benchmarking, and high-performance networks for this purpose are also discussed as well as descriptions of other hardware for digital video and film recording.
NASA Technical Reports Server (NTRS)
Fatoohi, Rod; Saini, Subbash; Ciotti, Robert
2006-01-01
We study the performance of inter-process communication on four high-speed multiprocessor systems using a set of communication benchmarks. The goal is to identify certain limiting factors and bottlenecks with the interconnect of these systems as well as to compare these interconnects. We measured network bandwidth using different number of communicating processors and communication patterns, such as point-to-point communication, collective communication, and dense communication patterns. The four platforms are: a 512-processor SGI Altix 3700 BX2 shared-memory machine with 3.2 GB/s links; a 64-processor (single-streaming) Cray XI shared-memory machine with 32 1.6 GB/s links; a 128-processor Cray Opteron cluster using a Myrinet network; and a 1280-node Dell PowerEdge cluster with an InfiniBand network. Our, results show the impact of the network bandwidth and topology on the overall performance of each interconnect.
NAS Parallel Benchmark Results 11-96. 1.0
NASA Technical Reports Server (NTRS)
Bailey, David H.; Bailey, David; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
The NAS Parallel Benchmarks have been developed at NASA Ames Research Center to study the performance of parallel supercomputers. The eight benchmark problems are specified in a "pencil and paper" fashion. In other words, the complete details of the problem to be solved are given in a technical document, and except for a few restrictions, benchmarkers are free to select the language constructs and implementation techniques best suited for a particular system. These results represent the best results that have been reported to us by the vendors for the specific 3 systems listed. In this report, we present new NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz), NEC SX-4/32, SGI/CRAY T3E, SGI Origin200, and SGI Origin2000. We also report High Performance Fortran (HPF) based NPB results for IBM SP2 Wide Nodes, HP/Convex Exemplar SPP2000, and SGI/CRAY T3D. These results have been submitted by Applied Parallel Research (APR) and Portland Group Inc. (PGI). We also present sustained performance per dollar for Class B LU, SP and BT benchmarks.
VizieR Online Data Catalog: ChaMP X-ray point source catalog (Kim+, 2007)
NASA Astrophysics Data System (ADS)
Kim, M.; Kim, D.-W.; Wilkes, B. J.; Green, P. J.; Kim, E.; Anderson, C. S.; Barkhouse, W. A.; Evans, N. R.; Ivezic, Z.; Karovska, M.; Kashyap, V. L.; Lee, M. G.; Maksym, P.; Mossman, A. E.; Silverman, J. D.; Tananbaum, H. D.
2009-01-01
We present the Chandra Multiwavelength Project (ChaMP) X-ray point source catalog with ~6800 X-ray sources detected in 149 Chandra observations covering ~10deg2. The full ChaMP catalog sample is 7 times larger than the initial published ChaMP catalog. The exposure time of the fields in our sample ranges from 0.9 to 124ks, corresponding to a deepest X-ray flux limit of f0.5-8.0=9x10-16ergs/cm2/s. The ChaMP X-ray data have been uniformly reduced and analyzed with ChaMP-specific pipelines and then carefully validated by visual inspection. The ChaMP catalog includes X-ray photometric data in eight different energy bands as well as X-ray spectral hardness ratios and colors. To best utilize the ChaMP catalog, we also present the source reliability, detection probability, and positional uncertainty. (10 data files).
Parallelization of Rocket Engine System Software (Press)
NASA Technical Reports Server (NTRS)
Cezzar, Ruknet
1996-01-01
The main goal is to assess parallelization requirements for the Rocket Engine Numeric Simulator (RENS) project which, aside from gathering information on liquid-propelled rocket engines and setting forth requirements, involve a large FORTRAN based package at NASA Lewis Research Center and TDK software developed by SUBR/UWF. The ultimate aim is to develop, test, integrate, and suitably deploy a family of software packages on various aspects and facets of rocket engines using liquid-propellants. At present, all project efforts by the funding agency, NASA Lewis Research Center, and the HBCU participants are disseminated over the internet using world wide web home pages. Considering obviously expensive methods of actual field trails, the benefits of software simulators are potentially enormous. When realized, these benefits will be analogous to those provided by numerous CAD/CAM packages and flight-training simulators. According to the overall task assignments, Hampton University's role is to collect all available software, place them in a common format, assess and evaluate, define interfaces, and provide integration. Most importantly, the HU's mission is to see to it that the real-time performance is assured. This involves source code translations, porting, and distribution. The porting will be done in two phases: First, place all software on Cray XMP platform using FORTRAN. After testing and evaluation on the Cray X-MP, the code will be translated to C + + and ported to the parallel nCUBE platform. At present, we are evaluating another option of distributed processing over local area networks using Sun NFS, Ethernet, TCP/IP. Considering the heterogeneous nature of the present software (e.g., first started as an expert system using LISP machines) which now involve FORTRAN code, the effort is expected to be quite challenging.
On the parallel solution of parabolic equations
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Youcef
1989-01-01
Parallel algorithms for the solution of linear parabolic problems are proposed. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The two other methods proposed are based on Pade and Chebyshev approximations to the matrix exponential. The parallelization of these methods is achieved by using partial fraction decomposition techniques to solve the resulting systems and thus offers the potential for increased time parallelism in time dependent problems. Experimental results from the Alliant FX/8 and the Cray Y-MP/832 vector multiprocessors are also presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas
The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning.
Implementations of BLAST for parallel computers.
Jülich, A
1995-02-01
The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Simulation and analysis of a geopotential research mission
NASA Technical Reports Server (NTRS)
Schutz, B. E.
1987-01-01
Computer simulations were performed for a Geopotential Research Mission (GRM) to enable the study of the gravitational sensitivity of the range rate measurements between the two satellites and to provide a set of simulated measurements to assist in the evaluation of techniques developed for the determination of the gravity field. The simulations were conducted with two satellites in near circular, frozen orbits at 160 km altitudes separated by 300 km. High precision numerical integration of the polar orbits were used with a gravitational field complete to degree and order 360. The set of simulated data for a mission duration of about 32 days was generated on a Cray X-MP computer. The results presented cover the most recent simulation, S8703, and includes a summary of the numerical integration of the simulated trajectories, a summary of the requirements to compute nominal reference trajectories to meet the initial orbit determination requirements for the recovery of the geopotential, an analysis of the nature of the one way integrated Doppler measurements associated with the simulation, and a discussion of the data set to be made available.
RIP-REMOTE INTERACTIVE PARTICLE-TRACER
NASA Technical Reports Server (NTRS)
Rogers, S. E.
1994-01-01
Remote Interactive Particle-tracing (RIP) is a distributed-graphics program which computes particle traces for computational fluid dynamics (CFD) solution data sets. A particle trace is a line which shows the path a massless particle in a fluid will take; it is a visual image of where the fluid is going. The program is able to compute and display particle traces at a speed of about one trace per second because it runs on two machines concurrently. The data used by the program is contained in two files. The solution file contains data on density, momentum and energy quantities of a flow field at discrete points in three-dimensional space, while the grid file contains the physical coordinates of each of the discrete points. RIP requires two computers. A local graphics workstation interfaces with the user for program control and graphics manipulation, and a remote machine interfaces with the solution data set and performs time-intensive computations. The program utilizes two machines in a distributed mode for two reasons. First, the data to be used by the program is usually generated on the supercomputer. RIP avoids having to convert and transfer the data, eliminating any memory limitations of the local machine. Second, as computing the particle traces can be computationally expensive, RIP utilizes the power of the supercomputer for this task. Although the remote site code was developed on a CRAY, it is possible to port this to any supercomputer class machine with a UNIX-like operating system. Integration of a velocity field from a starting physical location produces the particle trace. The remote machine computes the particle traces using the particle-tracing subroutines from PLOT3D/AMES, a CFD post-processing graphics program available from COSMIC (ARC-12779). These routines use a second-order predictor-corrector method to integrate the velocity field. Then the remote program sends graphics tokens to the local machine via a remote-graphics library. The local machine interprets the graphics tokens and draws the particle traces. The program is menu driven. RIP is implemented on the silicon graphics IRIS 3000 (local workstation) with an IRIX operating system and on the CRAY2 (remote station) with a UNICOS 1.0 or 2.0 operating system. The IRIS 4D can be used in place of the IRIS 3000. The program is written in C (67%) and FORTRAN 77 (43%) and has an IRIS memory requirement of 4 MB. The remote and local stations must use the same user ID. PLOT3D/AMES unformatted data sets are required for the remote machine. The program was developed in 1988.
A transient FETI methodology for large-scale parallel implicit computations in structural mechanics
NASA Technical Reports Server (NTRS)
Farhat, Charbel; Crivelli, Luis; Roux, Francois-Xavier
1992-01-01
Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallelize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet -- and perhaps will never -- be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system.
Scaling up ATLAS Event Service to production levels on opportunistic computing platforms
NASA Astrophysics Data System (ADS)
Benjamin, D.; Caballero, J.; Ernst, M.; Guan, W.; Hover, J.; Lesny, D.; Maeno, T.; Nilsson, P.; Tsulaia, V.; van Gemmeren, P.; Vaniachine, A.; Wang, F.; Wenaus, T.; ATLAS Collaboration
2016-10-01
Continued growth in public cloud and HPC resources is on track to exceed the dedicated resources available for ATLAS on the WLCG. Examples of such platforms are Amazon AWS EC2 Spot Instances, Edison Cray XC30 supercomputer, backfill at Tier 2 and Tier 3 sites, opportunistic resources at the Open Science Grid (OSG), and ATLAS High Level Trigger farm between the data taking periods. Because of specific aspects of opportunistic resources such as preemptive job scheduling and data I/O, their efficient usage requires workflow innovations provided by the ATLAS Event Service. Thanks to the finer granularity of the Event Service data processing workflow, the opportunistic resources are used more efficiently. We report on our progress in scaling opportunistic resource usage to double-digit levels in ATLAS production.
High Performance Computing Software Applications for Space Situational Awareness
NASA Astrophysics Data System (ADS)
Giuliano, C.; Schumacher, P.; Matson, C.; Chun, F.; Duncan, B.; Borelli, K.; Desonia, R.; Gusciora, G.; Roe, K.
The High Performance Computing Software Applications Institute for Space Situational Awareness (HSAI-SSA) has completed its first full year of applications development. The emphasis of our work in this first year was in improving space surveillance sensor models and image enhancement software. These applications are the Space Surveillance Network Analysis Model (SSNAM), the Air Force Space Fence simulation (SimFence), and physically constrained iterative de-convolution (PCID) image enhancement software tool. Specifically, we have demonstrated order of magnitude speed-up in those codes running on the latest Cray XD-1 Linux supercomputer (Hoku) at the Maui High Performance Computing Center. The software applications improvements that HSAI-SSA has made, has had significant impact to the warfighter and has fundamentally changed the role of high performance computing in SSA.
Relativistic Collisions of Highly-Charged Ions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ionescu, Dorin; Belkacem, Ali
1998-11-19
The physics of elementary atomic processes in relativistic collisions between highly-charged ions and atoms or other ions is briefly discussed, and some recent theoretical and experimental results in this field are summarized. They include excitation, capture, ionization, and electron-positron pair creation. The numerical solution of the two-center Dirac equation in momentum space is shown to be a powerful nonperturbative method for describing atomic processes in relativistic collisions involving heavy and highly-charged ions. By propagating negative-energy wave packets in time the evolution of the QED vacuum around heavy ions in relativistic motion is investigated. Recent results obtained from numerical calculations usingmore » massively parallel processing on the Cray-T3E supercomputer of the National Energy Research Scientific Computer Center (NERSC) at Berkeley National Laboratory are presented.« less
NASA Technical Reports Server (NTRS)
Holzmann, Gerard J.; Joshi, Rajeev; Groce, Alex
2008-01-01
Reportedly, supercomputer designer Seymour Cray once said that he would sooner use two strong oxen to plow a field than a thousand chickens. Although this is undoubtedly wise when it comes to plowing a field, it is not so clear for other types of tasks. Model checking problems are of the proverbial "search the needle in a haystack" type. Such problems can often be parallelized easily. Alas, none of the usual divide and conquer methods can be used to parallelize the working of a model checker. Given that it has become easier than ever to gain access to large numbers of computers to perform even routine tasks it is becoming more and more attractive to find alternate ways to use these resources to speed up model checking tasks. This paper describes one such method, called swarm verification.
An interactive adaptive remeshing algorithm for the two-dimensional Euler equations
NASA Technical Reports Server (NTRS)
Slack, David C.; Walters, Robert W.; Lohner, R.
1990-01-01
An interactive adaptive remeshing algorithm utilizing a frontal grid generator and a variety of time integration schemes for the two-dimensional Euler equations on unstructured meshes is presented. Several device dependent interactive graphics interfaces have been developed along with a device independent DI-3000 interface which can be employed on any computer that has the supporting software including the Cray-2 supercomputers Voyager and Navier. The time integration methods available include: an explicit four stage Runge-Kutta and a fully implicit LU decomposition. A cell-centered finite volume upwind scheme utilizing Roe's approximate Riemann solver is developed. To obtain higher order accurate results a monotone linear reconstruction procedure proposed by Barth is utilized. Results for flow over a transonic circular arc and flow through a supersonic nozzle are examined.
TRASYS - THERMAL RADIATION ANALYZER SYSTEM (DEC VAX VERSION WITH NASADIG)
NASA Technical Reports Server (NTRS)
Anderson, G. E.
1994-01-01
The Thermal Radiation Analyzer System, TRASYS, is a computer software system with generalized capability to solve the radiation related aspects of thermal analysis problems. TRASYS computes the total thermal radiation environment for a spacecraft in orbit. The software calculates internode radiation interchange data as well as incident and absorbed heat rate data originating from environmental radiant heat sources. TRASYS provides data of both types in a format directly usable by such thermal analyzer programs as SINDA/FLUINT (available from COSMIC, program number MSC-21528). One primary feature of TRASYS is that it allows users to write their own driver programs to organize and direct the preprocessor and processor library routines in solving specific thermal radiation problems. The preprocessor first reads and converts the user's geometry input data into the form used by the processor library routines. Then, the preprocessor accepts the user's driving logic, written in the TRASYS modified FORTRAN language. In many cases, the user has a choice of routines to solve a given problem. Users may also provide their own routines where desirable. In particular, the user may write output routines to provide for an interface between TRASYS and any thermal analyzer program using the R-C network concept. Input to the TRASYS program consists of Options and Edit data, Model data, and Logic Flow and Operations data. Options and Edit data provide for basic program control and user edit capability. The Model data describe the problem in terms of geometry and other properties. This information includes surface geometry data, documentation data, nodal data, block coordinate system data, form factor data, and flux data. Logic Flow and Operations data house the user's driver logic, including the sequence of subroutine calls and the subroutine library. Output from TRASYS consists of two basic types of data: internode radiation interchange data, and incident and absorbed heat rate data. The flexible structure of TRASYS allows considerable freedom in the definition and choice of solution method for a thermal radiation problem. The program's flexible structure has also allowed TRASYS to retain the same basic input structure as the authors update it in order to keep up with changing requirements. Among its other important features are the following: 1) up to 3200 node problem size capability with shadowing by intervening opaque or semi-transparent surfaces; 2) choice of diffuse, specular, or diffuse/specular radiant interchange solutions; 3) a restart capability that minimizes recomputing; 4) macroinstructions that automatically provide the executive logic for orbit generation that optimizes the use of previously completed computations; 5) a time variable geometry package that provides automatic pointing of the various parts of an articulated spacecraft and an automatic look-back feature that eliminates redundant form factor calculations; 6) capability to specify submodel names to identify sets of surfaces or components as an entity; and 7) subroutines to perform functions which save and recall the internodal and/or space form factors in subsequent steps for nodes with fixed geometry during a variable geometry run. There are two machine versions of TRASYS v27: a DEC VAX version and a Cray UNICOS version. Both versions require installation of the NASADIG library (MSC-21801 for DEC VAX or COS-10049 for CRAY), which is available from COSMIC either separately or bundled with TRASYS. The NASADIG (NASA Device Independent Graphics Library) plot package provides a pictorial representation of input geometry, orbital/orientation parameters, and heating rate output as a function of time. NASADIG supports Tektronix terminals. The CRAY version of TRASYS v27 is written in FORTRAN 77 for batch or interactive execution and has been implemented on CRAY X-MP and CRAY Y-MP series computers running UNICOS. The standard distribution medium for MSC-21959 (CRAY version without NASADIG) is a 1600 BPI 9-track magnetic tape in UNIX tar format. The standard distribution medium for COS-10040 (CRAY version with NASADIG) is a set of two 6250 BPI 9-track magnetic tapes in UNIX tar format. Alternate distribution media and formats are available upon request. The DEC VAX version of TRASYS v27 is written in FORTRAN 77 for batch execution (only the plotting driver program is interactive) and has been implemented on a DEC VAX 8650 computer under VMS. Since the source codes for MSC-21030 and COS-10026 are in VAX/VMS text library files and DEC Command Language files, COSMIC will only provide these programs in the following formats: MSC-21030, TRASYS (DEC VAX version without NASADIG) is available on a 1600 BPI 9-track magnetic tape in VAX BACKUP format (standard distribution medium) or in VAX BACKUP format on a TK50 tape cartridge; COS-10026, TRASYS (DEC VAX version with NASADIG), is available in VAX BACKUP format on a set of three 6250 BPI 9-track magnetic tapes (standard distribution medium) or a set of three TK50 tape cartridges in VAX BACKUP format. TRASYS was last updated in 1993.
TRASYS - THERMAL RADIATION ANALYZER SYSTEM (DEC VAX VERSION WITHOUT NASADIG)
NASA Technical Reports Server (NTRS)
Vogt, R. A.
1994-01-01
The Thermal Radiation Analyzer System, TRASYS, is a computer software system with generalized capability to solve the radiation related aspects of thermal analysis problems. TRASYS computes the total thermal radiation environment for a spacecraft in orbit. The software calculates internode radiation interchange data as well as incident and absorbed heat rate data originating from environmental radiant heat sources. TRASYS provides data of both types in a format directly usable by such thermal analyzer programs as SINDA/FLUINT (available from COSMIC, program number MSC-21528). One primary feature of TRASYS is that it allows users to write their own driver programs to organize and direct the preprocessor and processor library routines in solving specific thermal radiation problems. The preprocessor first reads and converts the user's geometry input data into the form used by the processor library routines. Then, the preprocessor accepts the user's driving logic, written in the TRASYS modified FORTRAN language. In many cases, the user has a choice of routines to solve a given problem. Users may also provide their own routines where desirable. In particular, the user may write output routines to provide for an interface between TRASYS and any thermal analyzer program using the R-C network concept. Input to the TRASYS program consists of Options and Edit data, Model data, and Logic Flow and Operations data. Options and Edit data provide for basic program control and user edit capability. The Model data describe the problem in terms of geometry and other properties. This information includes surface geometry data, documentation data, nodal data, block coordinate system data, form factor data, and flux data. Logic Flow and Operations data house the user's driver logic, including the sequence of subroutine calls and the subroutine library. Output from TRASYS consists of two basic types of data: internode radiation interchange data, and incident and absorbed heat rate data. The flexible structure of TRASYS allows considerable freedom in the definition and choice of solution method for a thermal radiation problem. The program's flexible structure has also allowed TRASYS to retain the same basic input structure as the authors update it in order to keep up with changing requirements. Among its other important features are the following: 1) up to 3200 node problem size capability with shadowing by intervening opaque or semi-transparent surfaces; 2) choice of diffuse, specular, or diffuse/specular radiant interchange solutions; 3) a restart capability that minimizes recomputing; 4) macroinstructions that automatically provide the executive logic for orbit generation that optimizes the use of previously completed computations; 5) a time variable geometry package that provides automatic pointing of the various parts of an articulated spacecraft and an automatic look-back feature that eliminates redundant form factor calculations; 6) capability to specify submodel names to identify sets of surfaces or components as an entity; and 7) subroutines to perform functions which save and recall the internodal and/or space form factors in subsequent steps for nodes with fixed geometry during a variable geometry run. There are two machine versions of TRASYS v27: a DEC VAX version and a Cray UNICOS version. Both versions require installation of the NASADIG library (MSC-21801 for DEC VAX or COS-10049 for CRAY), which is available from COSMIC either separately or bundled with TRASYS. The NASADIG (NASA Device Independent Graphics Library) plot package provides a pictorial representation of input geometry, orbital/orientation parameters, and heating rate output as a function of time. NASADIG supports Tektronix terminals. The CRAY version of TRASYS v27 is written in FORTRAN 77 for batch or interactive execution and has been implemented on CRAY X-MP and CRAY Y-MP series computers running UNICOS. The standard distribution medium for MSC-21959 (CRAY version without NASADIG) is a 1600 BPI 9-track magnetic tape in UNIX tar format. The standard distribution medium for COS-10040 (CRAY version with NASADIG) is a set of two 6250 BPI 9-track magnetic tapes in UNIX tar format. Alternate distribution media and formats are available upon request. The DEC VAX version of TRASYS v27 is written in FORTRAN 77 for batch execution (only the plotting driver program is interactive) and has been implemented on a DEC VAX 8650 computer under VMS. Since the source codes for MSC-21030 and COS-10026 are in VAX/VMS text library files and DEC Command Language files, COSMIC will only provide these programs in the following formats: MSC-21030, TRASYS (DEC VAX version without NASADIG) is available on a 1600 BPI 9-track magnetic tape in VAX BACKUP format (standard distribution medium) or in VAX BACKUP format on a TK50 tape cartridge; COS-10026, TRASYS (DEC VAX version with NASADIG), is available in VAX BACKUP format on a set of three 6250 BPI 9-track magnetic tapes (standard distribution medium) or a set of three TK50 tape cartridges in VAX BACKUP format. TRASYS was last updated in 1993.
NASA Astrophysics Data System (ADS)
Leutwyler, David; Fuhrer, Oliver; Cumming, Benjamin; Lapillonne, Xavier; Gysi, Tobias; Lüthi, Daniel; Osuna, Carlos; Schär, Christoph
2014-05-01
The representation of moist convection is a major shortcoming of current global and regional climate models. State-of-the-art global models usually operate at grid spacings of 10-300 km, and therefore cannot fully resolve the relevant upscale and downscale energy cascades. Therefore parametrization of the relevant sub-grid scale processes is required. Several studies have shown that this approach entails major uncertainties for precipitation processes, which raises concerns about the model's ability to represent precipitation statistics and associated feedback processes, as well as their sensitivities to large-scale conditions. Further refining the model resolution to the kilometer scale allows representing these processes much closer to first principles and thus should yield an improved representation of the water cycle including the drivers of extreme events. Although cloud-resolving simulations are very useful tools for climate simulations and numerical weather prediction, their high horizontal resolution and consequently the small time steps needed, challenge current supercomputers to model large domains and long time scales. The recent innovations in the domain of hybrid supercomputers have led to mixed node designs with a conventional CPU and an accelerator such as a graphics processing unit (GPU). GPUs relax the necessity for cache coherency and complex memory hierarchies, but have a larger system memory-bandwidth. This is highly beneficial for low compute intensity codes such as atmospheric stencil-based models. However, to efficiently exploit these hybrid architectures, climate models need to be ported and/or redesigned. Within the framework of the Swiss High Performance High Productivity Computing initiative (HP2C) a project to port the COSMO model to hybrid architectures has recently come to and end. The product of these efforts is a version of COSMO with an improved performance on traditional x86-based clusters as well as hybrid architectures with GPUs. We present our redesign and porting approach as well as our experience and lessons learned. Furthermore, we discuss relevant performance benchmarks obtained on the new hybrid Cray XC30 system "Piz Daint" installed at the Swiss National Supercomputing Centre (CSCS), both in terms of time-to-solution as well as energy consumption. We will demonstrate a first set of short cloud-resolving climate simulations at the European-scale using the GPU-enabled COSMO prototype and elaborate our future plans on how to exploit this new model capability.
ALCF Data Science Program: Productive Data-centric Supercomputing
NASA Astrophysics Data System (ADS)
Romero, Nichols; Vishwanath, Venkatram
The ALCF Data Science Program (ADSP) is targeted at big data science problems that require leadership computing resources. The goal of the program is to explore and improve a variety of computational methods that will enable data-driven discoveries across all scientific disciplines. The projects will focus on data science techniques covering a wide area of discovery including but not limited to uncertainty quantification, statistics, machine learning, deep learning, databases, pattern recognition, image processing, graph analytics, data mining, real-time data analysis, and complex and interactive workflows. Project teams will be among the first to access Theta, ALCFs forthcoming 8.5 petaflops Intel/Cray system. The program will transition to the 200 petaflop/s Aurora supercomputing system when it becomes available. In 2016, four projects have been selected to kick off the ADSP. The selected projects span experimental and computational sciences and range from modeling the brain to discovering new materials for solar-powered windows to simulating collision events at the Large Hadron Collider (LHC). The program will have a regular call for proposals with the next call expected in Spring 2017.http://www.alcf.anl.gov/alcf-data-science-program This research used resources of the ALCF, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.
: A Scalable and Transparent System for Simulating MPI Programs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perumalla, Kalyan S
2010-01-01
is a scalable, transparent system for experimenting with the execution of parallel programs on simulated computing platforms. The level of simulated detail can be varied for application behavior as well as for machine characteristics. Unique features of are repeatability of execution, scalability to millions of simulated (virtual) MPI ranks, scalability to hundreds of thousands of host (real) MPI ranks, portability of the system to a variety of host supercomputing platforms, and the ability to experiment with scientific applications whose source-code is available. The set of source-code interfaces supported by is being expanded to support a wider set of applications, andmore » MPI-based scientific computing benchmarks are being ported. In proof-of-concept experiments, has been successfully exercised to spawn and sustain very large-scale executions of an MPI test program given in source code form. Low slowdowns are observed, due to its use of purely discrete event style of execution, and due to the scalability and efficiency of the underlying parallel discrete event simulation engine, sik. In the largest runs, has been executed on up to 216,000 cores of a Cray XT5 supercomputer, successfully simulating over 27 million virtual MPI ranks, each virtual rank containing its own thread context, and all ranks fully synchronized by virtual time.« less
Liwo, Adam; Ołdziej, Stanisław; Czaplewski, Cezary; Kleinerman, Dana S.; Blood, Philip; Scheraga, Harold A.
2010-01-01
We report the implementation of our united-residue UNRES force field for simulations of protein structure and dynamics with massively parallel architectures. In addition to coarse-grained parallelism already implemented in our previous work, in which each conformation was treated by a different task, we introduce a fine-grained level in which energy and gradient evaluation are split between several tasks. The Message Passing Interface (MPI) libraries have been utilized to construct the parallel code. The parallel performance of the code has been tested on a professional Beowulf cluster (Xeon Quad Core), a Cray XT3 supercomputer, and two IBM BlueGene/P supercomputers with canonical and replica-exchange molecular dynamics. With IBM BlueGene/P, about 50 % efficiency and 120-fold speed-up of the fine-grained part was achieved for a single trajectory of a 767-residue protein with use of 256 processors/trajectory. Because of averaging over the fast degrees of freedom, UNRES provides an effective 1000-fold speed-up compared to the experimental time scale and, therefore, enables us to effectively carry out millisecond-scale simulations of proteins with 500 and more amino-acid residues in days of wall-clock time. PMID:20305729
First Detection of the Hatchett-McCray Effect in the High-Mass X-ray Binary
NASA Technical Reports Server (NTRS)
Sonneborn, G.; Iping, R. C.; Kaper, L.; Hammerschiag-Hensberge, G.; Hutchings, J. B.
2004-01-01
The orbital modulation of stellar wind UV resonance line profiles as a result of ionization of the wind by the X-ray source has been observed in the high-mass X-ray binary 4U1700-37/HD 153919 for the first time. Far-UV observations (905-1180 Angstrom, resolution 0.05 Angstroms) were made at the four quadrature points of the binary orbit with the Far Ultraviolet Spectroscopic Explorer (FUSE) in 2003 April and August. The O6.5 laf primary eclipses the X-ray source (neutron star or black hole) with a 3.41-day period. Orbital modulation of the UV resonance lines, resulting from X-ray photoionization of the dense stellar wind, the so-called Hatchett-McCray (HM) effect, was predicted for 4U1700-37/HD153919 (Hatchett 8 McCray 1977, ApJ, 211, 522) but was not seen in N V 1240, Si IV 1400, or C IV 1550 in IUE and HST spectra. The FUSE spectra show that the P V 1118-1128 and S IV 1063-1073 P-Cygni lines appear to vary as expected for the HM effect, weakest at phase 0.5 (X-ray source conjunction) and strongest at phase 0.0 (X-ray source eclipse). The phase modulation of the O VI 1032-1037 lines, however, is opposite to P V and S IV, implying that O VI may be a byproduct of the wind's ionization by the X-ray source. Such variations were not observed in N V, Si IV, and C IV because of their high optical depth. Due to their lower cosmic abundance, the P V and S IV wind lines are unsaturated, making them excellent tracers of the ionization conditions in the O star's wind.
NASA Technical Reports Server (NTRS)
Korzennik, Sylvain
1997-01-01
Under the direction of Dr. Rhodes, and the technical supervision of Dr. Korzennik, the data assimilation of high spatial resolution solar dopplergrams has been carried out throughout the program on the Intel Delta Touchstone supercomputer. With the help of a research assistant, partially supported by this grant, and under the supervision of Dr. Korzennik, code development was carried out at SAO, using various available resources. To ensure cross-platform portability, PVM was selected as the message passing library. A parallel implementation of power spectra computation for helioseismology data reduction, using PVM was successfully completed. It was successfully ported to SMP architectures (i.e. SUN), and to some MPP architectures (i.e. the CM5). Due to limitation of the implementation of PVM on the Cray T3D, the port to that architecture was not completed at the time.
Zhang, Rong; Xu, Xingjian; Chen, Wenli; Huang, Qiaoyun
2016-02-01
A multifunctional Pseudomonas putida X3 strain was successfully engineered by introducing methyl parathion (MP)-degrading gene and enhanced green fluorescent protein (EGFP) gene in P. putida X4 (CCTCC: 209319). In liquid cultures, the engineered X3 strain utilized MP as sole carbon source for growth and degraded 100 mg L(-1) of MP within 24 h; however, this strain did not further metabolize p-nitrophenol (PNP), an intermediate metabolite of MP. No discrepancy in minimum inhibitory concentrations (MICs) to cadmium (Cd), copper (Cu), zinc (Zn), and cobalt (Co) was observed between the engineered X3 strain and its host strain. The inoculated X3 strain accelerated MP degradation in different polluted soil microcosms with 100 mg MP kg(-1) dry soil and/or 5 mg Cd kg(-1) dry soil; MP was completely eliminated within 40 h. However, the presence of Cd in the early stage of remediation slightly delayed MP degradation. The application of X3 strain in Cd-contaminated soil strongly affected the distribution of Cd fractions and immobilized Cd by reducing bioavailable Cd concentrations with lower soluble/exchangeable Cd and organic-bound Cd. The inoculated X3 strain also colonized and proliferated in various contaminated microcosms. Our results suggested that the engineered X3 strain is a potential bioremediation agent showing competitive advantage in complex contaminated environments.
The application of CFD to rotary wing flow problems
NASA Technical Reports Server (NTRS)
Caradonna, F. X.
1990-01-01
Rotorcraft aerodynamics is especially rich in unsolved problems, and for this reason the need for independent computational and experimental studies is great. Three-dimensional unsteady, nonlinear potential methods are becoming fast enough to enable their use in parametric design studies. At present, combined CAMRAD/FPR analyses for a complete trimmed rotor soltution can be performed in about an hour on a CRAY Y-MP (or ten minutes, with multiple processors). These computational speeds indicate that in the near future many of the large CFD problems will no longer require a supercomputer. The ability to convect circulation is routine for integral methods, but only recently was it discovered how to do the same with differential methods. It is clear that the differential CFD rotor analyses are poised to enter the engineering workplace. Integral methods already constitute a mainstay. Ultimately, it is the users who will integrate CFD into the entire engineering process and provide a new measure of confidence in design and analysis. It should be recognized that the above classes of analyses do not include several major limiting phenomena which will continue to require empirical treatment because of computational time constraints and limited physical understanding. Such empirical treatment should be included, however, into the developing CFD, engineering level analyses. It is likely that properly constructed flow models containing corrections from physical testing will be able to fill in unavoidable gaps in the experimental data base, both for basic studies and for specific configuration testing. For these kinds of applications, computational cost is not an issue. Finally, it should be recognized that although rotorcraft are probably the most complex of aircraft, the rotorcraft engineering community is very small compared to the fixed-wing community. Likewise, rotorcraft CFD resources can never achieve fixed-wing proportions and must be used wisely. Therefore the fixed-wing work must be gleaned for many of the basic methods.
The Q continuum simulation: Harnessing the power of GPU accelerated supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heitmann, Katrin; Frontiere, Nicholas; Sewell, Chris
2015-08-01
Modeling large-scale sky survey observations is a key driver for the continuing development of high-resolution, large-volume, cosmological simulations. We report the first results from the "Q Continuum" cosmological N-body simulation run carried out on the GPU-accelerated supercomputer Titan. The simulation encompasses a volume of (1300 Mpc)(3) and evolves more than half a trillion particles, leading to a particle mass resolution of m(p) similar or equal to 1.5 . 10(8) M-circle dot. At thismass resolution, the Q Continuum run is currently the largest cosmology simulation available. It enables the construction of detailed synthetic sky catalogs, encompassing different modeling methodologies, including semi-analyticmore » modeling and sub-halo abundance matching in a large, cosmological volume. Here we describe the simulation and outputs in detail and present first results for a range of cosmological statistics, such as mass power spectra, halo mass functions, and halo mass-concentration relations for different epochs. We also provide details on challenges connected to running a simulation on almost 90% of Titan, one of the fastest supercomputers in the world, including our usage of Titan's GPU accelerators.« less
Machine characterization and benchmark performance prediction
NASA Technical Reports Server (NTRS)
Saavedra-Barrera, Rafael H.
1988-01-01
From runs of standard benchmarks or benchmark suites, it is not possible to characterize the machine nor to predict the run time of other benchmarks which have not been run. A new approach to benchmarking and machine characterization is reported. The creation and use of a machine analyzer is described, which measures the performance of a given machine on FORTRAN source language constructs. The machine analyzer yields a set of parameters which characterize the machine and spotlight its strong and weak points. Also described is a program analyzer, which analyzes FORTRAN programs and determines the frequency of execution of each of the same set of source language operations. It is then shown that by combining a machine characterization and a program characterization, we are able to predict with good accuracy the run time of a given benchmark on a given machine. Characterizations are provided for the Cray-X-MP/48, Cyber 205, IBM 3090/200, Amdahl 5840, Convex C-1, VAX 8600, VAX 11/785, VAX 11/780, SUN 3/50, and IBM RT-PC/125, and for the following benchmark programs or suites: Los Alamos (BMK8A1), Baskett, Linpack, Livermore Loops, Madelbrot Set, NAS Kernels, Shell Sort, Smith, Whetstone and Sieve of Erathostenes.
DOT National Transportation Integrated Search
2008-12-01
The appendix includes various flow-occupancy plots, including: I-205 NB, Gladstone MP 11.05; I-205 NB, Gladstone Hway MP 12.94; I-205 NB, Lawnfield MP 13.58; I-205 NB, Sunnybrook MP 14.32; I-205 NB, Sunnyside MP 14.7; I-205 NB, Johnson Creek MP 16.2;...
DOT National Transportation Integrated Search
2008-12-01
The appendix includes various speed-occupancy plots, including: I-205 NB, Gladstone MP 11.05; I-205 NB, Gladstone Hway MP 12.94; I-205 NB, Lawnfield MP 13.58; I-205 NB, Sunnybrook MP 14.32; I-205 NB, Sunnyside MP 14.7; I-205 NB, Johnson Creek MP 16.2...
Performance Analysis of a Hybrid Overset Multi-Block Application on Multiple Architectures
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak
2003-01-01
This paper presents a detailed performance analysis of a multi-block overset grid compu- tational fluid dynamics app!ication on multiple state-of-the-art computer architectures. The application is implemented using a hybrid MPI+OpenMP programming paradigm that exploits both coarse and fine-grain parallelism; the former via MPI message passing and the latter via OpenMP directives. The hybrid model also extends the applicability of multi-block programs to large clusters of SNIP nodes by overcoming the restriction that the number of processors be less than the number of grid blocks. A key kernel of the application, namely the LU-SGS linear solver, had to be modified to enhance the performance of the hybrid approach on the target machines. Investigations were conducted on cacheless Cray SX6 vector processors, cache-based IBM Power3 and Power4 architectures, and single system image SGI Origin3000 platforms. Overall results for complex vortex dynamics simulations demonstrate that the SX6 achieves the highest performance and outperforms the RISC-based architectures; however, the best scaling performance was achieved on the Power3.
DOT National Transportation Integrated Search
2008-12-01
The appendix includes various ML speed-occupancy plots: OR-217 NB, 72nd MP 6.61; OR-217 NB, 99W-EB MP 5.9; OR-217 NB, 99W-WB MP 5.85; OR-217 NB, Greenburg MP 4.65; OR-217 NB, Scholls MP 3.85; OR-217 NB, Denney MP 2.68; OR-217 NB, Allen MP 2.16; OR-21...
DOT National Transportation Integrated Search
2008-12-01
The appendix includes various ML speed-flow plots: OR-217 NB, 72nd MP 6.61; OR-217 NB, 99W-EB MP 5.9; OR-217 NB, 99W-WB MP 5.85; OR-217 NB, Greenburg MP 4.65; OR-217 NB, Scholls MP 3.85; OR-217 NB, Denney MP 2.68; OR-217 NB, Allen MP 2.16; OR-217 NB,...
DOT National Transportation Integrated Search
2008-12-01
The appendix includes various ML flow occupancy plots: OR-217 NB, 72nd MP 6.61; OR-217 NB, 99W-EB MP 5.9; OR-217 NB, 99W-WB MP 5.85; OR-217 NB, Greenburg MP 4.65; OR-217 NB, Scholls MP 3.85; OR-217 NB, Denney MP 2.68; OR-217 NB, Allen MP 2.16; OR-217...
Parallelization of the FLAPW method
NASA Astrophysics Data System (ADS)
Canning, A.; Mannstadt, W.; Freeman, A. J.
2000-08-01
The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.
Parallel performance optimizations on unstructured mesh-based simulations
Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; ...
2015-06-01
This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches.more » We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less
Non-linear wave phenomena in Josephson elements for superconducting electronics
NASA Astrophysics Data System (ADS)
Christiansen, P. L.; Parmentier, R. D.; Skovgaard, O.
1985-07-01
The long and intermediate length Josephson tunnel junction oscillator with overlap geometry of linear and circular configuration, is investigated by computational solution of the perturbed sine-Gordon equation model and by experimental measurements. The model predicts the experimental results very well. Line oscillators as well as ring oscillators are treated. For long junctions soliton perturbation methods are developed and turn out to be efficient prediction tools, also providing physical understanding of the dynamics of the oscillator. For intermediate length junctions expansions in terms of linear cavity modes reduce computational costs. The narrow linewidth of the electromagnetic radiation (typically 1 kHz of a line at 10 GHz) is demonstrated experimentally. Corresponding computer simulations requiring a relative accuracy of less than 10 to the -7th power are performed on supercomputer CRAY-1-S. The broadening of linewidth due to external microradiation and internal thermal noise is determined.
Mapping to Irregular Torus Topologies and Other Techniques for Petascale Biomolecular Simulation
Phillips, James C.; Sun, Yanhua; Jain, Nikhil; Bohm, Eric J.; Kalé, Laxmikant V.
2014-01-01
Currently deployed petascale supercomputers typically use toroidal network topologies in three or more dimensions. While these networks perform well for topology-agnostic codes on a few thousand nodes, leadership machines with 20,000 nodes require topology awareness to avoid network contention for communication-intensive codes. Topology adaptation is complicated by irregular node allocation shapes and holes due to dedicated input/output nodes or hardware failure. In the context of the popular molecular dynamics program NAMD, we present methods for mapping a periodic 3-D grid of fixed-size spatial decomposition domains to 3-D Cray Gemini and 5-D IBM Blue Gene/Q toroidal networks to enable hundred-million atom full machine simulations, and to similarly partition node allocations into compact domains for smaller simulations using multiple-copy algorithms. Additional enabling techniques are discussed and performance is reported for NCSA Blue Waters, ORNL Titan, ANL Mira, TACC Stampede, and NERSC Edison. PMID:25594075
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gorentla Venkata, Manjunath; Shamis, Pavel; Graham, Richard L
2013-01-01
Many scientific simulations, using the Message Passing Interface (MPI) programming model, are sensitive to the performance and scalability of reduction collective operations such as MPI Allreduce and MPI Reduce. These operations are the most widely used abstractions to perform mathematical operations over all processes that are part of the simulation. In this work, we propose a hierarchical design to implement the reduction operations on multicore systems. This design aims to improve the efficiency of reductions by 1) tailoring the algorithms and customizing the implementations for various communication mechanisms in the system 2) providing the ability to configure the depth ofmore » hierarchy to match the system architecture, and 3) providing the ability to independently progress each of this hierarchy. Using this design, we implement MPI Allreduce and MPI Reduce operations (and its nonblocking variants MPI Iallreduce and MPI Ireduce) for all message sizes, and evaluate on multiple architectures including InfiniBand and Cray XT5. We leverage and enhance our existing infrastructure, Cheetah, which is a framework for implementing hierarchical collective operations to implement these reductions. The experimental results show that the Cheetah reduction operations outperform the production-grade MPI implementations such as Open MPI default, Cray MPI, and MVAPICH2, demonstrating its efficiency, flexibility and portability. On Infini- Band systems, with a microbenchmark, a 512-process Cheetah nonblocking Allreduce and Reduce achieves a speedup of 23x and 10x, respectively, compared to the default Open MPI reductions. The blocking variants of the reduction operations also show similar performance benefits. A 512-process nonblocking Cheetah Allreduce achieves a speedup of 3x, compared to the default MVAPICH2 Allreduce implementation. On a Cray XT5 system, a 6144-process Cheetah Allreduce outperforms the Cray MPI by 145%. The evaluation with an application kernel, Conjugate Gradient solver, shows that the Cheetah reductions speeds up total time to solution by 195%, demonstrating the potential benefits for scientific simulations.« less
Non-preconditioned conjugate gradient on cell and FPGA based hybrid supercomputer nodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dubois, David H; Dubois, Andrew J; Boorman, Thomas M
2009-01-01
This work presents a detailed implementation of a double precision, non-preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{sup TM} in conjunction with x86 Opteron{sup TM} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.
Non-preconditioned conjugate gradient on cell and FPCA-based hybrid supercomputer nodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dubois, David H; Dubois, Andrew J; Boorman, Thomas M
2009-03-10
This work presents a detailed implementation of a double precision, Non-Preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{trademark} in conjunction with x86 Opteron{trademark} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.
Application of a distributed network in computational fluid dynamic simulations
NASA Technical Reports Server (NTRS)
Deshpande, Manish; Feng, Jinzhang; Merkle, Charles L.; Deshpande, Ashish
1994-01-01
A general-purpose 3-D, incompressible Navier-Stokes algorithm is implemented on a network of concurrently operating workstations using parallel virtual machine (PVM) and compared with its performance on a CRAY Y-MP and on an Intel iPSC/860. The problem is relatively computationally intensive, and has a communication structure based primarily on nearest-neighbor communication, making it ideally suited to message passing. Such problems are frequently encountered in computational fluid dynamics (CDF), and their solution is increasingly in demand. The communication structure is explicitly coded in the implementation to fully exploit the regularity in message passing in order to produce a near-optimal solution. Results are presented for various grid sizes using up to eight processors.
PLOT3D/AMES, UNIX SUPERCOMPUTER AND SGI IRIS VERSION (WITHOUT TURB3D)
NASA Technical Reports Server (NTRS)
Buning, P.
1994-01-01
PLOT3D is an interactive graphics program designed to help scientists visualize computational fluid dynamics (CFD) grids and solutions. Today, supercomputers and CFD algorithms can provide scientists with simulations of such highly complex phenomena that obtaining an understanding of the simulations has become a major problem. Tools which help the scientist visualize the simulations can be of tremendous aid. PLOT3D/AMES offers more functions and features, and has been adapted for more types of computers than any other CFD graphics program. Version 3.6b+ is supported for five computers and graphic libraries. Using PLOT3D, CFD physicists can view their computational models from any angle, observing the physics of problems and the quality of solutions. As an aid in designing aircraft, for example, PLOT3D's interactive computer graphics can show vortices, temperature, reverse flow, pressure, and dozens of other characteristics of air flow during flight. As critical areas become obvious, they can easily be studied more closely using a finer grid. PLOT3D is part of a computational fluid dynamics software cycle. First, a program such as 3DGRAPE (ARC-12620) helps the scientist generate computational grids to model an object and its surrounding space. Once the grids have been designed and parameters such as the angle of attack, Mach number, and Reynolds number have been specified, a "flow-solver" program such as INS3D (ARC-11794 or COS-10019) solves the system of equations governing fluid flow, usually on a supercomputer. Grids sometimes have as many as two million points, and the "flow-solver" produces a solution file which contains density, x- y- and z-momentum, and stagnation energy for each grid point. With such a solution file and a grid file containing up to 50 grids as input, PLOT3D can calculate and graphically display any one of 74 functions, including shock waves, surface pressure, velocity vectors, and particle traces. PLOT3D's 74 functions are organized into five groups: 1) Grid Functions for grids, grid-checking, etc.; 2) Scalar Functions for contour or carpet plots of density, pressure, temperature, Mach number, vorticity magnitude, helicity, etc.; 3) Vector Functions for vector plots of velocity, vorticity, momentum, and density gradient, etc.; 4) Particle Trace Functions for rake-like plots of particle flow or vortex lines; and 5) Shock locations based on pressure gradient. TURB3D is a modification of PLOT3D which is used for viewing CFD simulations of incompressible turbulent flow. Input flow data consists of pressure, velocity and vorticity. Typical quantities to plot include local fluctuations in flow quantities and turbulent production terms, plotted in physical or wall units. PLOT3D/TURB3D includes both TURB3D and PLOT3D because the operation of TURB3D is identical to PLOT3D, and there is no additional sample data or printed documentation for TURB3D. Graphical capabilities of PLOT3D version 3.6b+ vary among the implementations available through COSMIC. Customers are encouraged to purchase and carefully review the PLOT3D manual before ordering the program for a specific computer and graphics library. There is only one manual for use with all implementations of PLOT3D, and although this manual generally assumes that the Silicon Graphics Iris implementation is being used, informative comments concerning other implementations appear throughout the text. With all implementations, the visual representation of the object and flow field created by PLOT3D consists of points, lines, and polygons. Points can be represented with dots or symbols, color can be used to denote data values, and perspective is used to show depth. Differences among implementations impact the program's ability to use graphical features that are based on 3D polygons, the user's ability to manipulate the graphical displays, and the user's ability to obtain alternate forms of output. In addition to providing the advantages of performing complex calculations on a supercomputer, the Supercomputer/IRIS implementation of PLOT3D offers advanced 3-D, view manipulation, and animation capabilities. Shading and hidden line/surface removal can be used to enhance depth perception and other aspects of the graphical displays. A mouse can be used to translate, rotate, or zoom in on views. Files for several types of output can be produced. Two animation options are available. Simple animation sequences can be created on the IRIS, or,if an appropriately modified version of ARCGRAPH (ARC-12350) is accesible on the supercomputer, files can be created for use in GAS (Graphics Animation System, ARC-12379), an IRIS program which offers more complex rendering and animation capabilities and options for recording images to digital disk, video tape, or 16-mm film. The version 3.6b+ Supercomputer/IRIS implementations of PLOT3D (ARC-12779) and PLOT3D/TURB3D (ARC-12784) are suitable for use on CRAY 2/UNICOS, CONVEX, and ALLIANT computers with a remote Silicon Graphics IRIS 2xxx/3xxx or IRIS 4D workstation. These programs are distributed on .25 inch magnetic tape cartridges in IRIS TAR format. Customers purchasing one implementation version of PLOT3D or PLOT3D/TURB3D will be given a $200 discount on each additional implementation version ordered at the same time. Version 3.6b+ of PLOT3D and PLOT3D/TURB3D are also supported for the following computers and graphics libraries: (1) Silicon Graphics IRIS 2xxx/3xxx or IRIS 4D workstations (ARC-12783, ARC-12782); (2) VAX computers running VMS Version 5.0 and DISSPLA Version 11.0 (ARC12777, ARC-12781); (3) generic UNIX and DISSPLA Version 11.0 (ARC-12788, ARC-12778); and (4) Apollo computers running UNIX and GMR3D Version 2.0 (ARC-12789, ARC-12785 - which have no capabilities to put text on plots). Silicon Graphics Iris, IRIS 4D, and IRIS 2xxx/3xxx are trademarks of Silicon Graphics Incorporated. VAX and VMS are trademarks of Digital Electronics Corporation. DISSPLA is a trademark of Computer Associates. CRAY 2 and UNICOS are trademarks of CRAY Research, Incorporated. CONVEX is a trademark of Convex Computer Corporation. Alliant is a trademark of Alliant. Apollo, DN10000, and GMR3D are trademarks of Hewlett-Packard, Incorporated. System V is a trademark of Bell Labs, Incorporated. BSD4.3 is a trademark of the University of California at Berkeley. UNIX is a registered trademark of AT&T.
PLOT3D/AMES, UNIX SUPERCOMPUTER AND SGI IRIS VERSION (WITH TURB3D)
NASA Technical Reports Server (NTRS)
Buning, P.
1994-01-01
PLOT3D is an interactive graphics program designed to help scientists visualize computational fluid dynamics (CFD) grids and solutions. Today, supercomputers and CFD algorithms can provide scientists with simulations of such highly complex phenomena that obtaining an understanding of the simulations has become a major problem. Tools which help the scientist visualize the simulations can be of tremendous aid. PLOT3D/AMES offers more functions and features, and has been adapted for more types of computers than any other CFD graphics program. Version 3.6b+ is supported for five computers and graphic libraries. Using PLOT3D, CFD physicists can view their computational models from any angle, observing the physics of problems and the quality of solutions. As an aid in designing aircraft, for example, PLOT3D's interactive computer graphics can show vortices, temperature, reverse flow, pressure, and dozens of other characteristics of air flow during flight. As critical areas become obvious, they can easily be studied more closely using a finer grid. PLOT3D is part of a computational fluid dynamics software cycle. First, a program such as 3DGRAPE (ARC-12620) helps the scientist generate computational grids to model an object and its surrounding space. Once the grids have been designed and parameters such as the angle of attack, Mach number, and Reynolds number have been specified, a "flow-solver" program such as INS3D (ARC-11794 or COS-10019) solves the system of equations governing fluid flow, usually on a supercomputer. Grids sometimes have as many as two million points, and the "flow-solver" produces a solution file which contains density, x- y- and z-momentum, and stagnation energy for each grid point. With such a solution file and a grid file containing up to 50 grids as input, PLOT3D can calculate and graphically display any one of 74 functions, including shock waves, surface pressure, velocity vectors, and particle traces. PLOT3D's 74 functions are organized into five groups: 1) Grid Functions for grids, grid-checking, etc.; 2) Scalar Functions for contour or carpet plots of density, pressure, temperature, Mach number, vorticity magnitude, helicity, etc.; 3) Vector Functions for vector plots of velocity, vorticity, momentum, and density gradient, etc.; 4) Particle Trace Functions for rake-like plots of particle flow or vortex lines; and 5) Shock locations based on pressure gradient. TURB3D is a modification of PLOT3D which is used for viewing CFD simulations of incompressible turbulent flow. Input flow data consists of pressure, velocity and vorticity. Typical quantities to plot include local fluctuations in flow quantities and turbulent production terms, plotted in physical or wall units. PLOT3D/TURB3D includes both TURB3D and PLOT3D because the operation of TURB3D is identical to PLOT3D, and there is no additional sample data or printed documentation for TURB3D. Graphical capabilities of PLOT3D version 3.6b+ vary among the implementations available through COSMIC. Customers are encouraged to purchase and carefully review the PLOT3D manual before ordering the program for a specific computer and graphics library. There is only one manual for use with all implementations of PLOT3D, and although this manual generally assumes that the Silicon Graphics Iris implementation is being used, informative comments concerning other implementations appear throughout the text. With all implementations, the visual representation of the object and flow field created by PLOT3D consists of points, lines, and polygons. Points can be represented with dots or symbols, color can be used to denote data values, and perspective is used to show depth. Differences among implementations impact the program's ability to use graphical features that are based on 3D polygons, the user's ability to manipulate the graphical displays, and the user's ability to obtain alternate forms of output. In addition to providing the advantages of performing complex calculations on a supercomputer, the Supercomputer/IRIS implementation of PLOT3D offers advanced 3-D, view manipulation, and animation capabilities. Shading and hidden line/surface removal can be used to enhance depth perception and other aspects of the graphical displays. A mouse can be used to translate, rotate, or zoom in on views. Files for several types of output can be produced. Two animation options are available. Simple animation sequences can be created on the IRIS, or,if an appropriately modified version of ARCGRAPH (ARC-12350) is accesible on the supercomputer, files can be created for use in GAS (Graphics Animation System, ARC-12379), an IRIS program which offers more complex rendering and animation capabilities and options for recording images to digital disk, video tape, or 16-mm film. The version 3.6b+ Supercomputer/IRIS implementations of PLOT3D (ARC-12779) and PLOT3D/TURB3D (ARC-12784) are suitable for use on CRAY 2/UNICOS, CONVEX, and ALLIANT computers with a remote Silicon Graphics IRIS 2xxx/3xxx or IRIS 4D workstation. These programs are distributed on .25 inch magnetic tape cartridges in IRIS TAR format. Customers purchasing one implementation version of PLOT3D or PLOT3D/TURB3D will be given a $200 discount on each additional implementation version ordered at the same time. Version 3.6b+ of PLOT3D and PLOT3D/TURB3D are also supported for the following computers and graphics libraries: (1) Silicon Graphics IRIS 2xxx/3xxx or IRIS 4D workstations (ARC-12783, ARC-12782); (2) VAX computers running VMS Version 5.0 and DISSPLA Version 11.0 (ARC12777, ARC-12781); (3) generic UNIX and DISSPLA Version 11.0 (ARC-12788, ARC-12778); and (4) Apollo computers running UNIX and GMR3D Version 2.0 (ARC-12789, ARC-12785 - which have no capabilities to put text on plots). Silicon Graphics Iris, IRIS 4D, and IRIS 2xxx/3xxx are trademarks of Silicon Graphics Incorporated. VAX and VMS are trademarks of Digital Electronics Corporation. DISSPLA is a trademark of Computer Associates. CRAY 2 and UNICOS are trademarks of CRAY Research, Incorporated. CONVEX is a trademark of Convex Computer Corporation. Alliant is a trademark of Alliant. Apollo, DN10000, and GMR3D are trademarks of Hewlett-Packard, Incorporated. System V is a trademark of Bell Labs, Incorporated. BSD4.3 is a trademark of the University of California at Berkeley. UNIX is a registered trademark of AT&T.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan
2010-01-01
Calibration of groundwater models involves hundreds to thousands of forward solutions, each of which may solve many transient coupled nonlinear partial differential equations, resulting in a computationally intensive problem. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelisms in software and hardware to reduce calibration time on multi-core computers. HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for direct solutions for a reactive transport model application, and a field-scale coupled flow and transport model application. In the reactive transport model, a single parallelizable loop is identified to account for over 97% of the total computational time using GPROF.more » Addition of a few lines of OpenMP compiler directives to the loop yields a speedup of about 10 on a 16-core compute node. For the field-scale model, parallelizable loops in 14 of 174 HGC5 subroutines that require 99% of the execution time are identified. As these loops are parallelized incrementally, the scalability is found to be limited by a loop where Cray PAT detects over 90% cache missing rates. With this loop rewritten, similar speedup as the first application is achieved. The OpenMP-parallelized code can be run efficiently on multiple workstations in a network or multiple compute nodes on a cluster as slaves using parallel PEST to speedup model calibration. To run calibration on clusters as a single task, the Levenberg Marquardt algorithm is added to HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, 100 200 compute cores are used to reduce the calibration time from weeks to a few hours for these two applications. This approach is applicable to most of the existing groundwater model codes for many applications.« less
Eigensolution of finite element problems in a completely connected parallel architecture
NASA Technical Reports Server (NTRS)
Akl, Fred A.; Morel, Michael R.
1989-01-01
A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.
Parallel eigenanalysis of finite element models in a completely connected architecture
NASA Technical Reports Server (NTRS)
Akl, F. A.; Morel, M. R.
1989-01-01
A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.
Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Secchi, Simone; Tumeo, Antonino; Villa, Oreste
Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less
Multiprocessing on supercomputers for computational aerodynamics
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Mehta, Unmeel B.
1990-01-01
Very little use is made of multiple processors available on current supercomputers (computers with a theoretical peak performance capability equal to 100 MFLOPs or more) in computational aerodynamics to significantly improve turnaround time. The productivity of a computer user is directly related to this turnaround time. In a time-sharing environment, the improvement in this speed is achieved when multiple processors are used efficiently to execute an algorithm. The concept of multiple instructions and multiple data (MIMD) through multi-tasking is applied via a strategy which requires relatively minor modifications to an existing code for a single processor. Essentially, this approach maps the available memory to multiple processors, exploiting the C-FORTRAN-Unix interface. The existing single processor code is mapped without the need for developing a new algorithm. The procedure for building a code utilizing this approach is automated with the Unix stream editor. As a demonstration of this approach, a Multiple Processor Multiple Grid (MPMG) code is developed. It is capable of using nine processors, and can be easily extended to a larger number of processors. This code solves the three-dimensional, Reynolds averaged, thin-layer and slender-layer Navier-Stokes equations with an implicit, approximately factored and diagonalized method. The solver is applied to generic oblique-wing aircraft problem on a four processor Cray-2 computer. A tricubic interpolation scheme is developed to increase the accuracy of coupling of overlapped grids. For the oblique-wing aircraft problem, a speedup of two in elapsed (turnaround) time is observed in a saturated time-sharing environment.
The accuracy of quantum chemical methods for large noncovalent complexes
Pitoňák, Michal; Řezáč, Jan; Pulay, Peter
2013-01-01
We evaluate the performance of the most widely used wavefunction, density functional theory, and semiempirical methods for the description of noncovalent interactions in a set of larger, mostly dispersion-stabilized noncovalent complexes (the L7 data set). The methods tested include MP2, MP3, SCS-MP2, SCS(MI)-MP2, MP2.5, MP2.X, MP2C, DFT-D, DFT-D3 (B3-LYP-D3, B-LYP-D3, TPSS-D3, PW6B95-D3, M06-2X-D3) and M06-2X, and semiempirical methods augmented with dispersion and hydrogen bonding corrections: SCC-DFTB-D, PM6-D, PM6-DH2 and PM6-D3H4. The test complexes are the octadecane dimer, the guanine trimer, the circumcoronene…adenine dimer, the coronene dimer, the guanine-cytosine dimer, the circumcoronene…guanine-cytosine dimer, and an amyloid fragment trimer containing phenylalanine residues. The best performing method is MP2.5 with relative root mean square deviation (rRMSD) of 4 %. It can thus be recommended as an alternative to the CCSD(T)/CBS (alternatively QCISD(T)/CBS) benchmark for molecular systems which exceed current computational capacity. The second best non-DFT method is MP2C with rRMSD of 8 %. A method with the most favorable “accuracy/cost” ratio belongs to the DFT family: BLYP-D3, with an rRMSD of 8 %. Semiempirical methods deliver less accurate results (the rRMSD exceeds 25 %). Nevertheless, their absolute errors are close to some much more expensive methods such as M06-2X, MP2 or SCS(MI)-MP2, and thus their price/performance ratio is excellent. PMID:24098094
NASA Technical Reports Server (NTRS)
Raju, I. S.; Newman, J. C., Jr.
1993-01-01
A computer program, surf3d, that uses the 3D finite-element method to calculate the stress-intensity factors for surface, corner, and embedded cracks in finite-thickness plates with and without circular holes, was developed. The cracks are assumed to be either elliptic or part eliptic in shape. The computer program uses eight-noded hexahedral elements to model the solid. The program uses a skyline storage and solver. The stress-intensity factors are evaluated using the force method, the crack-opening displacement method, and the 3-D virtual crack closure methods. In the manual the input to and the output of the surf3d program are described. This manual also demonstrates the use of the program and describes the calculation of the stress-intensity factors. Several examples with sample data files are included with the manual. To facilitate modeling of the user's crack configuration and loading, a companion program (a preprocessor program) that generates the data for the surf3d called gensurf was also developed. The gensurf program is a three dimensional mesh generator program that requires minimal input and that builds a complete data file for surf3d. The program surf3d is operational on Unix machines such as CRAY Y-MP, CRAY-2, and Convex C-220.
A Comparison Between the PLM and the MC68020 as Prolog Processors
1988-01-01
Continnt &OII P0111ter CP Memory X6_ofset(MP) A11ument Register 6 A6 Memory X7_ofset(MP) A11ument Register 7 A7 Memory X6_ofaet(MP) Tempor&ry Register 6...get_vuia.ble_Y iaput. Permeunt nria.ble Yi &Dd &rgumeat ~Jl8ler XJ output: fuDctioD move the content of Xj iato Yi get_va.na.ble_Y: move.! Xi.·4
NASA Astrophysics Data System (ADS)
Clay, M. P.; Buaria, D.; Gotoh, T.; Yeung, P. K.
2017-10-01
A new dual-communicator algorithm with very favorable performance characteristics has been developed for direct numerical simulation (DNS) of turbulent mixing of a passive scalar governed by an advection-diffusion equation. We focus on the regime of high Schmidt number (S c), where because of low molecular diffusivity the grid-resolution requirements for the scalar field are stricter than those for the velocity field by a factor √{ S c }. Computational throughput is improved by simulating the velocity field on a coarse grid of Nv3 points with a Fourier pseudo-spectral (FPS) method, while the passive scalar is simulated on a fine grid of Nθ3 points with a combined compact finite difference (CCD) scheme which computes first and second derivatives at eighth-order accuracy. A static three-dimensional domain decomposition and a parallel solution algorithm for the CCD scheme are used to avoid the heavy communication cost of memory transposes. A kernel is used to evaluate several approaches to optimize the performance of the CCD routines, which account for 60% of the overall simulation cost. On the petascale supercomputer Blue Waters at the University of Illinois, Urbana-Champaign, scalability is improved substantially with a hybrid MPI-OpenMP approach in which a dedicated thread per NUMA domain overlaps communication calls with computational tasks performed by a separate team of threads spawned using OpenMP nested parallelism. At a target production problem size of 81923 (0.5 trillion) grid points on 262,144 cores, CCD timings are reduced by 34% compared to a pure-MPI implementation. Timings for 163843 (4 trillion) grid points on 524,288 cores encouragingly maintain scalability greater than 90%, although the wall clock time is too high for production runs at this size. Performance monitoring with CrayPat for problem sizes up to 40963 shows that the CCD routines can achieve nearly 6% of the peak flop rate. The new DNS code is built upon two existing FPS and CCD codes. With the grid ratio Nθ /Nv = 8, the disparity in the computational requirements for the velocity and scalar problems is addressed by splitting the global communicator MPI_COMM_WORLD into disjoint communicators for the velocity and scalar fields, respectively. Inter-communicator transfer of the velocity field from the velocity communicator to the scalar communicator is handled with discrete send and non-blocking receive calls, which are overlapped with other operations on the scalar communicator. For production simulations at Nθ = 8192 and Nv = 1024 on 262,144 cores for the scalar field, the DNS code achieves 94% strong scaling relative to 65,536 cores and 92% weak scaling relative to Nθ = 1024 and Nv = 128 on 512 cores.
The Performance of the NAS HSPs in 1st Half of 1994
NASA Technical Reports Server (NTRS)
Bergeron, Robert J.; Walter, Howard (Technical Monitor)
1995-01-01
During the first six months of 1994, the NAS (National Airspace System) 16-CPU Y-MP C90 Von Neumann (VN) delivered an average throughput of 4.045 GFLOPS while the ACSF (Aeronautics Consolidated Supercomputer Facility) 8-CPU Y-MP C90 Eagle averaged 1.658 GFLOPS. The VN rate represents a machine efficiency of 26.3% whereas the Eagle rate corresponds to a machine efficiency of 21.6%. VN displayed a greater efficiency than Eagle primarily because the stronger workload demand for its CPU cycles allowed it to devote more time to user programs and less time to idle. An additional factor increasing VN efficiency was the ability of the UNICOS 8.0 Operating System to deliver a larger fraction of CPU time to user programs. Although measurements indicate increasing vector length for both workloads, insufficient vector lengths continue to hinder HSP (High Speed Processor) performance. To improve HSP performance, NAS should continue to encourage the HSP users to modify their codes to increase program vector length.
Parallel Performance Optimizations on Unstructured Mesh-based Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas
2015-01-01
© The Authors. Published by Elsevier B.V. This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cachemore » efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less
Thought Leaders during Crises in Massive Social Networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Corley, Courtney D.; Farber, Robert M.; Reynolds, William
The vast amount of social media data that can be gathered from the internet coupled with workflows that utilize both commodity systems and massively parallel supercomputers, such as the Cray XMT, open new vistas for research to support health, defense, and national security. Computer technology now enables the analysis of graph structures containing more than 4 billion vertices joined by 34 billion edges along with metrics and massively parallel algorithms that exhibit near-linear scalability according to number of processors. The challenge lies in making this massive data and analysis comprehensible to an analyst and end-users that require actionable knowledge tomore » carry out their duties. Simply stated, we have developed language and content agnostic techniques to reduce large graphs built from vast media corpora into forms people can understand. Specifically, our tools and metrics act as a survey tool to identify thought leaders' -- those members that lead or reflect the thoughts and opinions of an online community, independent of the source language.« less
Exciting Quantized Vortex Rings in a Superfluid Unitary Fermi Gas
NASA Astrophysics Data System (ADS)
Bulgac, Aurel
2014-03-01
In a recent article, Yefsah et al., Nature 499, 426 (2013) report the observation of an unusual quantum excitation mode in an elongated harmonically trapped unitary Fermi gas. After phase imprinting a domain wall, they observe collective oscillations of the superfluid atomic cloud with a period almost an order of magnitude larger than that predicted by any theory of domain walls, which they interpret as a possible new quantum phenomenon dubbed ``a heavy soliton'' with an inertial mass some 50 times larger than one expected for a domain wall. We present compelling evidence that this ``heavy soliton'' is instead a quantized vortex ring by showing that the main aspects of the experiment can be naturally explained within an extension of the time-dependent density functional theory (TDDFT) to superfluid systems. The numerical simulations required the solution of some 260,000 nonlinear coupled time-dependent 3-dimensional partial differential equations and was implemented on 2048 GPUs on the Cray XK7 supercomputer Titan of the Oak Ridge Leadership Computing Facility.
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak
1999-01-01
The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
Parallelization of the FLAPW method and comparison with the PPW method
NASA Astrophysics Data System (ADS)
Canning, Andrew; Mannstadt, Wolfgang; Freeman, Arthur
2000-03-01
The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. In the past the FLAPW method has been limited to systems of about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell running on up to 512 processors on a Cray T3E parallel supercomputer. Some results will also be presented on a comparison of the plane-wave pseudopotential method and the FLAPW method on large systems.
Query optimization for graph analytics on linked data using SPARQL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hong, Seokyong; Lee, Sangkeun; Lim, Seung -Hwan
2015-07-01
Triplestores that support query languages such as SPARQL are emerging as the preferred and scalable solution to represent data and meta-data as massive heterogeneous graphs using Semantic Web standards. With increasing adoption, the desire to conduct graph-theoretic mining and exploratory analysis has also increased. Addressing that desire, this paper presents a solution that is the marriage of Graph Theory and the Semantic Web. We present software that can analyze Linked Data using graph operations such as counting triangles, finding eccentricity, testing connectedness, and computing PageRank directly on triple stores via the SPARQL interface. We describe the process of optimizing performancemore » of the SPARQL-based implementation of such popular graph algorithms by reducing the space-overhead, simplifying iterative complexity and removing redundant computations by understanding query plans. Our optimized approach shows significant performance gains on triplestores hosted on stand-alone workstations as well as hardware-optimized scalable supercomputers such as the Cray XMT.« less
Parallelized reliability estimation of reconfigurable computer networks
NASA Technical Reports Server (NTRS)
Nicol, David M.; Das, Subhendu; Palumbo, Dan
1990-01-01
A parallelized system, ASSURE, for computing the reliability of embedded avionics flight control systems which are able to reconfigure themselves in the event of failure is described. ASSURE accepts a grammar that describes a reliability semi-Markov state-space. From this it creates a parallel program that simultaneously generates and analyzes the state-space, placing upper and lower bounds on the probability of system failure. ASSURE is implemented on a 32-node Intel iPSC/860, and has achieved high processor efficiencies on real problems. Through a combination of improved algorithms, exploitation of parallelism, and use of an advanced microprocessor architecture, ASSURE has reduced the execution time on substantial problems by a factor of one thousand over previous workstation implementations. Furthermore, ASSURE's parallel execution rate on the iPSC/860 is an order of magnitude faster than its serial execution rate on a Cray-2 supercomputer. While dynamic load balancing is necessary for ASSURE's good performance, it is needed only infrequently; the particular method of load balancing used does not substantially affect performance.
The design and implementation of a parallel unstructured Euler solver using software primitives
NASA Technical Reports Server (NTRS)
Das, R.; Mavriplis, D. J.; Saltz, J.; Gupta, S.; Ponnusamy, R.
1992-01-01
This paper is concerned with the implementation of a three-dimensional unstructured grid Euler-solver on massively parallel distributed-memory computer architectures. The goal is to minimize solution time by achieving high computational rates with a numerically efficient algorithm. An unstructured multigrid algorithm with an edge-based data structure has been adopted, and a number of optimizations have been devised and implemented in order to accelerate the parallel communication rates. The implementation is carried out by creating a set of software tools, which provide an interface between the parallelization issues and the sequential code, while providing a basis for future automatic run-time compilation support. Large practical unstructured grid problems are solved on the Intel iPSC/860 hypercube and Intel Touchstone Delta machine. The quantitative effect of the various optimizations are demonstrated, and we show that the combined effect of these optimizations leads to roughly a factor of three performance improvement. The overall solution efficiency is compared with that obtained on the CRAY-YMP vector supercomputer.
Seismic signal processing on heterogeneous supercomputers
NASA Astrophysics Data System (ADS)
Gokhberg, Alexey; Ermert, Laura; Fichtner, Andreas
2015-04-01
The processing of seismic signals - including the correlation of massive ambient noise data sets - represents an important part of a wide range of seismological applications. It is characterized by large data volumes as well as high computational input/output intensity. Development of efficient approaches towards seismic signal processing on emerging high performance computing systems is therefore essential. Heterogeneous supercomputing systems introduced in the recent years provide numerous computing nodes interconnected via high throughput networks, every node containing a mix of processing elements of different architectures, like several sequential processor cores and one or a few graphical processing units (GPU) serving as accelerators. A typical representative of such computing systems is "Piz Daint", a supercomputer of the Cray XC 30 family operated by the Swiss National Supercomputing Center (CSCS), which we used in this research. Heterogeneous supercomputers provide an opportunity for manifold application performance increase and are more energy-efficient, however they have much higher hardware complexity and are therefore much more difficult to program. The programming effort may be substantially reduced by the introduction of modular libraries of software components that can be reused for a wide class of seismology applications. The ultimate goal of this research is design of a prototype for such library suitable for implementing various seismic signal processing applications on heterogeneous systems. As a representative use case we have chosen an ambient noise correlation application. Ambient noise interferometry has developed into one of the most powerful tools to image and monitor the Earth's interior. Future applications will require the extraction of increasingly small details from noise recordings. To meet this demand, more advanced correlation techniques combined with very large data volumes are needed. This poses new computational problems that require dedicated HPC solutions. The chosen application is using a wide range of common signal processing methods, which include various IIR filter designs, amplitude and phase correlation, computing the analytic signal, and discrete Fourier transforms. Furthermore, various processing methods specific for seismology, like rotation of seismic traces, are used. Efficient implementation of all these methods on the GPU-accelerated systems represents several challenges. In particular, it requires a careful distribution of work between the sequential processors and accelerators. Furthermore, since the application is designed to process very large volumes of data, special attention had to be paid to the efficient use of the available memory and networking hardware resources in order to reduce intensity of data input and output. In our contribution we will explain the software architecture as well as principal engineering decisions used to address these challenges. We will also describe the programming model based on C++ and CUDA that we used to develop the software. Finally, we will demonstrate performance improvements achieved by using the heterogeneous computing architecture. This work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project ID d26.
Shrimankar, D D; Sathe, S R
2016-01-01
Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.
Shrimankar, D. D.; Sathe, S. R.
2016-01-01
Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
GASNet-EX Performance Improvements Due to Specialization for the Cray Aries Network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hargrove, Paul H.; Bonachea, Dan
This document is a deliverable for milestone STPM17-6 of the Exascale Computing Project, delivered by WBS 2.3.1.14. It reports on the improvements in performance observed on Cray XC-series systems due to enhancements made to the GASNet-EX software. These enhancements, known as “specializations”, primarily consist of replacing network-independent implementations of several recently added features with implementations tailored to the Cray Aries network. Performance gains from specialization include (1) Negotiated-Payload Active Messages improve bandwidth of a ping-pong test by up to 14%, (2) Immediate Operations reduce running time of a synthetic benchmark by up to 93%, (3) non-bulk RMA Put bandwidth ismore » increased by up to 32%, (4) Remote Atomic performance is 70% faster than the reference on a point-to-point test and allows a hot-spot test to scale robustly, and (5) non-contiguous RMA interfaces see up to 8.6x speedups for an intra-node benchmark and 26% for inter-node. These improvements are available in the GASNet-EX 2018.3.0 release.« less
A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gittens, Alex; Kottalam, Jey; Yang, Jiyan
We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2 cluster, a Cray XC40 system, and an experimental Cray cluster. We implemented this factorization both as a parallelized C implementation with hand-tuned optimizations and in Scala using the Apache Spark high-level cluster computing framework. We obtained consistent performance across the three platforms: using Spark we were able to process the 1TB size dataset in under 30 minutes with 960 cores on all systems, with themore » fastest times obtained on the experimental Cray cluster. In comparison, the C implementation was 21X faster on the Amazon EC2 system, due to careful cache optimizations, bandwidth-friendly access of matrices and vector computation using SIMD units. We report these results and their implications on the hardware and software issues arising in supporting data-centric workloads in parallel and distributed environments.« less
DOT National Transportation Integrated Search
2008-12-01
The appendix includes various ramp flow and ML speed-flow plots: OR-217 NB, 72nd MP 6.61; OR-217 NB, 99W-EB MP 5.9; OR-217 NB, 99W-WB MP 5.85; OR-217 NB, Greenburg MP 4.65; OR-217 NB, Scholls MP 3.85; OR-217 NB, Denney MP 2.68; OR-217 NB, Allen MP 2....
LARCRIM user's guide, version 1.0
NASA Technical Reports Server (NTRS)
Davis, John S.; Heaphy, William J.
1993-01-01
LARCRIM is a relational database management system (RDBMS) which performs the conventional duties of an RDBMS with the added feature that it can store attributes which consist of arrays or matrices. This makes it particularly valuable for scientific data management. It is accessible as a stand-alone system and through an application program interface. The stand-alone system may be executed in two modes: menu or command. The menu mode prompts the user for the input required to create, update, and/or query the database. The command mode requires the direct input of LARCRIM commands. Although LARCRIM is an update of an old database family, its performance on modern computers is quite satisfactory. LARCRIM is written in FORTRAN 77 and runs under the UNIX operating system. Versions have been released for the following computers: SUN (3 & 4), Convex, IRIS, Hewlett-Packard, CRAY 2 & Y-MP.
DOT National Transportation Integrated Search
2008-12-01
The appendix includes various speed flow plots, including: I-205 NB, Gladstone MP 11.05; I-205 NB, Gladstone Hway MP 12.94; I-205 NB, Lawnfield MP 13.58; I-205 NB, Sunnybrook MP 14.32; I-205 NB, Sunnyside MP 14.7; I-205 NB, Johnson Creek MP 16.2; I-2...
Cao, Xu-Ni; Lin, Li; Zhou, Yu-Yan; Shi, Guo-Yue; Zhang, Wen; Yamamoto, Katsunobu; Jin, Li-Tong
2003-07-27
In this paper, multi-wall carbon nanotubes functionalized with carboxylic groups modified electrode (MWNT-COOH CME) was fabricated. This chemically modified electrode (CME) can be used as the working electrode in the liquid chromatography for the determination of 6-mercaptopurine (6-MP). The results indicate that the CME exhibits efficiently electrocatalytic oxidation for 6-MP with relatively high sensitivity, stability and long-life. The peak currents of 6-MP are linear to its concentrations ranging from 4.0 x 10(-7) to 1.0 x 10(-4) mol l(-1) with the calculated detection limit (S/N=3) of 2.0 x 10(-7) mol l(-1). Coupled with microdialysis, the method has been successfully applied to the pharmacokinetic study of 6-MP in rabbit blood. This method provides a fast, sensible and simple technique for the pharmacokinetic study of 6-MP in vivo.
NASA Technical Reports Server (NTRS)
Edwards, Jack R.; Mcrae, D. S.
1993-01-01
An efficient implicit method for the computation of steady, three-dimensional, compressible Navier-Stokes flowfields is presented. A nonlinear iteration strategy based on planar Gauss-Seidel sweeps is used to drive the solution toward a steady state, with approximate factorization errors within a crossflow plane reduced by the application of a quasi-Newton technique. A hybrid discretization approach is employed, with flux-vector splitting utilized in the streamwise direction and central differences with artificial dissipation used for the transverse fluxes. Convergence histories and comparisons with experimental data are presented for several 3-D shock-boundary layer interactions. Both laminar and turbulent cases are considered, with turbulent closure provided by a modification of the Baldwin-Barth one-equation model. For the problems considered (175,000-325,000 mesh points), the algorithm provides steady-state convergence in 900-2000 CPU seconds on a single processor of a Cray Y-MP.
A new procedure for dynamic adaption of three-dimensional unstructured grids
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Strawn, Roger
1993-01-01
A new procedure is presented for the simultaneous coarsening and refinement of three-dimensional unstructured tetrahedral meshes. This algorithm allows for localized grid adaption that is used to capture aerodynamic flow features such as vortices and shock waves in helicopter flowfield simulations. The mesh-adaption algorithm is implemented in the C programming language and uses a data structure consisting of a series of dynamically-allocated linked lists. These lists allow the mesh connectivity to be rapidly reconstructed when individual mesh points are added and/or deleted. The algorithm allows the mesh to change in an anisotropic manner in order to efficiently resolve directional flow features. The procedure has been successfully implemented on a single processor of a Cray Y-MP computer. Two sample cases are presented involving three-dimensional transonic flow. Computed results show good agreement with conventional structured-grid solutions for the Euler equations.
Development of a CRAY 1 version of the SINDA program. [thermo-structural analyzer program
NASA Technical Reports Server (NTRS)
Juba, S. M.; Fogerson, P. E.
1982-01-01
The SINDA thermal analyzer program was transferred from the UNIVAC 1110 computer to a CYBER And then to a CRAY 1. Significant changes to the code of the program were required in order to execute efficiently on the CYBER and CRAY. The program was tested on the CRAY using a thermal math model of the shuttle which was too large to run on either the UNIVAC or CYBER. An effort was then begun to further modify the code of SINDA in order to make effective use of the vector capabilities of the CRAY.
1982-12-01
u z w Li 0 -1 .5 -C~mp ",0.9800 + - Cpmp -,.074 "-a - Cm -am .8144 .-. •- Cp~mp -. 0191 So - Cp0mp -. 02413 :. x - Cjp - .0l381 • -n - Cpap - .808 I-2...b.4. I ° A 0- CmJmp - .0492 0 +- - Cm p - .431 a - Cpmp - .3355 0 f - Clamp - .0293 a - Cpmp - .0228 x - Cpimp - .0171 n - Cpap - .6693 1.0 n 0 .01
Highly parallel implementation of non-adiabatic Ehrenfest molecular dynamics
NASA Astrophysics Data System (ADS)
Kanai, Yosuke; Schleife, Andre; Draeger, Erik; Anisimov, Victor; Correa, Alfredo
2014-03-01
While the adiabatic Born-Oppenheimer approximation tremendously lowers computational effort, many questions in modern physics, chemistry, and materials science require an explicit description of coupled non-adiabatic electron-ion dynamics. Electronic stopping, i.e. the energy transfer of a fast projectile atom to the electronic system of the target material, is a notorious example. We recently implemented real-time time-dependent density functional theory based on the plane-wave pseudopotential formalism in the Qbox/qb@ll codes. We demonstrate that explicit integration using a fourth-order Runge-Kutta scheme is very suitable for modern highly parallelized supercomputers. Applying the new implementation to systems with hundreds of atoms and thousands of electrons, we achieved excellent performance and scalability on a large number of nodes both on the BlueGene based ``Sequoia'' system at LLNL as well as the Cray architecture of ``Blue Waters'' at NCSA. As an example, we discuss our work on computing the electronic stopping power of aluminum and gold for hydrogen projectiles, showing an excellent agreement with experiment. These first-principles calculations allow us to gain important insight into the the fundamental physics of electronic stopping.
NASA Technical Reports Server (NTRS)
Konopliv, Alexander S.; Sjogren, William L.
1996-01-01
This report documents the Venus gravity methods and results to date (model MGNP90LSAAP). It is called a handbook in that it contains many useful plots (such as geometry and orbit behavior) that are useful in evaluating the tracking data. We discuss the models that are used in processing the Doppler data and the estimation method for determining the gravity field. With Pioneer Venus Orbiter and Magellan tracking data, the Venus gravity field was determined complete to degree and order 90 with the use of the JPL Cray T3D Supercomputer. The gravity field shows unprecedented high correlation with topography and resolution of features to the 2OOkm resolution. In the procedure for solving the gravity field, other information is gained as well, and, for example, we discuss results for the Venus ephemeris, Love number, pole orientation of Venus, and atmospheric densities. Of significance is the Love number solution which indicates a liquid core for Venus. The ephemeris of Venus is determined to an accuracy of 0.02 mm/s (tens of meters in position), and the rotation period to 243.0194 +/- 0.0002 days.
Data Transfer Study HPSS Archiving
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wynne, James; Parete-Koon, Suzanne T; Mitchell, Quinn
2015-01-01
The movement of the large amounts of data produced by codes run in a High Performance Computing (HPC) environment can be a bottleneck for project workflows. To balance filesystem capacity and performance requirements, HPC centers enforce data management policies to purge old files to make room for new computation and analysis results. Users at Oak Ridge Leadership Computing Facility (OLCF) and many other HPC user facilities must archive data to avoid data loss during purges, therefore the time associated with data movement for archiving is something that all users must consider. This study observed the difference in transfer speed frommore » the originating location on the Lustre filesystem to the more permanent High Performance Storage System (HPSS). The tests were done with a number of different transfer methods for files that spanned a variety of sizes and compositions that reflect OLCF user data. This data will be used to help users of Titan and other Cray supercomputers plan their workflow and data transfers so that they are most efficient for their project. We will also discuss best practice for maintaining data at shared user facilities.« less
Accelerating Science with the NERSC Burst Buffer Early User Program
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhimji, Wahid; Bard, Debbie; Romanus, Melissa
NVRAM-based Burst Buffers are an important part of the emerging HPC storage landscape. The National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory recently installed one of the first Burst Buffer systems as part of its new Cori supercomputer, collaborating with Cray on the development of the DataWarp software. NERSC has a diverse user base comprised of over 6500 users in 700 different projects spanning a wide variety of scientific computing applications. The use-cases of the Burst Buffer at NERSC are therefore also considerable and diverse. We describe here performance measurements and lessons learned from the Burstmore » Buffer Early User Program at NERSC, which selected a number of research projects to gain early access to the Burst Buffer and exercise its capability to enable new scientific advancements. To the best of our knowledge this is the first time a Burst Buffer has been stressed at scale by diverse, real user workloads and therefore these lessons will be of considerable benefit to shaping the developing use of Burst Buffers at HPC centers.« less
NASA Astrophysics Data System (ADS)
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; Ng, Esmond G.; Maris, Pieter; Vary, James P.
2018-01-01
We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.
A leap forward with UTK s Cray XC30
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fahey, Mark R
2014-01-01
This paper shows a significant productivity leap for several science groups and the accomplishments they have made to date on Darter - a Cray XC30 at the University of Tennessee Knoxville. The increased productivity is due to faster processors and interconnect combined in a new generation from Cray, and yet it still has a very similar programming environment as compared to previous generations of Cray machines that makes porting easy.
Late evolution of very low mass X-ray binaries sustained by radiation from their primaries
NASA Technical Reports Server (NTRS)
Ruderman, M.; Shaham, J.; Tavani, M.; Eichler, D.
1989-01-01
The accretion-powered radiation from the X-ray pulsar system Her X-1 (McCray et al. 1982) is studied. The changes in the soft X-ray and gamma-ray flux and in the accompanying electron-positron wind are discussed. These are believed to be associated with the inward movement of the inner edge of the accretion disk corresponding to the boundary with the neutron star's corotating magnetosphere (Alfven radius). LMXB evolution which is self-sustained by secondary winds intercepting the radiation emitted near an LMXB neutron star is investigated as well.
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Jespersen, Dennis; Buning, Peter; Bailey, David (Technical Monitor)
1996-01-01
The Gorden Bell Prizes given out at Supercomputing every year includes at least two catergories: performance (highest GFLOP count) and price-performance (GFLOP/million $$) for real applications. In the past five years, the winners of the price-performance categories all came from networks of work-stations. This reflects three important facts: 1. supercomputers are still too expensive for the masses; 2. achieving high performance for real applications takes real work; and, most importantly; 3. it is possible to obtain acceptable performance for certain real applications on network of work stations. With the continued advance of network technology as well as increased performance of "desktop" workstation, the "Swarm of Ants vs. Herd of Elephants" debate, which began with vector multiprocessors (VPPs) against SIMD type multiprocessors (e.g. CM2), is now recast as VPPs against Symetric Multiprocessors (SMPs, e.g. SGI PowerChallenge). This paper reports on performance studies we performed solving a large scale (2-million grid pt.s) CFD problem involving a Boeing 747 based on a parallel version of OVERFLOW that utilizes message passing on PVM. A performance monitoring tool developed under NASA HPCC, called AIMS, was used to instrument and analyze the the performance data thus obtained. We plan to compare its performance data obtained across a wide spectrum of architectures including: the Cray C90, IBM/SP2, SGI/Power Challenge Cluster, to a group of workstations connected over a simple network. The metrics of comparison includes speed-up, price-performance, throughput, and turn-around time. We also plan to present a plan of attack for various issues that will make the execution of Grand Challenge Applications across the Global Information Infrastructure a reality.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bland, Arthur S Buddy; Hack, James J; Baker, Ann E
Oak Ridge National Laboratory's (ORNL's) Cray XT5 supercomputer, Jaguar, kicked off the era of petascale scientific computing in 2008 with applications that sustained more than a thousand trillion floating point calculations per second - or 1 petaflop. Jaguar continues to grow even more powerful as it helps researchers broaden the boundaries of knowledge in virtually every domain of computational science, including weather and climate, nuclear energy, geosciences, combustion, bioenergy, fusion, and materials science. Their insights promise to broaden our knowledge in areas that are vitally important to the Department of Energy (DOE) and the nation as a whole, particularly energymore » assurance and climate change. The science of the 21st century, however, will demand further revolutions in computing, supercomputers capable of a million trillion calculations a second - 1 exaflop - and beyond. These systems will allow investigators to continue attacking global challenges through modeling and simulation and to unravel longstanding scientific questions. Creating such systems will also require new approaches to daunting challenges. High-performance systems of the future will need to be codesigned for scientific and engineering applications with best-in-class communications networks and data-management infrastructures and teams of skilled researchers able to take full advantage of these new resources. The Oak Ridge Leadership Computing Facility (OLCF) provides the nation's most powerful open resource for capability computing, with a sustainable path that will maintain and extend national leadership for DOE's Office of Science (SC). The OLCF has engaged a world-class team to support petascale science and to take a dramatic step forward, fielding new capabilities for high-end science. This report highlights the successful delivery and operation of a petascale system and shows how the OLCF fosters application development teams, developing cutting-edge tools and resources for next-generation systems.« less
NASA Technical Reports Server (NTRS)
Lin, Yuh-Lang; Kaplan, Michael L.
1992-01-01
Work performed during the report period is summarized. The first numerical experiment which was performed on the North Carolina Supercomputer Center's CRAY-YMP machine during the second half of FY92 involved a 36 hour simulation of the CCOPE case study. This first coarse-mesh simulation employed the GMASS model with a 178 x 108 x 32 matrix of grid points spaced approximately 24 km apart. The initial data was comprised of the global 2.5 x 2.5 degree analyses as well as all available North American rawinsonde data valid at 0000 UTC 11 July 1981. Highly-smoothed LFM-derived terrain data were utilized so as to determine the mesoscale response of the three-dimensional atmosphere to weak terrain forcing prior to including the observed highly complex terrain of the northern Rocky Mountain region. It was felt that the model should be run with a spectrum of terrain geometries, ranging from observed complex terrain to no terrain at all, to determine how crucial the terrain was in forcing the mesoscale phenomena. Both convection and stratiform (stable) precipitation were not allowed in this simulation so that their relative importance could be determined by inclusion in forth-coming simulations. A full suite of planetary boundary layer forcing was allowed in the simulation, including surface sensible and latent heat fluxes employing the Blakadar PBL formulation. The details of this simulation, which in many ways could be considered the control simulation, including the important synoptic-scale, meso-alpha scale, and meso-beta scale circulations is described. These results are compared to the observations diagnosed by Koch and his colleagues as well as hypotheses set forth in the project proposal for terrain-influences upon the jet stream and their role in the generation of mesoscale wave phenomenon. The fundamental goal of the analyses being the discrimination among background geostrophic adjustment, terrain influences, and shearing instability in the initiation and maintainance of mesoscale internal wave phenomena. Based upon these findings, FY93 plans are discussed. A review of linear theory and theoretical modeling of a geostrophic zonal wind anomaly is included.
NASA Technical Reports Server (NTRS)
Wilkes, Belinda; Lavoie, Anthony R. (Technical Monitor)
2000-01-01
The launch of the Chandra X-ray Observatory in July 2000 opened a new era in X-ray astronomy. Its unprecedented, < 1" spatial resolution and low background is providing views of the X-ray sky 10-100 times fainter than previously possible. We have begun to carry out a serendipitous survey of the X-ray sky using Chandra archival data to flux limits covering the range between those reached by current satellites and those of the small area Chandra deep surveys. We estimate the survey will cover about 8 sq.deg. per year to X-ray fluxes (2-10 keV) in the range 10(exp -13) - 6(exp -16) erg cm2/s and include about 3000 sources per year, roughly two thirds of which are expected to be active galactic nuclei (AGN). Optical imaging of the ChaMP fields is underway at NOAO and SAO telescopes using g',r',z' colors with which we will be able to classify the X-ray sources into object types and, in some cases, estimate their redshifts. We are also planning to obtain optical spectroscopy of a well-defined subset to allow confirmation of classification and redshift determination. All X-ray and optical results and supporting optical data will be place in the ChaMP archive within a year of the completion of our data analysis. Over the five years of Chandra operations, ChaMP will provide both a major resource for Chandra observers and a key research tool for the study of the cosmic X-ray background and the individual source populations which comprise it. ChaMP promises profoundly new science return on a number of key questions at the current frontier of many areas of astronomy including solving the spectral paradox by resolving the CXRB, locating and studying high redshift clusters and so constraining cosmological parameters, defining the true, possibly absorbed, population of quasars and studying coronal emission from late-type stars as their cores become fully convective. The current status and initial results from the ChaMP will be presented.
Exploring Accelerating Science Applications with FPGAs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Storaasli, Olaf O; Strenski, Dave
2007-01-01
FPGA hardware and tools (VHDL, Viva, MitrionC and CHiMPS) are described. FPGA performance is evaluated on two Cray XD1 systems (Virtex-II Pro 50 and Virtex-4 LX160) for human genome (DNA and protein) sequence comparisons for a computational biology code (FASTA). Scalable FPGA speedups of 50X (Virtex-II) and 100X (Virtex-4) over a 2.2 GHz Opteron were achieved. Coding and IO issues faced for human genome data are described.
NASA Technical Reports Server (NTRS)
Tarshish, Adina; Salmon, Ellen
1994-01-01
In October 1992, the NASA Center for Computational Sciences made its Convex-based UniTree system generally available to users. The ensuing months saw growth in every area. Within 26 months, data under UniTree control grew from nil to over 12 terabytes, nearly all of it stored on robotically mounted tape. HiPPI/UltraNet was added to enhance connectivity, and later HiPPI/TCP was added as well. Disks and robotic tape silos were added to those already under UniTree's control, and 18-track tapes were upgraded to 36-track. The primary data source for UniTree, the facility's Cray Y-MP/4-128, first doubled its processing power and then was replaced altogether by a C98/6-256 with nearly two-and-a-half times the Y-MP's combined peak gigaflops. The Convex/UniTree software was upgraded from version 1.5 to 1.7.5, and then to 1.7.6. Finally, the server itself, a Convex C3240, was upgraded to a C3830 with a second I/O bay, doubling the C3240's memory and capacity for I/O. This paper describes insights gained and reinforced with the burgeoning demands on the UniTree storage system and the significant increases in performance gained from the many upgrades.
Space Radar Image of Mammoth Mountain, California
1999-05-01
These two false-color composite images of the Mammoth Mountain area in the Sierra Nevada Mountains, Calif., show significant seasonal changes in snow cover. The image at left was acquired by the Spaceborne Imaging Radar-C and X-band Synthetic Aperture Radar aboard the space shuttle Endeavour on its 67th orbit on April 13, 1994. The image is centered at 37.6 degrees north latitude and 119 degrees west longitude. The area is about 36 kilometers by 48 kilometers (22 miles by 29 miles). In this image, red is L-band (horizontally transmitted and vertically received) polarization data; green is C-band (horizontally transmitted and vertically received) polarization data; and blue is C-band (horizontally transmitted and received) polarization data. The image at right was acquired on October 3, 1994, on the space shuttle Endeavour's 67th orbit of the second radar mission. Crowley Lake appears dark at the center left of the image, just above or south of Long Valley. The Mammoth Mountain ski area is visible at the top right of the scene. The red areas correspond to forests, the dark blue areas are bare surfaces and the green areas are short vegetation, mainly brush. The changes in color tone at the higher elevations (e.g. the Mammoth Mountain ski area) from green-blue in April to purple in September reflect changes in snow cover between the two missions. The April mission occurred immediately following a moderate snow storm. During the mission the snow evolved from a dry, fine-grained snowpack with few distinct layers to a wet, coarse-grained pack with multiple ice inclusions. Since that mission, all snow in the area has melted except for small glaciers and permanent snowfields on the Silver Divide and near the headwaters of Rock Creek. On October 3, 1994, only discontinuous patches of snow cover were present at very high elevations following the first snow storm of the season on September 28, 1994. For investigations in hydrology and land-surface climatology, seasonal snow cover and alpine glaciers are critical to the radiation and water balances. SIR-C/X-SAR is a powerful tool because it is sensitive to most snowpack conditions and is less influenced by weather conditions than other remote sensing instruments, such as Landsat. In parallel with the operational SIR-C data processing, an experimental effort is being conducted to test SAR data processing using the Jet Propulsion Laboratory's massively parallel supercomputing facility, centered around the Cray Research T3D. These experiments will assess the abilities of large supercomputers to produce high throughput SAR processing in preparation for upcoming data-intensive SAR missions. The images released here were produced as part of this experimental effort. http://photojournal.jpl.nasa.gov/catalog/PIA01753
Strategies for vectorizing the sparse matrix vector product on the CRAY XMP, CRAY 2, and CYBER 205
NASA Technical Reports Server (NTRS)
Bauschlicher, Charles W., Jr.; Partridge, Harry
1987-01-01
Large, randomly sparse matrix vector products are important in a number of applications in computational chemistry, such as matrix diagonalization and the solution of simultaneous equations. Vectorization of this process is considered for the CRAY XMP, CRAY 2, and CYBER 205, using a matrix of dimension of 20,000 with from 1 percent to 6 percent nonzeros. Efficient scatter/gather capabilities add coding flexibility and yield significant improvements in performance. For the CYBER 205, it is shown that minor changes in the IO can reduce the CPU time by a factor of 50. Similar changes in the CRAY codes make a far smaller improvement.
The reliability of dental x-ray film in assessment of MP3 stages of the pubertal growth spurt.
Abdel-Kader, H M
1998-10-01
The main object of this clinical study is to provide a simple and practical method to assess the pubertal growth spurt stages of a subject by recording MP3 stages with the dental periapical radiograph and the standard dental x-ray machine.
ARCGRAPH SYSTEM - AMES RESEARCH GRAPHICS SYSTEM
NASA Technical Reports Server (NTRS)
Hibbard, E. A.
1994-01-01
Ames Research Graphics System, ARCGRAPH, is a collection of libraries and utilities which assist researchers in generating, manipulating, and visualizing graphical data. In addition, ARCGRAPH defines a metafile format that contains device independent graphical data. This file format is used with various computer graphics manipulation and animation packages at Ames, including SURF (COSMIC Program ARC-12381) and GAS (COSMIC Program ARC-12379). In its full configuration, the ARCGRAPH system consists of a two stage pipeline which may be used to output graphical primitives. Stage one is associated with the graphical primitives (i.e. moves, draws, color, etc.) along with the creation and manipulation of the metafiles. Five distinct data filters make up stage one. They are: 1) PLO which handles all 2D vector primitives, 2) POL which handles all 3D polygonal primitives, 3) RAS which handles all 2D raster primitives, 4) VEC which handles all 3D raster primitives, and 5) PO2 which handles all 2D polygonal primitives. Stage two is associated with the process of displaying graphical primitives on a device. To generate the various graphical primitives, create and reprocess ARCGRAPH metafiles, and access the device drivers in the VDI (Video Device Interface) library, users link their applications to ARCGRAPH's GRAFIX library routines. Both FORTRAN and C language versions of the GRAFIX and VDI libraries exist for enhanced portability within these respective programming environments. The ARCGRAPH libraries were developed on a VAX running VMS. Minor documented modification of various routines, however, allows the system to run on the following computers: Cray X-MP running COS (no C version); Cray 2 running UNICOS; DEC VAX running BSD 4.3 UNIX, or Ultrix; SGI IRIS Turbo running GL2-W3.5 and GL2-W3.6; Convex C1 running UNIX; Amhdahl 5840 running UTS; Alliant FX8 running UNIX; Sun 3/160 running UNIX (no native device driver); Stellar GS1000 running Stellex (no native device driver); and an SGI IRIS 4D running IRIX (no native device driver). Currently with version 7.0 of ARCGRAPH, the VDI library supports the following output devices: A VT100 terminal with a RETRO-GRAPHICS board installed, a VT240 using the Tektronix 4010 emulation capability, an SGI IRIS turbo using the native GL2 library, a Tektronix 4010, a Tektronix 4105, and the Tektronix 4014. ARCGRAPH version 7.0 was developed in 1988.
Castro-Alvarez, Alejandro; Carneros, Héctor; Sánchez, Dani; Vilarrasa, Jaume
2015-12-18
While B3LYP, M06-2X, and MP2 calculations predict the ΔG° values for exchange equilibria between enamines and ketones with similar acceptable accuracy, the M06-2X/6-311+G(d,p) and MP2/6-311+G(d,p) methods are required for enamine formation reactions (for example, for enamine 5a, arising from 3-methylbutanal and pyrrolidine). Stronger disagreement was observed when calculated energies of hemiaminals (N,O-acetals) and aminals (N,N-acetals) were compared with experimental equilibrium constants, which are reported here for the first time. Although it is known that the B3LYP method does not provide a good description of the London dispersion forces, while M06-2X and MP2 may overestimate them, it is shown here how large the gaps are and that at least single-point calculations at the CCSD(T)/6-31+G(d) level should be used for these reaction intermediates; CCSD(T)/6-31+G(d) and CCSD(T)/6-311+G(d,p) calculations afford ΔG° values in some cases quite close to MP2/6-311+G(d,p) while in others closer to M06-2X/6-311+G(d,p). The effect of solvents is similarly predicted by the SMD, CPCM, and IEFPCM approaches (with energy differences below 1 kcal/mol).
Geophysics of Small Planetary Bodies
NASA Technical Reports Server (NTRS)
Asphaug, Erik I.
1998-01-01
As a SETI Institute PI from 1996-1998, Erik Asphaug studied impact and tidal physics and other geophysical processes associated with small (low-gravity) planetary bodies. This work included: a numerical impact simulation linking basaltic achondrite meteorites to asteroid 4 Vesta (Asphaug 1997), which laid the groundwork for an ongoing study of Martian meteorite ejection; cratering and catastrophic evolution of small bodies (with implications for their internal structure; Asphaug et al. 1996); genesis of grooved and degraded terrains in response to impact; maturation of regolith (Asphaug et al. 1997a); and the variation of crater outcome with impact angle, speed, and target structure. Research of impacts into porous, layered and prefractured targets (Asphaug et al. 1997b, 1998a) showed how shape, rheology and structure dramatically affects sizes and velocities of ejecta, and the survivability and impact-modification of comets and asteroids (Asphaug et al. 1998a). As an affiliate of the Galileo SSI Team, the PI studied problems related to cratering, tectonics, and regolith evolution, including an estimate of the impactor flux around Jupiter and the effect of impact on local and regional tectonics (Asphaug et al. 1998b). Other research included tidal breakup modeling (Asphaug and Benz 1996; Schenk et al. 1996), which is leading to a general understanding of the role of tides in planetesimal evolution. As a Guest Computational Investigator for NASA's BPCC/ESS supercomputer testbed, helped graft SPH3D onto an existing tree code tuned for the massively parallel Cray T3E (Olson and Asphaug, in preparation), obtaining a factor xIO00 speedup in code execution time (on 512 cpus). Runs which once took months are now completed in hours.
Evaluation of advanced materials through experimental mechanics and modelling
NASA Technical Reports Server (NTRS)
Yang, Yii-Ching
1993-01-01
Composite materials have been frequently used in aerospace vehicles. Very often defects are inherited during the manufacture and damages are inherited during the construction and services. It becomes critical to understand the mechanical behavior of such composite structure before it can be further used. One good example of these composite structures is the cylindrical bottle of solid rocket motor case with accidental impact damages. Since the replacement of this cylindrical bottle is expensive, it is valuable to know how the damages affects the material, and how it can be repaired. To reach this goal, the damage must be characterized and the stress/strain field must be carefully analyzed. First the damage area, due to impact, is surveyed and identified with a shearography technique which uses the principle of speckle shearing interferometry to measure displacement gradient. Within the damage area of a composite laminate, such as the bottle of solid rocket motor case, all layers are considered to be degraded. Once a lamina being degraded the stiffness as well as strength will be drastically decreased. It becomes a critical area of failure to the whole bottle. And hence the stress/strain field within and around a damage should be accurately evaluated for failure prediction. To investigate the stress/strain field around damages a Hybrid-Numerical method which combines experimental measurement and finite element analysis is used. It is known the stress or strain at the singular point can not be accurately measured by an experimental technique. Nevertheless, if the location is far away from the singular spot, the displacement can be found accurately. Since it reflects the true displacement field locally regardless of the boundary conditions, it is an excellent input data for a finite element analysis to replace the usually assumed boundary conditions. Therefore, the Hybrid-Numerical method is chosen to avoid the difficulty and to take advantage of both experimental technique and finite element analysis. Experimentally, the digital image correlation technique is employed to measure the displacement field. It is done by comparing two digitized images, before and after loading. Numerically, the finite element program, ABAQUS (version 5.2), is used to analyze the stress and strain field. It takes advantage of the high speed and huge memory size of modern supercomputer, CRAY Y-MP, at NASA Marshall Space Flight Center.
Logistic model analysis of neurological findings in Minamata disease and the predicting index.
Nakagawa, Masanori; Kodama, Tomoko; Akiba, Suminori; Arimura, Kimiyoshi; Wakamiya, Junji; Futatsuka, Makoto; Kitano, Takao; Osame, Mitsuhiro
2002-01-01
To establish a statistical diagnostic method to identify patients with Minamata disease (MD) considering factors of aging and sex, we analyzed the neurological findings in MD patients, inhabitants in a methylmercury polluted (MP) area, and inhabitants in a non-MP area. We compared the neurological findings in MD patients and inhabitants aged more than 40 years in the non-MP area. Based on the different frequencies of the neurological signs in the two groups, we devised the following formula to calculate the predicting index for MD: predicting index = 1/(1+e(-x)) x 100 (The value of x was calculated using the regression coefficients of each neurological finding obtained from logistic analysis. The index 100 indicated MD, and 0, non-MD). Using this method, we found that 100% of male and 98% of female patients with MD (95 cases) gave predicting indices higher than 95. Five percent of the aged inhabitants in the MP area (598 inhabitants) and 0.2% of those in the non-MP area (558 inhabitants) gave predicting indices of 50 or higher. Our statistical diagnostic method for MD was useful in distinguishing MD patients from healthy elders based on their neurological findings.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rodríguez Guilbe, María M.; Protein Research and Development Center, University of Puerto Rico; Alfaro Malavé, Elisa C.
The genetically encoded fluorescent calcium-indicator protein GCaMP2 was crystallized in the calcium-saturated form. X-ray diffraction data were collected to 2.0 Å resolution and the structure was solved by molecular replacement. Fluorescent proteins and their engineered variants have played an important role in the study of biology. The genetically encoded calcium-indicator protein GCaMP2 comprises a circularly permuted fluorescent protein coupled to the calcium-binding protein calmodulin and a calmodulin target peptide, M13, derived from the intracellular calmodulin target myosin light-chain kinase and has been used to image calcium transients in vivo. To aid rational efforts to engineer improved variants of GCaMP2, thismore » protein was crystallized in the calcium-saturated form. X-ray diffraction data were collected to 2.0 Å resolution. The crystals belong to space group C2, with unit-cell parameters a = 126.1, b = 47.1, c = 68.8 Å, β = 100.5° and one GCaMP2 molecule in the asymmetric unit. The structure was phased by molecular replacement and refinement is currently under way.« less
Marianski, Mateusz; Oliva, Antoni
2012-01-01
We reevaluate the interaction of pyridine and p-benzoquinone using functionals designed to treat dispersion. We compare the relative energies of four different structures: stacked, T-shaped (identified for the first time) and two planar H-bonded geometries using these functionals (B97-D, ωB97x-D, M05, M05-2X, M06, M06L, M06-2X), other functionals (PBE1PBE, B3LYP, X3LYP), MP2 and CCSD(T) using basis sets as large as cc-pVTZ. The functionals designed to treat dispersion behave erratically as the predictions of the most stable structure vary considerably. MP2 predicts the experimentally observed structure (H-bonded) to be the least stable, while single point CCSD(T) at the MP2 optimized geometry correctly predicts the observed structure to be most stable. We have confirmed the assignment of the experimental structure using new calculations of the vibrational frequency shifts previously used to identify the structure. The MP2/cc-pVTZ vibrational calculations are in excellent agreement with the observations. All methods used to calculate the energies provide vibrational shifts that agree with the observed structure even though most do not predict this structure to be most stable. The implications for evaluating possible π-stacking in biologically important systems are discussed. PMID:22765283
Marianski, Mateusz; Oliva, Antoni; Dannenberg, J J
2012-08-02
We reevaluate the interaction of pyridine and p-benzoquinone using functionals designed to treat dispersion. We compare the relative energies of four different structures: stacked, T-shaped (identified for the first time), and two planar H-bonded geometries using these functionals (B97-D, ωB97x-D, M05, M05-2X, M06, M06L, and M06-2X), other functionals (PBE1PBE, B3LYP, X3LYP), MP2, and CCSD(T) using basis sets as large as cc-pVTZ. The functionals designed to treat dispersion behave erratically as the predictions of the most stable structure vary considerably. MP2 predicts the experimentally observed structure (H-bonded) to be the least stable, while single-point CCSD(T) at the MP2 optimized geometry correctly predicts the observed structure to be the most stable. We have confirmed the assignment of the experimental structure using new calculations of the vibrational frequency shifts previously used to identify the structure. The MP2/cc-pVTZ vibrational calculations are in excellent agreement with the observations. All methods used to calculate the energies provide vibrational shifts that agree with the observed structure even though most do not predict this structure to be most stable. The implications for evaluating possible π-stacking in biologically important systems are discussed.
Improved analysis of SP and CoSaMP under total perturbations
NASA Astrophysics Data System (ADS)
Li, Haifeng
2016-12-01
Practically, in the underdetermined model y= A x, where x is a K sparse vector (i.e., it has no more than K nonzero entries), both y and A could be totally perturbed. A more relaxed condition means less number of measurements are needed to ensure the sparse recovery from theoretical aspect. In this paper, based on restricted isometry property (RIP), for subspace pursuit (SP) and compressed sampling matching pursuit (CoSaMP), two relaxed sufficient conditions are presented under total perturbations to guarantee that the sparse vector x is recovered. Taking random matrix as measurement matrix, we also discuss the advantage of our condition. Numerical experiments validate that SP and CoSaMP can provide oracle-order recovery performance.
NASADIG - NASA DEVICE INDEPENDENT GRAPHICS LIBRARY (AMDAHL VERSION)
NASA Technical Reports Server (NTRS)
Rogers, J. E.
1994-01-01
The NASA Device Independent Graphics Library, NASADIG, can be used with many computer-based engineering and management applications. The library gives the user the opportunity to translate data into effective graphic displays for presentation. The software offers many features which allow the user flexibility in creating graphics. These include two-dimensional plots, subplot projections in 3D-space, surface contour line plots, and surface contour color-shaded plots. Routines for three-dimensional plotting, wireframe surface plots, surface plots with hidden line removal, and surface contour line plots are provided. Other features include polar and spherical coordinate plotting, world map plotting utilizing either cylindrical equidistant or Lambert equal area projection, plot translation, plot rotation, plot blowup, splines and polynomial interpolation, area blanking control, multiple log/linear axes, legends and text control, curve thickness control, and multiple text fonts (18 regular, 4 bold). NASADIG contains several groups of subroutines. Included are subroutines for plot area and axis definition; text set-up and display; area blanking; line style set-up, interpolation, and plotting; color shading and pattern control; legend, text block, and character control; device initialization; mixed alphabets setting; and other useful functions. The usefulness of many routines is dependent on the prior definition of basic parameters. The program's control structure uses a serial-level construct with each routine restricted for activation at some prescribed level(s) of problem definition. NASADIG provides the following output device drivers: Selanar 100XL, VECTOR Move/Draw ASCII and PostScript files, Tektronix 40xx, 41xx, and 4510 Rasterizer, DEC VT-240 (4014 mode), IBM AT/PC compatible with SmartTerm 240 emulator, HP Lasergrafix Film Recorder, QMS 800/1200, DEC LN03+ Laserprinters, and HP LaserJet (Series III). NASADIG is written in FORTRAN and is available for several platforms. NASADIG 5.7 is available for DEC VAX series computers running VMS 5.0 or later (MSC-21801), Cray X-MP and Y-MP series computers running UNICOS (COS-10049), and Amdahl 5990 mainframe computers running UTS (COS-10050). NASADIG 5.1 is available for UNIX-based operating systems (MSC-22001). The UNIX version has been successfully implemented on Sun4 series computers running SunOS, SGI IRIS computers running IRIX, Hewlett Packard 9000 computers running HP-UX, and Convex computers running Convex OS (MSC-22001). The standard distribution medium for MSC-21801 is a set of two 6250 BPI 9-track magnetic tapes in DEC VAX BACKUP format. It is also available on a set of two TK50 tape cartridges in DEC VAX BACKUP format. The standard distribution medium for COS-10049 and COS-10050 is a 6250 BPI 9-track magnetic tape in UNIX tar format. Other distribution media and formats may be available upon request. The standard distribution medium for MSC-22001 is a .25 inch streaming magnetic tape cartridge (Sun QIC-24) in UNIX tar format. Alternate distribution media and formats are available upon request. With minor modification, the UNIX source code can be ported to other platforms including IBM PC/AT series computers and compatibles. NASADIG is also available bundled with TRASYS, the Thermal Radiation Analysis System (COS-10026, DEC VAX version; COS-10040, CRAY version).
NASADIG - NASA DEVICE INDEPENDENT GRAPHICS LIBRARY (UNIX VERSION)
NASA Technical Reports Server (NTRS)
Rogers, J. E.
1994-01-01
The NASA Device Independent Graphics Library, NASADIG, can be used with many computer-based engineering and management applications. The library gives the user the opportunity to translate data into effective graphic displays for presentation. The software offers many features which allow the user flexibility in creating graphics. These include two-dimensional plots, subplot projections in 3D-space, surface contour line plots, and surface contour color-shaded plots. Routines for three-dimensional plotting, wireframe surface plots, surface plots with hidden line removal, and surface contour line plots are provided. Other features include polar and spherical coordinate plotting, world map plotting utilizing either cylindrical equidistant or Lambert equal area projection, plot translation, plot rotation, plot blowup, splines and polynomial interpolation, area blanking control, multiple log/linear axes, legends and text control, curve thickness control, and multiple text fonts (18 regular, 4 bold). NASADIG contains several groups of subroutines. Included are subroutines for plot area and axis definition; text set-up and display; area blanking; line style set-up, interpolation, and plotting; color shading and pattern control; legend, text block, and character control; device initialization; mixed alphabets setting; and other useful functions. The usefulness of many routines is dependent on the prior definition of basic parameters. The program's control structure uses a serial-level construct with each routine restricted for activation at some prescribed level(s) of problem definition. NASADIG provides the following output device drivers: Selanar 100XL, VECTOR Move/Draw ASCII and PostScript files, Tektronix 40xx, 41xx, and 4510 Rasterizer, DEC VT-240 (4014 mode), IBM AT/PC compatible with SmartTerm 240 emulator, HP Lasergrafix Film Recorder, QMS 800/1200, DEC LN03+ Laserprinters, and HP LaserJet (Series III). NASADIG is written in FORTRAN and is available for several platforms. NASADIG 5.7 is available for DEC VAX series computers running VMS 5.0 or later (MSC-21801), Cray X-MP and Y-MP series computers running UNICOS (COS-10049), and Amdahl 5990 mainframe computers running UTS (COS-10050). NASADIG 5.1 is available for UNIX-based operating systems (MSC-22001). The UNIX version has been successfully implemented on Sun4 series computers running SunOS, SGI IRIS computers running IRIX, Hewlett Packard 9000 computers running HP-UX, and Convex computers running Convex OS (MSC-22001). The standard distribution medium for MSC-21801 is a set of two 6250 BPI 9-track magnetic tapes in DEC VAX BACKUP format. It is also available on a set of two TK50 tape cartridges in DEC VAX BACKUP format. The standard distribution medium for COS-10049 and COS-10050 is a 6250 BPI 9-track magnetic tape in UNIX tar format. Other distribution media and formats may be available upon request. The standard distribution medium for MSC-22001 is a .25 inch streaming magnetic tape cartridge (Sun QIC-24) in UNIX tar format. Alternate distribution media and formats are available upon request. With minor modification, the UNIX source code can be ported to other platforms including IBM PC/AT series computers and compatibles. NASADIG is also available bundled with TRASYS, the Thermal Radiation Analysis System (COS-10026, DEC VAX version; COS-10040, CRAY version).
Ultraviolet, X-ray, and infrared observations of HDE 226868 equals Cygnus X-1
NASA Technical Reports Server (NTRS)
Treves, A.; Chiappetti, L.; Tanzi, E. G.; Tarenghi, M.; Gursky, H.; Dupree, A. K.; Hartmann, L. W.; Raymond, J.; Davis, R. J.; Black, J.
1980-01-01
During April, May, and July of 1978, HDE 226868, the optical counterpart of Cygnus X-1, was repeatedly observed in the ultraviolet with the IUE satellite. Some X-ray and infrared observations have been made during the same period. The general shape of the spectrum is that expected from a late O supergiant. Strong absorption features are apparent in the ultraviolet, some of which have been identified. The equivalent widths of the most prominent lines appear to be modulated with the orbital phase. This modulation is discussed in terms of the ionization contours calculated by Hatchett and McCray, for a binary X-ray source in the stellar wind of the companion.
High Performance Programming Using Explicit Shared Memory Model on the Cray T3D
NASA Technical Reports Server (NTRS)
Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)
1994-01-01
The Cray T3D is the first-phase system in Cray Research Inc.'s (CRI) three-phase massively parallel processing program. In this report we describe the architecture of the T3D, as well as the CRAFT (Cray Research Adaptive Fortran) programming model, and contrast it with PVM, which is also supported on the T3D We present some performance data based on the NAS Parallel Benchmarks to illustrate both architectural and software features of the T3D.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wasserman, H.J.
1996-02-01
The second generation of the Digital Equipment Corp. (DEC) DECchip Alpha AXP microprocessor is referred to as the 21164. From the viewpoint of numerically-intensive computing, the primary difference between it and its predecessor, the 21064, is that the 21164 has twice the multiply/add throughput per clock period (CP), a maximum of two floating point operations (FLOPS) per CP vs. one for 21064. The AlphaServer 8400 is a shared-memory multiprocessor server system that can accommodate up to 12 CPUs and up to 14 GB of memory. In this report we will compare single processor performance of the 8400 system with thatmore » of the International Business Machines Corp. (IBM) RISC System/6000 POWER-2 microprocessor running at 66 MHz, the Silicon Graphics, Inc. (SGI) MIPS R8000 microprocessor running at 75 MHz, and the Cray Research, Inc. CRAY J90. The performance comparison is based on a set of Fortran benchmark codes that represent a portion of the Los Alamos National Laboratory supercomputer workload. The advantage of using these codes, is that the codes also span a wide range of computational characteristics, such as vectorizability, problem size, and memory access pattern. The primary disadvantage of using them is that detailed, quantitative analysis of performance behavior of all codes on all machines is difficult. One important addition to the benchmark set appears for the first time in this report. Whereas the older version was written for a vector processor, the newer version is more optimized for microprocessor architectures. Therefore, we have for the first time, an opportunity to measure performance on a single application using implementations that expose the respective strengths of vector and superscalar architecture. All results in this report are from single processors. A subsequent article will explore shared-memory multiprocessing performance of the 8400 system.« less
Optics Program Modified for Multithreaded Parallel Computing
NASA Technical Reports Server (NTRS)
Lou, John; Bedding, Dave; Basinger, Scott
2006-01-01
A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
BioFVM: an efficient, parallelized diffusive transport solver for 3-D biological simulations
Ghaffarizadeh, Ahmadreza; Friedman, Samuel H.; Macklin, Paul
2016-01-01
Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can simulate release and uptake of many substrates by cell and bulk sources, diffusion and decay in large 3D domains. It has been parallelized with OpenMP, allowing efficient simulations on desktop workstations or single supercomputer nodes. The code is stable even for large time steps, with linear computational cost scalings. Solutions are first-order accurate in time and second-order accurate in space. The code can be run by itself or as part of a larger simulator. Availability and implementation: BioFVM is written in C ++ with parallelization in OpenMP. It is maintained and available for download at http://BioFVM.MathCancer.org and http://BioFVM.sf.net under the Apache License (v2.0). Contact: paul.macklin@usc.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26656933
Cao, Xu-Ni; Lin, Li; Zhou, Yu-Yan; Zhang, Wen; Shi, Guo-Yue; Yamamoto, Katsunobu; Jin, Li-Tong
2003-07-14
Microdialysis sampling coupled with liquid chromatography and electrochemical detection (LC-ECD) was developed and applied to study the interaction of 6-Mercaptopurine (6-MP) with bovine serum albumin (BSA). In the LC-ECD, the multi-wall carbon nanotubes fuctionalized with carboxylic groups modified electrode (MWNT-COOH CME) was used as the working electrode for the determination of 6-MP. The results indicated that this chemically modified electrode (CME) exhibited efficiently electrocatalytic oxidation for 6-MP with relatively high sensitivity, stability and long-life. The peak currents of 6-MP were linear to its concentrations ranging from 4.0 x 10(-7) to 1.0 x 10(-4) mol l(-1) with the calculated detection limit (S/N = 3) of 2.0 x 10(-7) mol l(-1). The method had been successfully applied to assess the association constant (K) and the number of the binding sites (n) on a BSA molecular, which calculated by Scatchard equation, were 3.97 x 10(3) mol(-1) l and 1.51, respectively. This method provided a fast, sensible and simple technique for the study of drug-protein interactions.
Improvements to the Unstructured Mesh Generator MESH3D
NASA Technical Reports Server (NTRS)
Thomas, Scott D.; Baker, Timothy J.; Cliff, Susan E.
1999-01-01
The AIRPLANE process starts with an aircraft geometry stored in a CAD system. The surface is modeled with a mesh of triangles and then the flow solver produces pressures at surface points which may be integrated to find forces and moments. The biggest advantage is that the grid generation bottleneck of the CFD process is eliminated when an unstructured tetrahedral mesh is used. MESH3D is the key to turning around the first analysis of a CAD geometry in days instead of weeks. The flow solver part of AIRPLANE has proven to be robust and accurate over a decade of use at NASA. It has been extensively validated with experimental data and compares well with other Euler flow solvers. AIRPLANE has been applied to all the HSR geometries treated at Ames over the course of the HSR program in order to verify the accuracy of other flow solvers. The unstructured approach makes handling complete and complex geometries very simple because only the surface of the aircraft needs to be discretized, i.e. covered with triangles. The volume mesh is created automatically by MESH3D. AIRPLANE runs well on multiple platforms. Vectorization on the Cray Y-MP is reasonable for a code that uses indirect addressing. Massively parallel computers such as the IBM SP2, SGI Origin 2000, and the Cray T3E have been used with an MPI version of the flow solver and the code scales very well on these systems. AIRPLANE can run on a desktop computer as well. AIRPLANE has a future. The unstructured technologies developed as part of the HSR program are now targeting high Reynolds number viscous flow simulation. The pacing item in this effort is Navier-Stokes mesh generation.
Flux Pinning Enhancement in YBa2Cu3O7-x Films for Coated Conductor Applications (Postprint)
2010-01-01
YBa2Cu3O7–x Films for Coated Conductor Applications Maiorov , B. , Civale , L. , Lin , Y. , Hawley , M.E. , Maley , M.P. , and Peterson , D.E...L. , Maiorov , B. , Hawley , M.E. , Maley , M.P. , and Peterson , D.E. ( 2004 ) Nat. Mater. , 3 , 439 . 30 Kang , S. , Goyal...1864 . 47 Civale , L. , Maiorov , B. , Serquis , A. , Willis , J.O. , Coulter , J.Y. , Wang , H. , Jia , Q.X. , Arendt , P.N
Kim, Dae-Yeon; Park, Hyun; Lee, Sang-Hwan; Koo, Namin; Kim, Jeong-Gyu
2009-04-01
We investigated the arsenate tolerance mechanisms of Oenothera odorata by comparing two populations [i.e., one population from the mine site (MP) and the other population from an uncontaminated site (UP)] via the exposure of hydroponic solution containing arsenate (i.e., 0-50 microM). The MP plants were significantly more tolerant to arsenate than UP plants. The UP plants accumulated more As in their shoots and roots than did the MP plants. The UP plants translocated up to 21 microg g(-1) of As into shoots, whereas MP plants translocated less As (up to 4.5 microg g(-1)) to shoots over all treatments. The results of lipid peroxidation indicated that MP plants were less damaged by oxidative stress than were UP plants. Phytochelatin (PC) content correlated linearly with root As concentration in the MP (i.e., [PCs](root)=1.69x[As](root), r(2)=0.945) and UP (i.e., [PCs](root)=0.89x[As](root), r(2)=0.979) plants. This relationship means that increased PC to As ratio may be associated with increased tolerance. Our results suggest that PC induction in roots plays a critical role in As tolerance of O. odorata.
Model potentials for main group elements Li through Rn
NASA Astrophysics Data System (ADS)
Sakai, Yoshiko; Miyoshi, Eisaku; Klobukowski, Mariusz; Huzinaga, Sigeru
1997-05-01
Model potential (MP) parameters and valence basis sets were systematically determined for the main group elements Li through Rn. For alkali and alkaline-earth metal atoms, the outermost core (n-1)p electrons were treated explicitly together with the ns valence electrons. For the remaining atoms, only the valence ns and np electrons were treated explicitly. The major relativistic effects at the level of Cowan and Griffin's quasi-relativistic Hartree-Fock method (QRHF) were incorporated in the MPs for all atoms heavier than Kr. The valence orbitals thus obtained have inner nodal structure. The reliability of the MP method was tested in calculations for X-, X, and X+ (X=Br, I, and At) at the SCF level and the results were compared with the corresponding values given by the numerical HF (or QRHF) calculations. Calculations that include electron correlation were done for X-, X, and X+ (X=Cl and Br) at the SDCI level and for As2 at the CASSCF and MRSDCI levels. These results were compared with those of all-electron (AE) calculations using the well-tempered basis sets. Close agreement between the MP and AE results was obtained at all levels of the treatment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique.more » We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.« less
Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines
NASA Technical Reports Server (NTRS)
Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.
1994-01-01
The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
The fully relativistic implementation of the convergent close-coupling method
NASA Astrophysics Data System (ADS)
Bostock, Christopher James
2011-04-01
The calculation of accurate excitation and ionization cross sections for electron collisions with atoms and ions plays a fundamental role in atomic and molecular physics, laser physics, x-ray spectroscopy, plasma physics and chemistry. Within the veil of plasma physics lie important research areas affiliated with the lighting industry, nuclear fusion and astrophysics. For high energy projectiles or targets with a large atomic number it is presently understood that a scattering formalism based on the Dirac equation is required to incorporate relativistic effects. This tutorial outlines the development of the relativistic convergent close-coupling (RCCC) method and highlights the following three main accomplishments. (i) The inclusion of the Breit interaction, a relativistic correction to the Coulomb potential, in the RCCC method. This led to calculations that resolved a discrepancy between theory and experiment for the polarization of x-rays emitted by highly charged hydrogen-like ions excited by electron impact (Bostock et al 2009 Phys. Rev. A 80 052708). (ii) The extension of the RCCC method to accommodate two-electron and quasi-two-electron targets. The method was applied to electron scattering from mercury. Accurate plasma physics modelling of mercury-based fluorescent lamps requires detailed information on a large number of electron impact excitation cross sections involving transitions between various states (Bostock et al 2010 Phys. Rev. A 82 022713). (iii) The third accomplishment outlined in this tutorial is the restructuring of the RCCC computer code to utilize a hybrid OpenMP-MPI parallelization scheme which now enables the RCCC code to run on the latest high performance supercomputer architectures.
PARVMEC: An Efficient, Scalable Implementation of the Variational Moments Equilibrium Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seal, Sudip K; Hirshman, Steven Paul; Wingen, Andreas
The ability to sustain magnetically confined plasma in a state of stable equilibrium is crucial for optimal and cost-effective operations of fusion devices like tokamaks and stellarators. The Variational Moments Equilibrium Code (VMEC) is the de-facto serial application used by fusion scientists to compute magnetohydrodynamics (MHD) equilibria and study the physics of three dimensional plasmas in confined configurations. Modern fusion energy experiments have larger system scales with more interactive experimental workflows, both demanding faster analysis turnaround times on computational workloads that are stressing the capabilities of sequential VMEC. In this paper, we present PARVMEC, an efficient, parallel version of itsmore » sequential counterpart, capable of scaling to thousands of processors on distributed memory machines. PARVMEC is a non-linear code, with multiple numerical physics modules, each with its own computational complexity. A detailed speedup analysis supported by scaling results on 1,024 cores of a Cray XC30 supercomputer is presented. Depending on the mode of PARVMEC execution, speedup improvements of one to two orders of magnitude are reported. PARVMEC equips fusion scientists for the first time with a state-of-theart capability for rapid, high fidelity analyses of magnetically confined plasmas at unprecedented scales.« less
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; ...
2017-09-14
In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
Dynamic load balancing for petascale quantum Monte Carlo applications: The Alias method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sudheer, C. D.; Krishnan, S.; Srinivasan, A.
Diffusion Monte Carlo is the most accurate widely used Quantum Monte Carlo method for the electronic structure of materials, but it requires frequent load balancing or population redistribution steps to maintain efficiency and avoid accumulation of systematic errors on parallel machines. The load balancing step can be a significant factor affecting performance, and will become more important as the number of processing elements increases. We propose a new dynamic load balancing algorithm, the Alias Method, and evaluate it theoretically and empirically. An important feature of the new algorithm is that the load can be perfectly balanced with each process receivingmore » at most one message. It is also optimal in the maximum size of messages received by any process. We also optimize its implementation to reduce network contention, a process facilitated by the low messaging requirement of the algorithm. Empirical results on the petaflop Cray XT Jaguar supercomputer at ORNL showing up to 30% improvement in performance on 120,000 cores. The load balancing algorithm may be straightforwardly implemented in existing codes. The algorithm may also be employed by any method with many near identical computational tasks that requires load balancing.« less
High-performance finite-difference time-domain simulations of C-Mod and ITER RF antennas
NASA Astrophysics Data System (ADS)
Jenkins, Thomas G.; Smithe, David N.
2015-12-01
Finite-difference time-domain methods have, in recent years, developed powerful capabilities for modeling realistic ICRF behavior in fusion plasmas [1, 2, 3, 4]. When coupled with the power of modern high-performance computing platforms, such techniques allow the behavior of antenna near and far fields, and the flow of RF power, to be studied in realistic experimental scenarios at previously inaccessible levels of resolution. In this talk, we present results and 3D animations from high-performance FDTD simulations on the Titan Cray XK7 supercomputer, modeling both Alcator C-Mod's field-aligned ICRF antenna and the ITER antenna module. Much of this work focuses on scans over edge density, and tailored edge density profiles, to study dispersion and the physics of slow wave excitation in the immediate vicinity of the antenna hardware and SOL. An understanding of the role of the lower-hybrid resonance in low-density scenarios is emerging, and possible implications of this for the NSTX launcher and power balance are also discussed. In addition, we discuss ongoing work centered on using these simulations to estimate sputtering and impurity production, as driven by the self-consistent sheath potentials at antenna surfaces.
High-performance finite-difference time-domain simulations of C-Mod and ITER RF antennas
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jenkins, Thomas G., E-mail: tgjenkins@txcorp.com; Smithe, David N., E-mail: smithe@txcorp.com
Finite-difference time-domain methods have, in recent years, developed powerful capabilities for modeling realistic ICRF behavior in fusion plasmas [1, 2, 3, 4]. When coupled with the power of modern high-performance computing platforms, such techniques allow the behavior of antenna near and far fields, and the flow of RF power, to be studied in realistic experimental scenarios at previously inaccessible levels of resolution. In this talk, we present results and 3D animations from high-performance FDTD simulations on the Titan Cray XK7 supercomputer, modeling both Alcator C-Mod’s field-aligned ICRF antenna and the ITER antenna module. Much of this work focuses on scansmore » over edge density, and tailored edge density profiles, to study dispersion and the physics of slow wave excitation in the immediate vicinity of the antenna hardware and SOL. An understanding of the role of the lower-hybrid resonance in low-density scenarios is emerging, and possible implications of this for the NSTX launcher and power balance are also discussed. In addition, we discuss ongoing work centered on using these simulations to estimate sputtering and impurity production, as driven by the self-consistent sheath potentials at antenna surfaces.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao
In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
A distributed-memory approximation algorithm for maximum weight perfect bipartite matching
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azad, Ariful; Buluc, Aydin; Li, Xiaoye S.
We design and implement an efficient parallel approximation algorithm for the problem of maximum weight perfect matching in bipartite graphs, i.e. the problem of finding a set of non-adjacent edges that covers all vertices and has maximum weight. This problem differs from the maximum weight matching problem, for which scalable approximation algorithms are known. It is primarily motivated by finding good pivots in scalable sparse direct solvers before factorization where sequential implementations of maximum weight perfect matching algorithms, such as those available in MC64, are widely used due to the lack of scalable alternatives. To overcome this limitation, we proposemore » a fully parallel distributed memory algorithm that first generates a perfect matching and then searches for weightaugmenting cycles of length four in parallel and iteratively augments the matching with a vertex disjoint set of such cycles. For most practical problems the weights of the perfect matchings generated by our algorithm are very close to the optimum. An efficient implementation of the algorithm scales up to 256 nodes (17,408 cores) on a Cray XC40 supercomputer and can solve instances that are too large to be handled by a single node using the sequential algorithm.« less
Thread-level parallelization and optimization of NWChem for the Intel MIC architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shan, Hongzhang; Williams, Samuel; de Jong, Wibe
In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant e ort was required to safely and efeciently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI+OpenMP hybrid implementations attain up to 65× better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6× better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less
Chang, C Y; Yuan, F G
2018-05-16
Guided wave dispersion curves in isotropic and anisotropic materials are extracted automatically from measured data by Matrix Pencil (MP) method investigating through k-t or x-ω domain with a broadband signal. A piezoelectric wafer emits a broadband excitation, linear chirp signal to generate guided waves in the plate. The propagating waves are measured at discrete locations along the lines for one-dimensional laser Doppler vibrometer (1-D LDV). Measurements are first Fourier transformed into either wavenumber-time k-t domain or space-frequency x-ω domain. MP method is then employed to extract the dispersion curves explicitly associated with different wave modes. In addition, the phase and group velocity are deduced by the relations between wavenumbers and frequencies. In this research, the inspections for dispersion relations on an aluminum plate by MP method from k-t or x-ω domain are demonstrated and compared with two-dimensional Fourier transform (2-D FFT). Other experiments on a thicker aluminum plate for higher modes and a composite plate are analyzed by MP method. Extracted relations of composite plate are confirmed by three-dimensional (3-D) theoretical curves computed numerically. The results explain that the MP method not only shows more accuracy for distinguishing the dispersion curves on isotropic material, but also obtains good agreements with theoretical curves on anisotropic and laminated materials. Copyright © 2018 Elsevier B.V. All rights reserved.
OPAL: An Open-Source MPI-IO Library over Cray XT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, Weikuan; Vetter, Jeffrey S; Canon, Richard Shane
Parallel IO over Cray XT is supported by a vendor-supplied MPI-IO package. This package contains a proprietary ADIO implementation built on top of the sysio library. While it is reasonable to maintain a stable code base for application scientists' convenience, it is also very important to the system developers and researchers to analyze and assess the effectiveness of parallel IO software, and accordingly, tune and optimize the MPI-IO implementation. A proprietary parallel IO code base relinquishes such flexibilities. On the other hand, a generic UFS-based MPI-IO implementation is typically used on many Linux-based platforms. We have developed an open-source MPI-IOmore » package over Lustre, referred to as OPAL (OPportunistic and Adaptive MPI-IO Library over Lustre). OPAL provides a single source-code base for MPI-IO over Lustre on Cray XT and Linux platforms. Compared to Cray implementation, OPAL provides a number of good features, including arbitrary specification of striping patterns and Lustre-stripe aligned file domain partitioning. This paper presents the performance comparisons between OPAL and Cray's proprietary implementation. Our evaluation demonstrates that OPAL achieves the performance comparable to the Cray implementation. We also exemplify the benefits of an open source package in revealing the underpinning of the parallel IO performance.« less
Numerical Methods for 2-Dimensional Modeling
1980-12-01
high-order finite element methods, and a multidimensional version of the method of lines, both utilizing an optimized stiff integrator for the time...integration. The finite element methods have proved disappointing, but the method of lines has provided an unexpectedly large gain in speed. Two...diffusion problems with the same number of unknowns (a 21 x 41 grid), solved by second-order finite element methods, took over seven minutes on the Cray-i
Three-Dimensional Analysis and Modeling of a Wankel Engine
NASA Technical Reports Server (NTRS)
Raju, M. S.; Willis, E. A.
1991-01-01
A new computer code, AGNI-3D, has been developed for the modeling of combustion, spray, and flow properties in a stratified-charge rotary engine (SCRE). The mathematical and numerical details of the new code are described by the first author in a separate NASA publication. The solution procedure is based on an Eulerian-Lagrangian approach where the unsteady, three-dimensional Navier-Stokes equations for a perfect gas-mixture with variable properties are solved in generalized, Eulerian coordinates on a moving grid by making use of an implicit finite-volume, Steger-Warming flux vector splitting scheme. The liquid-phase equations are solved in Lagrangian coordinates. The engine configuration studied was similar to existing rotary engine flow-visualization and hot-firing test rigs. The results of limited test cases indicate a good degree of qualitative agreement between the predicted and measured pressures. It is conjectured that the impulsive nature of the torque generated by the observed pressure nonuniformity may be one of the mechanisms responsible for the excessive wear of the timing gears observed during the early stages of the rotary combustion engine (RCE) development. It was identified that the turbulence intensities near top-dead-center were dominated by the compression process and only slightly influenced by the intake and exhaust processes. Slow mixing resulting from small turbulence intensities within the rotor pocket and also from a lack of formation of any significant recirculation regions within the rotor pocket were identified as the major factors leading to incomplete combustion. Detailed flowfield results during exhaust and intake, fuel injection, fuel vaporization, combustion, mixing and expansion processes are also presented. The numerical procedure is very efficient as it takes 7 to 10 CPU hours on a CRAY Y-MP for one entire engine cycle when the computations are performed over a 31 x16 x 20 grid.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Computational Research Division, Lawrence Berkeley National Laboratory; NERSC, Lawrence Berkeley National Laboratory; Computer Science Department, University of California, Berkeley
2009-05-04
We apply auto-tuning to a hybrid MPI-pthreads lattice Boltzmann computation running on the Cray XT4 at National Energy Research Scientific Computing Center (NERSC). Previous work showed that multicore-specific auto-tuning can improve the performance of lattice Boltzmann magnetohydrodynamics (LBMHD) by a factor of 4x when running on dual- and quad-core Opteron dual-socket SMPs. We extend these studies to the distributed memory arena via a hybrid MPI/pthreads implementation. In addition to conventional auto-tuning at the local SMP node, we tune at the message-passing level to determine the optimal aspect ratio as well as the correct balance between MPI tasks and threads permore » MPI task. Our study presents a detailed performance analysis when moving along an isocurve of constant hardware usage: fixed total memory, total cores, and total nodes. Overall, our work points to approaches for improving intra- and inter-node efficiency on large-scale multicore systems for demanding scientific applications.« less
ALEGRA -- A massively parallel h-adaptive code for solid dynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Summers, R.M.; Wong, M.K.; Boucheron, E.A.
1997-12-31
ALEGRA is a multi-material, arbitrary-Lagrangian-Eulerian (ALE) code for solid dynamics designed to run on massively parallel (MP) computers. It combines the features of modern Eulerian shock codes, such as CTH, with modern Lagrangian structural analysis codes using an unstructured grid. ALEGRA is being developed for use on the teraflop supercomputers to conduct advanced three-dimensional (3D) simulations of shock phenomena important to a variety of systems. ALEGRA was designed with the Single Program Multiple Data (SPMD) paradigm, in which the mesh is decomposed into sub-meshes so that each processor gets a single sub-mesh with approximately the same number of elements. Usingmore » this approach the authors have been able to produce a single code that can scale from one processor to thousands of processors. A current major effort is to develop efficient, high precision simulation capabilities for ALEGRA, without the computational cost of using a global highly resolved mesh, through flexible, robust h-adaptivity of finite elements. H-adaptivity is the dynamic refinement of the mesh by subdividing elements, thus changing the characteristic element size and reducing numerical error. The authors are working on several major technical challenges that must be met to make effective use of HAMMER on MP computers.« less
Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Turney, Raymond D.
2001-01-01
This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.
LFRic: Building a new Unified Model
NASA Astrophysics Data System (ADS)
Melvin, Thomas; Mullerworth, Steve; Ford, Rupert; Maynard, Chris; Hobson, Mike
2017-04-01
The LFRic project, named for Lewis Fry Richardson, aims to develop a replacement for the Met Office Unified Model in order to meet the challenges which will be presented by the next generation of exascale supercomputers. This project, a collaboration between the Met Office, STFC Daresbury and the University of Manchester, builds on the earlier GungHo project to redesign the dynamical core, in partnership with NERC. The new atmospheric model aims to retain the performance of the current ENDGame dynamical core and associated subgrid physics, while also enabling a far greater scalability and flexibility to accommodate future supercomputer architectures. Design of the model revolves around a principle of a 'separation of concerns', whereby the natural science aspects of the code can be developed without worrying about the underlying architecture, while machine dependent optimisations can be carried out at a high level. These principles are put into practice through the development of an autogenerated Parallel Systems software layer (known as the PSy layer) using a domain-specific compiler called PSyclone. The prototype model includes a re-write of the dynamical core using a mixed finite element method, in which different function spaces are used to represent the various fields. It is able to run in parallel with MPI and OpenMP and has been tested on over 200,000 cores. In this talk an overview of the both the natural science and computational science implementations of the model will be presented.
Do Some X-ray Stars Have White Dwarf Companions?
NASA Technical Reports Server (NTRS)
McCollum, Bruce
1995-01-01
Some Be stars which are intermittent C-ray sources may have white dwarf companions rather than neutron stars. It is not possible to prove or rule out the existence of Be+WD systems using X-ray or optical data. However, the presence of a white dwarf could be established by the detection of its EUV continuum shortward of the Be star's continuum turnover at 1OOOA. Either the detection or the nondetection of Be+WD systems would have implications for models of Be star variability, models of Be binary system formation and evolution, and models of wind-fed accretion.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maxwell, Don E; Ezell, Matthew A; Becklehimer, Jeff
While sites generally have systems in place to monitor the health of Cray computers themselves, often the cooling systems are ignored until a computer failure requires investigation into the source of the failure. The Liebert XDP units used to cool the Cray XE/XK models as well as the Cray proprietary cooling system used for the Cray XC30 models provide data useful for health monitoring. Unfortunately, this valuable information is often available only to custom solutions not accessible by a center-wide monitoring system or is simply ignored entirely. In this paper, methods and tools used to harvest the monitoring data availablemore » are discussed, and the implementation needed to integrate the data into a center-wide monitoring system at the Oak Ridge National Laboratory is provided.« less
Mixed polyanion glass cathodes: Glass-state conversion reactions
Kercher, Andrew K.; Kolopus, James A.; Carroll, Kyler; ...
2015-11-10
Mixed polyanion (MP) glasses can undergo glass-state conversion (GSC) reactions to provide an alternate class of high-capacity cathode materials. GSC reactions have been demonstrated in phosphate/vanadate glasses with Ag, Co, Cu, Fe, and Ni cations. These MP glasses provided high capacity and good high power performance, but suffer from moderate voltages, large voltage hysteresis, and significant capacity fade with cycling. Details of the GSC reaction have been revealed by x-ray absorption spectroscopy, electron microscopy, and energy dispersive x-ray spectroscopy of ex situ cathodes at key states of charge. Using the Open Quantum Materials Database (OQMD), a computational thermodynamic model hasmore » been developed to predict the near-equilibrium voltages of glass-state conversion reactions in MP glasses.« less
NASA Astrophysics Data System (ADS)
Xu, Chuanfu; Deng, Xiaogang; Zhang, Lilun; Fang, Jianbin; Wang, Guangxue; Jiang, Yi; Cao, Wei; Che, Yonggang; Wang, Yongxian; Wang, Zhenghua; Liu, Wei; Cheng, Xinghua
2014-12-01
Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU-GPU collaborative simulations that solve realistic CFD problems with both complex configurations and high-order schemes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Chuanfu, E-mail: xuchuanfu@nudt.edu.cn; Deng, Xiaogang; Zhang, Lilun
Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations formore » high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU–GPU collaborative simulations that solve realistic CFD problems with both complex configurations and high-order schemes.« less
Chakraborty, Somnath; Ghosh, Upasana; Balasubramanian, Thangavel; Das, Punyabrata
2014-01-01
Objective To screen, isolate and optimize anti-white spot syndrome virus (WSSV) drug derived from various marine floral ecosystems and to evaluate the efficacy of the same in host–pathogen interaction model. Methods Thirty species of marine plants were subjected to Soxhlet extraction using water, ethanol, methanol and hexane as solvents. The 120 plant isolates thus obtained were screened for their in vivo anti-WSSV property in Litopenaeus vannamei. By means of chemical processes, the purified anti-WSSV plant isolate, MP07X was derived. The drug was optimized at various concentrations. Viral and immune genes were analysed using reverse transcriptase PCR to confirm the potency of the drug. Results Nine plant isolates exhibited significant survivability in host. The drug MP07X thus formulated showing 85% survivability in host. The surviving shrimps were nested PCR negative at the end of the 15 d experimentation. The lowest concentration of MP07X required intramuscularly for virucidal property was 10 mg/mL. The oral dosage of 1 000 mg/kg body weight/day survived at the rate of 85%. Neither VP28 nor ie 1 was expressed in the test samples at 42nd hour and 84th hour post viral infection. Conclusions The drug MP07X derived from Rhizophora mucronata is a potent anti-WSSV drug. PMID:25183065
Rodríguez Guilbe, María M.; Alfaro Malavé, Elisa C.; Akerboom, Jasper; Marvin, Jonathan S.; Looger, Loren L.; Schreiter, Eric R.
2008-01-01
Fluorescent proteins and their engineered variants have played an important role in the study of biology. The genetically encoded calcium-indicator protein GCaMP2 comprises a circularly permuted fluorescent protein coupled to the calcium-binding protein calmodulin and a calmodulin target peptide, M13, derived from the intracellular calmodulin target myosin light-chain kinase and has been used to image calcium transients in vivo. To aid rational efforts to engineer improved variants of GCaMP2, this protein was crystallized in the calcium-saturated form. X-ray diffraction data were collected to 2.0 Å resolution. The crystals belong to space group C2, with unit-cell parameters a = 126.1, b = 47.1, c = 68.8 Å, β = 100.5° and one GCaMP2 molecule in the asymmetric unit. The structure was phased by molecular replacement and refinement is currently under way. PMID:18607093
The ASC Sequoia Programming Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seager, M
2008-08-06
In the late 1980's and early 1990's, Lawrence Livermore National Laboratory was deeply engrossed in determining the next generation programming model for the Integrated Design Codes (IDC) beyond vectorization for the Cray 1s series of computers. The vector model, developed in mid 1970's first for the CDC 7600 and later extended from stack based vector operation to memory to memory operations for the Cray 1s, lasted approximately 20 years (See Slide 5). The Cray vector era was deemed an extremely long lived era as it allowed vector codes to be developed over time (the Cray 1s were faster in scalarmore » mode than the CDC 7600) with vector unit utilization increasing incrementally over time. The other attributes of the Cray vector era at LLNL were that we developed, supported and maintained the Operating System (LTSS and later NLTSS), communications protocols (LINCS), Compilers (Civic Fortran77 and Model), operating system tools (e.g., batch system, job control scripting, loaders, debuggers, editors, graphics utilities, you name it) and math and highly machine optimized libraries (e.g., SLATEC, and STACKLIB). Although LTSS was adopted by Cray for early system generations, they later developed COS and UNICOS operating systems and environment on their own. In the late 1970s and early 1980s two trends appeared that made the Cray vector programming model (described above including both the hardware and system software aspects) seem potentially dated and slated for major revision. These trends were the appearance of low cost CMOS microprocessors and their attendant, departmental and mini-computers and later workstations and personal computers. With the wide spread adoption of Unix in the early 1980s, it appeared that LLNL (and the other DOE Labs) would be left out of the mainstream of computing without a rapid transition to these 'Killer Micros' and modern OS and tools environments. The other interesting advance in the period is that systems were being developed with multiple 'cores' in them and called Symmetric Multi-Processor or Shared Memory Processor (SMP) systems. The parallel revolution had begun. The Laboratory started a small 'parallel processing project' in 1983 to study the new technology and its application to scientific computing with four people: Tim Axelrod, Pete Eltgroth, Paul Dubois and Mark Seager. Two years later, Eugene Brooks joined the team. This team focused on Unix and 'killer micro' SMPs. Indeed, Eugene Brooks was credited with coming up with the 'Killer Micro' term. After several generations of SMP platforms (e.g., Sequent Balance 8000 with 8 33MHz MC32032s, Allian FX8 with 8 MC68020 and FPGA based Vector Units and finally the BB&N Butterfly with 128 cores), it became apparent to us that the killer micro revolution would indeed take over Crays and that we definitely needed a new programming and systems model. The model developed by Mark Seager and Dale Nielsen focused on both the system aspects (Slide 3) and the code development aspects (Slide 4). Although now succinctly captured in two attached slides, at the time there was tremendous ferment in the research community as to what parallel programming model would emerge, dominate and survive. In addition, we wanted a model that would provide portability between platforms of a single generation but also longevity over multiple--and hopefully--many generations. Only after we developed the 'Livermore Model' and worked it out in considerable detail did it become obvious that what we came up with was the right approach. In a nutshell, the applications programming model of the Livermore Model posited that SMP parallelism would ultimately not scale indefinitely and one would have to bite the bullet and implement MPI parallelism within the Integrated Design Code (IDC). We also had a major emphasis on doing everything in a completely standards based, portable methodology with POSIX/Unix as the target environment. We decided against specialized libraries like STACKLIB for performance, but kept as many general purpose, portable math libraries as were needed by the codes. Third, we assumed that the SMPs in clusters would evolve in time to become more powerful, feature rich and, in particular, offer more cores. Thus, we focused on OpenMP, and POSIX PThreads for programming SMP parallelism. These code porting efforts were lead by Dale Nielsen, A-Division code group leader, and Randy Christensen, B-Division code group leader. Most of the porting effort revolved removing 'Crayisms' in the codes: artifacts of LTSS/NLTSS, Civic compiler extensions beyond Fortran77, IO libraries and dealing with new code control languages (we switched to Perl and later to Python). Adding MPI to the codes was initially problematic and error prone because the programmers used MPI directly and sprinkled the calls throughout the code.« less
Belin, C; Schmitt, C; Demangeat, G; Komar, V; Pinck, L; Fuchs, M
2001-12-05
The nepovirus Grapevine fanleaf virus (GFLV) is specifically transmitted by the nematode Xiphinema index. To identify the RNA2-encoded proteins involved in X. index-mediated spread of GFLV, chimeric RNA2 constructs were engineered by replacing the 2A, 2B(MP), and/or 2C(CP) sequences of GFLV with their counterparts in Arabis mosaic virus (ArMV), a closely related nepovirus which is transmitted by Xiphinema diversicaudatum but not by X. index. Among the recombinant viruses obtained from transcripts of GFLV RNA1 and chimeric RNA2, only those which contained the 2C(CP) gene (504 aa) and 2B(MP) contiguous 9 C-terminal residues of GFLV were transmitted by X. index as efficiently as natural and synthetic wild-type GFLV, regardless of the origin of the 2A and 2B(MP) genes. As expected, ArMV was not transmitted probably because it is not retained by X. index. These results indicate that the determinants responsible for the specific spread of GFLV by X. index are located within the 513 C-terminal residues of the polyprotein encoded by RNA2. Copyright 2001 Elsevier Science.
Ab initio SCF study of the barrier to internal rotation in simple amides. Part 3. Thioamides
NASA Astrophysics Data System (ADS)
Vassilev, Nikolay G.; Dimitrov, Valentin S.
2003-06-01
The free energies of activation for rotation about the thiocarbonyl C-N bond in X-C(S)N(CH 3) 2 (X=H, F, Cl, CH 3, CF 3) were calculated at the MP2(fc)/6-31+G*//6-31G* and MP2(fc)/6-311++G**//6-311++G** levels and compared with literature NMR gas-phase data. The results of calculations indicate that the nonbonded interactions in ground state (GS) are mainly responsible for the differences in the rotational barriers. For X=H, CH 3 and CF 3, the anti transition state (TS) is more stable; for the case X=Cl, the syn TS is more stable, while for the X=F, the two TS are energetically almost equivalent.
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2002-01-01
The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Taft, James R.
1999-01-01
Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas
The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning for balancing computational work in pushing particlesmore » and in grid related work, scalable and accurate discretization algorithms for non-linear Coulomb collisions, and communication-avoiding subcycling technology for pushing particles on both CPUs and GPUs are also utilized to dramatically improve the scalability and time-to-solution, hence enabling the difficult kinetic ITER edge simulation on a present-day leadership class computer.« less
NASA Astrophysics Data System (ADS)
Vassilev, Nikolay G.; Dimitrov, Valentin S.
1999-06-01
Free energies of activation for rotation about the amide C-N bond in X-C(O)N(CH 3) 2 (X=H, F, Cl and Br) were calculated at the MP2(fc)/6-31+G*//6-31G* and MP2(fc)/6-311++G**//6-311++G** levels and compared with NMR gas-phase data. The results of calculations indicate that the repulsion between X and methyl group in ground state and the repulsion between X or oxygen and nitrogen lone pair in transition states (TS) are largely responsible for the difference in the free energies of the studied amides. For X=H (DMF), the anti TS is more stable; for the cases X=Cl, Br, the syn TS is more stable, while for the case X=F the two transition states are energetically almost equivalent.
NASA Technical Reports Server (NTRS)
2002-01-01
The life of the very small, whether in something as complicated as a human cell or as simple as a drop of water, is of fundamental scientific interest: By knowing how a tiny amount of material reacts to changes in its environment, scientists maybe able to answer questions about how a bulk of material would react to comparable changes. NASA is in the forefront of computational research into a broad range of basic scientific questions about fluid dynamics and the nature of liquid boundary instability. For example, one important issue for the space program is how drops of water and other materials will behave in the low-gravity environment of space and how the low gravity will affect the transport and containment of these materials. Accurate prediction of this behavior is among the aims of a set of molecular dynamics experiments carried out on the NCCSs Cray supercomputers. In conventional computational studies of materials, matter is treated as continuous - a macroscopic whole without regard to its molecular parts - and the behavior patterns of the matter in various physical environments are studied using well-established differential equations and mathematical parameters based on physical properties such as compressibility density, heat capacity, and vapor pressure of the bulk material.
DNA dynamics in aqueous solution: opening the double helix
NASA Technical Reports Server (NTRS)
Pohorille, A.; Ross, W. S.; Tinoco, I. Jr; MacElroy, R. D. (Principal Investigator)
1990-01-01
The opening of a DNA base pair is a simple reaction that is a prerequisite for replication, transcription, and other vital biological functions. Understanding the molecular mechanisms of biological reactions is crucial for predicting and, ultimately, controlling them. Realistic computer simulations of the reactions can provide the needed understanding. To model even the simplest reaction in aqueous solution requires hundreds of hours of supercomputing time. We have used molecular dynamics techniques to simulate fraying of the ends of a six base pair double strand of DNA, [TCGCGA]2, where the four bases of DNA are denoted by T (thymine), C (cytosine), G (guanine), and A (adenine), and to estimate the free energy barrier to this process. The calculations, in which the DNA was surrounded by 2,594 water molecules, required 50 hours of CRAY-2 CPU time for every simulated 100 picoseconds. A free energy barrier to fraying, which is mainly characterized by the movement of adenine away from thymine into aqueous environment, was estimated to be 4 kcal/mol. Another fraying pathway, which leads to stacking between terminal adenine and thymine, was also observed. These detailed pictures of the motions and energetics of DNA base pair opening in water are a first step toward understanding how DNA will interact with any molecule.
A practical approach to portability and performance problems on massively parallel supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beazley, D.M.; Lomdahl, P.S.
1994-12-08
We present an overview of the tactics we have used to achieve a high-level of performance while improving portability for a large-scale molecular dynamics code SPaSM. SPaSM was originally implemented in ANSI C with message passing for the Connection Machine 5 (CM-5). In 1993, SPaSM was selected as one of the winners in the IEEE Gordon Bell Prize competition for sustaining 50 Gflops on the 1024 node CM-5 at Los Alamos National Laboratory. Achieving this performance on the CM-5 required rewriting critical sections of code in CDPEAC assembler language. In addition, the code made extensive use of CM-5 parallel I/Omore » and the CMMD message passing library. Given this highly specialized implementation, we describe how we have ported the code to the Cray T3D and high performance workstations. In addition we will describe how it has been possible to do this using a single version of source code that runs on all three platforms without sacrificing any performance. Sound too good to be true? We hope to demonstrate that one can realize both code performance and portability without relying on the latest and greatest prepackaged tool or parallelizing compiler.« less
A compositional reservoir simulator on distributed memory parallel computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rame, M.; Delshad, M.
1995-12-31
This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
NASA Astrophysics Data System (ADS)
Sim, Jae-Hoon; Kim, Heung-Sik; Han, Myung Joon
2015-03-01
Using first-principles density functional theory (DFT) calculations, we investigated the electronic structure of Rh-doped iridate, Sr2Ir1-xRhxO4 for which the doping (x) dependent metal-insulator transition (MIT) has been reported experimentally and the controversial discussion developed regarding the origin of this transition. Our DFT+U calculation shows that the value of < L . S > remains largely intact over the entire doping range considered here (x = 0 . 0 , 0 . 125 , 0 . 25 , 0 . 50 , 0 . 75 , and 1 . 0) in good agreement with the branching ratio measured by x-ray absorption spectroscopy. Also contrary to a previous picture to explain MIT based on the charge transfer between the transition-metal sites, our calculation clearly shows that those sites remain basically isoelectronic while the impurity bands of predominantly rhodium character are introduced near the Fermi level. As the doping increases, this impurity band overlaps with lower Hubbard band of iridium, leading to metal-insulator transition. The results will be discussed with comparison to the case of Ru doping. Computational resources were suported by The National Institute of Supercomputing and Networking/Korea Institute of Science and Technology Information with supercomputing resources including technical spport (Grant No. KSC-2013-C2-23).
Hutz, Janna E; Nelson, Thomas; Wu, Hua; McAllister, Gregory; Moutsatsos, Ioannis; Jaeger, Savina A; Bandyopadhyay, Somnath; Nigsch, Florian; Cornett, Ben; Jenkins, Jeremy L; Selinger, Douglas W
2013-04-01
Screens using high-throughput, information-rich technologies such as microarrays, high-content screening (HCS), and next-generation sequencing (NGS) have become increasingly widespread. Compared with single-readout assays, these methods produce a more comprehensive picture of the effects of screened treatments. However, interpreting such multidimensional readouts is challenging. Univariate statistics such as t-tests and Z-factors cannot easily be applied to multidimensional profiles, leaving no obvious way to answer common screening questions such as "Is treatment X active in this assay?" and "Is treatment X different from (or equivalent to) treatment Y?" We have developed a simple, straightforward metric, the multidimensional perturbation value (mp-value), which can be used to answer these questions. Here, we demonstrate application of the mp-value to three data sets: a multiplexed gene expression screen of compounds and genomic reagents, a microarray-based gene expression screen of compounds, and an HCS compound screen. In all data sets, active treatments were successfully identified using the mp-value, and simulations and follow-up analyses supported the mp-value's statistical and biological validity. We believe the mp-value represents a promising way to simplify the analysis of multidimensional data while taking full advantage of its richness.
Tuning HDF5 subfiling performance on parallel file systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Byna, Suren; Chaarawi, Mohamad; Koziol, Quincey
Subfiling is a technique used on parallel file systems to reduce locking and contention issues when multiple compute nodes interact with the same storage target node. Subfiling provides a compromise between the single shared file approach that instigates the lock contention problems on parallel file systems and having one file per process, which results in generating a massive and unmanageable number of files. In this paper, we evaluate and tune the performance of recently implemented subfiling feature in HDF5. In specific, we explain the implementation strategy of subfiling feature in HDF5, provide examples of using the feature, and evaluate andmore » tune parallel I/O performance of this feature with parallel file systems of the Cray XC40 system at NERSC (Cori) that include a burst buffer storage and a Lustre disk-based storage. We also evaluate I/O performance on the Cray XC30 system, Edison, at NERSC. Our results show performance benefits of 1.2X to 6X performance advantage with subfiling compared to writing a single shared HDF5 file. We present our exploration of configurations, such as the number of subfiles and the number of Lustre storage targets to storing files, as optimization parameters to obtain superior I/O performance. Based on this exploration, we discuss recommendations for achieving good I/O performance as well as limitations with using the subfiling feature.« less
de O Barsottini, Mario R; de Oliveira, Juliana F; Adamoski, Douglas; Teixeira, Paulo J P L; do Prado, Paula F V; Tiezzi, Henrique O; Sforça, Mauricio L; Cassago, Alexandre; Portugal, Rodrigo V; de Oliveira, Paulo S L; de M Zeri, Ana C; Dias, Sandra M G; Pereira, Gonçalo A G; Ambrosio, Andre L B
2013-11-01
Cerato-platanins (CP) are small, cysteine-rich fungal-secreted proteins involved in the various stages of the host-fungus interaction process, acting as phytotoxins, elicitors, and allergens. We identified 12 CP genes (MpCP1 to MpCP12) in the genome of Moniliophthora perniciosa, the causal agent of witches' broom disease in cacao, and showed that they present distinct expression profiles throughout fungal development and infection. We determined the X-ray crystal structures of MpCP1, MpCP2, MpCP3, and MpCP5, representative of different branches of a phylogenetic tree and expressed at different stages of the disease. Structure-based biochemistry, in combination with nuclear magnetic resonance and mass spectrometry, allowed us to define specialized capabilities regarding self-assembling and the direct binding to chitin and N-acetylglucosamine (NAG) tetramers, a fungal cell wall building block, and to map a previously unknown binding region in MpCP5. Moreover, fibers of MpCP2 were shown to act as expansin and facilitate basidiospore germination whereas soluble MpCP5 blocked NAG6-induced defense response. The correlation between these roles, the fungus life cycle, and its tug-of-war interaction with cacao plants is discussed.
Synthesis and characterization of a series of Group 4 phenoxy-thiol derivatives
Boyle, Timothy J.; Neville, Michael L.; Parkes, Marie V.
2016-02-11
In this study, a series of Group 4 phenoxy-thiols were developed from the reaction products of a series of metal tert-butoxides ([M(OBu t) 4]) with four equivalents of 4-mercaptophenol (H-4MP). The products were found by single crystal X-ray diffraction to adopt the general structure [(HOBu t)(4MP) 3M(μ-4MP)] 2 [where M = Ti (1), Zr (2), Hf (3)] from toluene and [(py) 2M(4MP)] where M = Ti (4), Zr (5) and [(py)(4MP) 3Hf(μ-4MP)] 2 (6) from pyridine (py). Varying the [Ti(OR) 4] precursors (OR = iso-propoxide (OPr i) or neo-pentoxide (ONep)) in toluene led to [(HOR)(4MP) 3Ti(μ-4MP)] 2 (OR = OPrmore » i (7), ONep (8)), which were structurally similar to 1. Lower stoichiometric reactions in toluene led to partial substitution by the 4MP ligands yielding [H][Ti(μ-4MP)(4MP)(ONep) 3] 2 (9). Independent of the stoichiometry, all of the Ti derivatives were found to be red in color, whereas the heavier congeners were colorless. Attempts to understand this phenomenon led to investigation with a series of varied –SH substituted phenols. From the reaction of H-2MP and H-3MP (2-mercaptophenol and 3-mercaptophenol, respectively), the isolated products had identical arrangements: [(ONep) 2(2MP)Ti(μ,η2-2MP)] 2 (10) and [(HOR)(3MP)M(μ-3MP)] 2 (M/OR = Ti/ONep (11); Zr/OBu t (12)) with a similar red color. Based on the simulated and observed UV–Vis spectra, it was reasoned that the color was generated due to a ligand-to-metal charge transfer for Ti that was not available for the larger congeners.« less
Wang, Linda; Bim, Odair; Lopes, Adolfo Coelho de Oliveira; Francisconi-Dos-Rios, Luciana Fávaro; Maenosono, Rafael Massunari; D'Alpino, Paulo Henrique Perlatti; Honório, Heitor Marques; Atta, Maria Teresa
2016-01-01
This study investigated the effect of the fluorescent dye rhodamine B (RB) for interfacial micromorphology analysis of dental composite restorations on water sorption/solubility (WS/WSL) and microtensile bond strength to dentin (µTBS) of a 3-step total etch and a 2-step self-etch adhesive system. The adhesives Adper Scotchbond Multi-Purpose (MP) and Clearfil SE Bond (SE) were mixed with 0.1 mg/mL of RB. For the WS/WSL tests, cured resin disks (5.0 mm in diameter x 0.8 mm thick) were prepared and assigned into four groups (n=10): MP, MP-RB, SE, and SE-RB. For µTBS assessment, extracted human third molars (n=40) had the flat occlusal dentin prepared and assigned into the same experimental groups (n=10). After the bonding and restoration procedures, specimens were sectioned in rectangular beams, stored in water and tested after seven days or after 12 months. The failure mode of fractured specimens was qualitatively evaluated under optical microscope (x40). Data from WS/WSL and µTBS were assessed by one-way and three-way ANOVA, respectively, and Tukey's test (α=5%). RB increased the WSL of MP and SE. On the other hand, WS of both MP and SE was not affected by the addition of RB. No significance in µTBS between MP and MP-RB for seven days or one year was observed, whereas for SE a decrease in the µTBS means occurred in both storage times. RB should be incorporated into non-simplified DBSs with caution, as it can interfere with their physical-mechanical properties, leading to a possible misinterpretation of bonded interface.
WANG, Linda; BIM, Odair; LOPES, Adolfo Coelho de Oliveira; FRANCISCONI-DOS-RIOS, Luciana Fávaro; MAENOSONO, Rafael Massunari; D’ALPINO, Paulo Henrique Perlatti; HONÓRIO, Heitor Marques; ATTA, Maria Teresa
2016-01-01
ABSTRACT Objective This study investigated the effect of the fluorescent dye rhodamine B (RB) for interfacial micromorphology analysis of dental composite restorations on water sorption/solubility (WS/WSL) and microtensile bond strength to dentin (µTBS) of a 3-step total etch and a 2-step self-etch adhesive system. Material and Methods The adhesives Adper Scotchbond Multi-Purpose (MP) and Clearfil SE Bond (SE) were mixed with 0.1 mg/mL of RB. For the WS/WSL tests, cured resin disks (5.0 mm in diameter x 0.8 mm thick) were prepared and assigned into four groups (n=10): MP, MP-RB, SE, and SE-RB. For µTBS assessment, extracted human third molars (n=40) had the flat occlusal dentin prepared and assigned into the same experimental groups (n=10). After the bonding and restoration procedures, specimens were sectioned in rectangular beams, stored in water and tested after seven days or after 12 months. The failure mode of fractured specimens was qualitatively evaluated under optical microscope (x40). Data from WS/WSL and µTBS were assessed by one-way and three-way ANOVA, respectively, and Tukey’s test (α=5%). Results RB increased the WSL of MP and SE. On the other hand, WS of both MP and SE was not affected by the addition of RB. No significance in µTBS between MP and MP-RB for seven days or one year was observed, whereas for SE a decrease in the µTBS means occurred in both storage times. Conclusions RB should be incorporated into non-simplified DBSs with caution, as it can interfere with their physical-mechanical properties, leading to a possible misinterpretation of bonded interface. PMID:27556201
An Evaluation of One-Sided and Two-Sided Communication Paradigms on Relaxed-Ordering Interconnect
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ibrahim, Khaled Z.; Hargrove, Paul H.; Iancu, Costin
The Cray Gemini interconnect hardware provides multiple transfer mechanisms and out-of-order message delivery to improve communication throughput. In this paper we quantify the performance of one-sided and two-sided communication paradigms with respect to: 1) the optimal available hardware transfer mechanism, 2) message ordering constraints, 3) per node and per core message concurrency. In addition to using Cray native communication APIs, we use UPC and MPI micro-benchmarks to capture one- and two-sided semantics respectively. Our results indicate that relaxing the message delivery order can improve performance up to 4.6x when compared with strict ordering. When hardware allows it, high-level one-sided programmingmore » models can already take advantage of message reordering. Enforcing the ordering semantics of two-sided communication comes with a performance penalty. Furthermore, we argue that exposing out-of-order delivery at the application level is required for the next-generation programming models. Any ordering constraints in the language specifications reduce communication performance for small messages and increase the number of active cores required for peak throughput.« less
Gigabit ATM: another technical mistake?
NASA Astrophysics Data System (ADS)
Christ, Paul
1998-09-01
Once upon a time, or more precisely during February 1988 at the CCITT Seoul plenary, and definitely arriving as a revolution, ATM hit the hard-core B-ISDN circuit-switching gang. Initiated by the Telecoms' camp, but, surprisingly, soon to be pushed by computer minded people, ATM's generic technological history is somewhat richer than single-sided stories. Here are two classical elements of that history: Firstly, together with X.25, ATM suffers from the connection versus datagram dichotomy, well known for more than twenty years. Secondly, and lesser known, ATM's use of cells in support of the 'I' of B-ISDN was questioned from the very beginning by the packet switching camp. Furthermore, in this context, there are two other essential elements to be considered: Firstly, the exponential growth of the Internet and later intranets, using Internet technology, sparked by the success of the Web and the WINTEL alliance, resulted in a corresponding demand for both aggregate and end-system network bandwidth. Secondly, servers, historically restricted to the exclusive club of HIPPI-equipped supercomputers, suddenly become ordinary high-end PCs with 64-bit wide PCI busses -- definitely aiming at the Gigabit. Here, if your aim is for Gigabit ATM with 5000-transactions per second classical supercomputers, a 65K ATM MTU -- as implemented by Cray -- might be okay. Following Clark and others, another part of the story is the adoption and redefinition, by the IETF, of the Telecoms' notion of 'Integrated Services' and QoS mechanisms. The quest for low-delay IP packet forwarding, perhaps possible over ATM cut-throughs, has resulted in the switching versus/or integrated-with-routing movement. However, a blow for ATM may be the recent results concerning fast routing table lookup algorithms. This, by making Gigabit routing possible using ordinary Pentium processors may eventually render the much prophesized ATM switching performance unnecessary. Recently, with the rise of Gigabit Ethernet, many of the elements mentioned above are now being presented by standard 'Gigabit Ethernet and Gigabit ATM -- friends or foes' conferences. In- depth analyses are given concerning the canonical elements of such a setting: legacy, new use requirements, manageability, security LAN-WAN, architectures, standards, technologies and products, complexity, evolution-transition strategies, manufacturers, player organizations etc. Often in such conferences, Fiber Channel, being one of Gigabit Ethernet's physical media, is presented as the only other Gigabit LAN technology. In an attempt to sum up: Given the present state of ATM deployment measured in terms of functionalities and sophistication, after ten years of CCCITT/ITU and almost as many years of ATM Forum effort, does the question still being asked now represent the answer -- ATM is or was a mistake there some elements still missing? Here's a technical and a political example:
Yu, Jen-Shiang K; Yu, Chin-Hui
2002-01-01
One of the most frequently used packages for electronic structure research, GAUSSIAN 98, is compiled on Linux systems with various hardware configurations, including AMD Athlon (with the "Thunderbird" core), AthlonMP, and AthlonXP (with the "Palomino" core) systems as well as the Intel Pentium 4 (with the "Willamette" core) machines. The default PGI FORTRAN compiler (pgf77) and the Intel FORTRAN compiler (ifc) are respectively employed with different architectural optimization options to compile GAUSSIAN 98 and test the performance improvement. In addition to the BLAS library included in revision A.11 of this package, the Automatically Tuned Linear Algebra Software (ATLAS) library is linked against the binary executables to improve the performance. Various Hartree-Fock, density-functional theories, and the MP2 calculations are done for benchmarking purposes. It is found that the combination of ifc with ATLAS library gives the best performance for GAUSSIAN 98 on all of these PC-Linux computers, including AMD and Intel CPUs. Even on AMD systems, the Intel FORTRAN compiler invariably produces binaries with better performance than pgf77. The enhancement provided by the ATLAS library is more significant for post-Hartree-Fock calculations. The performance on one single CPU is potentially as good as that on an Alpha 21264A workstation or an SGI supercomputer. The floating-point marks by SpecFP2000 have similar trends to the results of GAUSSIAN 98 package.
Ice-sheet modelling accelerated by graphics cards
NASA Astrophysics Data System (ADS)
Brædstrup, Christian Fredborg; Damsgaard, Anders; Egholm, David Lundbek
2014-11-01
Studies of glaciers and ice sheets have increased the demand for high performance numerical ice flow models over the past decades. When exploring the highly non-linear dynamics of fast flowing glaciers and ice streams, or when coupling multiple flow processes for ice, water, and sediment, researchers are often forced to use super-computing clusters. As an alternative to conventional high-performance computing hardware, the Graphical Processing Unit (GPU) is capable of massively parallel computing while retaining a compact design and low cost. In this study, we present a strategy for accelerating a higher-order ice flow model using a GPU. By applying the newest GPU hardware, we achieve up to 180× speedup compared to a similar but serial CPU implementation. Our results suggest that GPU acceleration is a competitive option for ice-flow modelling when compared to CPU-optimised algorithms parallelised by the OpenMP or Message Passing Interface (MPI) protocols.
BCYCLIC: A parallel block tridiagonal matrix cyclic solver
NASA Astrophysics Data System (ADS)
Hirshman, S. P.; Perumalla, K. S.; Lynch, V. E.; Sanchez, R.
2010-09-01
A block tridiagonal matrix is factored with minimal fill-in using a cyclic reduction algorithm that is easily parallelized. Storage of the factored blocks allows the application of the inverse to multiple right-hand sides which may not be known at factorization time. Scalability with the number of block rows is achieved with cyclic reduction, while scalability with the block size is achieved using multithreaded routines (OpenMP, GotoBLAS) for block matrix manipulation. This dual scalability is a noteworthy feature of this new solver, as well as its ability to efficiently handle arbitrary (non-powers-of-2) block row and processor numbers. Comparison with a state-of-the art parallel sparse solver is presented. It is expected that this new solver will allow many physical applications to optimally use the parallel resources on current supercomputers. Example usage of the solver in magneto-hydrodynamic (MHD), three-dimensional equilibrium solvers for high-temperature fusion plasmas is cited.
Transitioning NWChem to the Next Generation of Manycore Machines
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bylaska, Eric J.; Apra, Edoardo; Kowalski, Karol
The NorthWest Chemistry (NWChem) modeling software is a popular molecular chemistry simulation software that was designed from the start to work on massively parallel processing supercomputers[6, 28, 49]. It contains an umbrella of modules that today includes Self Consistent Field (SCF), second order Mller-Plesset perturbation theory (MP2), Coupled Cluster, multi-conguration selfconsistent eld (MCSCF), selected conguration interaction (CI), tensor contraction engine (TCE) many body methods, density functional theory (DFT), time-dependent density functional theory (TDDFT), real time time-dependent density functional theory, pseudopotential plane-wave density functional theory (PSPW), band structure (BAND), ab initio molecular dynamics, Car-Parrinello molecular dynamics, classical molecular dynamics (MD), QM/MM,more » AIMD/MM, GIAO NMR, COSMO, COSMO-SMD, and RISM solvation models, free energy simulations, reaction path optimization, parallel in time, among other capabilities[ 22]. Moreover new capabilities continue to be added with each new release.« less
Automation of Data Traffic Control on DSM Architecture
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry
2001-01-01
The design of distributed shared memory (DSM) computers liberates users from the duty to distribute data across processors and allows for the incremental development of parallel programs using, for example, OpenMP or Java threads. DSM architecture greatly simplifies the development of parallel programs having good performance on a few processors. However, to achieve a good program scalability on DSM computers requires that the user understand data flow in the application and use various techniques to avoid data traffic congestions. In this paper we discuss a number of such techniques, including data blocking, data placement, data transposition and page size control and evaluate their efficiency on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks. We also present a tool which automates the detection of constructs causing data congestions in Fortran array oriented codes and advises the user on code transformations for improving data traffic in the application.
VizieR Online Data Catalog: ChaMP. I. First X-ray source catalog (Kim+, 2004)
NASA Astrophysics Data System (ADS)
Kim, D.-W.; Cameron, R. A.; Drake, J. J.; Evans, N. R.; Freeman, P.; Gaetz, T. J.; Ghosh, H.; Green, P. J.; Harnden, F. R. Jr; Karovska, M.; Kashyap, V.; Maksym, P. W.; Ratzlaff, P. W.; Schlegel, E. M.; Silverman, J. D.; Tananbaum, H. D.; Vikhlinin, A. A.; Wilkes, B. J.; Grimes, J. P.
2004-01-01
The Chandra Multiwavelength Project (ChaMP) is a wide-area (~14deg2 < survey of serendipitous Chandra X-ray sources, aiming to establish fair statistical samples covering a wide range of characteristics (such as absorbed active galactic nuclei, high-z clusters of galaxies) at flux levels (fX~10-15 to 10-14erg/s/cm2) ) intermediate between the Chandra deep surveys and previous missions. We present the first ChaMP catalog, which consists of 991 near on-axis, bright X-ray sources obtained from the initial sample of 62 observations. The data have been uniformly reduced and analyzed with techniques specifically developed for the ChaMP and then validated by visual examination. To assess source reliability and positional uncertainty, we perform a series of simulations and also use Chandra data to complement the simulation study. The false source detection rate is found to be as good as or better than expected for a given limiting threshold. On the other hand, the chance of missing a real source is rather complex, depending on the source counts, off-axis distance (or PSF), and background rate. The positional error (95% confidence level) is usually less than 1" for a bright source, regardless of its off-axis distance, while it can be as large as 4" for a weak source (~20counts) at a large off-axis distance (Doff-axis>8'). We have also developed new methods to find spatially extended or temporary variable sources, and those sources are listed in the catalog. (5 data files).
Ali, S Tahir; Antonov, Liudmil; Fabian, Walter M F
2014-01-30
Tautomerization energies of a series of isomeric [(4-R-phenyl)azo]naphthols and the analogous Schiff bases (R = N(CH3)2, OCH3, H, CN, NO2) are calculated by LPNO-CEPA/1-CBS using the def2-TZVPP and def2-QZVPP basis sets for extrapolation. The performance of various density functionals (B3LYP, M06-2X, PW6B95, B2PLYP, mPW2PLYP, PWPB95) as well as MP2 and SCS-MP2 is evaluated against these results. M06-2X and SCS-MP2 yield results close to the LPNO-CEPA/1-CBS values. Solvent effects (CCl4, CHCl3, CH3CN, and CH3OH) are treated by a variety of bulk solvation models (SM8, IEFPCM, COSMO, PBF, and SMD) as well as explicit solvation (Monte Carlo free energy perturbation using the OPLSAA force field).
Schmiegelow, K; Bretton-Meyer, U
2001-01-01
Through inhibition of purine de novo synthesis and enhancement of 6-mercaptopurine (6MP) bioavailability high-dose methotrexate (HDM) may increase the incorporation into DNA of 6-thioguanine nucleotides (6TGN), the cytoxic metabolites of 6MP. Thus, coadministration of 6MP could increase myelotoxicity following HDM. Twenty-one children with standard risk (SR) and 25 with intermediate risk (IR) acute lymphoblastic leukemia (ALL) were studied. During consolidation therapy they received either three courses of HDM at 2 week intervals without concurrent oral 6MP (SR-ALL) or four courses of HDM given at 2 week intervals with 25 mg/m2 of oral 6MP daily (IR-ALL). During the first year of maintenance with oral 6MP (75 mg/m2/day) and oral MTX (20 mg/m2/week) they all received five courses of HDM at 8 week intervals. In all cases, HDM consisted of 5,000 mg of MTX/m2 given over 24 h with intraspinal MTX and leucovorin rescue. Erythrocyte levels of 6TGN (E-6TGN) and methotrexate (E-MTX) were, on average, measured every second week during maintenance therapy. When SR consolidation (6MP: 0 mg), IR consolidation (6MP: 25 mg/m2), and SR/IR maintenance therapy (6MP: 75 mg/m2) were compared, white cell and absolute neutrophil count (ANC) nadir, lymphocyte count nadir, thrombocyte count nadir, and hemoglobin nadir after HDM decreased significantly with increasing doses of oral 6MP. Three percent of the HDM courses given without oral 6MP (SR consolidation) were followed by an ANC nadir <0.5 x 10(9)/l compared to 50% of the HDM courses given during SR/IR maintenance therapy. Similarly, only 13% of the HDM courses given as SR-ALL consolidation induced a thrombocyte count nadir <100 x 10(9)/l compared to 58% of the HDM courses given during maintenance therapy. The best-fit model to predict the ANC nadir following HDM during maintenance therapy included the dose of 6MP prior to HDM (beta = -0.017, P= 0.001), the average ANC level during maintenance therapy (beta = 0.82, P = 0.004), and E-6TGN (beta = -0.0029, P= 0.02). The best-fit model to predict the thrombocyte nadir following HDM during maintenance therapy included only mPLATE (beta = 0.0057, P = 0.046). In conclusion, the study indicates that reductions of the dose of concurrently given oral 6MP could be one way of reducing the risk of significant myelotoxicity following HDM during maintenance therapy of childhood ALL.
United Information Services, Inc. , CRAY 1-s/2000, FORTRAN CFT 1. 10. Validation summary report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1983-12-13
This Validation Summary Report (VSR) for the United Information Services, Inc., FORTRAN CFT 1.10 running under the COS Level C12 1.11 provides a consolidated summary of the results obtained from the validation of the subject compiler against the 1978 FORTRAN Standard (X3.9-1978/FIPS PUB 69). The compiler was validated against the Full Level FORTRAN level of FIPS PUB 69. The VSR is made up of several sections showing all the discrepancies found -if any. These include an overview of the validation which lists all categories of discrepancies within X3.9-1978, and a detailed listing of discrepancies together with the tests which failed.
Multitasking a three-dimensional Navier-Stokes algorithm on the Cray-2
NASA Technical Reports Server (NTRS)
Swisshelm, Julie M.
1989-01-01
A three-dimensional computational aerodynamics algorithm has been multitasked for efficient parallel execution on the Cray-2. It provides a means for examining the multitasking performance of a complete CFD application code. An embedded zonal multigrid scheme is used to solve the Reynolds-averaged Navier-Stokes equations for an internal flow model problem. The explicit nature of each component of the method allows a spatial partitioning of the computational domain to achieve a well-balanced task load for MIMD computers with vector-processing capability. Experiments have been conducted with both two- and three-dimensional multitasked cases. The best speedup attained by an individual task group was 3.54 on four processors of the Cray-2, while the entire solver yielded a speedup of 2.67 on four processors for the three-dimensional case. The multiprocessing efficiency of various types of computational tasks is examined, performance on two Cray-2s with different memory access speeds is compared, and extrapolation to larger problems is discussed.
Quantum Mechanics Approach to Hydration Energies and Structures of Alanine and Dialanine.
Lanza, Giuseppe; Chiacchio, Maria A
2017-06-20
A systematic approach to the phenomena related to hydration of biomolecules is reported at the state of the art of electronic-structure methods. Large-scale CCSD(T), MP4-SDQ, MP2, and DFT(M06-2X) calculations for some hydrated complexes of alanine and dialanine (Ala⋅13 H 2 O, Ala 2 H + ⋅18 H 2 O, and Ala 2 ⋅18 H 2 O) are compared with experimental data and other elaborate modeling to assess the reliability of a simple bottom-up approach. The inclusion of a minimal number of water molecules for microhydration of the polar groups together with the polarizable continuum model is sufficient to reproduce the relative bulk thermodynamic functions of the considered biomolecules. These quantities depend on the adopted electronic-structure method, which should be chosen with great care. Nevertheless, the computationally feasible MP2 and M06-2X functionals with the aug-cc-pVTZ basis set satisfactorily reproduce values derived by high-level CCSD(T) and MP4-SDQ methods, and thus they are suitable for future developments of more elaborate and hence more biochemically significant peptides. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
de Carvalho, E. F. V.; Lopez-Castillo, A.; Roberto-Neto, O.
2018-01-01
Graphene can be viewed as sheet of benzene rings fused together forming a variety of structures including the trioxotriangulenes (TOTs) which is a class of organic molecules with electro-active properties. In order to clarify such properties, structures and electronic properties of the graphene fragments phenalenyl, triangulene, 6-oxophenalenoxyl, and X3TOT (X = H, F, Cl) are computed. Validation of the methodologies are carried out using the density functionals B3LYP, M06-2X, B2PLYP-D, and the MP2 theory, giving equilibrium geometries of benzene, naphthalene, and anthracene with mean unsigned error (MUE) of only 0.003, 0.007, 0.004, and 0.007 Å, respectively in relation to experiment.
Global Adjoint Tomography: Next-Generation Models
NASA Astrophysics Data System (ADS)
Bozdag, Ebru; Lefebvre, Matthieu; Lei, Wenjie; Orsvuran, Ridvan; Peter, Daniel; Ruan, Youyi; Smith, James; Komatitsch, Dimitri; Tromp, Jeroen
2017-04-01
The first-generation global adjoint tomography model GLAD-M15 (Bozdag et al. 2016) is the result of 15 conjugate-gradient iterations based on GPU-accelerated spectral-element simulations of 3D wave propagation and Fréchet kernels. For simplicity, GLAD-M15 was constructed as an elastic model with transverse isotropy confined to the upper mantle. However, Earth's mantle and crust show significant evidence of anisotropy as a result of its composition and deformation. There may be different sources of seismic anisotropy affecting both body and surface waves. As a first attempt, we initially tackle with surface-wave anisotropy and proceed iterations using the same 253 earthquake data set used in GLAD-M15 with an emphasize on upper-mantle. Furthermore, we explore new misfits, such as double-difference measurements (Yuan et al. 2016), to better deal with the possible artifacts of the uneven distribution of seismic stations globally and minimize source uncertainties in structural inversions. We will present our observations with the initial results of azimuthally anisotropic inversions and also discuss the next generation global models with various parametrizations. Meanwhile our goal is to use all available seismic data in imaging. This however requires a solid framework to perform iterative adjoint tomography workflows with big data on supercomputers. We will talk about developments in adjoint tomography workflow from the need of defining new seismic and computational data formats (e.g., ASDF by Krischer et al. 2016, ADIOS by Liu et al. 2011) to developing new pre- and post-processing tools together with experimenting workflow management tools, such as Pegasus (Deelman et al. 2015). All our simulations are performed on Oak Ridge National Laboratory's Cray XK7 "Titan" system. Our ultimate aim is to get ready to harness ORNL's next-generation supercomputer "Summit", an IBM with Power-9 CPUs and NVIDIA Volta GPU accelerators, to be ready by 2018 which will enable us to reduce the shortest period in our global simulations from 17 s to 9 s, and exascale systems will reduce this further to just a few seconds.
Optical clock signal distribution and packaging optimization
NASA Astrophysics Data System (ADS)
Wu, Linghui
Polymer-based waveguides for optoelectronic interconnects and packagings were fabricated by a fabrication process that is compatible with the Si CMOS packaging process. An optoelectronic interconnection layer (OIL) for the high-speed massive clock signal distribution for the Cray T-90 supercomputer board employing optical multimode channel waveguides in conjunction with surface-normal waveguide grating couplers and a 1-to-2 3 dB splitter was constructed. Equalized optical paths were realized using an optical H-tree structure having 48 optical fanouts. This device could be increased to 64 without introducing any additional complications. A 1-to-48 fanout H-tree structure using Ultradel 9000D series polyimide was fabricated. The propagation loss and splitting loss have been measured as 0.21 dB/cm and 0.4 dB/splitter at 850 nm. The power budget was discussed, and the H-tree waveguide fully satisfies the power budget requirement. A tapered waveguide coupler was employed to match the mode profile between the single-mode fiber and the multimode channel waveguides of the OIL. A thermo-optical based multimode switch was designed, fabricated, and tested. The finite difference method was used to simulate the thermal distribution in the polymer waveguide. Both stable and transient conditions have been calculated. The thermo-optical switch was fabricated and tested. The switching speed of 1 ms was experimentally confirmed, fitting well with the simulation results. Thermo-optic switching for randomly polarized light at wavelengths of 850 nm was experimental confirmed, as was a stable attenuation of 25 dB. The details of tapered waveguide fabrication were investigated. Compression-molded 3-D tapered waveguides were demonstrated for the first time. Not only the vertical depth variation but also the linear dimensions of the molded waveguides were well beyond the limits of what any other conventional waveguide fabrication method is capable of providing. Molded waveguides with vertical depths of 100 mum at one end and 5 mum at the other end and lengths of 1.0 cm were fabricated using a photolime gel polymer. A propagation loss of 0.5 dB/cm was achieved when light was coupled from the 5 mum x 5 mum end to the 100 mum x 100 mum end and that of 1.1 dB/cm was observed when light was coupled from the 100 mum x 100 mum end to the 5 mum x 5 mum. By confining the energy to the fundamental mode when coupling from the large end to the small end, low-loss packaging can be achieved bi-directionally. 3-D compression-molded polymeric waveguides present a promising solution to bridging the huge dynamic range of different optoelectronic device-depths varying from a few microns to several hundred microns.
NASA Technical Reports Server (NTRS)
Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana
2016-01-01
In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.
Two-dimensional Euler and Navier-Stokes Time accurate simulations of fan rotor flows
NASA Technical Reports Server (NTRS)
Boretti, A. A.
1990-01-01
Two numerical methods are presented which describe the unsteady flow field in the blade-to-blade plane of an axial fan rotor. These methods solve the compressible, time-dependent, Euler and the compressible, turbulent, time-dependent, Navier-Stokes conservation equations for mass, momentum, and energy. The Navier-Stokes equations are written in Favre-averaged form and are closed with an approximate two-equation turbulence model with low Reynolds number and compressibility effects included. The unsteady aerodynamic component is obtained by superposing inflow or outflow unsteadiness to the steady conditions through time-dependent boundary conditions. The integration in space is performed by using a finite volume scheme, and the integration in time is performed by using k-stage Runge-Kutta schemes, k = 2,5. The numerical integration algorithm allows the reduction of the computational cost of an unsteady simulation involving high frequency disturbances in both CPU time and memory requirements. Less than 200 sec of CPU time are required to advance the Euler equations in a computational grid made up of about 2000 grid during 10,000 time steps on a CRAY Y-MP computer, with a required memory of less than 0.3 megawords.
Development of iterative techniques for the solution of unsteady compressible viscous flows
NASA Technical Reports Server (NTRS)
Hixon, Duane; Sankar, L. N.
1993-01-01
During the past two decades, there has been significant progress in the field of numerical simulation of unsteady compressible viscous flows. At present, a variety of solution techniques exist such as the transonic small disturbance analyses (TSD), transonic full potential equation-based methods, unsteady Euler solvers, and unsteady Navier-Stokes solvers. These advances have been made possible by developments in three areas: (1) improved numerical algorithms; (2) automation of body-fitted grid generation schemes; and (3) advanced computer architectures with vector processing and massively parallel processing features. In this work, the GMRES scheme has been considered as a candidate for acceleration of a Newton iteration time marching scheme for unsteady 2-D and 3-D compressible viscous flow calculation; from preliminary calculations, this will provide up to a 65 percent reduction in the computer time requirements over the existing class of explicit and implicit time marching schemes. The proposed method has ben tested on structured grids, but is flexible enough for extension to unstructured grids. The described scheme has been tested only on the current generation of vector processor architecture of the Cray Y/MP class, but should be suitable for adaptation to massively parallel machines.
NASA Astrophysics Data System (ADS)
Hoar, T. J.; Anderson, J. L.; Collins, N.; Kershaw, H.; Hendricks, J.; Raeder, K.; Mizzi, A. P.; Barré, J.; Gaubert, B.; Madaus, L. E.; Aydogdu, A.; Raeder, J.; Arango, H.; Moore, A. M.; Edwards, C. A.; Curchitser, E. N.; Escudier, R.; Dussin, R.; Bitz, C. M.; Zhang, Y. F.; Shrestha, P.; Rosolem, R.; Rahman, M.
2016-12-01
Strongly-coupled ensemble data assimilation with multiple high-resolution model components requires massive state vectors which need to be efficiently stored and accessed throughout the assimilation process. Supercomputer architectures are tending towards increasing the number of cores per node but have the same or less memory per node. Recent advances in the Data Assimilation Research Testbed (DART), a freely-available community ensemble data assimilation facility that works with dozens of large geophysical models, have addressed the need to run with a smaller memory footprint on a higher node count by utilizing MPI-2 one-sided communication to do non-blocking asynchronous access of distributed data. DART runs efficiently on many computational platforms ranging from laptops through thousands of cores on the newest supercomputers. Benefits of the new DART implementation will be shown. In addition, overviews of the most recently supported models will be presented: CAM-CHEM, WRF-CHEM, CM1, OpenGGCM, FESOM, ROMS, CICE5, TerrSysMP (COSMO, CLM, ParFlow), JULES, and CABLE. DART provides a comprehensive suite of software, documentation, and tutorials that can be used for ensemble data assimilation research, operations, and education. Scientists and software engineers at NCAR are available to support DART users who want to use existing DART products or develop their own applications. Current DART users range from university professors teaching data assimilation, to individual graduate students working with simple models, through national laboratories and state agencies doing operational prediction with large state-of-the-art models.
TOUGH3: A new efficient version of the TOUGH suite of multiphase flow and transport simulators
NASA Astrophysics Data System (ADS)
Jung, Yoojin; Pau, George Shu Heng; Finsterle, Stefan; Pollyea, Ryan M.
2017-11-01
The TOUGH suite of nonisothermal multiphase flow and transport simulators has been updated by various developers over many years to address a vast range of challenging subsurface problems. The increasing complexity of the simulated processes as well as the growing size of model domains that need to be handled call for an improvement in the simulator's computational robustness and efficiency. Moreover, modifications have been frequently introduced independently, resulting in multiple versions of TOUGH that (1) led to inconsistencies in feature implementation and usage, (2) made code maintenance and development inefficient, and (3) caused confusion to users and developers. TOUGH3-a new base version of TOUGH-addresses these issues. It consolidates both the serial (TOUGH2 V2.1) and parallel (TOUGH2-MP V2.0) implementations, enabling simulations to be performed on desktop computers and supercomputers using a single code. New PETSc parallel linear solvers are added to the existing serial solvers of TOUGH2 and the Aztec solver used in TOUGH2-MP. The PETSc solvers generally perform better than the Aztec solvers in parallel and the internal TOUGH3 linear solver in serial. TOUGH3 also incorporates many new features, addresses bugs, and improves the flexibility of data handling. Due to the improved capabilities and usability, TOUGH3 is more robust and efficient for solving tough and computationally demanding problems in diverse scientific and practical applications related to subsurface flow modeling.
NASA Astrophysics Data System (ADS)
Karton, Amir; Martin, Jan M. L.
2012-10-01
Accurate isomerization energies are obtained for a set of 45 C8H8 isomers by means of the high-level, ab initio W1-F12 thermochemical protocol. The 45 isomers involve a range of hydrocarbon functional groups, including (linear and cyclic) polyacetylene, polyyne, and cumulene moieties, as well as aromatic, anti-aromatic, and highly-strained rings. Performance of a variety of DFT functionals for the isomerization energies is evaluated. This proves to be a challenging test: only six of the 56 tested functionals attain root mean square deviations (RMSDs) below 3 kcal mol-1 (the performance of MP2), namely: 2.9 (B972-D), 2.8 (PW6B95), 2.7 (B3PW91-D), 2.2 (PWPB95-D3), 2.1 (ωB97X-D), and 1.2 (DSD-PBEP86) kcal mol-1. Isomers involving highly-strained fused rings or long cumulenic chains provide a 'torture test' for most functionals. Finally, we evaluate the performance of composite procedures (e.g. G4, G4(MP2), CBS-QB3, and CBS-APNO), as well as that of standard ab initio procedures (e.g. MP2, SCS-MP2, MP4, CCSD, and SCS-CCSD). Both connected triples and post-MP4 singles and doubles are important for accurate results. SCS-MP2 actually outperforms MP4(SDQ) for this problem, while SCS-MP3 yields similar performance as CCSD and slightly bests MP4. All the tested empirical composite procedures show excellent performance with RMSDs below 1 kcal mol-1.
Very-large-area CCD image sensors: concept and cost-effective research
NASA Astrophysics Data System (ADS)
Bogaart, E. W.; Peters, I. M.; Kleimann, A. C.; Manoury, E. J. P.; Klaassens, W.; de Laat, W. T. F. M.; Draijer, C.; Frost, R.; Bosiers, J. T.
2009-01-01
A new-generation full-frame 36x48 mm2 48Mp CCD image sensor with vertical anti-blooming for professional digital still camera applications is developed by means of the so-called building block concept. The 48Mp devices are formed by stitching 1kx1k building blocks with 6.0 µm pixel pitch in 6x8 (hxv) format. This concept allows us to design four large-area (48Mp) and sixty-two basic (1Mp) devices per 6" wafer. The basic image sensor is relatively small in order to obtain data from many devices. Evaluation of the basic parameters such as the image pixel and on-chip amplifier provides us statistical data using a limited number of wafers. Whereas the large-area devices are evaluated for aspects typical to large-sensor operation and performance, such as the charge transport efficiency. Combined with the usability of multi-layer reticles, the sensor development is cost effective for prototyping. Optimisation of the sensor design and technology has resulted in a pixel charge capacity of 58 ke- and significantly reduced readout noise (12 electrons at 25 MHz pixel rate, after CDS). Hence, a dynamic range of 73 dB is obtained. Microlens and stack optimisation resulted in an excellent angular response that meets with the wide-angle photography demands.
Kim, Il-Young; Schutzler, Scott; Schrader, Amy; Spencer, Horace J; Azhar, Gohar; Ferrando, Arny A; Wolfe, Robert R
2016-01-01
We have determined whole body protein kinetics, i.e., protein synthesis (PS), breakdown (PB), and net balance (NB) in human subjects in the fasted state and following ingestion of ~40 g [moderate protein (MP)], which has been reported to maximize the protein synthetic response or ~70 g [higher protein (HP)] protein, more representative of the amount of protein in the dinner of an average American diet. Twenty-three healthy young adults who had performed prior resistance exercise (X-MP or X-HP) or time-matched resting (R-MP or R-HP) were studied during a primed continuous infusion of l-[(2)H5]phenylalanine and l-[(2)H2]tyrosine. Subjects were randomly assigned into an exercise (X, n = 12) or resting (R, n = 11) group, and each group was studied at the two levels of dietary protein intake in random order. PS, PB, and NB were expressed as increases above the basal, fasting values (mg·kg lean body mass(-1)·min(-1)). Exercise did not significantly affect protein kinetics and blood chemistry. Feeding resulted in positive NB at both levels of protein intake: NB was greater in response to the meal containing HP vs. MP (P < 0.00001). The greater NB with HP was achieved primarily through a greater reduction in PB and to a lesser extent stimulation of protein synthesis (for all, P < 0.0001). HP resulted in greater plasma essential amino acid responses (P < 0.01) vs. MP, with no differences in insulin and glucose responses. In conclusion, whole body net protein balance improves with greater protein intake above that previously suggested to maximally stimulating muscle protein synthesis because of a simultaneous reduction in protein breakdown. Copyright © 2016 the American Physiological Society.
Impact! Chandra Images a Young Supernova Blast Wave
NASA Astrophysics Data System (ADS)
2000-05-01
Two images made by NASA's Chandra X-ray Observatory, one in October 1999, the other in January 2000, show for the first time the full impact of the actual blast wave from Supernova 1987A (SN1987A). The observations are the first time that X-rays from a shock wave have been imaged at such an early stage of a supernova explosion. Recent observations of SN 1987A with the Hubble Space Telescope revealed gradually brightening hot spots from a ring of matter ejected by the star thousands of years before it exploded. Chandra's X-ray images show the cause for this brightening ring. A shock wave is smashing into portions of the ring at a speed of 10 million miles per hour (4,500 kilometers per second). The gas behind the shock wave has a temperature of about ten million degrees Celsius, and is visible only with an X-ray telescope. "With Hubble we heard the whistle from the oncoming train," said David Burrows of Pennsylvania State University, University Park, the leader of the team of scientists involved in analyzing the Chandra data on SN 1987A. "Now, with Chandra, we can see the train." The X-ray observations appear to confirm the general outlines of a model developed by team member Richard McCray of the University of Colorado, Boulder, and others, which holds that a shock wave has been moving out ahead of the debris expelled by the explosion. As this shock wave collides with material outside the ring, it heats it to millions of degrees. "We are witnessing the birth of a supernova remnant for the first time," McCray said. The Chandra images clearly show the previously unseen, shock-heated matter just inside the optical ring. Comparison with observations made with Chandra in October and January, and with Hubble in February 2000, show that the X-ray emission peaks close to the newly discovered optical hot spots, and indicate that the wave is beginning to hit the ring. In the next few years, the shock wave will light up still more material in the ring, and an inward moving, or reverse, shock wave will heat the material ejected in the explosion itself. "The supernova is digging up its own past," said McCray. The observations were made on October 6, 1999, using the Advanced CCD Imaging Spectrometer (ACIS) and the High Energy Transmission Grating, and again on January 17, 2000, using ACIS. Other members of the team were Eli Michael of the University of Colorado; Dr. Una Hwang, Dr. Steven Holt and Dr. Rob Petre of NASA's Goddard Space Flight Center in Greenbelt, MD; Professor Roger Chevalier of the University of Virginia, Charlottesville; and Professors Gordon Garmire and John Nousek of Pennsylvania State University. The results will be published in an upcoming issue of the Astrophysical Journal. The ACIS instrument was built for NASA by the Massachusetts Institute of Technology, Cambridge, and Pennsylvania State University. The High Energy Transmission Grating was built by the Massachusetts Institute of Technology. NASA's Marshall Space Flight Center in Huntsville, AL, manages the Chandra program. TRW, Inc., Redondo Beach, CA, is the prime contractor for the spacecraft. The Smithsonian's Chandra X-ray Center controls science and flight operations from Cambridge, MA. More About SN 1987A Images to illustrate this release and more information on Chandra's progress can be found on the Internet at: http://chandra.harvard.edu/photo/2000/sn1987a/index.html AND http://chandra.nasa.gov More About SN 1987A
Pols, Thijs W H; Bonta, Peter I; Pires, Nuno M M; Otermin, Iker; Vos, Mariska; de Vries, Margreet R; van Eijk, Marco; Roelofsen, Jeroen; Havekes, Louis M; Quax, Paul H A; van Kuilenburg, André B P; de Waard, Vivian; Pannekoek, Hans; de Vries, Carlie J M
2010-08-01
6-Mercaptopurine (6-MP), the active metabolite of the immunosuppressive prodrug azathioprine, is commonly used in autoimmune diseases and transplant recipients, who are at high risk for cardiovascular disease. Here, we aimed to gain knowledge on the action of 6-MP in atherosclerosis, with a focus on monocytes and macrophages. We demonstrate that 6-MP induces apoptosis of THP-1 monocytes, involving decreased expression of the intrinsic antiapoptotic factors B-cell CLL/Lymphoma-2 (Bcl-2) and Bcl2-like 1 (Bcl-x(L)). In addition, we show that 6-MP decreases expression of the monocyte adhesion molecules platelet endothelial adhesion molecule-1 (PECAM-1) and very late antigen-4 (VLA-4) and inhibits monocyte adhesion. Screening of a panel of cytokines relevant to atherosclerosis revealed that 6-MP robustly inhibits monocyte chemoattractant chemokine-1 (MCP-1) expression in macrophages stimulated with lipopolysaccharide (LPS). Finally, local delivery of 6-MP to the vessel wall, using a drug-eluting cuff, attenuates atherosclerosis in hypercholesterolemic apolipoprotein E*3-Leiden transgenic mice (P<0.05). In line with our in vitro data, this inhibition of atherosclerosis by 6-MP was accompanied with decreased lesion monocyte chemoattractant chemokine-1 levels, enhanced vascular apoptosis, and reduced macrophage content. We report novel, previously unrecognized atheroprotective actions of 6-MP in cultured monocytes/macrophages and in a mouse model of atherosclerosis, providing further insight into the effect of the immunosuppressive drug azathioprine in atherosclerosis.
Hegde, Gautham; Hegde, Nanditha; Kumar, Anil; Keshavaraj
2014-07-01
Orthodontic diagnosis and treatment planning for growing children must involve growth prediction, especially in the treatment of skeletal problems. Studies have shown that a strong association exists between skeletal maturity and dental calcification stages. The present study was therefore taken up to provide a simple and practical method for assessing skeletal maturity using a dental periapical film and standard dental X-ray machine, to compare the developmental stages of the mandibular canine with that of developmental stages of modified MP3 and to find out if any correlation exists, to determine if the developmental stages of the mandibular canine alone can be used as a reliable indicator for assessment of skeletal maturity. A total of 160 periapical radiographs, of the mandibular right canine and the MP3 region was taken and assessed according to the Dermirjian's stages of dental calcification and the modified MP3 stages. The correlation coefficient between MP3 stages and developmental stages of mandibular canine was found to be significant in both male and female groups. When the canine calcification stages were compared with the MP3 stages it was found that with the exception of the D stage of canine calcification the remaining stages showed a very high correlation with the modified MP3 stages. The correlation between the mandibular canine calcification stages, and the MP3 stages was found to be significant. The canine calcification could be used as a sole indicator for assessment of skeletal maturity.
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
Inspired by human forgetfulness – how our brains discard unnecessary data to make room for new information – scientists at the U.S. Department of Energy’s (DOE) Argonne National Laboratory, in collaboration with Brookhaven National Laboratory and three universities, conducted a recent study that combined supercomputer simulation and X-ray characterization of a material that gradually “forgets.”
Breter, H J; Zahn, R K
1979-09-01
6-Mercaptopurine (6MP) metabolism was quantitatively determined in L5178Y murine lymphoma. Cells grown in time-course incubates with [35S]-6MP were extracted with cold perchloric acid, and the buffered extracts were subjected to high-performance liquid cation-exchange chromatography prior to and after hydrolysis with alkaline phosphatase. Free sulfate, 6-thiouric acid, 6-thioxanthosine, 6-thioguanosine, 6-thioinosine, free 6MP, and 6-methylthioinosine were separated from each other; identified in the radiochromatograms by elution volume, UV spectroscopic data, and enzymatic peak-shifting analyses with purine nucleoside phosphorylase; and quantitatively determined by means of 35S radioactivity. Gross intracellular 35S concentrations remained constant at 5 x 10(-5) M after 1 hr of incubation. 6MP metabolism in L5178Y cells was distinguished into an early phase (to 1 hr of incubation) in which 6MP was predominantly catabolized to 6-thiouric acid and free sulfate, into an intermediate phase (to 8 hr) in which substantial amounts of free 6MP and of ribonucleotides of 6-thioxanthosine and 6-thioguanosine were present while the concentrations of nonnucleotide oxidation products sharply decreased, and into a late phase (to 24 hr) in which the ribonucleotides of 6MP, of 6-thioguanosine and, in particular, of 6-methylthioinosine were the most abundant metabolites.
Wang, Jian-Rong; Yu, Xueping; Zhou, Chun; Lin, Yunfei; Chen, Chen; Pan, Guoyu; Mei, Xuefeng
2015-03-01
6-Mercaptopurine (6-MP) is a clinically important antitumor drug. The commercially available form was provided as monohydrate and belongs to BCS class II category. Co-crystallization screening by reaction crystallization method (RCM) and monitored by powder X-ray diffraction led to the discovery of a new co-crystal formed between 6-MP and isonicotinamide (co-crystal 1). Co-crystal 1 was thoroughly characterized by X-ray diffraction, FT-IR and Raman spectroscopy, and thermal analysis. Noticeably, the in vitro and in vivo studies revealed that co-crystal 1 possesses improved dissolution rate and superior bioavailability on animal model. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Palmer, Michael H.
1997-03-01
The relatively minor deviations from true tetrahedral geometry for molecules of type MX 2Y 2 where M is tetravalent, and X, Y are either H, Me or halogen are discussed, in the light of ab initio calculations of equilibrium geometry with a large (triple zeta valence + polarisation) basis, at both the SCF and MP2 levels. The results are compared with known experimental structural and dipole moment data; in most cases a very close correlation with experiment is found, with slight improvements in the MP2 data. The study is coupled with a localised orbital study of relevance to Bent's Rule.
Bala, M; Pathak, A; Jain, R L
2010-01-01
The purpose of the study was to assess skeletal age using MP3 and hand-wrist radiographs and to find the correlation amongst the skeletal, dental and chronological ages. One hundred and sixty North-Indian healthy children in the age group 8-14 years, comprising equal number of males and females were included in the study. The children were radiographed for middle phalanx of third finger (MP3) and hand-wrist of the right hand and intra oral periapical X-ray for right permanent maxillary canine. Skeletal age was assessed from MP3 and hand-wrist radiographs according to the standards of Greulich and Pyle. The dental age was assessed from IOPA radiographs of right permanent maxillary canine based on Nolla's calcification stages. Skeletal age from MP3 and hand-wrist radiographs shows high correlation in all the age groups for both sexes. Females were advanced in skeletal maturation than males. Skeletal age showed high correlation with dental age in 12-14 years age group. Chronological age showed inconsistent correlation with dental and skeletal ages.
Status and future plans for open source QuickPIC
NASA Astrophysics Data System (ADS)
An, Weiming; Decyk, Viktor; Mori, Warren
2017-10-01
QuickPIC is a three dimensional (3D) quasi-static particle-in-cell (PIC) code developed based on the UPIC framework. It can be used for efficiently modeling plasma based accelerator (PBA) problems. With quasi-static approximation, QuickPIC can use different time scales for calculating the beam (or laser) evolution and the plasma response, and a 3D plasma wake field can be simulated using a two-dimensional (2D) PIC code where the time variable is ξ = ct - z and z is the beam propagation direction. QuickPIC can be thousand times faster than the normal PIC code when simulating the PBA. It uses an MPI/OpenMP hybrid parallel algorithm, which can be run on either a laptop or the largest supercomputer. The open source QuickPIC is an object-oriented program with high level classes written in Fortran 2003. It can be found at https://github.com/UCLA-Plasma-Simulation-Group/QuickPIC-OpenSource.git
PLOT3D/AMES, APOLLO UNIX VERSION USING GMR3D (WITHOUT TURB3D)
NASA Technical Reports Server (NTRS)
Buning, P.
1994-01-01
PLOT3D is an interactive graphics program designed to help scientists visualize computational fluid dynamics (CFD) grids and solutions. Today, supercomputers and CFD algorithms can provide scientists with simulations of such highly complex phenomena that obtaining an understanding of the simulations has become a major problem. Tools which help the scientist visualize the simulations can be of tremendous aid. PLOT3D/AMES offers more functions and features, and has been adapted for more types of computers than any other CFD graphics program. Version 3.6b+ is supported for five computers and graphic libraries. Using PLOT3D, CFD physicists can view their computational models from any angle, observing the physics of problems and the quality of solutions. As an aid in designing aircraft, for example, PLOT3D's interactive computer graphics can show vortices, temperature, reverse flow, pressure, and dozens of other characteristics of air flow during flight. As critical areas become obvious, they can easily be studied more closely using a finer grid. PLOT3D is part of a computational fluid dynamics software cycle. First, a program such as 3DGRAPE (ARC-12620) helps the scientist generate computational grids to model an object and its surrounding space. Once the grids have been designed and parameters such as the angle of attack, Mach number, and Reynolds number have been specified, a "flow-solver" program such as INS3D (ARC-11794 or COS-10019) solves the system of equations governing fluid flow, usually on a supercomputer. Grids sometimes have as many as two million points, and the "flow-solver" produces a solution file which contains density, x- y- and z-momentum, and stagnation energy for each grid point. With such a solution file and a grid file containing up to 50 grids as input, PLOT3D can calculate and graphically display any one of 74 functions, including shock waves, surface pressure, velocity vectors, and particle traces. PLOT3D's 74 functions are organized into five groups: 1) Grid Functions for grids, grid-checking, etc.; 2) Scalar Functions for contour or carpet plots of density, pressure, temperature, Mach number, vorticity magnitude, helicity, etc.; 3) Vector Functions for vector plots of velocity, vorticity, momentum, and density gradient, etc.; 4) Particle Trace Functions for rake-like plots of particle flow or vortex lines; and 5) Shock locations based on pressure gradient. TURB3D is a modification of PLOT3D which is used for viewing CFD simulations of incompressible turbulent flow. Input flow data consists of pressure, velocity and vorticity. Typical quantities to plot include local fluctuations in flow quantities and turbulent production terms, plotted in physical or wall units. PLOT3D/TURB3D includes both TURB3D and PLOT3D because the operation of TURB3D is identical to PLOT3D, and there is no additional sample data or printed documentation for TURB3D. Graphical capabilities of PLOT3D version 3.6b+ vary among the implementations available through COSMIC. Customers are encouraged to purchase and carefully review the PLOT3D manual before ordering the program for a specific computer and graphics library. There is only one manual for use with all implementations of PLOT3D, and although this manual generally assumes that the Silicon Graphics Iris implementation is being used, informative comments concerning other implementations appear throughout the text. With all implementations, the visual representation of the object and flow field created by PLOT3D consists of points, lines, and polygons. Points can be represented with dots or symbols, color can be used to denote data values, and perspective is used to show depth. Differences among implementations impact the program's ability to use graphical features that are based on 3D polygons, the user's ability to manipulate the graphical displays, and the user's ability to obtain alternate forms of output. The Apollo implementation of PLOT3D uses some of the capabilities of Apollo's 3-dimensional graphics hardware, but does not take advantage of the shading and hidden line/surface removal capabilities of the Apollo DN10000. Although this implementation does not offer a capability for putting text on plots, it does support the use of a mouse to translate, rotate, or zoom in on views. The version 3.6b+ Apollo implementations of PLOT3D (ARC-12789) and PLOT3D/TURB3D (ARC-12785) were developed for use on Apollo computers running UNIX System V with BSD 4.3 extensions and the graphics library GMR3D Version 2.0. The standard distribution media for each of these programs is a 9-track, 6250 bpi magnetic tape in TAR format. Customers purchasing one implementation version of PLOT3D or PLOT3D/TURB3D will be given a $200 discount on each additional implementation version ordered at the same time. Version 3.6b+ of PLOT3D and PLOT3D/TURB3D are also supported for the following computers and graphics libraries: 1) generic UNIX Supercomputer and IRIS, suitable for CRAY 2/UNICOS, CONVEX, and Alliant with remote IRIS 2xxx/3xxx or IRIS 4D (ARC-12779, ARC-12784); 2) VAX computers running VMS Version 5.0 and DISSPLA Version 11.0 (ARC-12777, ARC-12781); 3) generic UNIX and DISSPLA Version 11.0 (ARC-12788, ARC-12778); and (4) Silicon Graphics IRIS 2xxx/3xxx or IRIS 4D workstations (ARC-12783, ARC-12782). Silicon Graphics Iris, IRIS 4D, and IRIS 2xxx/3xxx are trademarks of Silicon Graphics Incorporated. VAX and VMS are trademarks of Digital Electronics Corporation. DISSPLA is a trademark of Computer Associates. CRAY 2 and UNICOS are trademarks of CRAY Research, Incorporated. CONVEX is a trademark of Convex Computer Corporation. Alliant is a trademark of Alliant. Apollo and GMR3D are trademarks of Hewlett-Packard, Incorporated. UNIX is a registered trademark of AT&T.
PLOT3D/AMES, SGI IRIS VERSION (WITHOUT TURB3D)
NASA Technical Reports Server (NTRS)
Buning, P.
1994-01-01
PLOT3D is an interactive graphics program designed to help scientists visualize computational fluid dynamics (CFD) grids and solutions. Today, supercomputers and CFD algorithms can provide scientists with simulations of such highly complex phenomena that obtaining an understanding of the simulations has become a major problem. Tools which help the scientist visualize the simulations can be of tremendous aid. PLOT3D/AMES offers more functions and features, and has been adapted for more types of computers than any other CFD graphics program. Version 3.6b+ is supported for five computers and graphic libraries. Using PLOT3D, CFD physicists can view their computational models from any angle, observing the physics of problems and the quality of solutions. As an aid in designing aircraft, for example, PLOT3D's interactive computer graphics can show vortices, temperature, reverse flow, pressure, and dozens of other characteristics of air flow during flight. As critical areas become obvious, they can easily be studied more closely using a finer grid. PLOT3D is part of a computational fluid dynamics software cycle. First, a program such as 3DGRAPE (ARC-12620) helps the scientist generate computational grids to model an object and its surrounding space. Once the grids have been designed and parameters such as the angle of attack, Mach number, and Reynolds number have been specified, a "flow-solver" program such as INS3D (ARC-11794 or COS-10019) solves the system of equations governing fluid flow, usually on a supercomputer. Grids sometimes have as many as two million points, and the "flow-solver" produces a solution file which contains density, x- y- and z-momentum, and stagnation energy for each grid point. With such a solution file and a grid file containing up to 50 grids as input, PLOT3D can calculate and graphically display any one of 74 functions, including shock waves, surface pressure, velocity vectors, and particle traces. PLOT3D's 74 functions are organized into five groups: 1) Grid Functions for grids, grid-checking, etc.; 2) Scalar Functions for contour or carpet plots of density, pressure, temperature, Mach number, vorticity magnitude, helicity, etc.; 3) Vector Functions for vector plots of velocity, vorticity, momentum, and density gradient, etc.; 4) Particle Trace Functions for rake-like plots of particle flow or vortex lines; and 5) Shock locations based on pressure gradient. TURB3D is a modification of PLOT3D which is used for viewing CFD simulations of incompressible turbulent flow. Input flow data consists of pressure, velocity and vorticity. Typical quantities to plot include local fluctuations in flow quantities and turbulent production terms, plotted in physical or wall units. PLOT3D/TURB3D includes both TURB3D and PLOT3D because the operation of TURB3D is identical to PLOT3D, and there is no additional sample data or printed documentation for TURB3D. Graphical capabilities of PLOT3D version 3.6b+ vary among the implementations available through COSMIC. Customers are encouraged to purchase and carefully review the PLOT3D manual before ordering the program for a specific computer and graphics library. There is only one manual for use with all implementations of PLOT3D, and although this manual generally assumes that the Silicon Graphics Iris implementation is being used, informative comments concerning other implementations appear throughout the text. With all implementations, the visual representation of the object and flow field created by PLOT3D consists of points, lines, and polygons. Points can be represented with dots or symbols, color can be used to denote data values, and perspective is used to show depth. Differences among implementations impact the program's ability to use graphical features that are based on 3D polygons, the user's ability to manipulate the graphical displays, and the user's ability to obtain alternate forms of output. In each of these areas, the IRIS implementation of PLOT3D offers advanced features which aid visualization efforts. Shading and hidden line/surface removal can be used to enhance depth perception and other aspects of the graphical displays. A mouse can be used to translate, rotate, or zoom in on views. Files for several types of output can be produced. Two animation options are even offered: creation of simple animation sequences without the need for other software; and, creation of files for use in GAS (Graphics Animation System, ARC-12379), an IRIS program which offers more complex rendering and animation capabilities and can record images to digital disk, video tape, or 16-mm film. The version 3.6b+ SGI implementations of PLOT3D (ARC-12783) and PLOT3D/TURB3D (ARC-12782) were developed for use on Silicon Graphics IRIS 2xxx/3xxx or IRIS 4D workstations. These programs are each distributed on one .25 inch magnetic tape cartridge in IRIS TAR format. Customers purchasing one implementation version of PLOT3D or PLOT3D/TURB3D will be given a $200 discount on each additional implementation version ordered at the same time. Version 3.6b+ of PLOT3D and PLOT3D/TURB3D are also supported for the following computers and graphics libraries: (1) generic UNIX Supercomputer and IRIS, suitable for CRAY 2/UNICOS, CONVEX, and Alliant with remote IRIS 2xxx/3xxx or IRIS 4D (ARC-12779, ARC-12784); (2) VAX computers running VMS Version 5.0 and DISSPLA Version 11.0 (ARC-12777,ARC-12781); (3) generic UNIX and DISSPLA Version 11.0 (ARC-12788, ARC-12778); and (4) Apollo computers running UNIX and GMR3D Version 2.0 (ARC-12789, ARC-12785 which have no capabilities to put text on plots). Silicon Graphics Iris, IRIS 4D, and IRIS 2xxx/3xxx are trademarks of Silicon Graphics Incorporated. VAX and VMS are trademarks of Digital Electronics Corporation. DISSPLA is a trademark of Computer Associates. CRAY 2 and UNICOS are trademarks of CRAY Research, Incorporated. CONVEX is a trademark of Convex Computer Corporation. Alliant is a trademark of Alliant. Apollo and GMR3D are trademarks of Hewlett-Packard, Incorporated. UNIX is a registered trademark of AT&T.
PLOT3D/AMES, GENERIC UNIX VERSION USING DISSPLA (WITH TURB3D)
NASA Technical Reports Server (NTRS)
Buning, P.
1994-01-01
PLOT3D is an interactive graphics program designed to help scientists visualize computational fluid dynamics (CFD) grids and solutions. Today, supercomputers and CFD algorithms can provide scientists with simulations of such highly complex phenomena that obtaining an understanding of the simulations has become a major problem. Tools which help the scientist visualize the simulations can be of tremendous aid. PLOT3D/AMES offers more functions and features, and has been adapted for more types of computers than any other CFD graphics program. Version 3.6b+ is supported for five computers and graphic libraries. Using PLOT3D, CFD physicists can view their computational models from any angle, observing the physics of problems and the quality of solutions. As an aid in designing aircraft, for example, PLOT3D's interactive computer graphics can show vortices, temperature, reverse flow, pressure, and dozens of other characteristics of air flow during flight. As critical areas become obvious, they can easily be studied more closely using a finer grid. PLOT3D is part of a computational fluid dynamics software cycle. First, a program such as 3DGRAPE (ARC-12620) helps the scientist generate computational grids to model an object and its surrounding space. Once the grids have been designed and parameters such as the angle of attack, Mach number, and Reynolds number have been specified, a "flow-solver" program such as INS3D (ARC-11794 or COS-10019) solves the system of equations governing fluid flow, usually on a supercomputer. Grids sometimes have as many as two million points, and the "flow-solver" produces a solution file which contains density, x- y- and z-momentum, and stagnation energy for each grid point. With such a solution file and a grid file containing up to 50 grids as input, PLOT3D can calculate and graphically display any one of 74 functions, including shock waves, surface pressure, velocity vectors, and particle traces. PLOT3D's 74 functions are organized into five groups: 1) Grid Functions for grids, grid-checking, etc.; 2) Scalar Functions for contour or carpet plots of density, pressure, temperature, Mach number, vorticity magnitude, helicity, etc.; 3) Vector Functions for vector plots of velocity, vorticity, momentum, and density gradient, etc.; 4) Particle Trace Functions for rake-like plots of particle flow or vortex lines; and 5) Shock locations based on pressure gradient. TURB3D is a modification of PLOT3D which is used for viewing CFD simulations of incompressible turbulent flow. Input flow data consists of pressure, velocity and vorticity. Typical quantities to plot include local fluctuations in flow quantities and turbulent production terms, plotted in physical or wall units. PLOT3D/TURB3D includes both TURB3D and PLOT3D because the operation of TURB3D is identical to PLOT3D, and there is no additional sample data or printed documentation for TURB3D. Graphical capabilities of PLOT3D version 3.6b+ vary among the implementations available through COSMIC. Customers are encouraged to purchase and carefully review the PLOT3D manual before ordering the program for a specific computer and graphics library. There is only one manual for use with all implementations of PLOT3D, and although this manual generally assumes that the Silicon Graphics Iris implementation is being used, informative comments concerning other implementations appear throughout the text. With all implementations, the visual representation of the object and flow field created by PLOT3D consists of points, lines, and polygons. Points can be represented with dots or symbols, color can be used to denote data values, and perspective is used to show depth. Differences among implementations impact the program's ability to use graphical features that are based on 3D polygons, the user's ability to manipulate the graphical displays, and the user's ability to obtain alternate forms of output. The UNIX/DISSPLA implementation of PLOT3D supports 2-D polygons as well as 2-D and 3-D lines, but does not support graphics features requiring 3-D polygons (shading and hidden line removal, for example). Views can be manipulated using keyboard commands. This version of PLOT3D is potentially able to produce files for a variety of output devices; however, site-specific capabilities will vary depending on the device drivers supplied with the user's DISSPLA library. The version 3.6b+ UNIX/DISSPLA implementations of PLOT3D (ARC-12788) and PLOT3D/TURB3D (ARC-12778) were developed for use on computers running UNIX SYSTEM 5 with BSD 4.3 extensions. The standard distribution media for each ofthese programs is a 9track, 6250 bpi magnetic tape in TAR format. Customers purchasing one implementation version of PLOT3D or PLOT3D/TURB3D will be given a $200 discount on each additional implementation version ordered at the same time. Version 3.6b+ of PLOT3D and PLOT3D/TURB3D are also supported for the following computers and graphics libraries: (1) generic UNIX Supercomputer and IRIS, suitable for CRAY 2/UNICOS, CONVEX, Alliant with remote IRIS 2xxx/3xxx or IRIS 4D (ARC-12779, ARC-12784); (2) Silicon Graphics IRIS 2xxx/3xxx or IRIS 4D (ARC-12783, ARC-12782); (3) VAX computers running VMS Version 5.0 and DISSPLA Version 11.0 (ARC-12777, ARC-12781); and (4) Apollo computers running UNIX and GMR3D Version 2.0 (ARC-12789, ARC-12785 which have no capabilities to put text on plots). Silicon Graphics Iris, IRIS 4D, and IRIS 2xxx/3xxx are trademarks of Silicon Graphics Incorporated. VAX and VMS are trademarks of Digital Electronics Corporation. DISSPLA is a trademark of Computer Associates. CRAY 2 and UNICOS are trademarks of CRAY Research, Incorporated. CONVEX is a trademark of Convex Computer Corporation. Alliant is a trademark of Alliant. Apollo and GMR3D are trademarks of Hewlett-Packard, Incorporated. System 5 is a trademark of Bell Labs, Incorporated. BSD4.3 is a trademark of the University of California at Berkeley. UNIX is a registered trademark of AT&T.
PLOT3D/AMES, APOLLO UNIX VERSION USING GMR3D (WITH TURB3D)
NASA Technical Reports Server (NTRS)
Buning, P.
1994-01-01
PLOT3D is an interactive graphics program designed to help scientists visualize computational fluid dynamics (CFD) grids and solutions. Today, supercomputers and CFD algorithms can provide scientists with simulations of such highly complex phenomena that obtaining an understanding of the simulations has become a major problem. Tools which help the scientist visualize the simulations can be of tremendous aid. PLOT3D/AMES offers more functions and features, and has been adapted for more types of computers than any other CFD graphics program. Version 3.6b+ is supported for five computers and graphic libraries. Using PLOT3D, CFD physicists can view their computational models from any angle, observing the physics of problems and the quality of solutions. As an aid in designing aircraft, for example, PLOT3D's interactive computer graphics can show vortices, temperature, reverse flow, pressure, and dozens of other characteristics of air flow during flight. As critical areas become obvious, they can easily be studied more closely using a finer grid. PLOT3D is part of a computational fluid dynamics software cycle. First, a program such as 3DGRAPE (ARC-12620) helps the scientist generate computational grids to model an object and its surrounding space. Once the grids have been designed and parameters such as the angle of attack, Mach number, and Reynolds number have been specified, a "flow-solver" program such as INS3D (ARC-11794 or COS-10019) solves the system of equations governing fluid flow, usually on a supercomputer. Grids sometimes have as many as two million points, and the "flow-solver" produces a solution file which contains density, x- y- and z-momentum, and stagnation energy for each grid point. With such a solution file and a grid file containing up to 50 grids as input, PLOT3D can calculate and graphically display any one of 74 functions, including shock waves, surface pressure, velocity vectors, and particle traces. PLOT3D's 74 functions are organized into five groups: 1) Grid Functions for grids, grid-checking, etc.; 2) Scalar Functions for contour or carpet plots of density, pressure, temperature, Mach number, vorticity magnitude, helicity, etc.; 3) Vector Functions for vector plots of velocity, vorticity, momentum, and density gradient, etc.; 4) Particle Trace Functions for rake-like plots of particle flow or vortex lines; and 5) Shock locations based on pressure gradient. TURB3D is a modification of PLOT3D which is used for viewing CFD simulations of incompressible turbulent flow. Input flow data consists of pressure, velocity and vorticity. Typical quantities to plot include local fluctuations in flow quantities and turbulent production terms, plotted in physical or wall units. PLOT3D/TURB3D includes both TURB3D and PLOT3D because the operation of TURB3D is identical to PLOT3D, and there is no additional sample data or printed documentation for TURB3D. Graphical capabilities of PLOT3D version 3.6b+ vary among the implementations available through COSMIC. Customers are encouraged to purchase and carefully review the PLOT3D manual before ordering the program for a specific computer and graphics library. There is only one manual for use with all implementations of PLOT3D, and although this manual generally assumes that the Silicon Graphics Iris implementation is being used, informative comments concerning other implementations appear throughout the text. With all implementations, the visual representation of the object and flow field created by PLOT3D consists of points, lines, and polygons. Points can be represented with dots or symbols, color can be used to denote data values, and perspective is used to show depth. Differences among implementations impact the program's ability to use graphical features that are based on 3D polygons, the user's ability to manipulate the graphical displays, and the user's ability to obtain alternate forms of output. The Apollo implementation of PLOT3D uses some of the capabilities of Apollo's 3-dimensional graphics hardware, but does not take advantage of the shading and hidden line/surface removal capabilities of the Apollo DN10000. Although this implementation does not offer a capability for putting text on plots, it does support the use of a mouse to translate, rotate, or zoom in on views. The version 3.6b+ Apollo implementations of PLOT3D (ARC-12789) and PLOT3D/TURB3D (ARC-12785) were developed for use on Apollo computers running UNIX System V with BSD 4.3 extensions and the graphics library GMR3D Version 2.0. The standard distribution media for each of these programs is a 9-track, 6250 bpi magnetic tape in TAR format. Customers purchasing one implementation version of PLOT3D or PLOT3D/TURB3D will be given a $200 discount on each additional implementation version ordered at the same time. Version 3.6b+ of PLOT3D and PLOT3D/TURB3D are also supported for the following computers and graphics libraries: 1) generic UNIX Supercomputer and IRIS, suitable for CRAY 2/UNICOS, CONVEX, and Alliant with remote IRIS 2xxx/3xxx or IRIS 4D (ARC-12779, ARC-12784); 2) VAX computers running VMS Version 5.0 and DISSPLA Version 11.0 (ARC-12777, ARC-12781); 3) generic UNIX and DISSPLA Version 11.0 (ARC-12788, ARC-12778); and (4) Silicon Graphics IRIS 2xxx/3xxx or IRIS 4D workstations (ARC-12783, ARC-12782). Silicon Graphics Iris, IRIS 4D, and IRIS 2xxx/3xxx are trademarks of Silicon Graphics Incorporated. VAX and VMS are trademarks of Digital Electronics Corporation. DISSPLA is a trademark of Computer Associates. CRAY 2 and UNICOS are trademarks of CRAY Research, Incorporated. CONVEX is a trademark of Convex Computer Corporation. Alliant is a trademark of Alliant. Apollo and GMR3D are trademarks of Hewlett-Packard, Incorporated. UNIX is a registered trademark of AT&T.
PLOT3D/AMES, SGI IRIS VERSION (WITH TURB3D)
NASA Technical Reports Server (NTRS)
Buning, P.
1994-01-01
PLOT3D is an interactive graphics program designed to help scientists visualize computational fluid dynamics (CFD) grids and solutions. Today, supercomputers and CFD algorithms can provide scientists with simulations of such highly complex phenomena that obtaining an understanding of the simulations has become a major problem. Tools which help the scientist visualize the simulations can be of tremendous aid. PLOT3D/AMES offers more functions and features, and has been adapted for more types of computers than any other CFD graphics program. Version 3.6b+ is supported for five computers and graphic libraries. Using PLOT3D, CFD physicists can view their computational models from any angle, observing the physics of problems and the quality of solutions. As an aid in designing aircraft, for example, PLOT3D's interactive computer graphics can show vortices, temperature, reverse flow, pressure, and dozens of other characteristics of air flow during flight. As critical areas become obvious, they can easily be studied more closely using a finer grid. PLOT3D is part of a computational fluid dynamics software cycle. First, a program such as 3DGRAPE (ARC-12620) helps the scientist generate computational grids to model an object and its surrounding space. Once the grids have been designed and parameters such as the angle of attack, Mach number, and Reynolds number have been specified, a "flow-solver" program such as INS3D (ARC-11794 or COS-10019) solves the system of equations governing fluid flow, usually on a supercomputer. Grids sometimes have as many as two million points, and the "flow-solver" produces a solution file which contains density, x- y- and z-momentum, and stagnation energy for each grid point. With such a solution file and a grid file containing up to 50 grids as input, PLOT3D can calculate and graphically display any one of 74 functions, including shock waves, surface pressure, velocity vectors, and particle traces. PLOT3D's 74 functions are organized into five groups: 1) Grid Functions for grids, grid-checking, etc.; 2) Scalar Functions for contour or carpet plots of density, pressure, temperature, Mach number, vorticity magnitude, helicity, etc.; 3) Vector Functions for vector plots of velocity, vorticity, momentum, and density gradient, etc.; 4) Particle Trace Functions for rake-like plots of particle flow or vortex lines; and 5) Shock locations based on pressure gradient. TURB3D is a modification of PLOT3D which is used for viewing CFD simulations of incompressible turbulent flow. Input flow data consists of pressure, velocity and vorticity. Typical quantities to plot include local fluctuations in flow quantities and turbulent production terms, plotted in physical or wall units. PLOT3D/TURB3D includes both TURB3D and PLOT3D because the operation of TURB3D is identical to PLOT3D, and there is no additional sample data or printed documentation for TURB3D. Graphical capabilities of PLOT3D version 3.6b+ vary among the implementations available through COSMIC. Customers are encouraged to purchase and carefully review the PLOT3D manual before ordering the program for a specific computer and graphics library. There is only one manual for use with all implementations of PLOT3D, and although this manual generally assumes that the Silicon Graphics Iris implementation is being used, informative comments concerning other implementations appear throughout the text. With all implementations, the visual representation of the object and flow field created by PLOT3D consists of points, lines, and polygons. Points can be represented with dots or symbols, color can be used to denote data values, and perspective is used to show depth. Differences among implementations impact the program's ability to use graphical features that are based on 3D polygons, the user's ability to manipulate the graphical displays, and the user's ability to obtain alternate forms of output. In each of these areas, the IRIS implementation of PLOT3D offers advanced features which aid visualization efforts. Shading and hidden line/surface removal can be used to enhance depth perception and other aspects of the graphical displays. A mouse can be used to translate, rotate, or zoom in on views. Files for several types of output can be produced. Two animation options are even offered: creation of simple animation sequences without the need for other software; and, creation of files for use in GAS (Graphics Animation System, ARC-12379), an IRIS program which offers more complex rendering and animation capabilities and can record images to digital disk, video tape, or 16-mm film. The version 3.6b+ SGI implementations of PLOT3D (ARC-12783) and PLOT3D/TURB3D (ARC-12782) were developed for use on Silicon Graphics IRIS 2xxx/3xxx or IRIS 4D workstations. These programs are each distributed on one .25 inch magnetic tape cartridge in IRIS TAR format. Customers purchasing one implementation version of PLOT3D or PLOT3D/TURB3D will be given a $200 discount on each additional implementation version ordered at the same time. Version 3.6b+ of PLOT3D and PLOT3D/TURB3D are also supported for the following computers and graphics libraries: (1) generic UNIX Supercomputer and IRIS, suitable for CRAY 2/UNICOS, CONVEX, and Alliant with remote IRIS 2xxx/3xxx or IRIS 4D (ARC-12779, ARC-12784); (2) VAX computers running VMS Version 5.0 and DISSPLA Version 11.0 (ARC-12777,ARC-12781); (3) generic UNIX and DISSPLA Version 11.0 (ARC-12788, ARC-12778); and (4) Apollo computers running UNIX and GMR3D Version 2.0 (ARC-12789, ARC-12785 which have no capabilities to put text on plots). Silicon Graphics Iris, IRIS 4D, and IRIS 2xxx/3xxx are trademarks of Silicon Graphics Incorporated. VAX and VMS are trademarks of Digital Electronics Corporation. DISSPLA is a trademark of Computer Associates. CRAY 2 and UNICOS are trademarks of CRAY Research, Incorporated. CONVEX is a trademark of Convex Computer Corporation. Alliant is a trademark of Alliant. Apollo and GMR3D are trademarks of Hewlett-Packard, Incorporated. UNIX is a registered trademark of AT&T.
PLOT3D/AMES, GENERIC UNIX VERSION USING DISSPLA (WITHOUT TURB3D)
NASA Technical Reports Server (NTRS)
Buning, P.
1994-01-01
PLOT3D is an interactive graphics program designed to help scientists visualize computational fluid dynamics (CFD) grids and solutions. Today, supercomputers and CFD algorithms can provide scientists with simulations of such highly complex phenomena that obtaining an understanding of the simulations has become a major problem. Tools which help the scientist visualize the simulations can be of tremendous aid. PLOT3D/AMES offers more functions and features, and has been adapted for more types of computers than any other CFD graphics program. Version 3.6b+ is supported for five computers and graphic libraries. Using PLOT3D, CFD physicists can view their computational models from any angle, observing the physics of problems and the quality of solutions. As an aid in designing aircraft, for example, PLOT3D's interactive computer graphics can show vortices, temperature, reverse flow, pressure, and dozens of other characteristics of air flow during flight. As critical areas become obvious, they can easily be studied more closely using a finer grid. PLOT3D is part of a computational fluid dynamics software cycle. First, a program such as 3DGRAPE (ARC-12620) helps the scientist generate computational grids to model an object and its surrounding space. Once the grids have been designed and parameters such as the angle of attack, Mach number, and Reynolds number have been specified, a "flow-solver" program such as INS3D (ARC-11794 or COS-10019) solves the system of equations governing fluid flow, usually on a supercomputer. Grids sometimes have as many as two million points, and the "flow-solver" produces a solution file which contains density, x- y- and z-momentum, and stagnation energy for each grid point. With such a solution file and a grid file containing up to 50 grids as input, PLOT3D can calculate and graphically display any one of 74 functions, including shock waves, surface pressure, velocity vectors, and particle traces. PLOT3D's 74 functions are organized into five groups: 1) Grid Functions for grids, grid-checking, etc.; 2) Scalar Functions for contour or carpet plots of density, pressure, temperature, Mach number, vorticity magnitude, helicity, etc.; 3) Vector Functions for vector plots of velocity, vorticity, momentum, and density gradient, etc.; 4) Particle Trace Functions for rake-like plots of particle flow or vortex lines; and 5) Shock locations based on pressure gradient. TURB3D is a modification of PLOT3D which is used for viewing CFD simulations of incompressible turbulent flow. Input flow data consists of pressure, velocity and vorticity. Typical quantities to plot include local fluctuations in flow quantities and turbulent production terms, plotted in physical or wall units. PLOT3D/TURB3D includes both TURB3D and PLOT3D because the operation of TURB3D is identical to PLOT3D, and there is no additional sample data or printed documentation for TURB3D. Graphical capabilities of PLOT3D version 3.6b+ vary among the implementations available through COSMIC. Customers are encouraged to purchase and carefully review the PLOT3D manual before ordering the program for a specific computer and graphics library. There is only one manual for use with all implementations of PLOT3D, and although this manual generally assumes that the Silicon Graphics Iris implementation is being used, informative comments concerning other implementations appear throughout the text. With all implementations, the visual representation of the object and flow field created by PLOT3D consists of points, lines, and polygons. Points can be represented with dots or symbols, color can be used to denote data values, and perspective is used to show depth. Differences among implementations impact the program's ability to use graphical features that are based on 3D polygons, the user's ability to manipulate the graphical displays, and the user's ability to obtain alternate forms of output. The UNIX/DISSPLA implementation of PLOT3D supports 2-D polygons as well as 2-D and 3-D lines, but does not support graphics features requiring 3-D polygons (shading and hidden line removal, for example). Views can be manipulated using keyboard commands. This version of PLOT3D is potentially able to produce files for a variety of output devices; however, site-specific capabilities will vary depending on the device drivers supplied with the user's DISSPLA library. The version 3.6b+ UNIX/DISSPLA implementations of PLOT3D (ARC-12788) and PLOT3D/TURB3D (ARC-12778) were developed for use on computers running UNIX SYSTEM 5 with BSD 4.3 extensions. The standard distribution media for each ofthese programs is a 9track, 6250 bpi magnetic tape in TAR format. Customers purchasing one implementation version of PLOT3D or PLOT3D/TURB3D will be given a $200 discount on each additional implementation version ordered at the same time. Version 3.6b+ of PLOT3D and PLOT3D/TURB3D are also supported for the following computers and graphics libraries: (1) generic UNIX Supercomputer and IRIS, suitable for CRAY 2/UNICOS, CONVEX, Alliant with remote IRIS 2xxx/3xxx or IRIS 4D (ARC-12779, ARC-12784); (2) Silicon Graphics IRIS 2xxx/3xxx or IRIS 4D (ARC-12783, ARC-12782); (3) VAX computers running VMS Version 5.0 and DISSPLA Version 11.0 (ARC-12777, ARC-12781); and (4) Apollo computers running UNIX and GMR3D Version 2.0 (ARC-12789, ARC-12785 which have no capabilities to put text on plots). Silicon Graphics Iris, IRIS 4D, and IRIS 2xxx/3xxx are trademarks of Silicon Graphics Incorporated. VAX and VMS are trademarks of Digital Electronics Corporation. DISSPLA is a trademark of Computer Associates. CRAY 2 and UNICOS are trademarks of CRAY Research, Incorporated. CONVEX is a trademark of Convex Computer Corporation. Alliant is a trademark of Alliant. Apollo and GMR3D are trademarks of Hewlett-Packard, Incorporated. System 5 is a trademark of Bell Labs, Incorporated. BSD4.3 is a trademark of the University of California at Berkeley. UNIX is a registered trademark of AT&T.
NASA Astrophysics Data System (ADS)
He, Yuan; Cremer, Dieter
For 30 molecules and two atoms, MP n correlation energies up to n = 6 are computed and used to analyse higher order correlation effects and the initial convergence behaviour of the MP n series. Particularly useful is the analysis of correlation contributions E(n)XY ...( n = 4,5,6; X , Y ,... = S, D, T, Q denoting single, double, triple, and quadruple excitations) in the form of correlation energy spectra. Two classes of system are distinguished, namely class A systems possessing well separated electron pairs and class B systems which are characterized by electron clustering in certain regions of atomic and molecular space. For class A systems, electron pair correlation effects as described by D, Q, DD, DQ, QQ, DDD, etc., contributions are most important, which are stepwise included at MP n with n = 2,... ,6. Class A systems are reasonably described by MP n theory, which is reflected by the fact that convergence of the MP n series is monotonic (but relatively slow) for class A systems. The description of class B systems is difficult since three- and four-electron correlation effects and couplings between two-, three-, and four-electron correlation effects missing for lower order perturbation theory are significant. MP n methods, which do not cover these effects, simulate higher order with lower order correlation effects thus exaggerating the latter, which has to be corrected with increasing n. Consequently, the MP n series oscillates for class B systems at low orders. A possible divergence of the MP n series is mostly a consequence of an unbalanced basis set. For example, diffuse functions added to an unsaturated sp basis lead to an exaggeration of higher order correlation effects, which can cause enhanced oscillations and divergence of the MP n series.
Lennard, L; Hale, J P; Lilleyman, J S
1993-01-01
1. 6-Mercaptopurine (6-MP) is used in the continuing chemotherapy of childhood acute lymphoblastic leukaemia. The formation of red blood cell (RBC) 6-thioguanine nucleotide (6-TGN) active metabolites, not the dose of 6-MP, is related to cytotoxicity and prognosis. But there is an apparent sex difference in 6-MP metabolism. Boys require more 6-MP than girls to produce the same range of 6-TGN concentrations. Given the same dose, they experience fewer dose reductions because of cytotoxicity, and have a higher relapse rate. 2. The enzyme hypoxanthine phosphoribosyltransferase (HPRT) catalyses the initial activation step in the metabolism of 6-MP to 6-TGNs, a step that requires endogenous phosphoribosyl pyrophosphate (PRPP) as a cosubstrate. Both HPRT and the enzyme responsible for the formation of PRPP are X-linked. 3. RBC HPRT activity was measured in two populations, 86 control children and 63 children with acute lymphoblastic leukaemia. 6-MP was used as the substrate and the formation of the nucleotide product, 6-thioinosinic acid (TIA) was measured. RBC 6-TGN concentrations were measured in the leukaemic children at a standard dose of 6-MP. 4. There was a 1.3 to 1.7 fold range in HPRT activity when measured under optimal conditions. The leukaemic children had significantly higher HPRT activities than the controls (median difference 4.2 micromol TIA ml(-1) RBCs h(-1), 95% C.I. 3.7 to 4.7, P < 0.0001). In the leukaemic children HPRT activity (range 20.4 to 26.6 micromol TIA ml(-1) RBCs h(-1), median 23.6) was not related to the production of 6-TGNs (range 60 to 1,024 pmol 8 x 10(-8) RBCs, median 323). RBC HPRT was present at a high activity even in those children with low 6-TGN concentrations. 5. When HPRT is measured under optimal conditions it does not appear to be the metabolic step responsible for the observed sex difference in 6-MP metabolism. This may be because RBC HPRT activity is not representative of other tissues but it could equally be because other sex-linked factors are influencing substrate availability. PMID:12959304
Beyond the Face of Race: Emo-Cognitive Explorations of White Neurosis and Racial Cray-Cray
ERIC Educational Resources Information Center
Matias, Cheryl E.; DiAngelo, Robin
2013-01-01
In this article, the authors focus on the emotional and cognitive context that underlies whiteness. They employ interdisciplinary approaches of critical Whiteness studies and critical race theory to entertain how common White responses to racial material stem from the need for Whites to deny race, a traumatizing process that begins in childhood.…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Werner, N.E.; Van Matre, S.W.
1985-05-01
This manual describes the CRI Subroutine Library and Utility Package. The CRI library provides Cray multitasking functionality on the four-processor shared memory VAX 11/780-4. Additional functionality has been added for more flexibility. A discussion of the library, utilities, error messages, and example programs is provided.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Varank, Gamze, E-mail: gvarank@yildiz.edu.tr; Demir, Ahmet, E-mail: ahmetd@yildiz.edu.tr; Yetilmezsoy, Kaan, E-mail: yetilmez@yildiz.edu.tr
2011-11-15
Highlights: > We conduct 1D advection-dispersion modeling to estimate transport parameters. > We examine fourteen phenolic compounds and three inorganic contaminants. > 2-MP, 2,4-DCP, 2,6-DCP, 2,4,5-TCP, 2,3,4,6-TeCP have the highest coefficients. > Dispersion coefficients of Cu are determined to be higher than Zn and Fe. > Transport of phenolics can be prevented by zeolite and bentonite in landfill liners. - Abstract: One-dimensional (1D) advection-dispersion transport modeling was conducted as a conceptual approach for the estimation of the transport parameters of fourteen different phenolic compounds (phenol, 2-CP, 2-MP, 3-MP, 4-MP, 2-NP, 4-NP, 2,4-DNP, 2,4-DCP, 2,6-DCP, 2,4,5-TCP, 2,4,6-TCP, 2,3,4,6-TeCP, PCP) and threemore » different inorganic contaminants (Cu, Zn, Fe) migrating downward through the several liner systems. Four identical pilot-scale landfill reactors (0.25 m{sup 3}) with different composite liners (R1: 0.10 + 0.10 m of compacted clay liner (CCL), L{sub e} = 0.20 m, k{sub e} = 1 x 10{sup -8} m/s, R2: 0.002-m-thick damaged high-density polyethylene (HDPE) geomembrane overlying 0.10 + 0.10 m of CCL, L{sub e} = 0.20 m, k{sub e} = 1 x 10{sup -8} m/s, R3: 0.002-m-thick damaged HDPE geomembrane overlying a 0.02-m-thick bentonite layer encapsulated between 0.10 + 0.10 m CCL, L{sub e} = 0.22 m, k{sub e} = 1 x 10{sup -8} m/s, R4: 0.002-m-thick damaged HDPE geomembrane overlying a 0.02-m-thick zeolite layer encapsulated between 0.10 + 0.10 m CCL, L{sub e} = 0.22 m, k{sub e} = 4.24 x 10{sup -7} m/s) were simultaneously run for a period of about 540 days to investigate the nature of diffusive and advective transport of the selected organic and inorganic contaminants. The results of 1D transport model showed that the highest molecular diffusion coefficients, ranging from 4.77 x 10{sup -10} to 10.67 x 10{sup -10} m{sup 2}/s, were estimated for phenol (R4), 2-MP (R1), 2,4-DNP (R2), 2,4-DCP (R1), 2,6-DCP (R2), 2,4,5-TCP (R2) and 2,3,4,6-TeCP (R1). For all reactors, dispersion coefficients of Cu, ranging from 3.47 x 10{sup -6} m{sup 2}/s to 5.37 x 10{sup -2} m{sup 2}/s, was determined to be higher than others obtained for Zn and Fe. Average molecular diffusion coefficients of phenolic compounds were estimated to be about 5.64 x 10{sup -10} m{sup 2}/s, 5.37 x 10{sup -10} m{sup 2}/s, 2.69 x 10{sup -10} m{sup 2}/s and 3.29 x 10{sup -10} m{sup 2}/s for R1, R2, R3 and R4 systems, respectively. The findings of this study clearly indicated that about 35-50% of transport of phenolic compounds to the groundwater is believed to be prevented with the use of zeolite and bentonite materials in landfill liner systems.« less
NASA Astrophysics Data System (ADS)
Biller, Matthew J.; Mecozzi, Sandro
2012-04-01
The interaction within the methane-methane (CH4/CH4), perfluoromethane-perfluoromethane (CF4/CF4) methane-perfluoromethane dimers (CH4/CF4) was calculated using the Hartree-Fock (HF) method, multiple orders of Møller-Plesset perturbation theory [MP2, MP3, MP4(DQ), MP4(SDQ), MP4(SDTQ)], and coupled cluster theory [CCSD, CCSD(T)], as well as the PW91, B97D, and M06-2X density functional theory (DFT) functionals. The basis sets of Dunning and coworkers (aug-cc-pVxZ, x = D, T, Q), Krishnan and coworkers [6-311++G(d,p), 6-311++G(2d,2p)], and Tsuzuki and coworkers [aug(df, pd)-6-311G(d,p)] were used. Basis set superposition error (BSSE) was corrected via the counterpoise method in all cases. Interaction energies obtained with the MP2 method do not fit with the experimental finding that the methane-perfluoromethane system phase separates at 94.5 K. It was not until the CCSD(T) method was considered that the interaction energy of the methane-perfluoromethane dimer (-0.69 kcal mol-1) was found to be intermediate between the methane (-0.51 kcal mol-1) and perfluoromethane (-0.78 kcal mol-1) dimers. This suggests that a perfluoromethane molecule interacts preferentially with another perfluoromethane (by about 0.09 kcal mol-1) than with a methane molecule. At temperatures much lower than the CH4/CF4 critical solution temperature of 94.5 K, this energy difference becomes significant and leads perfluoromethane molecules to associate with themselves, forming a phase separation. The DFT functionals yielded erratic results for the three dimers. Further development of DFT is needed in order to model dispersion interactions in hydrocarbon/perfluorocarbon systems.
ARC2D - EFFICIENT SOLUTION METHODS FOR THE NAVIER-STOKES EQUATIONS (CRAY VERSION)
NASA Technical Reports Server (NTRS)
Pulliam, T. H.
1994-01-01
ARC2D is a computational fluid dynamics program developed at the NASA Ames Research Center specifically for airfoil computations. The program uses implicit finite-difference techniques to solve two-dimensional Euler equations and thin layer Navier-Stokes equations. It is based on the Beam and Warming implicit approximate factorization algorithm in generalized coordinates. The methods are either time accurate or accelerated non-time accurate steady state schemes. The evolution of the solution through time is physically realistic; good solution accuracy is dependent on mesh spacing and boundary conditions. The mathematical development of ARC2D begins with the strong conservation law form of the two-dimensional Navier-Stokes equations in Cartesian coordinates, which admits shock capturing. The Navier-Stokes equations can be transformed from Cartesian coordinates to generalized curvilinear coordinates in a manner that permits one computational code to serve a wide variety of physical geometries and grid systems. ARC2D includes an algebraic mixing length model to approximate the effect of turbulence. In cases of high Reynolds number viscous flows, thin layer approximation can be applied. ARC2D allows for a variety of solutions to stability boundaries, such as those encountered in flows with shocks. The user has considerable flexibility in assigning geometry and developing grid patterns, as well as in assigning boundary conditions. However, the ARC2D model is most appropriate for attached and mildly separated boundary layers; no attempt is made to model wake regions and widely separated flows. The techniques have been successfully used for a variety of inviscid and viscous flowfield calculations. The Cray version of ARC2D is written in FORTRAN 77 for use on Cray series computers and requires approximately 5Mb memory. The program is fully vectorized. The tape includes variations for the COS and UNICOS operating systems. Also included is a sample routine for CONVEX computers to emulate Cray system time calls, which should be easy to modify for other machines as well. The standard distribution media for this version is a 9-track 1600 BPI ASCII Card Image format magnetic tape. The Cray version was developed in 1987. The IBM ES/3090 version is an IBM port of the Cray version. It is written in IBM VS FORTRAN and has the capability of executing in both vector and parallel modes on the MVS/XA operating system and in vector mode on the VM/XA operating system. Various options of the IBM VS FORTRAN compiler provide new features for the ES/3090 version, including 64-bit arithmetic and up to 2 GB of virtual addressability. The IBM ES/3090 version is available only as a 9-track, 1600 BPI IBM IEBCOPY format magnetic tape. The IBM ES/3090 version was developed in 1989. The DEC RISC ULTRIX version is a DEC port of the Cray version. It is written in FORTRAN 77 for RISC-based Digital Equipment platforms. The memory requirement is approximately 7Mb of main memory. It is available in UNIX tar format on TK50 tape cartridge. The port to DEC RISC ULTRIX was done in 1990. COS and UNICOS are trademarks and Cray is a registered trademark of Cray Research, Inc. IBM, ES/3090, VS FORTRAN, MVS/XA, and VM/XA are registered trademarks of International Business Machines. DEC and ULTRIX are registered trademarks of Digital Equipment Corporation.
Hegde, Gautham; Hegde, Nanditha; Kumar, Anil; Keshavaraj
2014-01-01
Objective: Orthodontic diagnosis and treatment planning for growing children must involve growth prediction, especially in the treatment of skeletal problems. Studies have shown that a strong association exists between skeletal maturity and dental calcification stages. The present study was therefore taken up to provide a simple and practical method for assessing skeletal maturity using a dental periapical film and standard dental X-ray machine, to compare the developmental stages of the mandibular canine with that of developmental stages of modified MP3 and to find out if any correlation exists, to determine if the developmental stages of the mandibular canine alone can be used as a reliable indicator for assessment of skeletal maturity. Materials and Methods: A total of 160 periapical radiographs, of the mandibular right canine and the MP3 region was taken and assessed according to the Dermirjian's stages of dental calcification and the modified MP3 stages. Results and Discussion: The correlation coefficient between MP3 stages and developmental stages of mandibular canine was found to be significant in both male and female groups. When the canine calcification stages were compared with the MP3 stages it was found that with the exception of the D stage of canine calcification the remaining stages showed a very high correlation with the modified MP3 stages. Conclusion: The correlation between the mandibular canine calcification stages, and the MP3 stages was found to be significant. The canine calcification could be used as a sole indicator for assessment of skeletal maturity. PMID:25210386
Blackman, LM; Boevink, P; Cruz, SS; Palukaitis, P; Oparka, KJ
1998-01-01
The location of the 3a movement protein (MP) of cucumber mosaic virus (CMV) was studied by quantitative immunogold labeling of the wild-type 3a MP in leaves of Nicotiana clevelandii infected by CMV as well as by using a 3a-green fluorescent protein (GFP) fusion expressed from a potato virus X (PVX) vector. Whether expressed from CMV or PVX, the 3a MP targeted plasmodesmata and accumulated in the central cavity of the pore. Within minor veins, the most extensively labeled plasmodesmata were those connecting sieve elements and companion cells. In addition to targeting plasmodesmata, the 3a MP accumulated in the parietal layer of mature sieve elements. Confocal imaging of cells expressing the 3a-GFP fusion protein showed that the 3a MP assembled into elaborate fibrillar formations in the sieve element parietal layer. The ability of 3a-GFP, expressed from PVX rather than CMV, to enter sieve elements demonstrates that neither the CMV RNA nor the CMV coat protein is required for trafficking of the 3a MP into sieve elements. CMV virions were not detected in plasmodesmata from CMV-infected tissue, although large CMV aggregates were often found in the parietal layer of sieve elements and were usually surrounded by 3a MP. These data suggest that CMV traffics into minor vein sieve elements as a ribonucleoprotein complex that contains the viral RNA, coat protein, and 3a MP, with subsequent viral assembly occurring in the sieve element parietal layer. PMID:9548980
Comprehensive efficiency analysis of supercomputer resource usage based on system monitoring data
NASA Astrophysics Data System (ADS)
Mamaeva, A. A.; Shaykhislamov, D. I.; Voevodin, Vad V.; Zhumatiy, S. A.
2018-03-01
One of the main problems of modern supercomputers is the low efficiency of their usage, which leads to the significant idle time of computational resources, and, in turn, to the decrease in speed of scientific research. This paper presents three approaches to study the efficiency of supercomputer resource usage based on monitoring data analysis. The first approach performs an analysis of computing resource utilization statistics, which allows to identify different typical classes of programs, to explore the structure of the supercomputer job flow and to track overall trends in the supercomputer behavior. The second approach is aimed specifically at analyzing off-the-shelf software packages and libraries installed on the supercomputer, since efficiency of their usage is becoming an increasingly important factor for the efficient functioning of the entire supercomputer. Within the third approach, abnormal jobs – jobs with abnormally inefficient behavior that differs significantly from the standard behavior of the overall supercomputer job flow – are being detected. For each approach, the results obtained in practice in the Supercomputer Center of Moscow State University are demonstrated.
2011-09-01
Electromagnetic interference (EMI) may cause some Philips Healthcare IntelliVue MMS, MP2, MP5, and X2 patient monitoring products to incorrectly display a flat electrocardiogram (ECG) waveform and generate a false asystole alarm. This occurs while the devices' pace pulse rejection feature is enabled. Facilities that suspect such behavior in their inventories should contact Philips to discuss whether installation of firmware version D.02.05 will help address the problem.
NASA Astrophysics Data System (ADS)
Filatov, Michael; Cremer, Dieter
2005-01-01
A simple modification of the zeroth-order regular approximation (ZORA) in relativistic theory is suggested to suppress its erroneous gauge dependence to a high level of approximation. The method, coined gauge-independent ZORA (ZORA-GI), can be easily installed in any existing nonrelativistic quantum chemical package by programming simple one-electron matrix elements for the quasirelativistic Hamiltonian. Results of benchmark calculations obtained with ZORA-GI at the Hartree-Fock (HF) and second-order Møller-Plesset perturbation theory (MP2) level for dihalogens X2 (X=F,Cl,Br,I,At) are in good agreement with the results of four-component relativistic calculations (HF level) and experimental data (MP2 level). ZORA-GI calculations based on MP2 or coupled-cluster theory with single and double perturbations and a perturbative inclusion of triple excitations [CCSD(T)] lead to accurate atomization energies and molecular geometries for the tetroxides of group VIII elements. With ZORA-GI/CCSD(T), an improved estimate for the atomization energy of hassium (Z=108) tetroxide is obtained.
Supercomputer applications in molecular modeling.
Gund, T M
1988-01-01
An overview of the functions performed by molecular modeling is given. Molecular modeling techniques benefiting from supercomputing are described, namely, conformation, search, deriving bioactive conformations, pharmacophoric pattern searching, receptor mapping, and electrostatic properties. The use of supercomputers for problems that are computationally intensive, such as protein structure prediction, protein dynamics and reactivity, protein conformations, and energetics of binding is also examined. The current status of supercomputing and supercomputer resources are discussed.
a Physical Parameterization of Snow Albedo for Use in Climate Models.
NASA Astrophysics Data System (ADS)
Marshall, Susan Elaine
The albedo of a natural snowcover is highly variable ranging from 90 percent for clean, new snow to 30 percent for old, dirty snow. This range in albedo represents a difference in surface energy absorption of 10 to 70 percent of incident solar radiation. Most general circulation models (GCMs) fail to calculate the surface snow albedo accurately, yet the results of these models are sensitive to the assumed value of the snow albedo. This study replaces the current simple empirical parameterizations of snow albedo with a physically-based parameterization which is accurate (within +/- 3% of theoretical estimates) yet efficient to compute. The parameterization is designed as a FORTRAN subroutine (called SNOALB) which can be easily implemented into model code. The subroutine requires less then 0.02 seconds of computer time (CRAY X-MP) per call and adds only one new parameter to the model calculations, the snow grain size. The snow grain size can be calculated according to one of the two methods offered in this thesis. All other input variables to the subroutine are available from a climate model. The subroutine calculates a visible, near-infrared and solar (0.2-5 μm) snow albedo and offers a choice of two wavelengths (0.7 and 0.9 mu m) at which the solar spectrum is separated into the visible and near-infrared components. The parameterization is incorporated into the National Center for Atmospheric Research (NCAR) Community Climate Model, version 1 (CCM1), and the results of a five -year, seasonal cycle, fixed hydrology experiment are compared to the current model snow albedo parameterization. The results show the SNOALB albedos to be comparable to the old CCM1 snow albedos for current climate conditions, with generally higher visible and lower near-infrared snow albedos using the new subroutine. However, this parameterization offers a greater predictability for climate change experiments outside the range of current snow conditions because it is physically-based and not tuned to current empirical results.
An efficient parallel algorithm for matrix-vector multiplication
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hendrickson, B.; Leland, R.; Plimpton, S.
The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
CORRELATION OF CHANDRA PHOTONS WITH THE RADIO GIANT PULSES FROM THE CRAB PULSAR
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bilous, A. V.; McLaughlin, M. A.; Kondratiev, V. I.
2012-04-10
No apparent correlation was found between giant pulses (GPs) and X-ray photons from the Crab pulsar during 5.4 hr of simultaneous observations with the Green Bank Telescope at 1.5 GHz and Chandra X-Ray Observatory primarily in the energy range of 1.5-4.5 keV. During the Crab pulsar periods with GPs, the X-ray flux in radio emission phase windows does not change more than by {+-}10% for main pulse (MP) GPs and {+-}30% for interpulse (IP) GPs. During GPs themselves, the X-ray flux does not change by more than two times for MP GPs and five times for IP GPs. All limitsmore » quoted are compatible with 2{sigma} fluctuations of the X-ray flux around the sets of false GPs with random arrival times. The results speak in favor of changes in plasma coherence as the origin of GPs. However, the results do not rule out variations in the rate of particle creation if the particles that emit coherent radio emission are mostly at the lowest Landau level.« less
FORTRAN multitasking library for use on the ELXSI 6400 and the CRAY XMP
DOE Office of Scientific and Technical Information (OSTI.GOV)
Montry, G.R.
1985-07-16
A library of FORTRAN-based multitasking routines has been written for the ELXSI 6400 and the CRAY XMP. This library is designed to make multitasking codes easily transportable between machines with different hardware configurations. The library provides enhanced error checking and diagnostics over vendor-supplied multitasking intrinsics. The library also contains multitasking control structures not normally supplied by the vendor.
Radhika, R; Shankar, R; Vijayakumar, S; Kolandaivel, P
2018-05-01
The theoretical studies on DNA with the anticancer drug 6-Mercaptopurine (6-MP) are investigated using theoretical methods to shed light on drug designing. Among the DNA base pairs considered, 6-MP is stacked with GC with the highest interaction energy of -46.19 kcal/mol. Structural parameters revealed that structure of the DNA base pairs is deviated from the planarity of the equilibrium position due to the formation of hydrogen bonds and stacking interactions with 6-MP. These deviations are verified through the systematic comparison between X-H bond contraction and elongation and the associated blue shift and red shift values by both NBO analysis and vibrational analysis. Bent's rule is verified for the C-H bond contraction in the 6-MP interacted base pairs. The AIM results disclose that the higher values of electron density (ρ) and Laplacian of electron density (∇ 2 ρ) indicate the increased overlap between the orbitals that represent the strong interaction and positive values of the total electron density show the closed-shell interaction. The relative sensitivity of the chemical shift values for the DNA base pairs with 6-MP is investigated to confirm the hydrogen bond strength. Molecular dynamics simulation studies of G-quadruplex DNA d(TGGGGT) 4 with 6-MP revealed that the incorporation of 6-MP appears to cause local distortions and destabilize the G-quadruplex DNA.
Nayak, Reshma; Nayak, Us Krishna; Hegde, Gautam
2010-01-01
Orthodontic diagnosis and treatment planning for growing children must involve growth prediction, especially in the treatment of skeletal problems. Studies have shown that a strong association exists between skeletal maturity and dental calcification stages. The present study was therefore taken up to provide a simple and practical method for assessing skeletal maturity using a dental periapical film and standard dental X-ray machine, to compare the developmental stages of the mandibular canine with that of developmental stages of modified MP3 and to find out if any correlation exists, to determine if the developmental stages of the mandibular canine alone can be used as a reliable indicator for assessment of skeletal maturity. A total of 160 periapical radiographs (80 males and 80 females), of the mandibular right canine and the MP3 region was taken and assessed according to the Dermirjian's stages of dental calcification and the modified MP3 stages. The correlation between the developmental stages of MP3 and the mandibular right canine in male and female groups, is of high statistical significance (p = 0.001). The correlation coefficient between MP3 stages and developmental stages of mandibular canine and chronological age in male and females was found to be not significant. The correlation between the mandibular canine calcification stages and MP3 stages was found to be significant. The developmental stages of the mandibular canine could be used very reliably as a sole indicator for assessment of skeletal maturity.
The role of graphics super-workstations in a supercomputing environment
NASA Technical Reports Server (NTRS)
Levin, E.
1989-01-01
A new class of very powerful workstations has recently become available which integrate near supercomputer computational performance with very powerful and high quality graphics capability. These graphics super-workstations are expected to play an increasingly important role in providing an enhanced environment for supercomputer users. Their potential uses include: off-loading the supercomputer (by serving as stand-alone processors, by post-processing of the output of supercomputer calculations, and by distributed or shared processing), scientific visualization (understanding of results, communication of results), and by real time interaction with the supercomputer (to steer an iterative computation, to abort a bad run, or to explore and develop new algorithms).
48 CFR 252.225-7011 - Restriction on acquisition of supercomputers.
Code of Federal Regulations, 2010 CFR
2010-10-01
... of supercomputers. 252.225-7011 Section 252.225-7011 Federal Acquisition Regulations System DEFENSE... CLAUSES Text of Provisions And Clauses 252.225-7011 Restriction on acquisition of supercomputers. As prescribed in 225.7012-3, use the following clause: Restriction on Acquisition of Supercomputers (JUN 2005...
48 CFR 252.225-7011 - Restriction on acquisition of supercomputers.
Code of Federal Regulations, 2014 CFR
2014-10-01
... of supercomputers. 252.225-7011 Section 252.225-7011 Federal Acquisition Regulations System DEFENSE... CLAUSES Text of Provisions And Clauses 252.225-7011 Restriction on acquisition of supercomputers. As prescribed in 225.7012-3, use the following clause: Restriction on Acquisition of Supercomputers (JUN 2005...
48 CFR 252.225-7011 - Restriction on acquisition of supercomputers.
Code of Federal Regulations, 2012 CFR
2012-10-01
... of supercomputers. 252.225-7011 Section 252.225-7011 Federal Acquisition Regulations System DEFENSE... CLAUSES Text of Provisions And Clauses 252.225-7011 Restriction on acquisition of supercomputers. As prescribed in 225.7012-3, use the following clause: Restriction on Acquisition of Supercomputers (JUN 2005...
48 CFR 252.225-7011 - Restriction on acquisition of supercomputers.
Code of Federal Regulations, 2013 CFR
2013-10-01
... of supercomputers. 252.225-7011 Section 252.225-7011 Federal Acquisition Regulations System DEFENSE... CLAUSES Text of Provisions And Clauses 252.225-7011 Restriction on acquisition of supercomputers. As prescribed in 225.7012-3, use the following clause: Restriction on Acquisition of Supercomputers (JUN 2005...
48 CFR 252.225-7011 - Restriction on acquisition of supercomputers.
Code of Federal Regulations, 2011 CFR
2011-10-01
... of supercomputers. 252.225-7011 Section 252.225-7011 Federal Acquisition Regulations System DEFENSE... CLAUSES Text of Provisions And Clauses 252.225-7011 Restriction on acquisition of supercomputers. As prescribed in 225.7012-3, use the following clause: Restriction on Acquisition of Supercomputers (JUN 2005...
Investigating the impact of the cielo cray XE6 architecture on scientific application codes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rajan, Mahesh; Barrett, Richard; Pedretti, Kevin Thomas Tauke
2010-12-01
Cielo, a Cray XE6, is the Department of Energy NNSA Advanced Simulation and Computing (ASC) campaign's newest capability machine. Rated at 1.37 PFLOPS, it consists of 8,944 dual-socket oct-core AMD Magny-Cours compute nodes, linked using Cray's Gemini interconnect. Its primary mission objective is to enable a suite of the ASC applications implemented using MPI to scale to tens of thousands of cores. Cielo is an evolutionary improvement to a successful architecture previously available to many of our codes, thus enabling a basis for understanding the capabilities of this new architecture. Using three codes strategically important to the ASC campaign, andmore » supplemented with some micro-benchmarks that expose the fundamental capabilities of the XE6, we report on the performance characteristics and capabilities of Cielo.« less
Experiences with Cray multi-tasking
NASA Technical Reports Server (NTRS)
Miya, E. N.
1985-01-01
The issues involved in modifying an existing code for multitasking is explored. They include Cray extensions to FORTRAN, an examination of the application code under study, designing workable modifications, specific code modifications to the VAX and Cray versions, performance, and efficiency results. The finished product is a faster, fully synchronous, parallel version of the original program. A production program is partitioned by hand to run on two CPUs. Loop splitting multitasks three key subroutines. Simply dividing subroutine data and control structure down the middle of a subroutine is not safe. Simple division produces results that are inconsistent with uniprocessor runs. The safest way to partition the code is to transfer one block of loops at a time and check the results of each on a test case. Other issues include debugging and performance. Task startup and maintenance (e.g., synchronization) are potentially expensive.
Enviro-HIRLAM Applicability for Black Carbon Studies in Arctic
NASA Astrophysics Data System (ADS)
Nuterman, Roman; Mahura, Alexander; Baklanov, Alexander; Kurganskiy, Alexander; Amstrup, Bjarne; Kaas, Eigil
2015-04-01
One of the main aims of the Nordic CarboNord project ("Impact of black carbon on air quality and climate in Northern Europe and Arctic") is focused on providing new information on distribution and effects of black carbon in Northern Europe and Arctic. It can be done through assessing robustness of model predictions of long-range black carbon distribution and its relation to climate change and forcing. In our study, the online integrated meteorology-chemistry/aerosols model - Enviro-HIRLAM (Environment - HIgh Resolution Limited Area Model) - is used. This study, at first, is focused on adaptation (model setup, domain for the Northern Hemisphere and Arctic region, emissions, boundary conditions, refining aerosols microphysics and chemistry, cloud-aerosol interaction processes) of Enviro-HIRLAM model and selection of most unfavorable weather and air pollution episodes for the Arctic region. Simulations of interactions between black carbon and meteorological processes in northern conditions for selected episodes will be performed (at DMI's supercomputer HPC CRAY-XT5), and then long-term simulations at regional scale for selected winter vs. summer months. Modelling results will be compared on a diurnal cycle and monthly basis against observations for key meteorological parameters (such as air temperature, wind speed, relative humidity, and precipitation) as well as aerosol concentration. Finally, evaluation of black carbon atmospheric transport, dispersion, and deposition patterns at different spatio-temporal scales; physical-chemical processes and transformations of black carbon containing aerosols; and interactions and effects between black carbon and meteorological processes in Arctic weather conditions will be done.
Modeling and new equipment definition for the vibration isolation box equipment system
NASA Technical Reports Server (NTRS)
Sani, Robert L.
1993-01-01
Our MSAD-funded research project is to provide numerical modeling support for the VIBES (Vibration Isolation Box Experiment System) which is an IML2 flight experiment being built by the Japanese research team of Dr. H. Azuma of the Japanese National Aerospace Laboratory. During this reporting period, the following have been accomplished: A semi-consistent mass finite element projection algorithm for 2D and 3D Boussinesq flows has been implemented on Sun, HP And Cray Platforms. The algorithm has better phase speed accuracy than similar finite difference or lumped mass finite element algorithms, an attribute which is essential for addressing realistic g-jitter effects as well as convectively-dominated transient systems. The projection algorithm has been benchmarked against solutions generated via the commercial code FIDAP. The algorithm appears to be accurate as well as computationally efficient. Optimization and potential parallelization studies are underway. Our implementation to date has focused on execution of the basic algorithm with at most a concern for vectorization. The initial time-varying gravity Boussinesq flow simulation is being set up. The mesh is being designed and the input file is being generated. Some preliminary 'small mesh' cases will be attempted on our HP9000/735 while our request to MSAD for supercomputing resources is being addressed. The Japanese research team for VIBES was visited, the current set up and status of the physical experiment was obtained and ongoing E-Mail communication link was established.
Implementation of Parallel Computing Technology to Vortex Flow
NASA Technical Reports Server (NTRS)
Dacles-Mariani, Jennifer
1999-01-01
Mainframe supercomputers such as the Cray C90 was invaluable in obtaining large scale computations using several millions of grid points to resolve salient features of a tip vortex flow over a lifting wing. However, real flight configurations require tracking not only of the flow over several lifting wings but its growth and decay in the near- and intermediate- wake regions, not to mention the interaction of these vortices with each other. Resolving and tracking the evolution and interaction of these vortices shed from complex bodies is computationally intensive. Parallel computing technology is an attractive option in solving these flows. In planetary science vortical flows are also important in studying how planets and protoplanets form when cosmic dust and gases become gravitationally unstable and eventually form planets or protoplanets. The current paradigm for the formation of planetary systems maintains that the planets accreted from the nebula of gas and dust left over from the formation of the Sun. Traditional theory also indicate that such a preplanetary nebula took the form of flattened disk. The coagulation of dust led to the settling of aggregates toward the midplane of the disk, where they grew further into asteroid-like planetesimals. Some of the issues still remaining in this process are the onset of gravitational instability, the role of turbulence in the damping of particles and radial effects. In this study the focus will be with the role of turbulence and the radial effects.
Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube
NASA Technical Reports Server (NTRS)
Joslin, Ronald D.; Zubair, Mohammad
1993-01-01
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.
Portability and Cross-Platform Performance of an MPI-Based Parallel Polygon Renderer
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1999-01-01
Visualizing the results of computations performed on large-scale parallel computers is a challenging problem, due to the size of the datasets involved. One approach is to perform the visualization and graphics operations in place, exploiting the available parallelism to obtain the necessary rendering performance. Over the past several years, we have been developing algorithms and software to support visualization applications on NASA's parallel supercomputers. Our results have been incorporated into a parallel polygon rendering system called PGL. PGL was initially developed on tightly-coupled distributed-memory message-passing systems, including Intel's iPSC/860 and Paragon, and IBM's SP2. Over the past year, we have ported it to a variety of additional platforms, including the HP Exemplar, SGI Origin2OOO, Cray T3E, and clusters of Sun workstations. In implementing PGL, we have had two primary goals: cross-platform portability and high performance. Portability is important because (1) our manpower resources are limited, making it difficult to develop and maintain multiple versions of the code, and (2) NASA's complement of parallel computing platforms is diverse and subject to frequent change. Performance is important in delivering adequate rendering rates for complex scenes and ensuring that parallel computing resources are used effectively. Unfortunately, these two goals are often at odds. In this paper we report on our experiences with portability and performance of the PGL polygon renderer across a range of parallel computing platforms.
Data-intensive computing on numerically-insensitive supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahrens, James P; Fasel, Patricia K; Habib, Salman
2010-12-03
With the advent of the era of petascale supercomputing, via the delivery of the Roadrunner supercomputing platform at Los Alamos National Laboratory, there is a pressing need to address the problem of visualizing massive petascale-sized results. In this presentation, I discuss progress on a number of approaches including in-situ analysis, multi-resolution out-of-core streaming and interactive rendering on the supercomputing platform. These approaches are placed in context by the emerging area of data-intensive supercomputing.
Software for the Parallel Solution of Systems of Ordinary Differential Equations
1991-02-01
x real g (ndim) , x (O:nmax*maxnp) , yin (1) real vout (flout) , left , right equivalence # (n,vin(l)),(ndimc,vin(2)),(ninc, vin ( 3 )) # ’ (noutc,vin(4...ninc, vin ( 3 )) #, (noutc,vin(4)) , (m,vin(5)), (mp,vin(6)) #, (h,vin(7)), (left,vin(8)), (right,vin(9)) #, (g(1) ,vin(10)) #,(x(O),vin(10+ndim
Computer Electromagnetics and Supercomputer Architecture
NASA Technical Reports Server (NTRS)
Cwik, Tom
1993-01-01
The dramatic increase in performance over the last decade for microporcessor computations is compared with that for the supercomputer computations. This performance, the projected performance, and a number of other issues such as cost and the inherent pysical limitations in curent supercomputer technology have naturally led to parallel supercomputers and ensemble of interconnected microprocessors.
A Programmable Calculator Activity, x = 1/x + 1.
ERIC Educational Resources Information Center
Snover, Stephen L.; Spikell, Mark A.
An activity for secondary schools is presented and discussed which may be explored with a programmable calculator. The activity is non-standard and could not be easily explored without the use of a programmable calculator. Related activities are also discussed. Flow charts and programs for different programmable calculators are presented. (MP)
CDC to CRAY FORTRAN conversion manual
NASA Technical Reports Server (NTRS)
Mcgary, C.; Diebert, D.
1983-01-01
Documentation describing software differences between two general purpose computers for scientific applications is presented. Descriptions of the use of the FORTRAN and FORTRAN 77 high level programming language on a CDC 7600 under SCOPE and a CRAY XMP under COS are offered. Itemized differences of the FORTRAN language sets of the two machines are also included. The material is accompanied by numerous examples of preferred programming techniques for the two machines.
Force user's manual: A portable, parallel FORTRAN
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.
1990-01-01
The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.
NASA Astrophysics Data System (ADS)
Molcard, A. J.; Pinardi, N.; Ansaloni, R.
A new numerical model, SEOM (Spectral Element Ocean Model, (Iskandarani et al, 1994)), has been implemented in the Mediterranean Sea. Spectral element methods combine the geometric flexibility of finite element techniques with the rapid convergence rate of spectral schemes. The current version solves the shallow water equations with a fifth (or sixth) order accuracy spectral scheme and about 50.000 nodes. The domain decomposition philosophy makes it possible to exploit the power of parallel machines. The original MIMD master/slave version of SEOM, written in F90 and PVM, has been ported to the Cray T3D. When critical for performance, Cray specific high-performance one-sided communication routines (SHMEM) have been adopted to fully exploit the Cray T3D interprocessor network. Tests performed with highly unstructured and irregular grid, on up to 128 processors, show an almost linear scalability even with unoptimized domain decomposition techniques. Results from various case studies on the Mediterranean Sea are shown, involving realistic coastline geometry, and monthly mean 1000mb winds from the ECMWF's atmospheric model operational analysis from the period January 1987 to December 1994. The simulation results show that variability in the wind forcing considerably affect the circulation dynamics of the Mediterranean Sea.
de Oliveira, Guilherme A P; Pereira, Elen G; Dias, Cristiano V; Souza, Theo L F; Ferretti, Giulia D S; Cordeiro, Yraima; Camillo, Luciana R; Cascardo, Júlio; Almeida, Fabio C; Valente, Ana Paula; Silva, Jerson L
2012-01-01
Understanding how Nep-like proteins (NLPs) behave during the cell cycle and disease progression of plant pathogenic oomycetes, fungi and bacteria is crucial in light of compelling evidence that these proteins play a role in Witches` Broom Disease (WBD) of Theobroma cacao, one of the most important phytopathological problems to afflict the Southern Hemisphere. The crystal structure of MpNep2, a member of the NLP family and the causal agent of WBD, revealed the key elements for its activity. This protein has the ability to refold after heating and was believed to act as a monomer in solution, in contrast to the related homologs MpNep1 and NPP from the oomyceteous fungus Phytophthora parasitica. Here, we identify and characterize a metastable MpNep2 dimer upon over-expression in Escherichia coli using different biochemical and structural approaches. We found using ultra-fast liquid chromatography that the MpNep2 dimer can be dissociated by heating but not by dilution, oxidation or high ionic strength. Small-angle X-ray scattering revealed a possible tail-to-tail interaction between monomers, and nuclear magnetic resonance measurements identified perturbed residues involved in the putative interface of interaction. We also explored the ability of the MpNep2 monomer to refold after heating or chemical denaturation. We observed that MpNep2 has a low stability and cooperative fold that could be an explanation for its structure and activity recovery after stress. These results can provide new insights into the mechanism for MpNep2's action in dicot plants during the progression of WBD and may open new avenues for the involvement of NLP- oligomeric species in phytopathological disorders.
de Oliveira, Guilherme A. P.; Pereira, Elen G.; Dias, Cristiano V.; Souza, Theo L. F.; Ferretti, Giulia D. S.; Cordeiro, Yraima; Camillo, Luciana R.; Almeida, Fabio C.; Valente, Ana Paula; Silva, Jerson L.
2012-01-01
Understanding how Nep-like proteins (NLPs) behave during the cell cycle and disease progression of plant pathogenic oomycetes, fungi and bacteria is crucial in light of compelling evidence that these proteins play a role in Witches` Broom Disease (WBD) of Theobroma cacao, one of the most important phytopathological problems to afflict the Southern Hemisphere. The crystal structure of MpNep2, a member of the NLP family and the causal agent of WBD, revealed the key elements for its activity. This protein has the ability to refold after heating and was believed to act as a monomer in solution, in contrast to the related homologs MpNep1 and NPP from the oomyceteous fungus Phytophthora parasitica. Here, we identify and characterize a metastable MpNep2 dimer upon over-expression in Escherichia coli using different biochemical and structural approaches. We found using ultra-fast liquid chromatography that the MpNep2 dimer can be dissociated by heating but not by dilution, oxidation or high ionic strength. Small-angle X-ray scattering revealed a possible tail-to-tail interaction between monomers, and nuclear magnetic resonance measurements identified perturbed residues involved in the putative interface of interaction. We also explored the ability of the MpNep2 monomer to refold after heating or chemical denaturation. We observed that MpNep2 has a low stability and cooperative fold that could be an explanation for its structure and activity recovery after stress. These results can provide new insights into the mechanism for MpNep2′s action in dicot plants during the progression of WBD and may open new avenues for the involvement of NLP- oligomeric species in phytopathological disorders. PMID:23029140
Zgheib, Nathalie K; Akika, Reem; Mahfouz, Rami; Aridi, Carol Al; Ghanem, Khaled M; Saab, Raya; Abboud, Miguel R; Tarek, Nidale; El Solh, Hassan; Muwakkit, Samar A
2017-01-01
Interindividual variability in thiopurine-related toxicity could not be completely explained by thiopurine S-methyltransferase (TPMT) polymorphisms, as a number of patients who are homozygous wild type or normal for TPMT still develop toxicity that necessitates 6-mercaptopurine (MP) dose reduction or protocol interruption. Recently, few studies reported on an inherited nucleoside diphosphate-linked moiety X motif 15 (NUDT15) c.415C>T low-function variant that is associated with decreased thiopurine metabolism and leukopenia in childhood acute lymphoblastic leukemia (ALL) and other diseases. The aim of this study is to measure the frequency of TPMT and NUDT15 polymorphisms and assess whether they are predictors of MP intolerance in children treated for ALL. One hundred thirty-seven patients with ALL of whom 121 were Lebanese were evaluated. MP dose intensity was calculated as the ratio of the tolerated MP dose to planned dose during continuation phase to maintain an absolute neutrophil count (ANC) dose above 300 per μl. One patient was NUDT15 heterozygous TC and tolerated only 33.33% of the planned MP dose, which was statistically significantly different from the median-tolerated MP dose intensity of the rest of the cohort (76.00%). Three patients had the TPMT*3A haplotype and tolerated 40.00-66.66% of the planned MP dose, which was also statistically significantly different from the rest of the cohort. This is the first report on the association of TPMT and NUDT15 polymorphisms with MP dose intolerance in Arab patients with ALL. Genotyping for additional polymorphisms may be warranted for potential gene/allele-dose effect. © 2016 Wiley Periodicals, Inc.
NASA Technical Reports Server (NTRS)
Rutishauser, David
2006-01-01
The motivation for this work comes from an observation that amidst the push for Massively Parallel (MP) solutions to high-end computing problems such as numerical physical simulations, large amounts of legacy code exist that are highly optimized for vector supercomputers. Because re-hosting legacy code often requires a complete re-write of the original code, which can be a very long and expensive effort, this work examines the potential to exploit reconfigurable computing machines in place of a vector supercomputer to implement an essentially unmodified legacy source code. Custom and reconfigurable computing resources could be used to emulate an original application's target platform to the extent required to achieve high performance. To arrive at an architecture that delivers the desired performance subject to limited resources involves solving a multi-variable optimization problem with constraints. Prior research in the area of reconfigurable computing has demonstrated that designing an optimum hardware implementation of a given application under hardware resource constraints is an NP-complete problem. The premise of the approach is that the general issue of applying reconfigurable computing resources to the implementation of an application, maximizing the performance of the computation subject to physical resource constraints, can be made a tractable problem by assuming a computational paradigm, such as vector processing. This research contributes a formulation of the problem and a methodology to design a reconfigurable vector processing implementation of a given application that satisfies a performance metric. A generic, parametric, architectural framework for vector processing implemented in reconfigurable logic is developed as a target for a scheduling/mapping algorithm that maps an input computation to a given instance of the architecture. This algorithm is integrated with an optimization framework to arrive at a specification of the architecture parameters that attempts to minimize execution time, while staying within resource constraints. The flexibility of using a custom reconfigurable implementation is exploited in a unique manner to leverage the lessons learned in vector supercomputer development. The vector processing framework is tailored to the application, with variable parameters that are fixed in traditional vector processing. Benchmark data that demonstrates the functionality and utility of the approach is presented. The benchmark data includes an identified bottleneck in a real case study example vector code, the NASA Langley Terminal Area Simulation System (TASS) application.
Numerical investigation for the impact of CO2 geologic sequestration on regional groundwater flow
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yamamoto, H.; Zhang, K.; Karasaki, K.
Large-scale storage of carbon dioxide in saline aquifers may cause considerable pressure perturbation and brine migration in deep rock formations, which may have a significant influence on the regional groundwater system. With the help of parallel computing techniques, we conducted a comprehensive, large-scale numerical simulation of CO{sub 2} geologic storage that predicts not only CO{sub 2} migration, but also its impact on regional groundwater flow. As a case study, a hypothetical industrial-scale CO{sub 2} injection in Tokyo Bay, which is surrounded by the most heavily industrialized area in Japan, was considered, and the impact of CO{sub 2} injection on near-surfacemore » aquifers was investigated, assuming relatively high seal-layer permeability (higher than 10 microdarcy). A regional hydrogeological model with an area of about 60 km x 70 km around Tokyo Bay was discretized into about 10 million gridblocks. To solve the high-resolution model efficiently, we used a parallelized multiphase flow simulator TOUGH2-MP/ECO2N on a world-class high performance supercomputer in Japan, the Earth Simulator. In this simulation, CO{sub 2} was injected into a storage aquifer at about 1 km depth under Tokyo Bay from 10 wells, at a total rate of 10 million tons/year for 100 years. Through the model, we can examine regional groundwater pressure buildup and groundwater migration to the land surface. The results suggest that even if containment of CO{sub 2} plume is ensured, pressure buildup on the order of a few bars can occur in the shallow confined aquifers over extensive regions, including urban inlands.« less
NASA Astrophysics Data System (ADS)
Romero, Angel H.
2017-10-01
The influence of ring puckering angle on the multipole moments of sixteen four-membered heterocycles (1-16) was theoretically estimated using MP2 and different DFTs in combination with the 6-31+G(d,p) basis set. To obtain an accurate evaluation, CCSD/cc-pVDZ level and, the MP2 and PBE1PBE methods in combination with the aug-cc-pVDZ and aug-cc-pVTZ basis sets were performed on the planar geometries of 1-16. In general, the DFT and MP2 approaches provided an identical dependence of the electrical properties with the puckering angle for 1-16. Quantitatively, the quality of the level of theory and basis sets affects significant the predictions of the multipole moments, in particular for the heterocycles containing C=O and C=S bonds. Convergence basis sets within the MP2 and PBE1PBE approximations are reached in the dipole moment calculations when the aug-cc-pVTZ basis set is used, while the quadrupole and octupole moment computations require a larger basis set than aug-cc-pVTZ. On the other hand, the multipole moments showed a strong dependence with the molecular geometry and the nature of the carbon-heteroatom bonds. Specifically, the C-X bond determines the behavior of the μ(ϕ), θ(ϕ) and Ώ(ϕ) functions, while the C=Y bond plays an important role in the magnitude of the studied properties.
Program optimizations: The interplay between power, performance, and energy
Leon, Edgar A.; Karlin, Ian; Grant, Ryan E.; ...
2016-05-16
Practical considerations for future supercomputer designs will impose limits on both instantaneous power consumption and total energy consumption. Working within these constraints while providing the maximum possible performance, application developers will need to optimize their code for speed alongside power and energy concerns. This paper analyzes the effectiveness of several code optimizations including loop fusion, data structure transformations, and global allocations. A per component measurement and analysis of different architectures is performed, enabling the examination of code optimizations on different compute subsystems. Using an explicit hydrodynamics proxy application from the U.S. Department of Energy, LULESH, we show how code optimizationsmore » impact different computational phases of the simulation. This provides insight for simulation developers into the best optimizations to use during particular simulation compute phases when optimizing code for future supercomputing platforms. Here, we examine and contrast both x86 and Blue Gene architectures with respect to these optimizations.« less
Achieving High Performance on the i860 Microprocessor
NASA Technical Reports Server (NTRS)
Lee, King; Kutler, Paul (Technical Monitor)
1998-01-01
The i860 is a high performance microprocessor used in the Intel Touchstone project. This paper proposes a paradigm for programming the i860 that is modelled on the vector instructions of the Cray computers. Fortran callable assembler subroutines were written that mimic the concurrent vector instructions of the Cray. Cache takes the place of vector registers. Using this paradigm we have achieved twice the performance of compiled code on a traditional solve.
Production Experiences with the Cray-Enabled TORQUE Resource Manager
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ezell, Matthew A; Maxwell, Don E; Beer, David
High performance computing resources utilize batch systems to manage the user workload. Cray systems are uniquely different from typical clusters due to Cray s Application Level Placement Scheduler (ALPS). ALPS manages binary transfer, job launch and monitoring, and error handling. Batch systems require special support to integrate with ALPS using an XML protocol called BASIL. Previous versions of Adaptive Computing s TORQUE and Moab batch suite integrated with ALPS from within Moab, using PERL scripts to interface with BASIL. This would occasionally lead to problems when all the components would become unsynchronized. Version 4.1 of the TORQUE Resource Manager introducedmore » new features that allow it to directly integrate with ALPS using BASIL. This paper describes production experiences at Oak Ridge National Laboratory using the new TORQUE software versions, as well as ongoing and future work to improve TORQUE.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
G.A. Pope; K. Sephernoori; D.C. McKinney
1996-03-15
This report describes the application of distributed-memory parallel programming techniques to a compositional simulator called UTCHEM. The University of Texas Chemical Flooding reservoir simulator (UTCHEM) is a general-purpose vectorized chemical flooding simulator that models the transport of chemical species in three-dimensional, multiphase flow through permeable media. The parallel version of UTCHEM addresses solving large-scale problems by reducing the amount of time that is required to obtain the solution as well as providing a flexible and portable programming environment. In this work, the original parallel version of UTCHEM was modified and ported to CRAY T3D and CRAY T3E, distributed-memory, multiprocessor computersmore » using CRAY-PVM as the interprocessor communication library. Also, the data communication routines were modified such that the portability of the original code across different computer architectures was mad possible.« less
Nayak, US Krishna; Hegde, Gautam
2010-01-01
Background and objectives Orthodontic diagnosis and treatment planning for growing children must involve growth prediction, especially in the treatment of skeletal problems. Studies have shown that a strong association exists between skeletal maturity and dental calcification stages. The present study was therefore taken up to provide a simple and practical method for assessing skeletal maturity using a dental periapical film and standard dental X-ray machine, to compare the developmental stages of the mandibular canine with that of developmental stages of modified MP3 and to find out if any correlation exists, to determine if the developmental stages of the mandibular canine alone can be used as a reliable indicator for assessment of skeletal maturity. Methods A total of 160 periapical radiographs (80 males and 80 females), of the mandibular right canine and the MP3 region was taken and assessed according to the Dermirjian’s stages of dental calcification and the modified MP3 stages. Results The correlation between the developmental stages of MP3 and the mandibular right canine in male and female groups, is of high statistical significance (p = 0.001). The correlation coefficient between MP3 stages and developmental stages of mandibular canine and chronological age in male and females was found to be not significant. Conclusions The correlation between the mandibular canine calcification stages and MP3 stages was found to be significant. The developmental stages of the mandibular canine could be used very reliably as a sole indicator for assessment of skeletal maturity. PMID:27625553
NASA Astrophysics Data System (ADS)
Godfrey-Kittle, Andrew; Cafiero, Mauricio
We present density functional theory (DFT) interaction energies for the sandwich and T-shaped conformers of substituted benzene dimers. The DFT functionals studied include TPSS, HCTH407, B3LYP, and X3LYP. We also include Hartree-Fock (HF) and second-order Møller-Plesset perturbation theory calculations (MP2), as well as calculations using a new functional, P3LYP, which includes PBE and HF exchange and LYP correlation. Although DFT methods do not explicitly account for the dispersion interactions important in the benzene-dimer interactions, we find that our new method, P3LYP, as well as HCTH407 and TPSS, match MP2 and CCSD(T) calculations much better than the hybrid methods B3LYP and X3LYP methods do.
Electronic response of rare-earth magnetic-refrigeration compounds GdX2 (X = Fe and Co)
NASA Astrophysics Data System (ADS)
Bhatt, Samir; Ahuja, Ushma; Kumar, Kishor; Heda, N. L.
2018-05-01
We present the Compton profiles (CPs) of rare-earth-transition metal compounds GdX2 (X = Fe and Co) using 740 GBq 137Cs Compton spectrometer. To compare the experimental momentum densities, we have also computed the CPs, electronic band structure, density of states (DOS) and Mulliken population (MP) using linear combination of atomic orbitals (LCAO) method. Local density and generalized gradient approximations within density functional theory (DFT) along with the hybridization of Hartree-Fock and DFT (B3LYP and PBE0) have been considered under the framework of LCAO scheme. It is seen that the LCAO-B3LYP based momentum densities give a better agreement with the experimental data for both the compounds. The energy bands and DOS for both the spin-up and spin-down states show metallic like character of the reported intermetallic compounds. The localization of 3d electrons of Co and Fe has also been discussed in terms of equally normalized CPs and MP data. Discussion on magnetization using LCAO method is also included.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Du, Dan; Liu, Juan; Zhang, Xiao-Yan
2011-04-27
This paper described the preparation, characterization, and electrochemical properties of a graphene-ZrO 2 nanocomposite (GZN) and its application for both the enrichment and detection of methyl parathion (MP). GZN was fabricated using electrochemical deposition and characterized by scanning electron microscopy (SEM), transmission electron microscopy (TEM), X-ray diffraction (XRD) and X-ray photoelectron spectroscopy (XPS), which showed the successful formation of nanocomposites. Due to the strong affinity to the phosphoric group and the fast electron-transfer kinetics of GZN, both the extraction and electrochemical detection of organophosphorus (OP) agents at the same GZN modified electrochemical sensor was possible. The combination of solid-phase extractionmore » and stripping voltammetric analysis allowed fast, sensitive, and selective determination of MP in garlic samples. The stripping response was highly linear over the MP concentrations ranging from 0.5 ng mL -1 to 100 ng mL -1, with a detection limit of 0.1 ng mL -1. This new nanocomposite-based electrochemical sensor provides an opportunity to develop a field-deployable, sensitive, and quantitative method for monitoring exposure to OPs.« less
Biofuel cell based on direct bioelectrocatalysis.
Ramanavicius, Arunas; Kausaite, Asta; Ramanaviciene, Almira
2005-04-15
A biofuel cell, consisting of two 3mm diameter carbon rod electrodes and operating at ambient temperature in aqueous solution, pH 6, is described. Biofuel cell based on enzymes able to exchange directly electrons with carbon electrodes was constructed and characterized. Anode of the biofuel cell was based on immobilized Quino-hemoprotein alcohol dehydrogenase from Gluconobacter sp. 33 (QH-ADH), cathode on co-immobilized glucose oxidase from Aspergilus niger (GO(x)) and microperoxidase 8 from the horse heart (MP-8) acting in the consecutive mode. Two enzymes GO(x) and MP-8 applied in the design of biofuel cell cathode were acting in consecutive mode and by hydrogen peroxide oxidized MP-8 was directly accepting electrons from carbon rod electrode. If ethanol was applied as an energy source the maximal open circuit potential of the biofuel cell was -125 mV. If glucose was applied as energy source the open circuit potential of the cell was +145 mV. The maximal open circuit potential (270 mV) was achieved in the presence of extent concentration (over 2 mM) of both substrates (ethanol and glucose). Operational half-life period (tau(1/2)) of the biofuel cell was found to be 2.5 days.
48 CFR 225.7012 - Restriction on supercomputers.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 48 Federal Acquisition Regulations System 3 2014-10-01 2014-10-01 false Restriction on supercomputers. 225.7012 Section 225.7012 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS... supercomputers. ...
48 CFR 225.7012 - Restriction on supercomputers.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 48 Federal Acquisition Regulations System 3 2010-10-01 2010-10-01 false Restriction on supercomputers. 225.7012 Section 225.7012 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS... supercomputers. ...
48 CFR 225.7012 - Restriction on supercomputers.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 48 Federal Acquisition Regulations System 3 2013-10-01 2013-10-01 false Restriction on supercomputers. 225.7012 Section 225.7012 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS... supercomputers. ...
48 CFR 225.7012 - Restriction on supercomputers.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 48 Federal Acquisition Regulations System 3 2011-10-01 2011-10-01 false Restriction on supercomputers. 225.7012 Section 225.7012 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS... supercomputers. ...
48 CFR 225.7012 - Restriction on supercomputers.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 48 Federal Acquisition Regulations System 3 2012-10-01 2012-10-01 false Restriction on supercomputers. 225.7012 Section 225.7012 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS... supercomputers. ...
Zhou, Chong-Wen; Simmie, John M; Curran, Henry J
2010-07-14
A theoretical study of the mechanism and kinetics of the H-abstraction reaction from dimethyl (DME), ethylmethyl (EME) and iso-propylmethyl (IPME) ethers by the OH radical has been carried out using the high-level methods CCSD(T)/CBS, G3 and G3MP2BH&H. The computationally less-expensive methods of G3 and G3MP2BH&H yield results for DME within 0.2-0.6 and 0.7-0.9 kcal mol(-1), respectively, of the coupled cluster, CCSD(T), values extrapolated to the basis set limit. So the G3 and G3MP2BH&H methods can be confidently used for the reactions of the higher ethers. A distinction is made between the two different kinds of H-atoms, classified as in/out-of the symmetry plane, and it is found that abstraction from the out-of-plane H-atoms proceeds through a stepwise mechanism involving the formation of a reactant complex in the entrance channel and product complex in the exit channel. The in-plane H-atom abstractions take place through a more direct mechanism and are less competitive. Rate constants of the three reactions have been calculated in the temperature range of 500-3000 K using the Variflex code, based on the weak collision, master equation/microcanonical variational RRKM theory including tunneling corrections. The computed total rate constants (cm(3) mol(-1) s(-1)) have been fitted as follows: k(DME) = 2.74 xT(3.94) exp (1534.2/T), k(EME) = 20.93 xT(3.61) exp (2060.1/T) and k(IPME) = 0.55 xT(3.93) exp (2826.1/T). Expressions of the group rate constants for the three different carbon sites are also provided.
Ishitsuka, Kenji; Shirahashi, Akihiko; Iwao, Yasuhiro; Shishime, Mikiko; Takamatsu, Yasushi; Takatsuka, Yoshifusa; Utsunomiya, Atae; Suzumiya, Junji; Hara, Syuji; Tamura, Kazuo
2004-04-01
Arsenic trioxide (As2O3) therapy at a daily dose of 0.15 mg/kg was given to a 60-yr-old Japanese male with refractory acute promyelocytic leukemia. White blood cell (WBC) of 6.6 x 10(3)/microl increased to 134 x 10(3)/microl following the administration of As2O3. Daily hydroxyurea (HU), and 6-mercaptopurine (6-MP) were added on days 7 and 19, respectively. Both HU and 6-MP were discontinued on day 28, when WBC declined to 54.0 x 10(3)/microl. He developed unexplained fever and profound cytopenia requiring multiple blood products transfusions. Bone marrow examination on day 42 revealed massive necrosis. Pharmacokinetics confirmed a mean maximum plasma arsenic concentration (Cpmax) and a half-life time (t1/2) of 6.9 microm and 3.2 h, respectively, in the therapeutic range. This is the first case of bone marrow necrosis after standard-dose As2O3 therapy.
Sierpe, R; Noyong, Michael; Simon, Ulrich; Aguayo, D; Huerta, J; Kogan, Marcelo J; Yutronic, N
2017-12-01
As a novel strategy to overcome some of the therapeutic disadvantages of 6-thioguanine (TG) and 6-mercaptopurine (MP), we propose the inclusion of these drugs in βcyclodextrin (βCD) to form the complexes βCD-TG and βCD-MP, followed by subsequent interaction with gold nanoparticles (AuNPs), generating the ternary systems: βCD-TG-AuNPs and βCD-MP-AuNPs. This modification increased their solubility and improved their stability, betting by a site-specific transport due to their nanometric dimensions, among other advantages. The formation of the complexes was confirmed using powder X-ray diffraction, thermogravimetric analysis and one and two-dimensional NMR. A theoretical study using DFT and molecular modelling was conducted to obtain the more stable tautomeric species of TG and MP in solution and confirm the proposed inclusion geometries. The deposition of AuNPs onto βCD-TG and βCD-MP via sputtering was confirmed by UV-vis spectroscopy. Subsequently, the ternary systems were characterized by TEM, FE-SEM and EDX to directly observe the deposited AuNPs and evaluate their sizes, size dispersion, and composition. Finally, the in vitro permeability of the ternary systems was studied using parallel artificial membrane permeability assay (PAMPA). Copyright © 2017 Elsevier Ltd. All rights reserved.
Ab initio study of weakly bound halogen complexes: RX⋯PH3.
Georg, Herbert C; Fileti, Eudes E; Malaspina, Thaciana
2013-01-01
Ab initio calculations were employed to study the role of ipso carbon hybridization in halogenated compounds RX (R=methyl, phenyl, acetyl, H and X=F, Cl, Br and I) and its interaction with a phosphorus atom, as occurs in the halogen bonded complex type RX⋯PH3. The analysis was performed using ab initio MP2, MP4 and CCSD(T) methods. Systematic energy analysis found that the interaction energies are in the range -4.14 to -11.92 kJ mol(-1) (at MP2 level without ZPE correction). Effects of electronic correlation levels were evaluated at MP4 and CCSD(T) levels and a reduction of up to 27% in interaction energy obtained in MP2 was observed. Analysis of the electrostatic maps confirms that the PhCl⋯PH3 and all MeX⋯PH3 complexes are unstable. NBO analysis suggested that the charge transfer between the moieties is bigger when using iodine than bromine and chlorine. The electrical properties of these complexes (dipole and polarizability) were determined and the most important observed aspect was the systematic increase at the dipole polarizability, given by the interaction polarizability. This increase is in the range of 0.7-6.7 u.a. (about 3-7%).
Akerboom, Jasper; Rivera, Jonathan D Vélez; Guilbe, María M Rodríguez; Malavé, Elisa C Alfaro; Hernandez, Hector H; Tian, Lin; Hires, S Andrew; Marvin, Jonathan S; Looger, Loren L; Schreiter, Eric R
2009-03-06
The genetically encoded calcium indicator GCaMP2 shows promise for neural network activity imaging, but is currently limited by low signal-to-noise ratio. We describe x-ray crystal structures as well as solution biophysical and spectroscopic characterization of GCaMP2 in the calcium-free dark state, and in two calcium-bound bright states: a monomeric form that dominates at intracellular concentrations observed during imaging experiments and an unexpected domain-swapped dimer with decreased fluorescence. This series of structures provides insight into the mechanism of Ca2+-induced fluorescence change. Upon calcium binding, the calmodulin (CaM) domain wraps around the M13 peptide, creating a new domain interface between CaM and the circularly permuted enhanced green fluorescent protein domain. Residues from CaM alter the chemical environment of the circularly permuted enhanced green fluorescent protein chromophore and, together with flexible inter-domain linkers, block solvent access to the chromophore. Guided by the crystal structures, we engineered a series of GCaMP2 point mutants to probe the mechanism of GCaMP2 function and characterized one mutant with significantly improved signal-to-noise. The mutation is located at a domain interface and its effect on sensor function could not have been predicted in the absence of structural data.
ARC2D - EFFICIENT SOLUTION METHODS FOR THE NAVIER-STOKES EQUATIONS (DEC RISC ULTRIX VERSION)
NASA Technical Reports Server (NTRS)
Biyabani, S. R.
1994-01-01
ARC2D is a computational fluid dynamics program developed at the NASA Ames Research Center specifically for airfoil computations. The program uses implicit finite-difference techniques to solve two-dimensional Euler equations and thin layer Navier-Stokes equations. It is based on the Beam and Warming implicit approximate factorization algorithm in generalized coordinates. The methods are either time accurate or accelerated non-time accurate steady state schemes. The evolution of the solution through time is physically realistic; good solution accuracy is dependent on mesh spacing and boundary conditions. The mathematical development of ARC2D begins with the strong conservation law form of the two-dimensional Navier-Stokes equations in Cartesian coordinates, which admits shock capturing. The Navier-Stokes equations can be transformed from Cartesian coordinates to generalized curvilinear coordinates in a manner that permits one computational code to serve a wide variety of physical geometries and grid systems. ARC2D includes an algebraic mixing length model to approximate the effect of turbulence. In cases of high Reynolds number viscous flows, thin layer approximation can be applied. ARC2D allows for a variety of solutions to stability boundaries, such as those encountered in flows with shocks. The user has considerable flexibility in assigning geometry and developing grid patterns, as well as in assigning boundary conditions. However, the ARC2D model is most appropriate for attached and mildly separated boundary layers; no attempt is made to model wake regions and widely separated flows. The techniques have been successfully used for a variety of inviscid and viscous flowfield calculations. The Cray version of ARC2D is written in FORTRAN 77 for use on Cray series computers and requires approximately 5Mb memory. The program is fully vectorized. The tape includes variations for the COS and UNICOS operating systems. Also included is a sample routine for CONVEX computers to emulate Cray system time calls, which should be easy to modify for other machines as well. The standard distribution media for this version is a 9-track 1600 BPI ASCII Card Image format magnetic tape. The Cray version was developed in 1987. The IBM ES/3090 version is an IBM port of the Cray version. It is written in IBM VS FORTRAN and has the capability of executing in both vector and parallel modes on the MVS/XA operating system and in vector mode on the VM/XA operating system. Various options of the IBM VS FORTRAN compiler provide new features for the ES/3090 version, including 64-bit arithmetic and up to 2 GB of virtual addressability. The IBM ES/3090 version is available only as a 9-track, 1600 BPI IBM IEBCOPY format magnetic tape. The IBM ES/3090 version was developed in 1989. The DEC RISC ULTRIX version is a DEC port of the Cray version. It is written in FORTRAN 77 for RISC-based Digital Equipment platforms. The memory requirement is approximately 7Mb of main memory. It is available in UNIX tar format on TK50 tape cartridge. The port to DEC RISC ULTRIX was done in 1990. COS and UNICOS are trademarks and Cray is a registered trademark of Cray Research, Inc. IBM, ES/3090, VS FORTRAN, MVS/XA, and VM/XA are registered trademarks of International Business Machines. DEC and ULTRIX are registered trademarks of Digital Equipment Corporation.
Automatic discovery of the communication network topology for building a supercomputer model
NASA Astrophysics Data System (ADS)
Sobolev, Sergey; Stefanov, Konstantin; Voevodin, Vadim
2016-10-01
The Research Computing Center of Lomonosov Moscow State University is developing the Octotron software suite for automatic monitoring and mitigation of emergency situations in supercomputers so as to maximize hardware reliability. The suite is based on a software model of the supercomputer. The model uses a graph to describe the computing system components and their interconnections. One of the most complex components of a supercomputer that needs to be included in the model is its communication network. This work describes the proposed approach for automatically discovering the Ethernet communication network topology in a supercomputer and its description in terms of the Octotron model. This suite automatically detects computing nodes and switches, collects information about them and identifies their interconnections. The application of this approach is demonstrated on the "Lomonosov" and "Lomonosov-2" supercomputers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roach, Mack; De Silvio, Michelle; Valicenti, Richard
2006-11-01
Purpose: Radiation Therapy Oncology Group (RTOG) 9413 trial demonstrated a better progression-free survival (PFS) with whole-pelvis (WP) radiotherapy (RT) compared with prostate-only (PO) RT. This secondary analysis was undertaken to determine whether 'mini-pelvis' (MP; defined as {>=}10 x 11 cm but <11 x 11 cm) RT resulted in progression-free survival (PFS) comparable to that of WP RT. To avoid a timing bias, this analysis was limited to patients receiving neoadjuvant and concurrent hormonal therapy (N and CHT) in Arms 1 and 2 of the study. Methods and Materials: Eligible patients had a risk of lymph node (LN) involvement >15%. Neoadjuvantmore » and concurrent hormonal therapy (N and CHT) was administered 2 months before and during RT for 4 months. From April 1, 1995, to June 1, 1999, a group of 325 patients were randomized to WP RT + N and CHT and another group of 324 patients were randomized to receive PO RT + N and CHT. Patients randomized to PO RT were dichotomized by median field size (10 x 11 cm), with the larger field considered an 'MP' field and the smaller a PO field. Results: The median PFS was 5.2, 3.7, and 2.9 years for WP, MP, and PO fields, respectively (p = 0.02). The 7-year PFS was 40%, 35%, and 27% for patients treated to WP, MP, and PO fields, respectively. There was no association between field size and late Grade 3+ genitourinary toxicity but late Grade 3+ gastrointestinal RT complications correlated with increasing field size. Conclusions: This subset analysis demonstrates that RT field size has a major impact on PFS, and the findings support comprehensive nodal treatment in patients with a risk of LN involvement of >15%.« less
Transitioning NWChem to the Next Generation of Manycore Machines
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bylaska, Eric J.; Apra, E; Kowalski, Karol
The NorthWest chemistry (NWChem) modeling software is a popular molecular chemistry simulation software that was designed from the start to work on massively parallel processing supercomputers [1-3]. It contains an umbrella of modules that today includes self-consistent eld (SCF), second order Møller-Plesset perturbation theory (MP2), coupled cluster (CC), multiconguration self-consistent eld (MCSCF), selected conguration interaction (CI), tensor contraction engine (TCE) many body methods, density functional theory (DFT), time-dependent density functional theory (TDDFT), real-time time-dependent density functional theory, pseudopotential plane-wave density functional theory (PSPW), band structure (BAND), ab initio molecular dynamics (AIMD), Car-Parrinello molecular dynamics (MD), classical MD, hybrid quantum mechanicsmore » molecular mechanics (QM/MM), hybrid ab initio molecular dynamics molecular mechanics (AIMD/MM), gauge independent atomic orbital nuclear magnetic resonance (GIAO NMR), conductor like screening solvation model (COSMO), conductor-like screening solvation model based on density (COSMO-SMD), and reference interaction site model (RISM) solvation models, free energy simulations, reaction path optimization, parallel in time, among other capabilities [4]. Moreover, new capabilities continue to be added with each new release.« less
Panizzon, Gean Pier; Bueno, Fernanda Giacomini; Ueda-Nakamura, Tânia; Nakamura, Celso Vataru; Dias Filho, Benedito Prado
2014-01-01
The most bioactive soy isoflavones (SI), daidzein (DAI) and genistein (GEN) have poor water solubility, which reduces their bioavailability and health benefits and limits their use in industry. The goal of this study was to develop and characterize a new gelatin matrix to microencapsulate DAI and GEN from soy extract (SE) by spray drying, in order to obtain solid dispersions to overcome solubility problems and to allow controlled release. The influences of 1:2 (MP2) and 1:3 (MP3) SE/polymer ratios on the solid state, yield, morphology, encapsulation efficiency, particle size distribution, release kinetics and cumulative release were evaluated. Analyses showed integral microparticles and high drug content. MP3 and MP2 yield were 43.6% and 55.9%, respectively, with similar mean size (p > 0.05), respectively. X-ray diffraction revealed the amorphous solid state of SE. In vitro release tests showed that dissolution was drastically increased. The results indicated that SE microencapsulation might offer a good system to control SI release, as an alternative to improve bioavailability and industrial applications. PMID:25494200
Analysis of vibrational spectra of 3-halo-1-propanols CH(2)XCH(2)CH(2)OH (X is Cl and Br).
Badawi, Hassan M; Förner, Wolfgang
2008-12-01
The conformational stability and the three rotor internal rotations in 3-chloro- and 3-bromo-1-propanols were investigated by DFT-B3LYP/6-311+G and ab initio MP2/6-311+G, MP3/6-311+G and MP4(SDTQ)//MP3/6-311+G levels of theory. On the calculated potential energy surface twelve distinct minima were located all of which were not predicted to have imaginary frequencies at the B3LYP level of theory. The calculated lowest energy minimum in the potential curves of both molecules was predicted to correspond to the Gauche-gauche-trans (Ggt) conformer in excellent agreement with earlier microwave and electron diffraction results. The equilibrium constants for the conformational interconversion of the two 3-halo-1-propanols were calculated at the B3LYP/6-311+G level of calculation and found to correspond to an equilibrium mixture of about 32% Ggt, 18% Ggg1, 13% Tgt, 8% Tgg and 8% Gtt conformations for 3-chloro-1-propanol and 34% Ggt, 15% Tgt, 13% Ggg1, 9% Tgg and 7% Gtt conformations for 3-bromo-1-propanol at 298.15K. The nature of the high energy conformations was verified by carrying out solvent experiments using formamide ( epsilon=109.5) and MP3 and MP4//MP3 calculations. The vibrational frequencies of each molecule in its three most stable forms were computed at the B3LYP level and complete vibrational assignments were made based on normal coordinate calculations and comparison with experimental data of the molecules.
Cecchinato, Francesca; Atefyekta, Saba; Wennerberg, Ann; Andersson, Martin; Jimbo, Ryo; Davies, Julia R
2016-07-01
Mesoporous (MP) titania films used as implant coatings have recently been considered as release systems for controlled administration of magnesium to enhance initial osteoblast proliferation in vitro. Tuning of the pore size in such titania films is aimed at increasing the osteogenic potential through effects on the total loading capacity and the release profile of magnesium. In this study, evaporation-induced self-assembly (EISA) was used with different structure-directing agents to form three mesoporous films with average pore sizes of 2nm (MP1), 6nm (MP2) and 7nm (MP3). Mg adsorption and release was monitored using quartz crystal microbalance with dissipation (QCM-D). The film surfaces were characterized with atomic force microscopy (AFM), scanning electron microscopy (SEM) and X-ray photoelectron spectroscopy (XPS). The effect of different Mg release on osteogenesis was investigated in human fetal osteoblasts (hFOB) using pre-designed osteogenesis arrays and real-time polymerase chain reaction (RT-PCR). Results showed a sustained release from all the films investigated, with higher magnesium adsorption into MP1 and MP3 films. No significant differences were observed in the surface nanotopography of the films, either with or without the presence of magnesium. MP3 films (7nm pore size) had the greatest effect on osteogenesis, up-regulating 15 bone-related genes after 1 week of hFOB growth and significantly promoting bone morphogenic protein (BMP4) expression after 3 weeks of growth. The findings indicate that the increase in pore width on the nano scale significantly enhanced the bioactivity of the mesoporous coating, thus accelerating osteogenesis without creating differences in surface roughness. Copyright © 2016 The Academy of Dental Materials. Published by Elsevier Ltd. All rights reserved.
TOP500 Supercomputers for June 2004
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack
2004-06-23
23rd Edition of TOP500 List of World's Fastest Supercomputers Released: Japan's Earth Simulator Enters Third Year in Top Position MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a closely watched event in the world of high-performance computing, the 23rd edition of the TOP500 list of the world's fastest supercomputers was released today (June 23, 2004) at the International Supercomputer Conference in Heidelberg, Germany.
78 FR 78731 - Indoxacarb; Pesticide Tolerances
Federal Register 2010, 2011, 2012, 2013, 2014
2013-12-27
.../kg/day Chronic RfD = 0.02 Weight of evidence approach was UFA = 10x mg/kg/day. used from four studies..... Weight of evidence approach was 30 days), intermediate-term (1 UFA = 10x used from four studies: to 6..., one developmental toxicity study in rats with DPX-MP062 and DPX-KN128, one developmental toxicity...
Large-scale structural analysis: The structural analyst, the CSM Testbed and the NAS System
NASA Technical Reports Server (NTRS)
Knight, Norman F., Jr.; Mccleary, Susan L.; Macy, Steven C.; Aminpour, Mohammad A.
1989-01-01
The Computational Structural Mechanics (CSM) activity is developing advanced structural analysis and computational methods that exploit high-performance computers. Methods are developed in the framework of the CSM testbed software system and applied to representative complex structural analysis problems from the aerospace industry. An overview of the CSM testbed methods development environment is presented and some numerical methods developed on a CRAY-2 are described. Selected application studies performed on the NAS CRAY-2 are also summarized.
Venkateswaran, Rajamiyer V; Dronavalli, Vamsidhar; Lambert, Peter A; Steeds, Richard P; Wilson, Ian C; Thompson, Richard D; Mascaro, Jorge G; Bonser, Robert S
2009-08-27
Brain stem death can elicit a potentially manipulable cardiotoxic proinflammatory cytokine response. We investigated the prevalence of this response, the impact of donor management with tri-iodothyronine (T3) and methylprednisolone (MP) administration, and the relationship of biomarkers to organ function and transplant suitability. In a prospective randomized double-blinded factorially designed study of T3 and MP therapy, we measured serum levels of interleukin-1 and -6 (IL-1 and IL-6), tumor necrosis factor-alpha (TNF-alpha), C-reactive protein, and procalcitonin (PCT) levels in 79 potential heart or lung donors. Measurements were performed before and after 4 hr of algorithm-based donor management to optimize cardiorespiratory function and +/-hormone treatment. Donors were assigned to receive T3, MP, both drugs, or placebo. Initial IL-1 was elevated in 16% donors, IL-6 in 100%, TNF-alpha in 28%, CRP in 98%, and PCT in 87%. Overall biomarker concentrations did not change between initial and later measurements and neither T3 nor MP effected any change. Both PCT (P =0.02) and TNF-alpha (P =0.044) levels were higher in donor hearts with marginal hemodynamics at initial assessment. Higher PCT levels were related to worse cardiac index and right and left ventricular ejection fractions and a PCT level more than 2 ng x mL(-1) may attenuate any improvement in cardiac index gained by donor management. No differences were observed between initially marginal and nonmarginal donor lungs. A PCT level less than or equal to 2 ng x mL(-1) but not other biomarkers predicted transplant suitability following management. There is high prevalence of a proinflammatory environment in the organ donor that is not affected by tri-iodothyronine or MP therapy. High PCT and TNF-alpha levels are associated with donor heart dysfunction.
Automotive applications of superconductors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ginsberg, M.
1987-01-01
These proceedings compile papers on supercomputers in the automobile industry. Titles include: An automotive engineer's guide to the effective use of scalar, vector, and parallel computers; fluid mechanics, finite elements, and supercomputers; and Automotive crashworthiness performance on a supercomputer.
NASA Astrophysics Data System (ADS)
Mattacchioni, A.; Cristianini, M.; Lo Bosco, A.
2013-03-01
The purpose of this paper is to project digital rectangular phantoms, Di.Recta Multipurpose Phantoms (Di.Recta MP) for quality controls of primary high resolution medical monitors. The first approach for the monitors quality evaluation is represented by AAPM tests using multipurpose TG-18- CQ phantoms. The TG18-QC patterns are available in two sizes: 1024x1024 and 2048x2048 and the use of these phantoms requires a correct monitor setup. The study demonstrates that this type of phantoms is suitable for CRT monitors with adequate settings procedures. In the second step LCD monitors are analysed. Different types of primary monitors are included in a range between 2 and 5 Mp. The difference between the resolution of monitors and phantoms does not allow a complete analysis of the entire system, just moving phantoms in different positions. Of course, the analysis of images in the peripheral regions of medical monitors can not be neglected, especially because of the possible legal implications. A simpler analysis of these areas can be done through the use of rectangular phantoms in place of square ones. Furthermore, because of different technology, also different analysis patches are necessary for these types of monitors. Therefore, there are proposed digital rectangular phantoms, Di.Recta MP, compatible with the spatial resolution of most of commercial monitors. These phantoms are designed to simulate typical radiological conditions to determine the presence of significant defects using appropriate patches such as luminance, contrast, noise patterns. Finally a preliminary study of dedicate Di.Recta MP are proposed for LED monitors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghadar, Yasaman; Clark, Aurora E.
2012-02-02
The interaction potentials between immiscible polar and non-polar solvents are a major driving force behind the formation of liquid:liquid interfaces. In this work, the interaction energy of water–pentane dimer has been determined using coupled-cluster theory with single double (triple) excitations [CCSD(T)], 2nd order Möller Plesset perturbation theory (MP2), density fitted local MP2 (DF-LMP2), as well as density functional theory using a wide variety of density functionals and several different basis sets. The M05-2X exchange correlation functionals exhibit excellent agreement with CCSD(T) and DF-LMP2 after taking into account basis set superposition error. The gas phase water–pentane interaction energy is found tomore » be quite sensitive to the specific pentane isomer (2,2- dimethylpropane vs. n-pentane) and relative orientation of the monomeric constituents. Subsequent solution phase cluster calculations of 2,2-dimethylpropane and n-pentane solvated by water indicate a positive free energy of solvation that is in good agreement with available experimental data. Structural parameters are quite sensitive to the density functional employed and reflect differences in the two-body interaction energy calculated by each method. In contrast, cluster calculations of pentane solvation of H2O solute are found to be inadequate for describing the organic solvent, likely due to limitations associated with the functionals employed (B3LYP, BHandH, and M05-2X).« less
Li, Ming-Hsien; Yang, Yu-Syuan; Wang, Kuo-Chin; Chiang, Yu-Hsien; Shen, Po-Shen; Lai, Wei-Chih; Guo, Tzung-Fang; Chen, Peter
2017-12-06
A robust and recyclable monolithic substrate applying all-inorganic metal-oxide selective contact with a nanoporous (np) Au:NiO x counter electrode is successfully demonstrated for efficient perovskite solar cells, of which the perovskite active layer is deposited in the final step for device fabrication. Through annealing of the Ni/Au bilayer, the nanoporous NiO/Au electrode is formed in virtue of interconnected Au network embedded in oxidized Ni. By optimizing the annealing parameters and tuning the mesoscopic layer thickness (mp-TiO 2 and mp-Al 2 O 3 ), a decent power conversion efficiency (PCE) of 10.25% is delivered. With mp-TiO 2 /mp-Al 2 O 3 /np-Au:NiO x as a template, the original perovskite solar cell with 8.52% PCE can be rejuvenated by rinsing off the perovskite material with dimethylformamide and refilling with newly deposited perovskite. A renewed device using the recycled substrate once and twice, respectively, achieved a PCE of 8.17 and 7.72% that are comparable to original performance. This demonstrates that the novel device architecture is possible to recycle the expensive transparent conducting glass substrates together with all the electrode constituents. Deposition of stable multicomponent perovskite materials in the template also achieves an efficiency of 8.54%, which shows its versatility for various perovskite materials. The application of such a novel NiO/Au nanoporous electrode has promising potential for commercializing cost-effective, large scale, and robust perovskite solar cells.
PLOT3D/AMES, DEC VAX VMS VERSION USING DISSPLA (WITHOUT TURB3D)
NASA Technical Reports Server (NTRS)
Buning, P. G.
1994-01-01
PLOT3D is an interactive graphics program designed to help scientists visualize computational fluid dynamics (CFD) grids and solutions. Today, supercomputers and CFD algorithms can provide scientists with simulations of such highly complex phenomena that obtaining an understanding of the simulations has become a major problem. Tools which help the scientist visualize the simulations can be of tremendous aid. PLOT3D/AMES offers more functions and features, and has been adapted for more types of computers than any other CFD graphics program. Version 3.6b+ is supported for five computers and graphic libraries. Using PLOT3D, CFD physicists can view their computational models from any angle, observing the physics of problems and the quality of solutions. As an aid in designing aircraft, for example, PLOT3D's interactive computer graphics can show vortices, temperature, reverse flow, pressure, and dozens of other characteristics of air flow during flight. As critical areas become obvious, they can easily be studied more closely using a finer grid. PLOT3D is part of a computational fluid dynamics software cycle. First, a program such as 3DGRAPE (ARC-12620) helps the scientist generate computational grids to model an object and its surrounding space. Once the grids have been designed and parameters such as the angle of attack, Mach number, and Reynolds number have been specified, a "flow-solver" program such as INS3D (ARC-11794 or COS-10019) solves the system of equations governing fluid flow, usually on a supercomputer. Grids sometimes have as many as two million points, and the "flow-solver" produces a solution file which contains density, x- y- and z-momentum, and stagnation energy for each grid point. With such a solution file and a grid file containing up to 50 grids as input, PLOT3D can calculate and graphically display any one of 74 functions, including shock waves, surface pressure, velocity vectors, and particle traces. PLOT3D's 74 functions are organized into five groups: 1) Grid Functions for grids, grid-checking, etc.; 2) Scalar Functions for contour or carpet plots of density, pressure, temperature, Mach number, vorticity magnitude, helicity, etc.; 3) Vector Functions for vector plots of velocity, vorticity, momentum, and density gradient, etc.; 4) Particle Trace Functions for rake-like plots of particle flow or vortex lines; and 5) Shock locations based on pressure gradient. TURB3D is a modification of PLOT3D which is used for viewing CFD simulations of incompressible turbulent flow. Input flow data consists of pressure, velocity and vorticity. Typical quantities to plot include local fluctuations in flow quantities and turbulent production terms, plotted in physical or wall units. PLOT3D/TURB3D includes both TURB3D and PLOT3D because the operation of TURB3D is identical to PLOT3D, and there is no additional sample data or printed documentation for TURB3D. Graphical capabilities of PLOT3D version 3.6b+ vary among the implementations available through COSMIC. Customers are encouraged to purchase and carefully review the PLOT3D manual before ordering the program for a specific computer and graphics library. There is only one manual for use with all implementations of PLOT3D, and although this manual generally assumes that the Silicon Graphics Iris implementation is being used, informative comments concerning other implementations appear throughout the text. With all implementations, the visual representation of the object and flow field created by PLOT3D consists of points, lines, and polygons. Points can be represented with dots or symbols, color can be used to denote data values, and perspective is used to show depth. Differences among implementations impact the program's ability to use graphical features that are based on 3D polygons, the user's ability to manipulate the graphical displays, and the user's ability to obtain alternate forms of output. The VAX/VMS/DISSPLA implementation of PLOT3D supports 2-D polygons as well as 2-D and 3-D lines, but does not support graphics features requiring 3-D polygons (shading and hidden line removal, for example). Views can be manipulated using keyboard commands. This version of PLOT3D is potentially able to produce files for a variety of output devices; however, site-specific capabilities will vary depending on the device drivers supplied with the user's DISSPLA library. If ARCGRAPH (ARC-12350) is installed on the user's VAX, the VMS/DISSPLA version of PLOT3D can also be used to create files for use in GAS (Graphics Animation System, ARC-12379), an IRIS program capable of animating and recording images on film. The version 3.6b+ VMS/DISSPLA implementations of PLOT3D (ARC-12777) and PLOT3D/TURB3D (ARC-12781) were developed for use on VAX computers running VMS Version 5.0 and DISSPLA Version 11.0. The standard distribution media for each of these programs is a 9-track, 6250 bpi magnetic tape in DEC VAX BACKUP format. Customers purchasing one implementation version of PLOT3D or PLOT3D/TURB3D will be given a $200 discount on each additional implementation version ordered at the same time. Version 3.6b+ of PLOT3D and PLOT3D/TURB3D are also supported for the following computers and graphics libraries: (1) generic UNIX Supercomputer and IRIS, suitable for CRAY 2/UNICOS, CONVEX, and Alliant with remote IRIS 2xxx/3xxx or IRIS 4D (ARC-12779, ARC-12784); (2) Silicon Graphics IRIS 2xxx/3xxx or IRIS 4D (ARC-12783, ARC12782); (3) generic UNIX and DISSPLA Version 11.0 (ARC-12788, ARC-12778); and (4) Apollo computers running UNIX and GMR3D Version 2.0 (ARC-12789, ARC-12785 which have no capabilities to put text on plots). Silicon Graphics Iris, IRIS 4D, and IRIS 2xxx/3xxx are trademarks of Silicon Graphics Incorporated. VAX and VMS are trademarks of Digital Electronics Corporation. DISSPLA is a trademark of Computer Associates. CRAY 2 and UNICOS are trademarks of CRAY Research, Incorporated. CONVEX is a trademark of Convex Computer Corporation. Alliant is a trademark of Alliant. Apollo and GMR3D are trademarks of Hewlett-Packard, Incorporated. UNIX is a registered trademark of AT&T.
PLOT3D/AMES, DEC VAX VMS VERSION USING DISSPLA (WITH TURB3D)
NASA Technical Reports Server (NTRS)
Buning, P.
1994-01-01
PLOT3D is an interactive graphics program designed to help scientists visualize computational fluid dynamics (CFD) grids and solutions. Today, supercomputers and CFD algorithms can provide scientists with simulations of such highly complex phenomena that obtaining an understanding of the simulations has become a major problem. Tools which help the scientist visualize the simulations can be of tremendous aid. PLOT3D/AMES offers more functions and features, and has been adapted for more types of computers than any other CFD graphics program. Version 3.6b+ is supported for five computers and graphic libraries. Using PLOT3D, CFD physicists can view their computational models from any angle, observing the physics of problems and the quality of solutions. As an aid in designing aircraft, for example, PLOT3D's interactive computer graphics can show vortices, temperature, reverse flow, pressure, and dozens of other characteristics of air flow during flight. As critical areas become obvious, they can easily be studied more closely using a finer grid. PLOT3D is part of a computational fluid dynamics software cycle. First, a program such as 3DGRAPE (ARC-12620) helps the scientist generate computational grids to model an object and its surrounding space. Once the grids have been designed and parameters such as the angle of attack, Mach number, and Reynolds number have been specified, a "flow-solver" program such as INS3D (ARC-11794 or COS-10019) solves the system of equations governing fluid flow, usually on a supercomputer. Grids sometimes have as many as two million points, and the "flow-solver" produces a solution file which contains density, x- y- and z-momentum, and stagnation energy for each grid point. With such a solution file and a grid file containing up to 50 grids as input, PLOT3D can calculate and graphically display any one of 74 functions, including shock waves, surface pressure, velocity vectors, and particle traces. PLOT3D's 74 functions are organized into five groups: 1) Grid Functions for grids, grid-checking, etc.; 2) Scalar Functions for contour or carpet plots of density, pressure, temperature, Mach number, vorticity magnitude, helicity, etc.; 3) Vector Functions for vector plots of velocity, vorticity, momentum, and density gradient, etc.; 4) Particle Trace Functions for rake-like plots of particle flow or vortex lines; and 5) Shock locations based on pressure gradient. TURB3D is a modification of PLOT3D which is used for viewing CFD simulations of incompressible turbulent flow. Input flow data consists of pressure, velocity and vorticity. Typical quantities to plot include local fluctuations in flow quantities and turbulent production terms, plotted in physical or wall units. PLOT3D/TURB3D includes both TURB3D and PLOT3D because the operation of TURB3D is identical to PLOT3D, and there is no additional sample data or printed documentation for TURB3D. Graphical capabilities of PLOT3D version 3.6b+ vary among the implementations available through COSMIC. Customers are encouraged to purchase and carefully review the PLOT3D manual before ordering the program for a specific computer and graphics library. There is only one manual for use with all implementations of PLOT3D, and although this manual generally assumes that the Silicon Graphics Iris implementation is being used, informative comments concerning other implementations appear throughout the text. With all implementations, the visual representation of the object and flow field created by PLOT3D consists of points, lines, and polygons. Points can be represented with dots or symbols, color can be used to denote data values, and perspective is used to show depth. Differences among implementations impact the program's ability to use graphical features that are based on 3D polygons, the user's ability to manipulate the graphical displays, and the user's ability to obtain alternate forms of output. The VAX/VMS/DISSPLA implementation of PLOT3D supports 2-D polygons as well as 2-D and 3-D lines, but does not support graphics features requiring 3-D polygons (shading and hidden line removal, for example). Views can be manipulated using keyboard commands. This version of PLOT3D is potentially able to produce files for a variety of output devices; however, site-specific capabilities will vary depending on the device drivers supplied with the user's DISSPLA library. If ARCGRAPH (ARC-12350) is installed on the user's VAX, the VMS/DISSPLA version of PLOT3D can also be used to create files for use in GAS (Graphics Animation System, ARC-12379), an IRIS program capable of animating and recording images on film. The version 3.6b+ VMS/DISSPLA implementations of PLOT3D (ARC-12777) and PLOT3D/TURB3D (ARC-12781) were developed for use on VAX computers running VMS Version 5.0 and DISSPLA Version 11.0. The standard distribution media for each of these programs is a 9-track, 6250 bpi magnetic tape in DEC VAX BACKUP format. Customers purchasing one implementation version of PLOT3D or PLOT3D/TURB3D will be given a $200 discount on each additional implementation version ordered at the same time. Version 3.6b+ of PLOT3D and PLOT3D/TURB3D are also supported for the following computers and graphics libraries: (1) generic UNIX Supercomputer and IRIS, suitable for CRAY 2/UNICOS, CONVEX, and Alliant with remote IRIS 2xxx/3xxx or IRIS 4D (ARC-12779, ARC-12784); (2) Silicon Graphics IRIS 2xxx/3xxx or IRIS 4D (ARC-12783, ARC12782); (3) generic UNIX and DISSPLA Version 11.0 (ARC-12788, ARC-12778); and (4) Apollo computers running UNIX and GMR3D Version 2.0 (ARC-12789, ARC-12785 which have no capabilities to put text on plots). Silicon Graphics Iris, IRIS 4D, and IRIS 2xxx/3xxx are trademarks of Silicon Graphics Incorporated. VAX and VMS are trademarks of Digital Electronics Corporation. DISSPLA is a trademark of Computer Associates. CRAY 2 and UNICOS are trademarks of CRAY Research, Incorporated. CONVEX is a trademark of Convex Computer Corporation. Alliant is a trademark of Alliant. Apollo and GMR3D are trademarks of Hewlett-Packard, Incorporated. UNIX is a registered trademark of AT&T.
Waterman, R C; Sawyer, J E; Mathis, C P; Hawkins, D E; Donart, G B; Petersen, M K
2006-02-01
Cattle grazing winter range forages exhibit interannual variation in response to supplementation. This variation may be mediated by circulating concentrations and subsequent metabolism of glucose, which are influenced by forage quality and availability. A study conducted at the Corona Range and Livestock Research Center during 2 dry years evaluated responses of young postpartum beef cows (n = 51, initial BW = 408 +/- 3 kg, and BCS = 5.1 +/- 0.04 in year 1; n = 36, initial BW = 393 +/- 4 kg, and BCS = 4.5 +/- 0.05 in year 2) to supplements that met or exceeded metabolizable protein (MP) requirements. Supplements were fed at 908 g/d per cow and provided 327 g of CP, 118 g of ruminally undegradable protein (RUP), and 261 g of MP from RUP (RMP), calculated to meet the MP requirement; 327 g of CP, 175 g of RUP, and 292 g of MP from RUP (RMP+), which supplied 31 g of excess MP; or 327 g of CP, 180 g of RUP, 297 g of MP from RUP, and 100 g of propionate salt (NutroCal, Kemin Industries, Inc., Des Moines, IA; (RMP+)P), which supplied 36 g of excess MP. Body weights were recorded once every 2 wk, and blood samples were collected 1x/wk in year 1 and 2x/wk in year 2 for 100 d postpartum. Postpartum anestrous was evaluated by progesterone from weekly blood samples, and pregnancy was confirmed by rectal palpation at weaning. As MP from RUP with or without propionate increased, a decrease (P = 0.03) was observed in postpartum interval; however, differences in pregnancy percentage (P = 0.54) were not influenced by treatments. We hypothesized that additional AA from RUP along with propionate would increase supply of glucogenic precursors and, therefore, glucogenic potential of the diet. Therefore, a postpartum glucose tolerance test was conducted near the nadir of cow BW to evaluate the rate of glucose clearance. Glucose tolerance tests showed that (RMP+)- or (RMP+)P-supplemented cows had greater (P = 0.03) rates of glucose clearance, which might have influenced the observed abbreviation of the postpartum interval. A glucose tolerance test conducted at the end of supplemental treatments revealed no differences in glucose clearance (P = 0.47) among previously supplemented cows. These data suggest that not only vegetative quality, duration of lactation, and season of grazing, but also type of supplementation may play a pivotal role in the young postpartum beef cow's ability to respond and incorporate nutrients into insulin-sensitive tissues.
Scientific Visualization in High Speed Network Environments
NASA Technical Reports Server (NTRS)
Vaziri, Arsi; Kutler, Paul (Technical Monitor)
1997-01-01
In several cases, new visualization techniques have vastly increased the researcher's ability to analyze and comprehend data. Similarly, the role of networks in providing an efficient supercomputing environment have become more critical and continue to grow at a faster rate than the increase in the processing capabilities of supercomputers. A close relationship between scientific visualization and high-speed networks in providing an important link to support efficient supercomputing is identified. The two technologies are driven by the increasing complexities and volume of supercomputer data. The interaction of scientific visualization and high-speed networks in a Computational Fluid Dynamics simulation/visualization environment are given. Current capabilities supported by high speed networks, supercomputers, and high-performance graphics workstations at the Numerical Aerodynamic Simulation Facility (NAS) at NASA Ames Research Center are described. Applied research in providing a supercomputer visualization environment to support future computational requirements are summarized.
The solution of linear systems of equations with a structural analysis code on the NAS CRAY-2
NASA Technical Reports Server (NTRS)
Poole, Eugene L.; Overman, Andrea L.
1988-01-01
Two methods for solving linear systems of equations on the NAS Cray-2 are described. One is a direct method; the other is an iterative method. Both methods exploit the architecture of the Cray-2, particularly the vectorization, and are aimed at structural analysis applications. To demonstrate and evaluate the methods, they were installed in a finite element structural analysis code denoted the Computational Structural Mechanics (CSM) Testbed. A description of the techniques used to integrate the two solvers into the Testbed is given. Storage schemes, memory requirements, operation counts, and reformatting procedures are discussed. Finally, results from the new methods are compared with results from the initial Testbed sparse Choleski equation solver for three structural analysis problems. The new direct solvers described achieve the highest computational rates of the methods compared. The new iterative methods are not able to achieve as high computation rates as the vectorized direct solvers but are best for well conditioned problems which require fewer iterations to converge to the solution.
Porting the AVS/Express scientific visualization software to Cray XT4.
Leaver, George W; Turner, Martin J; Perrin, James S; Mummery, Paul M; Withers, Philip J
2011-08-28
Remote scientific visualization, where rendering services are provided by larger scale systems than are available on the desktop, is becoming increasingly important as dataset sizes increase beyond the capabilities of desktop workstations. Uptake of such services relies on access to suitable visualization applications and the ability to view the resulting visualization in a convenient form. We consider five rules from the e-Science community to meet these goals with the porting of a commercial visualization package to a large-scale system. The application uses message-passing interface (MPI) to distribute data among data processing and rendering processes. The use of MPI in such an interactive application is not compatible with restrictions imposed by the Cray system being considered. We present details, and performance analysis, of a new MPI proxy method that allows the application to run within the Cray environment yet still support MPI communication required by the application. Example use cases from materials science are considered.
A Pacific Ocean general circulation model for satellite data assimilation
NASA Technical Reports Server (NTRS)
Chao, Y.; Halpern, D.; Mechoso, C. R.
1991-01-01
A tropical Pacific Ocean General Circulation Model (OGCM) to be used in satellite data assimilation studies is described. The transfer of the OGCM from a CYBER-205 at NOAA's Geophysical Fluid Dynamics Laboratory to a CRAY-2 at NASA's Ames Research Center is documented. Two 3-year model integrations from identical initial conditions but performed on those two computers are compared. The model simulations are very similar to each other, as expected, but the simulations performed with the higher-precision CRAY-2 is smoother than that with the lower-precision CYBER-205. The CYBER-205 and CRAY-2 use 32 and 64-bit mantissa arithmetic, respectively. The major features of the oceanic circulation in the tropical Pacific, namely the North Equatorial Current, the North Equatorial Countercurrent, the South Equatorial Current, and the Equatorial Undercurrent, are realistically produced and their seasonal cycles are described. The OGCM provides a powerful tool for study of tropical oceans and for the assimilation of satellite altimetry data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
da Motta Filho, Helio
Themore » $$X_f$$, $$P^2_t$$ and $$p_t$$ distribution of $$D^{\\pm}$$ mesons produced by 250 GeV $K^+$-nucleon interactions are measured through the decay channel $$D^{\\pm} \\to K^{\\mp}\\pi^{\\pm} \\pi^{\\pm}$$....« less