cray x-mp computer: Topics by Science.gov

Sample records for cray x-mp computer

A performance comparison of the Cray-2 and the Cray X-MP

NASA Technical Reports Server (NTRS)

Schmickley, Ronald; Bailey, David H.

1986-01-01

A suite of thirteen large Fortran benchmark codes were run on Cray-2 and Cray X-MP supercomputers. These codes were a mix of compute-intensive scientific application programs (mostly Computational Fluid Dynamics) and some special vectorized computation exercise programs. For the general class of programs tested on the Cray-2, most of which were not specially tuned for speed, the floating point operation rates varied under a variety of system load configurations from 40 percent up to 125 percent of X-MP performance rates. It is concluded that the Cray-2, in the original system configuration studied (without memory pseudo-banking) will run untuned Fortran code, on average, about 70 percent of X-MP speeds.
FFTs in external or hierarchical memory

NASA Technical Reports Server (NTRS)

Bailey, David H.

1989-01-01

A description is given of advanced techniques for computing an ordered FFT on a computer with external or hierarchical memory. These algorithms (1) require as few as two passes through the external data set, (2) use strictly unit stride, long vector transfers between main memory and external storage, (3) require only a modest amount of scratch space in main memory, and (4) are well suited for vector and parallel computation. Performance figures are included for implementations of some of these algorithms on Cray supercomputers. Of interest is the fact that a main memory version outperforms the current Cray library FFT routines on the Cray-2, the Cray X-MP, and the Cray Y-MP systems. Using all eight processors on the Cray Y-MP, this main memory routine runs at nearly 2 Gflops.
Multitasking and microtasking experience on the NA S Cray-2 and ACF Cray X-MP

NASA Technical Reports Server (NTRS)

Raiszadeh, Farhad

1987-01-01

The fast Fourier transform (FFT) kernel of the NAS benchmark program has been utilized to experiment with the multitasking library on the Cray-2 and Cray X-MP/48, and microtasking directives on the Cray X-MP. Some performance figures are shown, and the state of multitasking software is described.
New computing systems and their impact on structural analysis and design

NASA Technical Reports Server (NTRS)

Noor, Ahmed K.

1989-01-01

A review is given of the recent advances in computer technology that are likely to impact structural analysis and design. The computational needs for future structures technology are described. The characteristics of new and projected computing systems are summarized. Advances in programming environments, numerical algorithms, and computational strategies for new computing systems are reviewed, and a novel partitioning strategy is outlined for maximizing the degree of parallelism. The strategy is designed for computers with a shared memory and a small number of powerful processors (or a small number of clusters of medium-range processors). It is based on approximating the response of the structure by a combination of symmetric and antisymmetric response vectors, each obtained using a fraction of the degrees of freedom of the original finite element model. The strategy was implemented on the CRAY X-MP/4 and the Alliant FX/8 computers. For nonlinear dynamic problems on the CRAY X-MP with four CPUs, it resulted in an order of magnitude reduction in total analysis time, compared with the direct analysis on a single-CPU CRAY X-MP machine.
Early MIMD experience on the CRAY X-MP

NASA Astrophysics Data System (ADS)

Rhoades, Clifford E.; Stevens, K. G.

1985-07-01

This paper describes some early experience with converting four physics simulation programs to the CRAY X-MP, a current Multiple Instruction, Multiple Data (MIMD) computer consisting of two processors each with an architecture similar to that of the CRAY-1. As a multi-processor, the CRAY X-MP together with the high speed Solid-state Storage Device (SSD) in an ideal machine upon which to study MIMD algorithms for solving the equations of mathematical physics because it is fast enough to run real problems. The computer programs used in this study are all FORTRAN versions of original production codes. They range in sophistication from a one-dimensional numerical simulation of collisionless plasma to a two-dimensional hydrodynamics code with heat flow to a couple of three-dimensional fluid dynamics codes with varying degrees of viscous modeling. Early research with a dual processor configuration has shown speed-ups ranging from 1.55 to 1.98. It has been observed that a few simple extensions to FORTRAN allow a typical programmer to achieve a remarkable level of efficiency. These extensions involve the concept of memory local to a concurrent subprogram and memory common to all concurrent subprograms.
Optimization of large matrix calculations for execution on the Cray X-MP vector supercomputer

NASA Technical Reports Server (NTRS)

Hornfeck, William A.

1988-01-01

A considerable volume of large computational computer codes were developed for NASA over the past twenty-five years. This code represents algorithms developed for machines of earlier generation. With the emergence of the vector supercomputer as a viable, commercially available machine, an opportunity exists to evaluate optimization strategies to improve the efficiency of existing software. This result is primarily due to architectural differences in the latest generation of large-scale machines and the earlier, mostly uniprocessor, machines. A sofware package being used by NASA to perform computations on large matrices is described, and a strategy for conversion to the Cray X-MP vector supercomputer is also described.
The International Conference on Vector and Parallel Computing (2nd)

DTIC Science & Technology

1989-01-17

Computation of the SVD of Bidiagonal Matrices" ...................................... 11 " Lattice QCD -As a Large Scale Scientific Computation...vectorizcd for the IBM 3090 Vector Facility. In addition, elapsed times " Lattice QCD -As a Large Scale Scientific have been reduced by using 3090...benchmarked Lattice QCD on a large number ofcompu- come from the wavefront solver routine. This was exten- ters: CrayX-MP and Cray 2 (vector
Optimization strategies for molecular dynamics programs on Cray computers and scalar work stations

NASA Astrophysics Data System (ADS)

Unekis, Michael J.; Rice, Betsy M.

1994-12-01

We present results of timing runs and different optimization strategies for a prototype molecular dynamics program that simulates shock waves in a two-dimensional (2-D) model of a reactive energetic solid. The performance of the program may be improved substantially by simple changes to the Fortran or by employing various vendor-supplied compiler optimizations. The optimum strategy varies among the machines used and will vary depending upon the details of the program. The effect of various compiler options and vendor-supplied subroutine calls is demonstrated. Comparison is made between two scalar workstations (IBM RS/6000 Model 370 and Model 530) and several Cray supercomputers (X-MP/48, Y-MP8/128, and C-90/16256). We find that for a scientific application program dominated by sequential, scalar statements, a relatively inexpensive high-end work station such as the IBM RS/60006 RISC series will outperform single processor performance of the Cray X-MP/48 and perform competitively with single processor performance of the Y-MP8/128 and C-9O/16256.
Y-MP floating point and Cholesky factorization

NASA Technical Reports Server (NTRS)

Carter, Russell

1991-01-01

The floating point arithmetics implemented in the Cray 2 and Cray Y-MP computer systems are nearly identical, but large scale computations performed on the two systems have exhibited significant differences in accuracy. The difference in accuracy is analyzed for Cholesky factorization algorithm, and it is found that the source of the difference is the subtract magnitude operation of the Cray Y-MP. The results from numerical experiments for a range of problem sizes are presented, and an efficient method for improving the accuracy of the factorization obtained on the Y-MP is presented.
Distributed Finite Element Analysis Using a Transputer Network

NASA Technical Reports Server (NTRS)

Watson, James; Favenesi, James; Danial, Albert; Tombrello, Joseph; Yang, Dabby; Reynolds, Brian; Turrentine, Ronald; Shephard, Mark; Baehmann, Peggy

1989-01-01

The principal objective of this research effort was to demonstrate the extraordinarily cost effective acceleration of finite element structural analysis problems using a transputer-based parallel processing network. This objective was accomplished in the form of a commercially viable parallel processing workstation. The workstation is a desktop size, low-maintenance computing unit capable of supercomputer performance yet costs two orders of magnitude less. To achieve the principal research objective, a transputer based structural analysis workstation termed XPFEM was implemented with linear static structural analysis capabilities resembling commercially available NASTRAN. Finite element model files, generated using the on-line preprocessing module or external preprocessing packages, are downloaded to a network of 32 transputers for accelerated solution. The system currently executes at about one third Cray X-MP24 speed but additional acceleration appears likely. For the NASA selected demonstration problem of a Space Shuttle main engine turbine blade model with about 1500 nodes and 4500 independent degrees of freedom, the Cray X-MP24 required 23.9 seconds to obtain a solution while the transputer network, operated from an IBM PC-AT compatible host computer, required 71.7 seconds. Consequently, the $80,000 transputer network demonstrated a cost-performance ratio about 60 times better than the $15,000,000 Cray X-MP24 system.
Attaching IBM-compatible 3380 disks to Cray X-MP

DOE Office of Scientific and Technical Information (OSTI.GOV)

Engert, D.E.; Midlock, J.L.

1989-01-01

A method of attaching IBM-compatible 3380 disks directly to a Cray X-MP via the XIOP with a BMC is described. The IBM 3380 disks appear to the UNICOS operating system as DD-29 disks with UNICOS file systems. IBM 3380 disks provide cheap, reliable large capacity disk storage. Combined with a small number of high-speed Cray disks, the IBM disks provide for the bulk of the storage for small files and infrequently used files. Cray Research designed the BMC and its supporting software in the XIOP to allow IBM tapes and other devices to be attached to the X-MP. No hardwaremore » changes were necessary, and we added less than 2000 lines of code to the XIOP to accomplish this project. This system has been in operation for over eight months. Future enhancements such as the use of a cache controller and attachment to a Y-MP are also described. 1 tab.« less
Implementing dense linear algebra algorithms using multitasking on the CRAY X-MP-4 (or approaching the gigaflop)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dongarra, J.J.; Hewitt, T.

1985-08-01

This note describes some experiments on simple, dense linear algebra algorithms. These experiments show that the CRAY X-MP is capable of small-grain multitasking arising from standard implementations of LU and Cholesky decomposition. The implementation described here provides the ''fastest'' execution rate for LU decomposition, 718 MFLOPS for a matrix of order 1000.
A parallel finite-difference method for computational aerodynamics

NASA Technical Reports Server (NTRS)

Swisshelm, Julie M.

1989-01-01

A finite-difference scheme for solving complex three-dimensional aerodynamic flow on parallel-processing supercomputers is presented. The method consists of a basic flow solver with multigrid convergence acceleration, embedded grid refinements, and a zonal equation scheme. Multitasking and vectorization have been incorporated into the algorithm. Results obtained include multiprocessed flow simulations from the Cray X-MP and Cray-2. Speedups as high as 3.3 for the two-dimensional case and 3.5 for segments of the three-dimensional case have been achieved on the Cray-2. The entire solver attained a factor of 2.7 improvement over its unitasked version on the Cray-2. The performance of the parallel algorithm on each machine is analyzed.
Scalable Vector Media-processors for Embedded Systems

DTIC Science & Technology

2002-05-01

Set Architecture for Multimedia “When you do the common things in life in an uncommon way, you will command the attention of the world.” George ...Bibliography [ABHS89] M. August, G. Brost , C. Hsiung, and C. Schiffleger. Cray X-MP: The Birth of a Super- computer. IEEE Computer, 22(1):45–52, January
Experiences and results multitasking a hydrodynamics code on global and local memory machines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mandell, D.

1987-01-01

A one-dimensional, time-dependent Lagrangian hydrodynamics code using a Godunov solution method has been multitasked for the Cray X-MP/48, the Intel iPSC hypercube, the Alliant FX series and the IBM RP3 computers. Actual multitasking results have been obtained for the Cray, Intel and Alliant computers and simulated results were obtained for the Cray and RP3 machines. The differences in the methods required to multitask on each of the machines is discussed. Results are presented for a sample problem involving a shock wave moving down a channel. Comparisons are made between theoretical speedups, predicted by Amdahl's law, and the actual speedups obtained.more » The problems of debugging on the different machines are also described.« less
A performance comparison of scalar, vector, and concurrent vector computers including supercomputers for modeling transport of reactive contaminants in groundwater

NASA Astrophysics Data System (ADS)

Tripathi, Vijay S.; Yeh, G. T.

1993-06-01

Sophisticated and highly computation-intensive models of transport of reactive contaminants in groundwater have been developed in recent years. Application of such models to real-world contaminant transport problems, e.g., simulation of groundwater transport of 10-15 chemically reactive elements (e.g., toxic metals) and relevant complexes and minerals in two and three dimensions over a distance of several hundred meters, requires high-performance computers including supercomputers. Although not widely recognized as such, the computational complexity and demand of these models compare with well-known computation-intensive applications including weather forecasting and quantum chemical calculations. A survey of the performance of a variety of available hardware, as measured by the run times for a reactive transport model HYDROGEOCHEM, showed that while supercomputers provide the fastest execution times for such problems, relatively low-cost reduced instruction set computer (RISC) based scalar computers provide the best performance-to-price ratio. Because supercomputers like the Cray X-MP are inherently multiuser resources, often the RISC computers also provide much better turnaround times. Furthermore, RISC-based workstations provide the best platforms for "visualization" of groundwater flow and contaminant plumes. The most notable result, however, is that current workstations costing less than $10,000 provide performance within a factor of 5 of a Cray X-MP.
Using Strassen's algorithm to accelerate the solution of linear systems

NASA Technical Reports Server (NTRS)

Bailey, David H.; Lee, King; Simon, Horst D.

1990-01-01

Strassen's algorithm for fast matrix-matrix multiplication has been implemented for matrices of arbitrary shapes on the CRAY-2 and CRAY Y-MP supercomputers. Several techniques have been used to reduce the scratch space requirement for this algorithm while simultaneously preserving a high level of performance. When the resulting Strassen-based matrix multiply routine is combined with some routines from the new LAPACK library, LU decomposition can be performed with rates significantly higher than those achieved by conventional means. We succeeded in factoring a 2048 x 2048 matrix on the CRAY Y-MP at a rate equivalent to 325 MFLOPS.
Parallel computation in a three-dimensional elastic-plastic finite-element analysis

NASA Technical Reports Server (NTRS)

Shivakumar, K. N.; Bigelow, C. A.; Newman, J. C., Jr.

1992-01-01

A CRAY parallel processing technique called autotasking was implemented in a three-dimensional elasto-plastic finite-element code. The technique was evaluated on two CRAY supercomputers, a CRAY 2 and a CRAY Y-MP. Autotasking was implemented in all major portions of the code, except the matrix equations solver. Compiler directives alone were not able to properly multitask the code; user-inserted directives were required to achieve better performance. It was noted that the connect time, rather than wall-clock time, was more appropriate to determine speedup in multiuser environments. For a typical example problem, a speedup of 2.1 (1.8 when the solution time was included) was achieved in a dedicated environment and 1.7 (1.6 with solution time) in a multiuser environment on a four-processor CRAY 2 supercomputer. The speedup on a three-processor CRAY Y-MP was about 2.4 (2.0 with solution time) in a multiuser environment.
CRAY mini manual. Revision D

NASA Technical Reports Server (NTRS)

Tennille, Geoffrey M.; Howser, Lona M.

1993-01-01

This document briefly describes the use of the CRAY supercomputers that are an integral part of the Supercomputing Network Subsystem of the Central Scientific Computing Complex at LaRC. Features of the CRAY supercomputers are covered, including: FORTRAN, C, PASCAL, architectures of the CRAY-2 and CRAY Y-MP, the CRAY UNICOS environment, batch job submittal, debugging, performance analysis, parallel processing, utilities unique to CRAY, and documentation. The document is intended for all CRAY users as a ready reference to frequently asked questions and to more detailed information contained in the vendor manuals. It is appropriate for both the novice and the experienced user.
NAS technical summaries: Numerical aerodynamic simulation program, March 1991 - February 1992

NASA Technical Reports Server (NTRS)

1992-01-01

NASA created the Numerical Aerodynamic Simulation (NAS) Program in 1987 to focus resources on solving critical problems in aeroscience and related disciplines by utilizing the power of the most advanced supercomputers available. The NAS Program provides scientists with the necessary computing power to solve today's most demanding computational fluid dynamics problems and serves as a pathfinder in integrating leading-edge supercomputing technologies, thus benefiting other supercomputer centers in Government and industry. This report contains selected scientific results from the 1991-92 NAS Operational Year, March 4, 1991 to March 3, 1992, which is the fifth year of operation. During this year, the scientific community was given access to a Cray-2 and a Cray Y-MP. The Cray-2, the first generation supercomputer, has four processors, 256 megawords of central memory, and a total sustained speed of 250 million floating point operations per second. The Cray Y-MP, the second generation supercomputer, has eight processors and a total sustained speed of one billion floating point operations per second. Additional memory was installed this year, doubling capacity from 128 to 256 megawords of solid-state storage-device memory. Because of its higher performance, the Cray Y-MP delivered approximately 77 percent of the total number of supercomputer hours used during this year.

Theoretical research program to study chemical reactions in AOTV bow shock tubes

NASA Technical Reports Server (NTRS)

Taylor, P.

1986-01-01

Progress in the development of computational methods for the characterization of chemical reactions in aerobraking orbit transfer vehicle (AOTV) propulsive flows is reported. Two main areas of code development were undertaken: (1) the implementation of CASSCF (complete active space self-consistent field) and SCF (self-consistent field) analytical first derivatives on the CRAY X-MP; and (2) the installation of the complete set of electronic structure codes on the CRAY 2. In the area of application calculations the main effort was devoted to performing full configuration-interaction calculations and using these results to benchmark other methods. Preprints describing some of the systems studied are included.
GPU acceleration of a petascale application for turbulent mixing at high Schmidt number using OpenMP 4.5

NASA Astrophysics Data System (ADS)

Clay, M. P.; Buaria, D.; Yeung, P. K.; Gotoh, T.

2018-07-01

This paper reports on the successful implementation of a massively parallel GPU-accelerated algorithm for the direct numerical simulation of turbulent mixing at high Schmidt number. The work stems from a recent development (Comput. Phys. Commun., vol. 219, 2017, 313-328), in which a low-communication algorithm was shown to attain high degrees of scalability on the Cray XE6 architecture when overlapping communication and computation via dedicated communication threads. An even higher level of performance has now been achieved using OpenMP 4.5 on the Cray XK7 architecture, where on each node the 16 integer cores of an AMD Interlagos processor share a single Nvidia K20X GPU accelerator. In the new algorithm, data movements are minimized by performing virtually all of the intensive scalar field computations in the form of combined compact finite difference (CCD) operations on the GPUs. A memory layout in departure from usual practices is found to provide much better performance for a specific kernel required to apply the CCD scheme. Asynchronous execution enabled by adding the OpenMP 4.5 NOWAIT clause to TARGET constructs improves scalability when used to overlap computation on the GPUs with computation and communication on the CPUs. On the 27-petaflops supercomputer Titan at Oak Ridge National Laboratory, USA, a GPU-to-CPU speedup factor of approximately 5 is consistently observed at the largest problem size of 81923 grid points for the scalar field computed with 8192 XK7 nodes.
Hot Chips and Hot Interconnects for High End Computing Systems

NASA Technical Reports Server (NTRS)

Saini, Subhash

2005-01-01

I will discuss several processors: 1. The Cray proprietary processor used in the Cray X1; 2. The IBM Power 3 and Power 4 used in an IBM SP 3 and IBM SP 4 systems; 3. The Intel Itanium and Xeon, used in the SGI Altix systems and clusters respectively; 4. IBM System-on-a-Chip used in IBM BlueGene/L; 5. HP Alpha EV68 processor used in DOE ASCI Q cluster; 6. SPARC64 V processor, which is used in the Fujitsu PRIMEPOWER HPC2500; 7. An NEC proprietary processor, which is used in NEC SX-6/7; 8. Power 4+ processor, which is used in Hitachi SR11000; 9. NEC proprietary processor, which is used in Earth Simulator. The IBM POWER5 and Red Storm Computing Systems will also be discussed. The architectures of these processors will first be presented, followed by interconnection networks and a description of high-end computer systems based on these processors and networks. The performance of various hardware/programming model combinations will then be compared, based on latest NAS Parallel Benchmark results (MPI, OpenMP/HPF and hybrid (MPI + OpenMP). The tutorial will conclude with a discussion of general trends in the field of high performance computing, (quantum computing, DNA computing, cellular engineering, and neural networks).
SNS programming environment user's guide

NASA Technical Reports Server (NTRS)

Tennille, Geoffrey M.; Howser, Lona M.; Humes, D. Creig; Cronin, Catherine K.; Bowen, John T.; Drozdowski, Joseph M.; Utley, Judith A.; Flynn, Theresa M.; Austin, Brenda A.

1992-01-01

The computing environment is briefly described for the Supercomputing Network Subsystem (SNS) of the Central Scientific Computing Complex of NASA Langley. The major SNS computers are a CRAY-2, a CRAY Y-MP, a CONVEX C-210, and a CONVEX C-220. The software is described that is common to all of these computers, including: the UNIX operating system, computer graphics, networking utilities, mass storage, and mathematical libraries. Also described is file management, validation, SNS configuration, documentation, and customer services.
Supercomputers for engineering analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Goudreau, G.L.; Benson, D.J.; Hallquist, J.O.

1986-07-01

The Cray-1 and Cray X-MP/48 experience in engineering computations at the Lawrence Livermore National Laboratory is surveyed. The fully vectorized explicit DYNA and implicit NIKE finite element codes are discussed with respect to solid and structural mechanics. The main efficiencies for production analyses are currently obtained by simple CFT compiler exploitation of pipeline architecture for inner do-loop optimization. Current developmet of outer-loop multitasking is also discussed. Applications emphasis will be on 3D examples spanning earth penetrator loads analysis, target lethality assessment, and crashworthiness. The use of a vectorized large deformation shell element in both DYNA and NIKE has substantially expandedmore » 3D nonlinear capability. 25 refs., 7 figs.« less
Multitasking the three-dimensional shock wave code CTH on the Cray X-MP/416

DOE Office of Scientific and Technical Information (OSTI.GOV)

McGlaun, J.M.; Thompson, S.L.

1988-01-01

CTH is a software system under development at Sandia National Laboratories Albuquerque that models multidimensional, multi-material, large-deformation, strong shock wave physics. CTH was carefully designed to both vectorize and multitask on the Cray X-MP/416. All of the physics routines are vectorized except the thermodynamics and the interface tracer. All of the physics routines are multitasked except the boundary conditions. The Los Alamos National Laboratory multitasking library was used for the multitasking. The resulting code is easy to maintain, easy to understand, gives the same answers as the unitasked code, and achieves a measured speedup of approximately 3.5 on the fourmore » cpu Cray. This document discusses the design, prototyping, development, and debugging of CTH. It also covers the architecture features of CTH that enhances multitasking, granularity of the tasks, and synchronization of tasks. The utility of system software and utilities such as simulators and interactive debuggers are also discussed. 5 refs., 7 tabs.« less
TOSCA calculations and measurements for the SLAC SLC damping ring dipole magnet

NASA Astrophysics Data System (ADS)

Early, R. A.; Cobb, J. K.

1985-04-01

The SLAC damping ring dipole magnet was originally designed with removable nose pieces at the ends. Recently, a set of magnetic measurements was taken of the vertical component of induction along the center of the magnet for four different pole-end configurations and several current settings. The three dimensional computer code TOSCA, which is currently installed on the National Magnetic Fusion Energy Computer Center's Cray X-MP, was used to compute field values for the four configurations at current settings near saturation. Comparisons were made for magnetic induction as well as effective magnetic lengths for the different configurations.
A multithreaded and GPU-optimized compact finite difference algorithm for turbulent mixing at high Schmidt number using petascale computing

NASA Astrophysics Data System (ADS)

Clay, M. P.; Yeung, P. K.; Buaria, D.; Gotoh, T.

2017-11-01

Turbulent mixing at high Schmidt number is a multiscale problem which places demanding requirements on direct numerical simulations to resolve fluctuations down the to Batchelor scale. We use a dual-grid, dual-scheme and dual-communicator approach where velocity and scalar fields are computed by separate groups of parallel processes, the latter using a combined compact finite difference (CCD) scheme on finer grid with a static 3-D domain decomposition free of the communication overhead of memory transposes. A high degree of scalability is achieved for a 81923 scalar field at Schmidt number 512 in turbulence with a modest inertial range, by overlapping communication with computation whenever possible. On the Cray XE6 partition of Blue Waters, use of a dedicated thread for communication combined with OpenMP locks and nested parallelism reduces CCD timings by 34% compared to an MPI baseline. The code has been further optimized for the 27-petaflops Cray XK7 machine Titan using GPUs as accelerators with the latest OpenMP 4.5 directives, giving 2.7X speedup compared to CPU-only execution at the largest problem size. Supported by NSF Grant ACI-1036170, the NCSA Blue Waters Project with subaward via UIUC, and a DOE INCITE allocation at ORNL.
Ada Compiler Validation Summary Report: Certificate Number: 901112W1. 11116 Cray Research, Inc., Cray Ada Compiler, Release 2.0, Cray X-MP/EA (Host & Target)

DTIC Science & Technology

1990-11-12

This feature prevents any significant unexpected and undesired size overhead introduced by the automatic inlining of a called subprogram. Any...PRESERVELAYOUT forces the 5.5.1 compiler to maintain the Ada source order of a given record type, thereby, preventing the compiler from performing this...Environment, Volme 2: Prgram nng Guide assignments to the copied array in Ada do not affect the Fortran version of the array. The dimensions and order of
Development of a Navier-Stokes algorithm for parallel-processing supercomputers. Ph.D. Thesis - Colorado State Univ., Dec. 1988

NASA Technical Reports Server (NTRS)

Swisshelm, Julie M.

1989-01-01

An explicit flow solver, applicable to the hierarchy of model equations ranging from Euler to full Navier-Stokes, is combined with several techniques designed to reduce computational expense. The computational domain consists of local grid refinements embedded in a global coarse mesh, where the locations of these refinements are defined by the physics of the flow. Flow characteristics are also used to determine which set of model equations is appropriate for solution in each region, thereby reducing not only the number of grid points at which the solution must be obtained, but also the computational effort required to get that solution. Acceleration to steady-state is achieved by applying multigrid on each of the subgrids, regardless of the particular model equations being solved. Since each of these components is explicit, advantage can readily be taken of the vector- and parallel-processing capabilities of machines such as the Cray X-MP and Cray-2.
Utilization of parallel processing in solving the inviscid form of the average-passage equation system for multistage turbomachinery

NASA Technical Reports Server (NTRS)

Mulac, Richard A.; Celestina, Mark L.; Adamczyk, John J.; Misegades, Kent P.; Dawson, Jef M.

1987-01-01

A procedure is outlined which utilizes parallel processing to solve the inviscid form of the average-passage equation system for multistage turbomachinery along with a description of its implementation in a FORTRAN computer code, MSTAGE. A scheme to reduce the central memory requirements of the program is also detailed. Both the multitasking and I/O routines referred to are specific to the Cray X-MP line of computers and its associated SSD (Solid-State Disk). Results are presented for a simulation of a two-stage rocket engine fuel pump turbine.
A computational/experimental study of the flow around a body of revolution at angle of attack

NASA Technical Reports Server (NTRS)

Zilliac, Gregory G.

1986-01-01

The incompressible Navier-Stokes equations are numerically solved for steady flow around an ogive-cylinder (fineness ration 4.5) at angle of attack. The three-dimensional vortical flow is investigated with emphasis on the tip and the near wake region. The implicit, finite-difference computation is performed on the CRAY X-MP computer using the method of pseudo-compressibility. Comparisons of computational results with results of a companion towing tank experiment are presented for two symmetric leeside flow cases of moderate angles of attack. The topology of the flow is discussed and conclusions are drawn concerning the growth and stability of the primary vortices.
Transferring ecosystem simulation codes to supercomputers

NASA Technical Reports Server (NTRS)

Skiles, J. W.; Schulbach, C. H.

1995-01-01

Many ecosystem simulation computer codes have been developed in the last twenty-five years. This development took place initially on main-frame computers, then mini-computers, and more recently, on micro-computers and workstations. Supercomputing platforms (both parallel and distributed systems) have been largely unused, however, because of the perceived difficulty in accessing and using the machines. Also, significant differences in the system architectures of sequential, scalar computers and parallel and/or vector supercomputers must be considered. We have transferred a grassland simulation model (developed on a VAX) to a Cray Y-MP/C90. We describe porting the model to the Cray and the changes we made to exploit the parallelism in the application and improve code execution. The Cray executed the model 30 times faster than the VAX and 10 times faster than a Unix workstation. We achieved an additional speedup of 30 percent by using the compiler's vectoring and 'in-line' capabilities. The code runs at only about 5 percent of the Cray's peak speed because it ineffectively uses the vector and parallel processing capabilities of the Cray. We expect that by restructuring the code, it could execute an additional six to ten times faster.
Solving large sparse eigenvalue problems on supercomputers

NASA Technical Reports Server (NTRS)

Philippe, Bernard; Saad, Youcef

1988-01-01

An important problem in scientific computing consists in finding a few eigenvalues and corresponding eigenvectors of a very large and sparse matrix. The most popular methods to solve these problems are based on projection techniques on appropriate subspaces. The main attraction of these methods is that they only require the use of the matrix in the form of matrix by vector multiplications. The implementations on supercomputers of two such methods for symmetric matrices, namely Lanczos' method and Davidson's method are compared. Since one of the most important operations in these two methods is the multiplication of vectors by the sparse matrix, methods of performing this operation efficiently are discussed. The advantages and the disadvantages of each method are compared and implementation aspects are discussed. Numerical experiments on a one processor CRAY 2 and CRAY X-MP are reported. Possible parallel implementations are also discussed.
Multitasking domain decomposition fast Poisson solvers on the Cray Y-MP

NASA Technical Reports Server (NTRS)

Chan, Tony F.; Fatoohi, Rod A.

1990-01-01

The results of multitasking implementation of a domain decomposition fast Poisson solver on eight processors of the Cray Y-MP are presented. The object of this research is to study the performance of domain decomposition methods on a Cray supercomputer and to analyze the performance of different multitasking techniques using highly parallel algorithms. Two implementations of multitasking are considered: macrotasking (parallelism at the subroutine level) and microtasking (parallelism at the do-loop level). A conventional FFT-based fast Poisson solver is also multitasked. The results of different implementations are compared and analyzed. A speedup of over 7.4 on the Cray Y-MP running in a dedicated environment is achieved for all cases.
A vectorized Lanczos eigensolver for high-performance computers

NASA Technical Reports Server (NTRS)

Bostic, Susan W.

1990-01-01

The computational strategies used to implement a Lanczos-based-method eigensolver on the latest generation of supercomputers are described. Several examples of structural vibration and buckling problems are presented that show the effects of using optimization techniques to increase the vectorization of the computational steps. The data storage and access schemes and the tools and strategies that best exploit the computer resources are presented. The method is implemented on the Convex C220, the Cray 2, and the Cray Y-MP computers. Results show that very good computation rates are achieved for the most computationally intensive steps of the Lanczos algorithm and that the Lanczos algorithm is many times faster than other methods extensively used in the past.
A parallel algorithm for generation and assembly of finite element stiffness and mass matrices

NASA Technical Reports Server (NTRS)

Storaasli, O. O.; Carmona, E. A.; Nguyen, D. T.; Baddourah, M. A.

1991-01-01

A new algorithm is proposed for parallel generation and assembly of the finite element stiffness and mass matrices. The proposed assembly algorithm is based on a node-by-node approach rather than the more conventional element-by-element approach. The new algorithm's generality and computation speed-up when using multiple processors are demonstrated for several practical applications on multi-processor Cray Y-MP and Cray 2 supercomputers.
The utilization of parallel processing in solving the inviscid form of the average-passage equation system for multistage turbomachinery

NASA Technical Reports Server (NTRS)

Mulac, Richard A.; Celestina, Mark L.; Adamczyk, John J.; Misegades, Kent P.; Dawson, Jef M.

1987-01-01

A procedure is outlined which utilizes parallel processing to solve the inviscid form of the average-passage equation system for multistage turbomachinery along with a description of its implementation in a FORTRAN computer code, MSTAGE. A scheme to reduce the central memory requirements of the program is also detailed. Both the multitasking and I/O routines referred to in this paper are specific to the Cray X-MP line of computers and its associated SSD (Solid-state Storage Device). Results are presented for a simulation of a two-stage rocket engine fuel pump turbine.
Time-partitioning simulation models for calculation on parallel computers

NASA Technical Reports Server (NTRS)

Milner, Edward J.; Blech, Richard A.; Chima, Rodrick V.

1987-01-01

A technique allowing time-staggered solution of partial differential equations is presented in this report. Using this technique, called time-partitioning, simulation execution speedup is proportional to the number of processors used because all processors operate simultaneously, with each updating of the solution grid at a different time point. The technique is limited by neither the number of processors available nor by the dimension of the solution grid. Time-partitioning was used to obtain the flow pattern through a cascade of airfoils, modeled by the Euler partial differential equations. An execution speedup factor of 1.77 was achieved using a two processor Cray X-MP/24 computer.
Vectorized and multitasked solution of the few-group neutron diffusion equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zee, S.K.; Turinsky, P.J.; Shayer, Z.

1989-03-01

A numerical algorithm with parallelism was used to solve the two-group, multidimensional neutron diffusion equations on computers characterized by shared memory, vector pipeline, and multi-CPU architecture features. Specifically, solutions were obtained on the Cray X/MP-48, the IBM-3090 with vector facilities, and the FPS-164. The material-centered mesh finite difference method approximation and outer-inner iteration method were employed. Parallelism was introduced in the inner iterations using the cyclic line successive overrelaxation iterative method and solving in parallel across lines. The outer iterations were completed using the Chebyshev semi-iterative method that allows parallelism to be introduced in both space and energy groups. Formore » the three-dimensional model, power, soluble boron, and transient fission product feedbacks were included. Concentrating on the pressurized water reactor (PWR), the thermal-hydraulic calculation of moderator density assumed single-phase flow and a closed flow channel, allowing parallelism to be introduced in the solution across the radial plane. Using a pinwise detail, quarter-core model of a typical PWR in cycle 1, for the two-dimensional model without feedback the measured million floating point operations per second (MFLOPS)/vector speedups were 83/11.7. 18/2.2, and 2.4/5.6 on the Cray, IBM, and FPS without multitasking, respectively. Lower performance was observed with a coarser mesh, i.e., shorter vector length, due to vector pipeline start-up. For an 18 x 18 x 30 (x-y-z) three-dimensional model with feedback of the same core, MFLOPS/vector speedups of --61/6.7 and an execution time of 0.8 CPU seconds on the Cray without multitasking were measured. Finally, using two CPUs and the vector pipelines of the Cray, a multitasking efficiency of 81% was noted for the three-dimensional model.« less

Discrete sensitivity derivatives of the Navier-Stokes equations with a parallel Krylov solver

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Taylor, Arthur C., III

1994-01-01

This paper solves an 'incremental' form of the sensitivity equations derived by differentiating the discretized thin-layer Navier Stokes equations with respect to certain design variables of interest. The equations are solved with a parallel, preconditioned Generalized Minimal RESidual (GMRES) solver on a distributed-memory architecture. The 'serial' sensitivity analysis code is parallelized by using the Single Program Multiple Data (SPMD) programming model, domain decomposition techniques, and message-passing tools. Sensitivity derivatives are computed for low and high Reynolds number flows over a NACA 1406 airfoil on a 32-processor Intel Hypercube, and found to be identical to those computed on a single-processor Cray Y-MP. It is estimated that the parallel sensitivity analysis code has to be run on 40-50 processors of the Intel Hypercube in order to match the single-processor processing time of a Cray Y-MP.
A comparison of five benchmarks

NASA Technical Reports Server (NTRS)

Huss, Janice E.; Pennline, James A.

1987-01-01

Five benchmark programs were obtained and run on the NASA Lewis CRAY X-MP/24. A comparison was made between the programs codes and between the methods for calculating performance figures. Several multitasking jobs were run to gain experience in how parallel performance is measured.
Multitasking the three-dimensional transport code TORT on CRAY platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Azmy, Y.Y.; Barnett, D.A.; Burre, C.A.

1996-04-01

The multitasking options in the three-dimensional neutral particle transport code TORT originally implemented for Cray`s CTSS operating system are revived and extended to run on Cray Y/MP and C90 computers using the UNICOS operating system. These include two coarse-grained domain decompositions; across octants, and across directions within an octant, termed Octant Parallel (OP), and Direction Parallel (DP), respectively. Parallel performance of the DP is significantly enhanced by increasing the task grain size and reducing load imbalance via dynamic scheduling of the discrete angles among the participating tasks. Substantial Wall Clock speedup factors, approaching 4.5 using 8 tasks, have been measuredmore » in a time-sharing environment, and generally depend on the test problem specifications, number of tasks, and machine loading during execution.« less
Fluid behavior in microgravity environment

NASA Technical Reports Server (NTRS)

Hung, R. J.; Lee, C. C.; Tsao, Y. D.

1990-01-01

The instability of liquid and gas interface can be induced by the presence of longitudinal and lateral accelerations, vehicle vibration, and rotational fields of spacecraft in a microgravity environment. In a spacecraft design, the requirements of settled propellant are different for tank pressurization, engine restart, venting, or propellent transfer. In this paper, the dynamical behavior of liquid propellant, fluid reorientation, and propellent resettling have been carried out through the execution of a CRAY X-MP super computer to simulate fluid management in a microgravity environment. Characteristics of slosh waves excited by the restoring force field of gravity jitters have also been investigated.
Numerical simulation of three dimensional transonic flows

NASA Technical Reports Server (NTRS)

Sahu, Jubaraj; Steger, Joseph L.

1987-01-01

The three-dimensional flow over a projectile has been computed using an implicit, approximately factored, partially flux-split algorithm. A simple composite grid scheme has been developed in which a single grid is partitioned into a series of smaller grids for applications which require an external large memory device such as the SSD of the CRAY X-MP/48, or multitasking. The accuracy and stability of the composite grid scheme has been tested by numerically simulating the flow over an ellipsoid at angle of attack and comparing the solution with a single grid solution. The flowfield over a projectile at M = 0.96 and 4 deg angle-of-attack has been computed using a fine grid, and compared with experiment.
Extensions and improvements on XTRAN3S

NASA Technical Reports Server (NTRS)

Borland, C. J.

1989-01-01

Improvements to the XTRAN3S computer program are summarized. Work on this code, for steady and unsteady aerodynamic and aeroelastic analysis in the transonic flow regime has concentrated on the following areas: (1) Maintenance of the XTRAN3S code, including correction of errors, enhancement of operational capability, and installation on the Cray X-MP system; (2) Extension of the vectorization concepts in XTRAN3S to include additional areas of the code for improved execution speed; (3) Modification of the XTRAN3S algorithm for improved numerical stability for swept, tapered wing cases and improved computational efficiency; and (4) Extension of the wing-only version of XTRAN3S to include pylon and nacelle or external store capability.
A Programming Model Performance Study Using the NAS Parallel Benchmarks

DOE PAGES

Shan, Hongzhang; Blagojević, Filip; Min, Seung-Jai; ...

2010-01-01

Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the threemore » programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors. We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.« less
Parallel Calculation of Sensitivity Derivatives for Aircraft Design using Automatic Differentiation

NASA Technical Reports Server (NTRS)

Bischof, c. H.; Green, L. L.; Haigler, K. J.; Knauff, T. L., Jr.

1994-01-01

Sensitivity derivative (SD) calculation via automatic differentiation (AD) typical of that required for the aerodynamic design of a transport-type aircraft is considered. Two ways of computing SD via code generated by the ADIFOR automatic differentiation tool are compared for efficiency and applicability to problems involving large numbers of design variables. A vector implementation on a Cray Y-MP computer is compared with a coarse-grained parallel implementation on an IBM SP1 computer, employing a Fortran M wrapper. The SD are computed for a swept transport wing in turbulent, transonic flow; the number of geometric design variables varies from 1 to 60 with coupling between a wing grid generation program and a state-of-the-art, 3-D computational fluid dynamics program, both augmented for derivative computation via AD. For a small number of design variables, the Cray Y-MP implementation is much faster. As the number of design variables grows, however, the IBM SP1 becomes an attractive alternative in terms of compute speed, job turnaround time, and total memory available for solutions with large numbers of design variables. The coarse-grained parallel implementation also can be moved easily to a network of workstations.
Multitasking 3-D forward modeling using high-order finite difference methods on the Cray X-MP/416

DOE Office of Scientific and Technical Information (OSTI.GOV)

Terki-Hassaine, O.; Leiss, E.L.

1988-01-01

The CRAY X-MP/416 was used to multitask 3-D forward modeling by the high-order finite difference method. Flowtrace analysis reveals that the most expensive operation in the unitasked program is a matrix vector multiplication. The in-core and out-of-core versions of a reentrant subroutine can perform any fraction of the matrix vector multiplication independently, a pattern compatible with multitasking. The matrix vector multiplication routine can be distributed over two to four processors. The rest of the program utilizes the microtasking feature that lets the system treat independent iterations of DO-loops as subtasks to be performed by any available processor. The availability ofmore » the Solid-State Storage Device (SSD) meant the I/O wait time was virtually zero. A performance study determined a theoretical speedup, taking into account the multitasking overhead. Multitasking programs utilizing both macrotasking and microtasking features obtained actual speedups that were approximately 80% of the ideal speedup.« less
Some Problems and Solutions in Transferring Ecosystem Simulation Codes to Supercomputers

NASA Technical Reports Server (NTRS)

Skiles, J. W.; Schulbach, C. H.

1994-01-01

Many computer codes for the simulation of ecological systems have been developed in the last twenty-five years. This development took place initially on main-frame computers, then mini-computers, and more recently, on micro-computers and workstations. Recent recognition of ecosystem science as a High Performance Computing and Communications Program Grand Challenge area emphasizes supercomputers (both parallel and distributed systems) as the next set of tools for ecological simulation. Transferring ecosystem simulation codes to such systems is not a matter of simply compiling and executing existing code on the supercomputer since there are significant differences in the system architectures of sequential, scalar computers and parallel and/or vector supercomputers. To more appropriately match the application to the architecture (necessary to achieve reasonable performance), the parallelism (if it exists) of the original application must be exploited. We discuss our work in transferring a general grassland simulation model (developed on a VAX in the FORTRAN computer programming language) to a Cray Y-MP. We show the Cray shared-memory vector-architecture, and discuss our rationale for selecting the Cray. We describe porting the model to the Cray and executing and verifying a baseline version, and we discuss the changes we made to exploit the parallelism in the application and to improve code execution. As a result, the Cray executed the model 30 times faster than the VAX 11/785 and 10 times faster than a Sun 4 workstation. We achieved an additional speed-up of approximately 30 percent over the original Cray run by using the compiler's vectorizing capabilities and the machine's ability to put subroutines and functions "in-line" in the code. With the modifications, the code still runs at only about 5% of the Cray's peak speed because it makes ineffective use of the vector processing capabilities of the Cray. We conclude with a discussion and future plans.
Scalability of Parallel Spatial Direct Numerical Simulations on Intel Hypercube and IBM SP1 and SP2

NASA Technical Reports Server (NTRS)

Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad

1995-01-01

The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 parallel computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be parallelized on a distributed-memory parallel machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same simulation, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this simulation, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical simulations; incompressible viscous flows; spectral methods; finite differences; parallel computing.
Efficient multitasking of Choleski matrix factorization on CRAY supercomputers

NASA Technical Reports Server (NTRS)

Overman, Andrea L.; Poole, Eugene L.

1991-01-01

A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm.
Implementation of a parallel unstructured Euler solver on shared and distributed memory architectures

NASA Technical Reports Server (NTRS)

Mavriplis, D. J.; Das, Raja; Saltz, Joel; Vermeland, R. E.

1992-01-01

An efficient three dimensional unstructured Euler solver is parallelized on a Cray Y-MP C90 shared memory computer and on an Intel Touchstone Delta distributed memory computer. This paper relates the experiences gained and describes the software tools and hardware used in this study. Performance comparisons between two differing architectures are made.
Application of high-performance computing to numerical simulation of human movement

NASA Technical Reports Server (NTRS)

Anderson, F. C.; Ziegler, J. M.; Pandy, M. G.; Whalen, R. T.

1995-01-01

We have examined the feasibility of using massively-parallel and vector-processing supercomputers to solve large-scale optimization problems for human movement. Specifically, we compared the computational expense of determining the optimal controls for the single support phase of gait using a conventional serial machine (SGI Iris 4D25), a MIMD parallel machine (Intel iPSC/860), and a parallel-vector-processing machine (Cray Y-MP 8/864). With the human body modeled as a 14 degree-of-freedom linkage actuated by 46 musculotendinous units, computation of the optimal controls for gait could take up to 3 months of CPU time on the Iris. Both the Cray and the Intel are able to reduce this time to practical levels. The optimal solution for gait can be found with about 77 hours of CPU on the Cray and with about 88 hours of CPU on the Intel. Although the overall speeds of the Cray and the Intel were found to be similar, the unique capabilities of each machine are better suited to different portions of the computational algorithm used. The Intel was best suited to computing the derivatives of the performance criterion and the constraints whereas the Cray was best suited to parameter optimization of the controls. These results suggest that the ideal computer architecture for solving very large-scale optimal control problems is a hybrid system in which a vector-processing machine is integrated into the communication network of a MIMD parallel machine.
NAS (Numerical Aerodynamic Simulation Program) technical summaries, March 1989 - February 1990

NASA Technical Reports Server (NTRS)

1990-01-01

Given here are selected scientific results from the Numerical Aerodynamic Simulation (NAS) Program's third year of operation. During this year, the scientific community was given access to a Cray-2 and a Cray Y-MP supercomputer. Topics covered include flow field analysis of fighter wing configurations, large-scale ocean modeling, the Space Shuttle flow field, advanced computational fluid dynamics (CFD) codes for rotary-wing airloads and performance prediction, turbulence modeling of separated flows, airloads and acoustics of rotorcraft, vortex-induced nonlinearities on submarines, and standing oblique detonation waves.
PAN AIR: A computer program for predicting subsonic or supersonic linear potential flows about arbitrary configurations using a higher order panel method. Volume 4: Maintenance document (version 3.0)

NASA Technical Reports Server (NTRS)

Purdon, David J.; Baruah, Pranab K.; Bussoletti, John E.; Epton, Michael A.; Massena, William A.; Nelson, Franklin D.; Tsurusaki, Kiyoharu

1990-01-01

The Maintenance Document Version 3.0 is a guide to the PAN AIR software system, a system which computes the subsonic or supersonic linear potential flow about a body of nearly arbitrary shape, using a higher order panel method. The document describes the overall system and each program module of the system. Sufficient detail is given for program maintenance, updating, and modification. It is assumed that the reader is familiar with programming and CRAY computer systems. The PAN AIR system was written in FORTRAN 4 language except for a few CAL language subroutines which exist in the PAN AIR library. Structured programming techniques were used to provide code documentation and maintainability. The operating systems accommodated are COS 1.11, COS 1.12, COS 1.13, and COS 1.14 on the CRAY 1S, 1M, and X-MP computing systems. The system is comprised of a data base management system, a program library, an execution control module, and nine separate FORTRAN technical modules. Each module calculates part of the posed PAN AIR problem. The data base manager is used to communicate between modules and within modules. The technical modules must be run in a prescribed fashion for each PAN AIR problem. In order to ease the problem of supplying the many JCL cards required to execute the modules, a set of CRAY procedures (PAPROCS) was created to automatically supply most of the JCL cards. Most of this document has not changed for Version 3.0. It now, however, strictly applies only to PAN AIR version 3.0. The major changes are: (1) additional sections covering the new FDP module (which calculates streamlines and offbody points); (2) a complete rewrite of the section on the MAG module; and (3) strict applicability to CRAY computing systems.
Computation of transonic flow about helicopter rotor blades

NASA Technical Reports Server (NTRS)

Arieli, R.; Tauber, M. E.; Saunders, D. A.; Caughey, D. A.

1986-01-01

An inviscid, nonconservative, three-dimensional full-potential flow code, ROT22, has been developed for computing the quasi-steady flow about a lifting rotor blade. The code is valid throughout the subsonic and transonic regime. Calculations from the code are compared with detailed laser velocimeter measurements made in the tip region of a nonlifting rotor at a tip Mach number of 0.95 and zero advance ratio. In addition, comparisons are made with chordwise surface pressure measurements obtained in a wind tunnel for a nonlifting rotor blade at transonic tip speeds at advance ratios from 0.40 to 0.50. The overall agreement between theoretical calculations and experiment is very good. A typical run on a CRAY X-MP computer requires about 30 CPU seconds for one rotor position at transonic tip speed.
Performance Analysis of the NAS Y-MP Workload

NASA Technical Reports Server (NTRS)

Bergeron, Robert J.; Kutler, Paul (Technical Monitor)

1997-01-01

This paper describes the performance characteristics of the computational workloads on the NAS Cray Y-MP machines, a Y-MP 832 and later a Y-MP 8128. Hardware measurements indicated that the Y-MP workload performance matured over time, ultimately sustaining an average throughput of 0.8 GFLOPS and a vector operation fraction of 87%. The measurements also revealed an operation rate exceeding 1 per clock period, a well-balanced architecture featuring a strong utilization of vector functional units, and an efficient memory organization. Introduction of the larger memory 8128 increased throughput by allowing a more efficient utilization of CPUs. Throughput also depended on the metering of the batch queues; low-idle Saturday workloads required a buffer of small jobs to prevent memory starvation of the CPU. UNICOS required about 7% of total CPU time to service the 832 workloads; this overhead decreased to 5% for the 8128 workloads. While most of the system time went to service I/O requests, efficient scheduling prevented excessive idle due to I/O wait. System measurements disclosed no obvious bottlenecks in the response of the machine and UNICOS to the workloads. In most cases, Cray-provided software tools were- quite sufficient for measuring the performance of both the machine and operating, system.
Performance of the fusion code GYRO on four generations of Cray computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fahey, Mark R

2014-01-01

GYRO is a code used for the direct numerical simulation of plasma microturbulence. It has been ported to a variety of modern MPP platforms including several modern commodity clusters, IBM SPs, and Cray XC, XT, and XE series machines. We briefly describe the mathematical structure of the equations, the data layout, and the redistribution scheme. Also, while the performance and scaling of GYRO on many of these systems has been shown before, here we show the comparative performance and scaling on four generations of Cray supercomputers including the newest addition - the Cray XC30. The more recently added hybrid OpenMP/MPImore » imple- mentation also shows a great deal of promise on custom HPC systems that utilize fast CPUs and proprietary interconnects. Four machines of varying sizes were used in the experiment, all of which are located at the National Institute for Computational Sciences at the University of Tennessee at Knoxville and Oak Ridge National Laboratory. The advantages, limitations, and performance of using each system are discussed.« less
Applications of CFD and visualization techniques

NASA Technical Reports Server (NTRS)

Saunders, James H.; Brown, Susan T.; Crisafulli, Jeffrey J.; Southern, Leslie A.

1992-01-01

In this paper, three applications are presented to illustrate current techniques for flow calculation and visualization. The first two applications use a commercial computational fluid dynamics (CFD) code, FLUENT, performed on a Cray Y-MP. The results are animated with the aid of data visualization software, apE. The third application simulates a particulate deposition pattern using techniques inspired by developments in nonlinear dynamical systems. These computations were performed on personal computers.

Massively parallel and linear-scaling algorithm for second-order Møller-Plesset perturbation theory applied to the study of supramolecular wires

NASA Astrophysics Data System (ADS)

Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul

2017-03-01

We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the "resolution of the identity second-order Møller-Plesset perturbation theory" (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.
Solving large-scale dynamic systems using band Lanczos method in Rockwell NASTRAN on CRAY X-MP

NASA Technical Reports Server (NTRS)

Gupta, V. K.; Zillmer, S. D.; Allison, R. E.

1986-01-01

The improved cost effectiveness using better models, more accurate and faster algorithms and large scale computing offers more representative dynamic analyses. The band Lanczos eigen-solution method was implemented in Rockwell's version of 1984 COSMIC-released NASTRAN finite element structural analysis computer program to effectively solve for structural vibration modes including those of large complex systems exceeding 10,000 degrees of freedom. The Lanczos vectors were re-orthogonalized locally using the Lanczos Method and globally using the modified Gram-Schmidt method for sweeping rigid-body modes and previously generated modes and Lanczos vectors. The truncated band matrix was solved for vibration frequencies and mode shapes using Givens rotations. Numerical examples are included to demonstrate the cost effectiveness and accuracy of the method as implemented in ROCKWELL NASTRAN. The CRAY version is based on RPK's COSMIC/NASTRAN. The band Lanczos method was more reliable and accurate and converged faster than the single vector Lanczos Method. The band Lanczos method was comparable to the subspace iteration method which was a block version of the inverse power method. However, the subspace matrix tended to be fully populated in the case of subspace iteration and not as sparse as a band matrix.
Multitasking for flows about multiple body configurations using the chimera grid scheme

NASA Technical Reports Server (NTRS)

Dougherty, F. C.; Morgan, R. L.

1987-01-01

The multitasking of a finite-difference scheme using multiple overset meshes is described. In this chimera, or multiple overset mesh approach, a multiple body configuration is mapped using a major grid about the main component of the configuration, with minor overset meshes used to map each additional component. This type of code is well suited to multitasking. Both steady and unsteady two dimensional computations are run on parallel processors on a CRAY-X/MP 48, usually with one mesh per processor. Flow field results are compared with single processor results to demonstrate the feasibility of running multiple mesh codes on parallel processors and to show the increase in efficiency.
A static data flow simulation study at Ames Research Center

NASA Technical Reports Server (NTRS)

Barszcz, Eric; Howard, Lauri S.

1987-01-01

Demands in computational power, particularly in the area of computational fluid dynamics (CFD), led NASA Ames Research Center to study advanced computer architectures. One architecture being studied is the static data flow architecture based on research done by Jack B. Dennis at MIT. To improve understanding of this architecture, a static data flow simulator, written in Pascal, has been implemented for use on a Cray X-MP/48. A matrix multiply and a two-dimensional fast Fourier transform (FFT), two algorithms used in CFD work at Ames, have been run on the simulator. Execution times can vary by a factor of more than 2 depending on the partitioning method used to assign instructions to processing elements. Service time for matching tokens has proved to be a major bottleneck. Loop control and array address calculation overhead can double the execution time. The best sustained MFLOPS rates were less than 50% of the maximum capability of the machine.
Partitioning strategy for efficient nonlinear finite element dynamic analysis on multiprocessor computers

NASA Technical Reports Server (NTRS)

Noor, Ahmed K.; Peters, Jeanne M.

1989-01-01

A computational procedure is presented for the nonlinear dynamic analysis of unsymmetric structures on vector multiprocessor systems. The procedure is based on a novel hierarchical partitioning strategy in which the response of the unsymmetric and antisymmetric response vectors (modes), each obtained by using only a fraction of the degrees of freedom of the original finite element model. The three key elements of the procedure which result in high degree of concurrency throughout the solution process are: (1) mixed (or primitive variable) formulation with independent shape functions for the different fields; (2) operator splitting or restructuring of the discrete equations at each time step to delineate the symmetric and antisymmetric vectors constituting the response; and (3) two level iterative process for generating the response of the structure. An assessment is made of the effectiveness of the procedure on the CRAY X-MP/4 computers.
Massively parallel and linear-scaling algorithm for second-order Moller–Plesset perturbation theory applied to the study of supramolecular wires

DOE PAGES

Kjaergaard, Thomas; Baudin, Pablo; Bykov, Dmytro; ...

2016-11-16

Here, we present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide–Expand–Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide–Expand–Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalabilitymore » of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the “resolution of the identity second-order Moller–Plesset perturbation theory” (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.« less
Simulation and analysis of a geopotential research mission

NASA Technical Reports Server (NTRS)

Schutz, B. E.

1986-01-01

A computer simulation was performed for a Geopotential Research Mission (GRM) to enable study of the gravitational sensitivity of the range/rate measurement between two satellites and to provide a set of simulated measurements to assist in the evaluation of techniques developed for the determination of the gravity field. The simulation, identified as SGRM 8511, was conducted with two satellites in near circular, frozen orbits at 160 km altitude and separated by 300 km. High precision numerical integration of the polar orbits was used with a gravitational field complete to degree and order 180 coefficients and to degree 300 in orders 0 to 10. The set of simulated data for a mission duration of about 32 days was generated on a Cray X-MP computer. The characteristics of the simulation and the nature of the results are described.
Using a multifrontal sparse solver in a high performance, finite element code

NASA Technical Reports Server (NTRS)

King, Scott D.; Lucas, Robert; Raefsky, Arthur

1990-01-01

We consider the performance of the finite element method on a vector supercomputer. The computationally intensive parts of the finite element method are typically the individual element forms and the solution of the global stiffness matrix both of which are vectorized in high performance codes. To further increase throughput, new algorithms are needed. We compare a multifrontal sparse solver to a traditional skyline solver in a finite element code on a vector supercomputer. The multifrontal solver uses the Multiple-Minimum Degree reordering heuristic to reduce the number of operations required to factor a sparse matrix and full matrix computational kernels (e.g., BLAS3) to enhance vector performance. The net result in an order-of-magnitude reduction in run time for a finite element application on one processor of a Cray X-MP.
New tools using the hardware performance monitor to help users tune programs on the Cray X-MP

DOE Office of Scientific and Technical Information (OSTI.GOV)

Engert, D.E.; Rudsinski, L.; Doak, J.

1991-09-25

The performance of a Cray system is highly dependent on the tuning techniques used by individuals on their codes. Many of our users were not taking advantage of the tuning tools that allow them to monitor their own programs by using the Hardware Performance Monitor (HPM). We therefore modified UNICOS to collect HPM data for all processes and to report Mflop ratings based on users, programs, and time used. Our tuning efforts are now being focused on the users and programs that have the best potential for performance improvements. These modifications and some of the more striking performance improvements aremore » described.« less
NASA Langley Research Center's distributed mass storage system

NASA Technical Reports Server (NTRS)

Pao, Juliet Z.; Humes, D. Creig

1993-01-01

There is a trend in institutions with high performance computing and data management requirements to explore mass storage systems with peripherals directly attached to a high speed network. The Distributed Mass Storage System (DMSS) Project at NASA LaRC is building such a system and expects to put it into production use by the end of 1993. This paper presents the design of the DMSS, some experiences in its development and use, and a performance analysis of its capabilities. The special features of this system are: (1) workstation class file servers running UniTree software; (2) third party I/O; (3) HIPPI network; (4) HIPPI/IPI3 disk array systems; (5) Storage Technology Corporation (STK) ACS 4400 automatic cartridge system; (6) CRAY Research Incorporated (CRI) CRAY Y-MP and CRAY-2 clients; (7) file server redundancy provision; and (8) a transition mechanism from the existent mass storage system to the DMSS.
An implementation of a tree code on a SIMD, parallel computer

NASA Technical Reports Server (NTRS)

Olson, Kevin M.; Dorband, John E.

1994-01-01

We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.
Parallel-vector out-of-core equation solver for computational mechanics

NASA Technical Reports Server (NTRS)

Qin, J.; Agarwal, T. K.; Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.

1993-01-01

A parallel/vector out-of-core equation solver is developed for shared-memory computers, such as the Cray Y-MP machine. The input/ output (I/O) time is reduced by using the a synchronous BUFFER IN and BUFFER OUT, which can be executed simultaneously with the CPU instructions. The parallel and vector capability provided by the supercomputers is also exploited to enhance the performance. Numerical applications in large-scale structural analysis are given to demonstrate the efficiency of the present out-of-core solver.
Parallel processing a three-dimensional free-lagrange code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mandell, D.A.; Trease, H.E.

1989-01-01

A three-dimensional, time-dependent free-Lagrange hydrodynamics code has been multitasked and autotasked on a CRAY X-MP/416. The multitasking was done by using the Los Alamos Multitasking Control Library, which is a superset of the CRAY multitasking library. Autotasking is done by using constructs which are only comment cards if the source code is not run through a preprocessor. The three-dimensional algorithm has presented a number of problems that simpler algorithms, such as those for one-dimensional hydrodynamics, did not exhibit. Problems in converting the serial code, originally written for a CRAY-1, to a multitasking code are discussed. Autotasking of a rewritten versionmore » of the code is discussed. Timing results for subroutines and hot spots in the serial code are presented and suggestions for additional tools and debugging aids are given. Theoretical speedup results obtained from Amdahl's law and actual speedup results obtained on a dedicated machine are presented. Suggestions for designing large parallel codes are given.« less
Parallel processing a real code: A case history

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mandell, D.A.; Trease, H.E.

1988-01-01

A three-dimensional, time-dependent Free-Lagrange hydrodynamics code has been multitasked and autotasked on a Cray X-MP/416. The multitasking was done by using the Los Alamos Multitasking Control Library, which is a superset of the Cray multitasking library. Autotasking is done by using constructs which are only comment cards if the source code is not run through a preprocessor. The 3-D algorithm has presented a number of problems that simpler algorithms, such as 1-D hydrodynamics, did not exhibit. Problems in converting the serial code, originally written for a Cray 1, to a multitasking code are discussed, Autotasking of a rewritten version ofmore » the code is discussed. Timing results for subroutines and hot spots in the serial code are presented and suggestions for additional tools and debugging aids are given. Theoretical speedup results obtained from Amdahl's law and actual speedup results obtained on a dedicated machine are presented. Suggestions for designing large parallel codes are given. 8 refs., 13 figs.« less
Vectorized program architectures for supercomputer-aided circuit design

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rizzoli, V.; Ferlito, M.; Neri, A.

1986-01-01

Vector processors (supercomputers) can be effectively employed in MIC or MMIC applications to solve problems of large numerical size such as broad-band nonlinear design or statistical design (yield optimization). In order to fully exploit the capabilities of a vector hardware, any program architecture must be structured accordingly. This paper presents a possible approach to the ''semantic'' vectorization of microwave circuit design software. Speed-up factors of the order of 50 can be obtained on a typical vector processor (Cray X-MP), with respect to the most powerful scaler computers (CDC 7600), with cost reductions of more than one order of magnitude. Thismore » could broaden the horizon of microwave CAD techniques to include problems that are practically out of the reach of conventional systems.« less
A Block-LU Update for Large-Scale Linear Programming

DTIC Science & Technology

1990-01-01

linear programming problems. Results are given from runs on the Cray Y -MP. 1. Introduction We wish to use the simplex method [Dan63] to solve the...standard linear program, minimize cTx subject to Ax = b 1< x <U, where A is an m by n matrix and c, x, 1, u, and b are of appropriate dimension. The simplex...the identity matrix. The basis is used to solve for the search direction y and the dual variables 7r in the following linear systems: Bky = aq (1.2) and
Comparison of the MPP with other supercomputers for LANDSAT data processing

NASA Technical Reports Server (NTRS)

Ozga, Martin

1987-01-01

The massively parallel processor is compared to the CRAY X-MP and the CYBER-205 for LANDSAT data processing. The maximum likelihood classification algorithm is the basis for comparison since this algorithm is simple to implement and vectorizes very well. The algorithm was implemented on all three machines and tested by classifying the same full scene of LANDSAT multispectral scan data. Timings are compared as well as features of the machines and available software.
Dynamic overset grid communication on distributed memory parallel processors

NASA Technical Reports Server (NTRS)

Barszcz, Eric; Weeratunga, Sisira K.; Meakin, Robert L.

1993-01-01

A parallel distributed memory implementation of intergrid communication for dynamic overset grids is presented. Included are discussions of various options considered during development. Results are presented comparing an Intel iPSC/860 to a single processor Cray Y-MP. Results for grids in relative motion show the iPSC/860 implementation to be faster than the Cray implementation.
INS3D - NUMERICAL SOLUTION OF THE INCOMPRESSIBLE NAVIER-STOKES EQUATIONS IN THREE-DIMENSIONAL GENERALIZED CURVILINEAR COORDINATES (CRAY VERSION)

NASA Technical Reports Server (NTRS)

Rogers, S. E.

1994-01-01

INS3D computes steady-state solutions to the incompressible Navier-Stokes equations. The INS3D approach utilizes pseudo-compressibility combined with an approximate factorization scheme. This computational fluid dynamics (CFD) code has been verified on problems such as flow through a channel, flow over a backwardfacing step and flow over a circular cylinder. Three dimensional cases include flow over an ogive cylinder, flow through a rectangular duct, wind tunnel inlet flow, cylinder-wall juncture flow and flow through multiple posts mounted between two plates. INS3D uses a pseudo-compressibility approach in which a time derivative of pressure is added to the continuity equation, which together with the momentum equations form a set of four equations with pressure and velocity as the dependent variables. The equations' coordinates are transformed for general three dimensional applications. The equations are advanced in time by the implicit, non-iterative, approximately-factored, finite-difference scheme of Beam and Warming. The numerical stability of the scheme depends on the use of higher-order smoothing terms to damp out higher-frequency oscillations caused by second-order central differencing. The artificial compressibility introduces pressure (sound) waves of finite speed (whereas the speed of sound would be infinite in an incompressible fluid). As the solution converges, these pressure waves die out, causing the derivation of pressure with respect to time to approach zero. Thus, continuity is satisfied for the incompressible fluid in the steady state. Computational efficiency is achieved using a diagonal algorithm. A block tri-diagonal option is also available. When a steady-state solution is reached, the modified continuity equation will satisfy the divergence-free velocity field condition. INS3D is capable of handling several different types of boundaries encountered in numerical simulations, including solid-surface, inflow and outflow, and far-field boundaries. Three machine versions of INS3D are available. INS3D for the CRAY is written in CRAY FORTRAN for execution on a CRAY X-MP under COS, INS3D for the IBM is written in FORTRAN 77 for execution on an IBM 3090 under the VM or MVS operating system, and INS3D for DEC RISC-based systems is written in RISC FORTRAN for execution on a DEC workstation running RISC ULTRIX 3.1 or later. The CRAY version has a central memory requirement of 730279 words. The central memory requirement for the IBM is 150Mb. The memory requirement for the DEC RISC ULTRIX version is 3Mb of main memory. INS3D was developed in 1987. The port to the IBM was done in 1990. The port to the DECstation 3100 was done in 1991. CRAY is a registered trademark of Cray Research Inc. IBM is a registered trademark of International Business Machines. DEC, DECstation, and ULTRIX are trademarks of the Digital Equipment Corporation.
Proceedings of the Scientific Conference on Obscuration and Aerosol Research Held in Aberdeen Maryland on 27-30 June 1989

DTIC Science & Technology

1990-08-01

corneal structure for both normal and swollen corneas. Other problems of future interest are the understanding of the structure of scarred and dystrophied ...METHOD AND RESULTS The system of equations is solved numerically on a Cray X-MP by a finite element method with 9-node Lagrange quadrilaterals ( Becker ...Appl. Math., 42, 430. Becker , E. B., G. F. Carey, and J. T. Oden, 1981. Finite Elements: An Introduction (Vol. 1), Prentice- Hall, Englewood Cliffs, New

Evaluating the networking characteristics of the Cray XC-40 Intel Knights Landing-based Cori supercomputer at NERSC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Doerfler, Douglas; Austin, Brian; Cook, Brandon

There are many potential issues associated with deploying the Intel Xeon Phi™ (code named Knights Landing [KNL]) manycore processor in a large-scale supercomputer. One in particular is the ability to fully utilize the high-speed communications network, given that the serial performance of a Xeon Phi TM core is a fraction of a Xeon®core. In this paper, we take a look at the trade-offs associated with allocating enough cores to fully utilize the Aries high-speed network versus cores dedicated to computation, e.g., the trade-off between MPI and OpenMP. In addition, we evaluate new features of Cray MPI in support of KNL,more » such as internode optimizations. We also evaluate one-sided programming models such as Unified Parallel C. We quantify the impact of the above trade-offs and features using a suite of National Energy Research Scientific Computing Center applications.« less
An Automated Parallel Image Registration Technique Based on the Correlation of Wavelet Features

NASA Technical Reports Server (NTRS)

LeMoigne, Jacqueline; Campbell, William J.; Cromp, Robert F.; Zukor, Dorothy (Technical Monitor)

2001-01-01

With the increasing importance of multiple platform/multiple remote sensing missions, fast and automatic integration of digital data from disparate sources has become critical to the success of these endeavors. Our work utilizes maxima of wavelet coefficients to form the basic features of a correlation-based automatic registration algorithm. Our wavelet-based registration algorithm is tested successfully with data from the National Oceanic and Atmospheric Administration (NOAA) Advanced Very High Resolution Radiometer (AVHRR) and the Landsat/Thematic Mapper(TM), which differ by translation and/or rotation. By the choice of high-frequency wavelet features, this method is similar to an edge-based correlation method, but by exploiting the multi-resolution nature of a wavelet decomposition, our method achieves higher computational speeds for comparable accuracies. This algorithm has been implemented on a Single Instruction Multiple Data (SIMD) massively parallel computer, the MasPar MP-2, as well as on the CrayT3D, the Cray T3E and a Beowulf cluster of Pentium workstations.
A transient FETI methodology for large-scale parallel implicit computations in structural mechanics

NASA Technical Reports Server (NTRS)

Farhat, Charbel; Crivelli, Luis; Roux, Francois-Xavier

1992-01-01

Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallelize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet -- and perhaps will never -- be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system.
DOE Office of Scientific and Technical Information (OSTI.GOV)

D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas

The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning.
Implementations of BLAST for parallel computers.

PubMed

Jülich, A

1995-02-01

The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Simulation and analysis of a geopotential research mission

NASA Technical Reports Server (NTRS)

Schutz, B. E.

1987-01-01

Computer simulations were performed for a Geopotential Research Mission (GRM) to enable the study of the gravitational sensitivity of the range rate measurements between the two satellites and to provide a set of simulated measurements to assist in the evaluation of techniques developed for the determination of the gravity field. The simulations were conducted with two satellites in near circular, frozen orbits at 160 km altitudes separated by 300 km. High precision numerical integration of the polar orbits were used with a gravitational field complete to degree and order 360. The set of simulated data for a mission duration of about 32 days was generated on a Cray X-MP computer. The results presented cover the most recent simulation, S8703, and includes a summary of the numerical integration of the simulated trajectories, a summary of the requirements to compute nominal reference trajectories to meet the initial orbit determination requirements for the recovery of the geopotential, an analysis of the nature of the one way integrated Doppler measurements associated with the simulation, and a discussion of the data set to be made available.
ARC-2012-ACD12-0020-005

NASA Image and Video Library

2012-02-10

Then and Now: These images illustrate the dramatic improvement in NASA computing power over the last 23 years, and its effect on the number of grid points used for flow simulations. At left, an image from the first full-body Navier-Stokes simulation (1988) of an F-16 fighter jet showing pressure on the aircraft body, and fore-body streamlines at Mach 0.90. This steady-state solution took 25 hours using a single Cray X-MP processor to solve the 500,000 grid-point problem. Investigator: Neal Chaderjian, NASA Ames Research Center At right, a 2011 snapshot from a Navier-Stokes simulation of a V-22 Osprey rotorcraft in hover. The blade vortices interact with the smaller turbulent structures. This very detailed simulation used 660 million grid points, and ran on 1536 processors on the Pleiades supercomputer for 180 hours. Investigator: Neal Chaderjian, NASA Ames Research Center; Image: Tim Sandstrom, NASA Ames Research Center
DOE Office of Scientific and Technical Information (OSTI.GOV)

Reed, D.A.; Grunwald, D.C.

The spectrum of parallel processor designs can be divided into three sections according to the number and complexity of the processors. At one end there are simple, bit-serial processors. Any one of thee processors is of little value, but when it is coupled with many others, the aggregate computing power can be large. This approach to parallel processing can be likened to a colony of termites devouring a log. The most notable examples of this approach are the NASA/Goodyear Massively Parallel Processor, which has 16K one-bit processors, and the Thinking Machines Connection Machine, which has 64K one-bit processors. At themore » other end of the spectrum, a small number of processors, each built using the fastest available technology and the most sophisticated architecture, are combined. An example of this approach is the Cray X-MP. This type of parallel processing is akin to four woodmen attacking the log with chainsaws.« less
Survey of new vector computers: The CRAY 1S from CRAY research; the CYBER 205 from CDC and the parallel computer from ICL - architecture and programming

NASA Technical Reports Server (NTRS)

Gentzsch, W.

1982-01-01

Problems which can arise with vector and parallel computers are discussed in a user oriented context. Emphasis is placed on the algorithms used and the programming techniques adopted. Three recently developed supercomputers are examined and typical application examples are given in CRAY FORTRAN, CYBER 205 FORTRAN and DAP (distributed array processor) FORTRAN. The systems performance is compared. The addition of parts of two N x N arrays is considered. The influence of the architecture on the algorithms and programming language is demonstrated. Numerical analysis of magnetohydrodynamic differential equations by an explicit difference method is illustrated, showing very good results for all three systems. The prognosis for supercomputer development is assessed.
INS3D - NUMERICAL SOLUTION OF THE INCOMPRESSIBLE NAVIER-STOKES EQUATIONS IN THREE-DIMENSIONAL GENERALIZED CURVILINEAR COORDINATES (DEC RISC ULTRIX VERSION)

NASA Technical Reports Server (NTRS)

Biyabani, S. R.

1994-01-01

INS3D computes steady-state solutions to the incompressible Navier-Stokes equations. The INS3D approach utilizes pseudo-compressibility combined with an approximate factorization scheme. This computational fluid dynamics (CFD) code has been verified on problems such as flow through a channel, flow over a backwardfacing step and flow over a circular cylinder. Three dimensional cases include flow over an ogive cylinder, flow through a rectangular duct, wind tunnel inlet flow, cylinder-wall juncture flow and flow through multiple posts mounted between two plates. INS3D uses a pseudo-compressibility approach in which a time derivative of pressure is added to the continuity equation, which together with the momentum equations form a set of four equations with pressure and velocity as the dependent variables. The equations' coordinates are transformed for general three dimensional applications. The equations are advanced in time by the implicit, non-iterative, approximately-factored, finite-difference scheme of Beam and Warming. The numerical stability of the scheme depends on the use of higher-order smoothing terms to damp out higher-frequency oscillations caused by second-order central differencing. The artificial compressibility introduces pressure (sound) waves of finite speed (whereas the speed of sound would be infinite in an incompressible fluid). As the solution converges, these pressure waves die out, causing the derivation of pressure with respect to time to approach zero. Thus, continuity is satisfied for the incompressible fluid in the steady state. Computational efficiency is achieved using a diagonal algorithm. A block tri-diagonal option is also available. When a steady-state solution is reached, the modified continuity equation will satisfy the divergence-free velocity field condition. INS3D is capable of handling several different types of boundaries encountered in numerical simulations, including solid-surface, inflow and outflow, and far-field boundaries. Three machine versions of INS3D are available. INS3D for the CRAY is written in CRAY FORTRAN for execution on a CRAY X-MP under COS, INS3D for the IBM is written in FORTRAN 77 for execution on an IBM 3090 under the VM or MVS operating system, and INS3D for DEC RISC-based systems is written in RISC FORTRAN for execution on a DEC workstation running RISC ULTRIX 3.1 or later. The CRAY version has a central memory requirement of 730279 words. The central memory requirement for the IBM is 150Mb. The memory requirement for the DEC RISC ULTRIX version is 3Mb of main memory. INS3D was developed in 1987. The port to the IBM was done in 1990. The port to the DECstation 3100 was done in 1991. CRAY is a registered trademark of Cray Research Inc. IBM is a registered trademark of International Business Machines. DEC, DECstation, and ULTRIX are trademarks of the Digital Equipment Corporation.
Tough2{_}MP: A parallel version of TOUGH2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Wu, Yu-Shu; Ding, Chris

2003-04-09

TOUGH2{_}MP is a massively parallel version of TOUGH2. It was developed for running on distributed-memory parallel computers to simulate large simulation problems that may not be solved by the standard, single-CPU TOUGH2 code. The new code implements an efficient massively parallel scheme, while preserving the full capacity and flexibility of the original TOUGH2 code. The new software uses the METIS software package for grid partitioning and AZTEC software package for linear-equation solving. The standard message-passing interface is adopted for communication among processors. Numerical performance of the current version code has been tested on CRAY-T3E and IBM RS/6000 SP platforms. Inmore » addition, the parallel code has been successfully applied to real field problems of multi-million-cell simulations for three-dimensional multiphase and multicomponent fluid and heat flow, as well as solute transport. In this paper, we will review the development of the TOUGH2{_}MP, and discuss the basic features, modules, and their applications.« less
User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Earth Sciences Division; Zhang, Keni; Zhang, Keni

TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator ismore » to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used. To familiarize users with the parallel code, illustrative sample problems are presented.« less
Application of a hybrid MPI/OpenMP approach for parallel groundwater model calibration using multi-core computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan

2010-01-01

Calibration of groundwater models involves hundreds to thousands of forward solutions, each of which may solve many transient coupled nonlinear partial differential equations, resulting in a computationally intensive problem. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelisms in software and hardware to reduce calibration time on multi-core computers. HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for direct solutions for a reactive transport model application, and a field-scale coupled flow and transport model application. In the reactive transport model, a single parallelizable loop is identified to account for over 97% of the total computational time using GPROF.more » Addition of a few lines of OpenMP compiler directives to the loop yields a speedup of about 10 on a 16-core compute node. For the field-scale model, parallelizable loops in 14 of 174 HGC5 subroutines that require 99% of the execution time are identified. As these loops are parallelized incrementally, the scalability is found to be limited by a loop where Cray PAT detects over 90% cache missing rates. With this loop rewritten, similar speedup as the first application is achieved. The OpenMP-parallelized code can be run efficiently on multiple workstations in a network or multiple compute nodes on a cluster as slaves using parallel PEST to speedup model calibration. To run calibration on clusters as a single task, the Levenberg Marquardt algorithm is added to HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, 100 200 compute cores are used to reduce the calibration time from weeks to a few hours for these two applications. This approach is applicable to most of the existing groundwater model codes for many applications.« less
Computer aided design of monolithic microwave and millimeter wave integrated circuits and subsystems

NASA Astrophysics Data System (ADS)

Ku, Walter H.

1987-08-01

This interim technical report presents results of research on the computer aided design of monolithic microwave and millimeter wave integrated circuits and subsystems. A specific objective is to extend the state-of-the-art of the Computer Aided Design (CAD) of the monolithic microwave and millimeter wave integrated circuits (MIMIC). In this reporting period, we have derived a new model for the high electron mobility transistor (HEMT) based on a nonlinear charge control formulation which takes into consideration the variation of the 2DEG distance offset from the heterointerface as a function of bias. Pseudomorphic InGaAs/GaAs HEMT devices have been successfully fabricated at UCSD. For a 1 micron gate length, a maximum transconductance of 320 mS/mm was obtained. In cooperation with TRW, devices with 0.15 micron and 0.25 micron gate lengths have been successfully fabricated and tested. New results on the design of ultra-wideband distributed amplifiers using 0.15 micron pseudomorphic InGaAs/GaAs HEMT's have also been obtained. In addition, two-dimensional models of the submicron MESFET's, HEMT's and HBT's are currently being developed for the CRAY X-MP/48 supercomputer. Preliminary results obtained are also presented in this report.
Application of a distributed network in computational fluid dynamic simulations

NASA Technical Reports Server (NTRS)

Deshpande, Manish; Feng, Jinzhang; Merkle, Charles L.; Deshpande, Ashish

1994-01-01

A general-purpose 3-D, incompressible Navier-Stokes algorithm is implemented on a network of concurrently operating workstations using parallel virtual machine (PVM) and compared with its performance on a CRAY Y-MP and on an Intel iPSC/860. The problem is relatively computationally intensive, and has a communication structure based primarily on nearest-neighbor communication, making it ideally suited to message passing. Such problems are frequently encountered in computational fluid dynamics (CDF), and their solution is increasingly in demand. The communication structure is explicitly coded in the implementation to fully exploit the regularity in message passing in order to produce a near-optimal solution. Results are presented for various grid sizes using up to eight processors.
TRASYS - THERMAL RADIATION ANALYZER SYSTEM (CRAY VERSION WITH NASADIG)

NASA Technical Reports Server (NTRS)

Anderson, G. E.

1994-01-01

The Thermal Radiation Analyzer System, TRASYS, is a computer software system with generalized capability to solve the radiation related aspects of thermal analysis problems. TRASYS computes the total thermal radiation environment for a spacecraft in orbit. The software calculates internode radiation interchange data as well as incident and absorbed heat rate data originating from environmental radiant heat sources. TRASYS provides data of both types in a format directly usable by such thermal analyzer programs as SINDA/FLUINT (available from COSMIC, program number MSC-21528). One primary feature of TRASYS is that it allows users to write their own driver programs to organize and direct the preprocessor and processor library routines in solving specific thermal radiation problems. The preprocessor first reads and converts the user's geometry input data into the form used by the processor library routines. Then, the preprocessor accepts the user's driving logic, written in the TRASYS modified FORTRAN language. In many cases, the user has a choice of routines to solve a given problem. Users may also provide their own routines where desirable. In particular, the user may write output routines to provide for an interface between TRASYS and any thermal analyzer program using the R-C network concept. Input to the TRASYS program consists of Options and Edit data, Model data, and Logic Flow and Operations data. Options and Edit data provide for basic program control and user edit capability. The Model data describe the problem in terms of geometry and other properties. This information includes surface geometry data, documentation data, nodal data, block coordinate system data, form factor data, and flux data. Logic Flow and Operations data house the user's driver logic, including the sequence of subroutine calls and the subroutine library. Output from TRASYS consists of two basic types of data: internode radiation interchange data, and incident and absorbed heat rate data. The flexible structure of TRASYS allows considerable freedom in the definition and choice of solution method for a thermal radiation problem. The program's flexible structure has also allowed TRASYS to retain the same basic input structure as the authors update it in order to keep up with changing requirements. Among its other important features are the following: 1) up to 3200 node problem size capability with shadowing by intervening opaque or semi-transparent surfaces; 2) choice of diffuse, specular, or diffuse/specular radiant interchange solutions; 3) a restart capability that minimizes recomputing; 4) macroinstructions that automatically provide the executive logic for orbit generation that optimizes the use of previously completed computations; 5) a time variable geometry package that provides automatic pointing of the various parts of an articulated spacecraft and an automatic look-back feature that eliminates redundant form factor calculations; 6) capability to specify submodel names to identify sets of surfaces or components as an entity; and 7) subroutines to perform functions which save and recall the internodal and/or space form factors in subsequent steps for nodes with fixed geometry during a variable geometry run. There are two machine versions of TRASYS v27: a DEC VAX version and a Cray UNICOS version. Both versions require installation of the NASADIG library (MSC-21801 for DEC VAX or COS-10049 for CRAY), which is available from COSMIC either separately or bundled with TRASYS. The NASADIG (NASA Device Independent Graphics Library) plot package provides a pictorial representation of input geometry, orbital/orientation parameters, and heating rate output as a function of time. NASADIG supports Tektronix terminals. The CRAY version of TRASYS v27 is written in FORTRAN 77 for batch or interactive execution and has been implemented on CRAY X-MP and CRAY Y-MP series computers running UNICOS. The standard distribution medium for MSC-21959 (CRAY version without NASADIG) is a 1600 BPI 9-track magnetic tape in UNIX tar format. The standard distribution medium for COS-10040 (CRAY version with NASADIG) is a set of two 6250 BPI 9-track magnetic tapes in UNIX tar format. Alternate distribution media and formats are available upon request. The DEC VAX version of TRASYS v27 is written in FORTRAN 77 for batch execution (only the plotting driver program is interactive) and has been implemented on a DEC VAX 8650 computer under VMS. Since the source codes for MSC-21030 and COS-10026 are in VAX/VMS text library files and DEC Command Language files, COSMIC will only provide these programs in the following formats: MSC-21030, TRASYS (DEC VAX version without NASADIG) is available on a 1600 BPI 9-track magnetic tape in VAX BACKUP format (standard distribution medium) or in VAX BACKUP format on a TK50 tape cartridge; COS-10026, TRASYS (DEC VAX version with NASADIG), is available in VAX BACKUP format on a set of three 6250 BPI 9-track magnetic tapes (standard distribution medium) or a set of three TK50 tape cartridges in VAX BACKUP format. TRASYS was last updated in 1993.
High performance computing applications in neurobiological research

NASA Technical Reports Server (NTRS)

Ross, Muriel D.; Cheng, Rei; Doshay, David G.; Linton, Samuel W.; Montgomery, Kevin; Parnas, Bruce R.

1994-01-01

The human nervous system is a massively parallel processor of information. The vast numbers of neurons, synapses and circuits is daunting to those seeking to understand the neural basis of consciousness and intellect. Pervading obstacles are lack of knowledge of the detailed, three-dimensional (3-D) organization of even a simple neural system and the paucity of large scale, biologically relevant computer simulations. We use high performance graphics workstations and supercomputers to study the 3-D organization of gravity sensors as a prototype architecture foreshadowing more complex systems. Scaled-down simulations run on a Silicon Graphics workstation and scale-up, three-dimensional versions run on the Cray Y-MP and CM5 supercomputers.
Early Experiences Writing Performance Portable OpenMP 4 Codes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Joubert, Wayne; Hernandez, Oscar R

In this paper, we evaluate the recently available directives in OpenMP 4 to parallelize a computational kernel using both the traditional shared memory approach and the newer accelerator targeting capabilities. In addition, we explore various transformations that attempt to increase application performance portability, and examine the expressiveness and performance implications of using these approaches. For example, we want to understand if the target map directives in OpenMP 4 improve data locality when mapped to a shared memory system, as opposed to the traditional first touch policy approach in traditional OpenMP. To that end, we use recent Cray and Intel compilersmore » to measure the performance variations of a simple application kernel when executed on the OLCF s Titan supercomputer with NVIDIA GPUs and the Beacon system with Intel Xeon Phi accelerators attached. To better understand these trade-offs, we compare our results from traditional OpenMP shared memory implementations to the newer accelerator programming model when it is used to target both the CPU and an attached heterogeneous device. We believe the results and lessons learned as presented in this paper will be useful to the larger user community by providing guidelines that can assist programmers in the development of performance portable code.« less
User's and test case manual for FEMATS

NASA Technical Reports Server (NTRS)

Chatterjee, Arindam; Volakis, John; Nurnberger, Mike; Natzke, John

1995-01-01

The FEMATS program incorporates first-order edge-based finite elements and vector absorbing boundary conditions into the scattered field formulation for computation of the scattering from three-dimensional geometries. The code has been validated extensively for a large class of geometries containing inhomogeneities and satisfying transition conditions. For geometries that are too large for the workstation environment, the FEMATS code has been optimized to run on various supercomputers. Currently, FEMATS has been configured to run on the HP 9000 workstation, vectorized for the Cray Y-MP, and parallelized to run on the Kendall Square Research (KSR) architecture and the Intel Paragon.
Synthesis, hydrolysis rates, supercomputer modeling, and antibacterial activity of bicyclic tetrahydropyridazinones.

PubMed

Jungheim, L N; Boyd, D B; Indelicato, J M; Pasini, C E; Preston, D A; Alborn, W E

1991-05-01

Bicyclic tetrahydropyridazinones, such as 13, where X are strongly electron-withdrawing groups, were synthesized to investigate their antibacterial activity. These delta-lactams are homologues of bicyclic pyrazolidinones 15, which were the first non-beta-lactam containing compounds reported to bind to penicillin-binding proteins (PBPs). The delta-lactam compounds exhibit poor antibacterial activity despite having reactivity comparable to the gamma-lactams. Molecular modeling based on semiempirical molecular orbital calculations on a Cray X-MP supercomputer, predicted that the reason for the inactivity is steric bulk hindering high affinity of the compounds to PBPs, as well as high conformational flexibility of the tetrahydropyridazinone ring hampering effective alignment of the molecule in the active site. Subsequent PBP binding experiments confirmed that this class of compound does not bind to PBPs.

TRASYS - THERMAL RADIATION ANALYZER SYSTEM (DEC VAX VERSION WITH NASADIG)

NASA Technical Reports Server (NTRS)

Anderson, G. E.

1994-01-01

The Thermal Radiation Analyzer System, TRASYS, is a computer software system with generalized capability to solve the radiation related aspects of thermal analysis problems. TRASYS computes the total thermal radiation environment for a spacecraft in orbit. The software calculates internode radiation interchange data as well as incident and absorbed heat rate data originating from environmental radiant heat sources. TRASYS provides data of both types in a format directly usable by such thermal analyzer programs as SINDA/FLUINT (available from COSMIC, program number MSC-21528). One primary feature of TRASYS is that it allows users to write their own driver programs to organize and direct the preprocessor and processor library routines in solving specific thermal radiation problems. The preprocessor first reads and converts the user's geometry input data into the form used by the processor library routines. Then, the preprocessor accepts the user's driving logic, written in the TRASYS modified FORTRAN language. In many cases, the user has a choice of routines to solve a given problem. Users may also provide their own routines where desirable. In particular, the user may write output routines to provide for an interface between TRASYS and any thermal analyzer program using the R-C network concept. Input to the TRASYS program consists of Options and Edit data, Model data, and Logic Flow and Operations data. Options and Edit data provide for basic program control and user edit capability. The Model data describe the problem in terms of geometry and other properties. This information includes surface geometry data, documentation data, nodal data, block coordinate system data, form factor data, and flux data. Logic Flow and Operations data house the user's driver logic, including the sequence of subroutine calls and the subroutine library. Output from TRASYS consists of two basic types of data: internode radiation interchange data, and incident and absorbed heat rate data. The flexible structure of TRASYS allows considerable freedom in the definition and choice of solution method for a thermal radiation problem. The program's flexible structure has also allowed TRASYS to retain the same basic input structure as the authors update it in order to keep up with changing requirements. Among its other important features are the following: 1) up to 3200 node problem size capability with shadowing by intervening opaque or semi-transparent surfaces; 2) choice of diffuse, specular, or diffuse/specular radiant interchange solutions; 3) a restart capability that minimizes recomputing; 4) macroinstructions that automatically provide the executive logic for orbit generation that optimizes the use of previously completed computations; 5) a time variable geometry package that provides automatic pointing of the various parts of an articulated spacecraft and an automatic look-back feature that eliminates redundant form factor calculations; 6) capability to specify submodel names to identify sets of surfaces or components as an entity; and 7) subroutines to perform functions which save and recall the internodal and/or space form factors in subsequent steps for nodes with fixed geometry during a variable geometry run. There are two machine versions of TRASYS v27: a DEC VAX version and a Cray UNICOS version. Both versions require installation of the NASADIG library (MSC-21801 for DEC VAX or COS-10049 for CRAY), which is available from COSMIC either separately or bundled with TRASYS. The NASADIG (NASA Device Independent Graphics Library) plot package provides a pictorial representation of input geometry, orbital/orientation parameters, and heating rate output as a function of time. NASADIG supports Tektronix terminals. The CRAY version of TRASYS v27 is written in FORTRAN 77 for batch or interactive execution and has been implemented on CRAY X-MP and CRAY Y-MP series computers running UNICOS. The standard distribution medium for MSC-21959 (CRAY version without NASADIG) is a 1600 BPI 9-track magnetic tape in UNIX tar format. The standard distribution medium for COS-10040 (CRAY version with NASADIG) is a set of two 6250 BPI 9-track magnetic tapes in UNIX tar format. Alternate distribution media and formats are available upon request. The DEC VAX version of TRASYS v27 is written in FORTRAN 77 for batch execution (only the plotting driver program is interactive) and has been implemented on a DEC VAX 8650 computer under VMS. Since the source codes for MSC-21030 and COS-10026 are in VAX/VMS text library files and DEC Command Language files, COSMIC will only provide these programs in the following formats: MSC-21030, TRASYS (DEC VAX version without NASADIG) is available on a 1600 BPI 9-track magnetic tape in VAX BACKUP format (standard distribution medium) or in VAX BACKUP format on a TK50 tape cartridge; COS-10026, TRASYS (DEC VAX version with NASADIG), is available in VAX BACKUP format on a set of three 6250 BPI 9-track magnetic tapes (standard distribution medium) or a set of three TK50 tape cartridges in VAX BACKUP format. TRASYS was last updated in 1993.
TRASYS - THERMAL RADIATION ANALYZER SYSTEM (DEC VAX VERSION WITHOUT NASADIG)

NASA Technical Reports Server (NTRS)

Vogt, R. A.

1994-01-01

The Thermal Radiation Analyzer System, TRASYS, is a computer software system with generalized capability to solve the radiation related aspects of thermal analysis problems. TRASYS computes the total thermal radiation environment for a spacecraft in orbit. The software calculates internode radiation interchange data as well as incident and absorbed heat rate data originating from environmental radiant heat sources. TRASYS provides data of both types in a format directly usable by such thermal analyzer programs as SINDA/FLUINT (available from COSMIC, program number MSC-21528). One primary feature of TRASYS is that it allows users to write their own driver programs to organize and direct the preprocessor and processor library routines in solving specific thermal radiation problems. The preprocessor first reads and converts the user's geometry input data into the form used by the processor library routines. Then, the preprocessor accepts the user's driving logic, written in the TRASYS modified FORTRAN language. In many cases, the user has a choice of routines to solve a given problem. Users may also provide their own routines where desirable. In particular, the user may write output routines to provide for an interface between TRASYS and any thermal analyzer program using the R-C network concept. Input to the TRASYS program consists of Options and Edit data, Model data, and Logic Flow and Operations data. Options and Edit data provide for basic program control and user edit capability. The Model data describe the problem in terms of geometry and other properties. This information includes surface geometry data, documentation data, nodal data, block coordinate system data, form factor data, and flux data. Logic Flow and Operations data house the user's driver logic, including the sequence of subroutine calls and the subroutine library. Output from TRASYS consists of two basic types of data: internode radiation interchange data, and incident and absorbed heat rate data. The flexible structure of TRASYS allows considerable freedom in the definition and choice of solution method for a thermal radiation problem. The program's flexible structure has also allowed TRASYS to retain the same basic input structure as the authors update it in order to keep up with changing requirements. Among its other important features are the following: 1) up to 3200 node problem size capability with shadowing by intervening opaque or semi-transparent surfaces; 2) choice of diffuse, specular, or diffuse/specular radiant interchange solutions; 3) a restart capability that minimizes recomputing; 4) macroinstructions that automatically provide the executive logic for orbit generation that optimizes the use of previously completed computations; 5) a time variable geometry package that provides automatic pointing of the various parts of an articulated spacecraft and an automatic look-back feature that eliminates redundant form factor calculations; 6) capability to specify submodel names to identify sets of surfaces or components as an entity; and 7) subroutines to perform functions which save and recall the internodal and/or space form factors in subsequent steps for nodes with fixed geometry during a variable geometry run. There are two machine versions of TRASYS v27: a DEC VAX version and a Cray UNICOS version. Both versions require installation of the NASADIG library (MSC-21801 for DEC VAX or COS-10049 for CRAY), which is available from COSMIC either separately or bundled with TRASYS. The NASADIG (NASA Device Independent Graphics Library) plot package provides a pictorial representation of input geometry, orbital/orientation parameters, and heating rate output as a function of time. NASADIG supports Tektronix terminals. The CRAY version of TRASYS v27 is written in FORTRAN 77 for batch or interactive execution and has been implemented on CRAY X-MP and CRAY Y-MP series computers running UNICOS. The standard distribution medium for MSC-21959 (CRAY version without NASADIG) is a 1600 BPI 9-track magnetic tape in UNIX tar format. The standard distribution medium for COS-10040 (CRAY version with NASADIG) is a set of two 6250 BPI 9-track magnetic tapes in UNIX tar format. Alternate distribution media and formats are available upon request. The DEC VAX version of TRASYS v27 is written in FORTRAN 77 for batch execution (only the plotting driver program is interactive) and has been implemented on a DEC VAX 8650 computer under VMS. Since the source codes for MSC-21030 and COS-10026 are in VAX/VMS text library files and DEC Command Language files, COSMIC will only provide these programs in the following formats: MSC-21030, TRASYS (DEC VAX version without NASADIG) is available on a 1600 BPI 9-track magnetic tape in VAX BACKUP format (standard distribution medium) or in VAX BACKUP format on a TK50 tape cartridge; COS-10026, TRASYS (DEC VAX version with NASADIG), is available in VAX BACKUP format on a set of three 6250 BPI 9-track magnetic tapes (standard distribution medium) or a set of three TK50 tape cartridges in VAX BACKUP format. TRASYS was last updated in 1993.
Particle simulation on heterogeneous distributed supercomputers

NASA Technical Reports Server (NTRS)

Becker, Jeffrey C.; Dagum, Leonardo

1993-01-01

We describe the implementation and performance of a three dimensional particle simulation distributed between a Thinking Machines CM-2 and a Cray Y-MP. These are connected by a combination of two high-speed networks: a high-performance parallel interface (HIPPI) and an optical network (UltraNet). This is the first application to use this configuration at NASA Ames Research Center. We describe our experience implementing and using the application and report the results of several timing measurements. We show that the distribution of applications across disparate supercomputing platforms is feasible and has reasonable performance. In addition, several practical aspects of the computing environment are discussed.
Using a Cray Y-MP as an array processor for a RISC Workstation

NASA Technical Reports Server (NTRS)

Lamaster, Hugh; Rogallo, Sarah J.

1992-01-01

As microprocessors increase in power, the economics of centralized computing has changed dramatically. At the beginning of the 1980's, mainframes and super computers were often considered to be cost-effective machines for scalar computing. Today, microprocessor-based RISC (reduced-instruction-set computer) systems have displaced many uses of mainframes and supercomputers. Supercomputers are still cost competitive when processing jobs that require both large memory size and high memory bandwidth. One such application is array processing. Certain numerical operations are appropriate to use in a Remote Procedure Call (RPC)-based environment. Matrix multiplication is an example of an operation that can have a sufficient number of arithmetic operations to amortize the cost of an RPC call. An experiment which demonstrates that matrix multiplication can be executed remotely on a large system to speed the execution over that experienced on a workstation is described.
A Performance Evaluation of the Cray X1 for Scientific Applications

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Borrill, Julian; Canning, Andrew; Carter, Jonathan; Djomehri, M. Jahed; Shan, Hongzhang; Skinner, David

2003-01-01

The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and capacity computers because of their generality, scalability, and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently-released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.
A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction

DOE PAGES

Kumar, B.; Huang, C. -H.; Sadayappan, P.; ...

1995-01-01

In this article, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storagemore » of size O(7 n ) for multiplying 2 n × 2 n matrices. We present a modified formulation in which the working storage requirement is reduced to O(4 n ). The modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor. Performance results on a Cray Y-MP8/64 are presented.« less
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Saini, Subhash; Grassi, Charles

1994-01-01

The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
LANZ: Software solving the large sparse symmetric generalized eigenproblem

NASA Technical Reports Server (NTRS)

Jones, Mark T.; Patrick, Merrell L.

1990-01-01

A package, LANZ, for solving the large symmetric generalized eigenproblem is described. The package was tested on four different architectures: Convex 200, CRAY Y-MP, Sun-3, and Sun-4. The package uses a Lanczos' method and is based on recent research into solving the generalized eigenproblem.
Eigensolution of finite element problems in a completely connected parallel architecture

NASA Technical Reports Server (NTRS)

Akl, Fred A.; Morel, Michael R.

1989-01-01

A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.
LARCRIM user's guide, version 1.0

NASA Technical Reports Server (NTRS)

Davis, John S.; Heaphy, William J.

1993-01-01

LARCRIM is a relational database management system (RDBMS) which performs the conventional duties of an RDBMS with the added feature that it can store attributes which consist of arrays or matrices. This makes it particularly valuable for scientific data management. It is accessible as a stand-alone system and through an application program interface. The stand-alone system may be executed in two modes: menu or command. The menu mode prompts the user for the input required to create, update, and/or query the database. The command mode requires the direct input of LARCRIM commands. Although LARCRIM is an update of an old database family, its performance on modern computers is quite satisfactory. LARCRIM is written in FORTRAN 77 and runs under the UNIX operating system. Versions have been released for the following computers: SUN (3 & 4), Convex, IRIS, Hewlett-Packard, CRAY 2 & Y-MP.
RATFOR user's guide version 2.0

NASA Technical Reports Server (NTRS)

Helmle, L. C.

1985-01-01

This document is a user's guide for RATFOR at Ames Research Center. The main part of the document is a general description of RATFOR, and the appendix is devoted to a machine specific implementation for the Cray X-MP. The general stylistic features of RATFOR are discussed, including the block structure, keywords, source code, format, and the notion of tokens. There is a section on the basic control structures (IF-ELSE, ELSE IF, WHILE, FOR, DO, REPEAT-UNTIL, BREAK, NEXT), and there is a section on the statements that extend FORTRAN's capabilities (DEFINE, MACRO, INCLUDE, STRING). THE appendix discusses everything needed to compile and run a basic job, the preprocessor options, the supported character sets, the generated listings, fatal errors, and program limitations and the differences from standard FORTRAN.
NASADIG - NASA DEVICE INDEPENDENT GRAPHICS LIBRARY (AMDAHL VERSION)

NASA Technical Reports Server (NTRS)

Rogers, J. E.

1994-01-01

The NASA Device Independent Graphics Library, NASADIG, can be used with many computer-based engineering and management applications. The library gives the user the opportunity to translate data into effective graphic displays for presentation. The software offers many features which allow the user flexibility in creating graphics. These include two-dimensional plots, subplot projections in 3D-space, surface contour line plots, and surface contour color-shaded plots. Routines for three-dimensional plotting, wireframe surface plots, surface plots with hidden line removal, and surface contour line plots are provided. Other features include polar and spherical coordinate plotting, world map plotting utilizing either cylindrical equidistant or Lambert equal area projection, plot translation, plot rotation, plot blowup, splines and polynomial interpolation, area blanking control, multiple log/linear axes, legends and text control, curve thickness control, and multiple text fonts (18 regular, 4 bold). NASADIG contains several groups of subroutines. Included are subroutines for plot area and axis definition; text set-up and display; area blanking; line style set-up, interpolation, and plotting; color shading and pattern control; legend, text block, and character control; device initialization; mixed alphabets setting; and other useful functions. The usefulness of many routines is dependent on the prior definition of basic parameters. The program's control structure uses a serial-level construct with each routine restricted for activation at some prescribed level(s) of problem definition. NASADIG provides the following output device drivers: Selanar 100XL, VECTOR Move/Draw ASCII and PostScript files, Tektronix 40xx, 41xx, and 4510 Rasterizer, DEC VT-240 (4014 mode), IBM AT/PC compatible with SmartTerm 240 emulator, HP Lasergrafix Film Recorder, QMS 800/1200, DEC LN03+ Laserprinters, and HP LaserJet (Series III). NASADIG is written in FORTRAN and is available for several platforms. NASADIG 5.7 is available for DEC VAX series computers running VMS 5.0 or later (MSC-21801), Cray X-MP and Y-MP series computers running UNICOS (COS-10049), and Amdahl 5990 mainframe computers running UTS (COS-10050). NASADIG 5.1 is available for UNIX-based operating systems (MSC-22001). The UNIX version has been successfully implemented on Sun4 series computers running SunOS, SGI IRIS computers running IRIX, Hewlett Packard 9000 computers running HP-UX, and Convex computers running Convex OS (MSC-22001). The standard distribution medium for MSC-21801 is a set of two 6250 BPI 9-track magnetic tapes in DEC VAX BACKUP format. It is also available on a set of two TK50 tape cartridges in DEC VAX BACKUP format. The standard distribution medium for COS-10049 and COS-10050 is a 6250 BPI 9-track magnetic tape in UNIX tar format. Other distribution media and formats may be available upon request. The standard distribution medium for MSC-22001 is a .25 inch streaming magnetic tape cartridge (Sun QIC-24) in UNIX tar format. Alternate distribution media and formats are available upon request. With minor modification, the UNIX source code can be ported to other platforms including IBM PC/AT series computers and compatibles. NASADIG is also available bundled with TRASYS, the Thermal Radiation Analysis System (COS-10026, DEC VAX version; COS-10040, CRAY version).
NASADIG - NASA DEVICE INDEPENDENT GRAPHICS LIBRARY (UNIX VERSION)

NASA Technical Reports Server (NTRS)

Rogers, J. E.

1994-01-01

The NASA Device Independent Graphics Library, NASADIG, can be used with many computer-based engineering and management applications. The library gives the user the opportunity to translate data into effective graphic displays for presentation. The software offers many features which allow the user flexibility in creating graphics. These include two-dimensional plots, subplot projections in 3D-space, surface contour line plots, and surface contour color-shaded plots. Routines for three-dimensional plotting, wireframe surface plots, surface plots with hidden line removal, and surface contour line plots are provided. Other features include polar and spherical coordinate plotting, world map plotting utilizing either cylindrical equidistant or Lambert equal area projection, plot translation, plot rotation, plot blowup, splines and polynomial interpolation, area blanking control, multiple log/linear axes, legends and text control, curve thickness control, and multiple text fonts (18 regular, 4 bold). NASADIG contains several groups of subroutines. Included are subroutines for plot area and axis definition; text set-up and display; area blanking; line style set-up, interpolation, and plotting; color shading and pattern control; legend, text block, and character control; device initialization; mixed alphabets setting; and other useful functions. The usefulness of many routines is dependent on the prior definition of basic parameters. The program's control structure uses a serial-level construct with each routine restricted for activation at some prescribed level(s) of problem definition. NASADIG provides the following output device drivers: Selanar 100XL, VECTOR Move/Draw ASCII and PostScript files, Tektronix 40xx, 41xx, and 4510 Rasterizer, DEC VT-240 (4014 mode), IBM AT/PC compatible with SmartTerm 240 emulator, HP Lasergrafix Film Recorder, QMS 800/1200, DEC LN03+ Laserprinters, and HP LaserJet (Series III). NASADIG is written in FORTRAN and is available for several platforms. NASADIG 5.7 is available for DEC VAX series computers running VMS 5.0 or later (MSC-21801), Cray X-MP and Y-MP series computers running UNICOS (COS-10049), and Amdahl 5990 mainframe computers running UTS (COS-10050). NASADIG 5.1 is available for UNIX-based operating systems (MSC-22001). The UNIX version has been successfully implemented on Sun4 series computers running SunOS, SGI IRIS computers running IRIX, Hewlett Packard 9000 computers running HP-UX, and Convex computers running Convex OS (MSC-22001). The standard distribution medium for MSC-21801 is a set of two 6250 BPI 9-track magnetic tapes in DEC VAX BACKUP format. It is also available on a set of two TK50 tape cartridges in DEC VAX BACKUP format. The standard distribution medium for COS-10049 and COS-10050 is a 6250 BPI 9-track magnetic tape in UNIX tar format. Other distribution media and formats may be available upon request. The standard distribution medium for MSC-22001 is a .25 inch streaming magnetic tape cartridge (Sun QIC-24) in UNIX tar format. Alternate distribution media and formats are available upon request. With minor modification, the UNIX source code can be ported to other platforms including IBM PC/AT series computers and compatibles. NASADIG is also available bundled with TRASYS, the Thermal Radiation Analysis System (COS-10026, DEC VAX version; COS-10040, CRAY version).
Chemical calculations on Cray computers

NASA Technical Reports Server (NTRS)

Taylor, Peter R.; Bauschlicher, Charles W., Jr.; Schwenke, David W.

1989-01-01

The influence of recent developments in supercomputing on computational chemistry is discussed with particular reference to Cray computers and their pipelined vector/limited parallel architectures. After reviewing Cray hardware and software the performance of different elementary program structures are examined, and effective methods for improving program performance are outlined. The computational strategies appropriate for obtaining optimum performance in applications to quantum chemistry and dynamics are discussed. Finally, some discussion is given of new developments and future hardware and software improvements.
A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gittens, Alex; Kottalam, Jey; Yang, Jiyan

We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2 cluster, a Cray XC40 system, and an experimental Cray cluster. We implemented this factorization both as a parallelized C implementation with hand-tuned optimizations and in Scala using the Apache Spark high-level cluster computing framework. We obtained consistent performance across the three platforms: using Spark we were able to process the 1TB size dataset in under 30 minutes with 960 cores on all systems, with themore » fastest times obtained on the experimental Cray cluster. In comparison, the C implementation was 21X faster on the Amazon EC2 system, due to careful cache optimizations, bandwidth-friendly access of matrices and vector computation using SIMD units. We report these results and their implications on the hardware and software issues arising in supporting data-centric workloads in parallel and distributed environments.« less
surf3d: A 3-D finite-element program for the analysis of surface and corner cracks in solids subjected to mode-1 loadings

NASA Technical Reports Server (NTRS)

Raju, I. S.; Newman, J. C., Jr.

1993-01-01

A computer program, surf3d, that uses the 3D finite-element method to calculate the stress-intensity factors for surface, corner, and embedded cracks in finite-thickness plates with and without circular holes, was developed. The cracks are assumed to be either elliptic or part eliptic in shape. The computer program uses eight-noded hexahedral elements to model the solid. The program uses a skyline storage and solver. The stress-intensity factors are evaluated using the force method, the crack-opening displacement method, and the 3-D virtual crack closure methods. In the manual the input to and the output of the surf3d program are described. This manual also demonstrates the use of the program and describes the calculation of the stress-intensity factors. Several examples with sample data files are included with the manual. To facilitate modeling of the user's crack configuration and loading, a companion program (a preprocessor program) that generates the data for the surf3d called gensurf was also developed. The gensurf program is a three dimensional mesh generator program that requires minimal input and that builds a complete data file for surf3d. The program surf3d is operational on Unix machines such as CRAY Y-MP, CRAY-2, and Convex C-220.
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code

NASA Astrophysics Data System (ADS)

Mendygral, P. J.; Radcliffe, N.; Kandalla, K.; Porter, D.; O'Neill, B. J.; Nolting, C.; Edmon, P.; Donnert, J. M. F.; Jones, T. W.

2017-02-01

We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it may be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.
Shooting and bouncing rays - Calculating the RCS of an arbitrarily shaped cavity

NASA Technical Reports Server (NTRS)

Ling, Hao; Chou, Ri-Chee; Lee, Shung-Wu

1989-01-01

A ray-shooting approach is presented for calculating the interior radar cross section (RCS) from a partially open cavity. In the problem considered, a dense grid of rays is launched into the cavity through the opening. The rays bounce from the cavity walls based on the laws of geometrical optics and eventually exit the cavity via the aperture. The ray-bouncing method is based on tracking a large number of rays launched into the cavity through the opening and determining the geometrical optics field associated with each ray by taking into consideration (1) the geometrical divergence factor, (2) polarization, and (3) material loading of the cavity walls. A physical optics scheme is then applied to compute the backscattered field from the exit rays. This method is so simple in concept that there is virtually no restriction on the shape or material loading of the cavity. Numerical results obtained by this method are compared with those for the modal analysis for a circular cylinder terminated by a PEC plate. RCS results for an S-bend circular cylinder generated on the Cray X-MP supercomputer show significant RCS reduction. Some of the limitations and possible extensions of this technique are discussed.
A new procedure for dynamic adaption of three-dimensional unstructured grids

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Strawn, Roger

1993-01-01

A new procedure is presented for the simultaneous coarsening and refinement of three-dimensional unstructured tetrahedral meshes. This algorithm allows for localized grid adaption that is used to capture aerodynamic flow features such as vortices and shock waves in helicopter flowfield simulations. The mesh-adaption algorithm is implemented in the C programming language and uses a data structure consisting of a series of dynamically-allocated linked lists. These lists allow the mesh connectivity to be rapidly reconstructed when individual mesh points are added and/or deleted. The algorithm allows the mesh to change in an anisotropic manner in order to efficiently resolve directional flow features. The procedure has been successfully implemented on a single processor of a Cray Y-MP computer. Two sample cases are presented involving three-dimensional transonic flow. Computed results show good agreement with conventional structured-grid solutions for the Euler equations.
ORNL Cray X1 evaluation status report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Agarwal, P.K.; Alexander, R.A.; Apra, E.

2004-05-01

On August 15, 2002 the Department of Energy (DOE) selected the Center for Computational Sciences (CCS) at Oak Ridge National Laboratory (ORNL) to deploy a new scalable vector supercomputer architecture for solving important scientific problems in climate, fusion, biology, nanoscale materials and astrophysics. ''This program is one of the first steps in an initiative designed to provide U.S. scientists with the computational power that is essential to 21st century scientific leadership,'' said Dr. Raymond L. Orbach, director of the department's Office of Science. In FY03, CCS procured a 256-processor Cray X1 to evaluate the processors, memory subsystem, scalability of themore » architecture, software environment and to predict the expected sustained performance on key DOE applications codes. The results of the micro-benchmarks and kernel bench marks show the architecture of the Cray X1 to be exceptionally fast for most operations. The best results are shown on large problems, where it is not possible to fit the entire problem into the cache of the processors. These large problems are exactly the types of problems that are important for the DOE and ultra-scale simulation. Application performance is found to be markedly improved by this architecture: - Large-scale simulations of high-temperature superconductors run 25 times faster than on an IBM Power4 cluster using the same number of processors. - Best performance of the parallel ocean program (POP v1.4.3) is 50 percent higher than on Japan s Earth Simulator and 5 times higher than on an IBM Power4 cluster. - A fusion application, global GYRO transport, was found to be 16 times faster on the X1 than on an IBM Power3. The increased performance allowed simulations to fully resolve questions raised by a prior study. - The transport kernel in the AGILE-BOLTZTRAN astrophysics code runs 15 times faster than on an IBM Power4 cluster using the same number of processors. - Molecular dynamics simulations related to the phenomenon of photon echo run 8 times faster than previously achieved. Even at 256 processors, the Cray X1 system is already outperforming other supercomputers with thousands of processors for a certain class of applications such as climate modeling and some fusion applications. This evaluation is the outcome of a number of meetings with both high-performance computing (HPC) system vendors and application experts over the past 9 months and has received broad-based support from the scientific community and other agencies.« less

Gigaflop performance on a CRAY-2: Multitasking a computational fluid dynamics application

NASA Technical Reports Server (NTRS)

Tennille, Geoffrey M.; Overman, Andrea L.; Lambiotte, Jules J.; Streett, Craig L.

1991-01-01

The methodology is described for converting a large, long-running applications code that executed on a single processor of a CRAY-2 supercomputer to a version that executed efficiently on multiple processors. Although the conversion of every application is different, a discussion of the types of modification used to achieve gigaflop performance is included to assist others in the parallelization of applications for CRAY computers, especially those that were developed for other computers. An existing application, from the discipline of computational fluid dynamics, that had utilized over 2000 hrs of CPU time on CRAY-2 during the previous year was chosen as a test case to study the effectiveness of multitasking on a CRAY-2. The nature of dominant calculations within the application indicated that a sustained computational rate of 1 billion floating-point operations per second, or 1 gigaflop, might be achieved. The code was first analyzed and modified for optimal performance on a single processor in a batch environment. After optimal performance on a single CPU was achieved, the code was modified to use multiple processors in a dedicated environment. The results of these two efforts were merged into a single code that had a sustained computational rate of over 1 gigaflop on a CRAY-2. Timings and analysis of performance are given for both single- and multiple-processor runs.
Performance Analysis of a Hybrid Overset Multi-Block Application on Multiple Architectures

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Biswas, Rupak

2003-01-01

This paper presents a detailed performance analysis of a multi-block overset grid compu- tational fluid dynamics app!ication on multiple state-of-the-art computer architectures. The application is implemented using a hybrid MPI+OpenMP programming paradigm that exploits both coarse and fine-grain parallelism; the former via MPI message passing and the latter via OpenMP directives. The hybrid model also extends the applicability of multi-block programs to large clusters of SNIP nodes by overcoming the restriction that the number of processors be less than the number of grid blocks. A key kernel of the application, namely the LU-SGS linear solver, had to be modified to enhance the performance of the hybrid approach on the target machines. Investigations were conducted on cacheless Cray SX6 vector processors, cache-based IBM Power3 and Power4 architectures, and single system image SGI Origin3000 platforms. Overall results for complex vortex dynamics simulations demonstrate that the SX6 achieves the highest performance and outperforms the RISC-based architectures; however, the best scaling performance was achieved on the Power3.
AMR on the CM-2

NASA Technical Reports Server (NTRS)

Berger, Marsha J.; Saltzman, Jeff S.

1992-01-01

We describe the development of a structured adaptive mesh algorithm (AMR) for the Connection Machine-2 (CM-2). We develop a data layout scheme that preserves locality even for communication between fine and coarse grids. On 8K of a 32K machine we achieve performance slightly less than 1 CPU of the Cray Y-MP. We apply our algorithm to an inviscid compressible flow problem.
Factoring symmetric indefinite matrices on high-performance architectures

NASA Technical Reports Server (NTRS)

Jones, Mark T.; Patrick, Merrell L.

1990-01-01

The Bunch-Kaufman algorithm is the method of choice for factoring symmetric indefinite matrices in many applications. However, the Bunch-Kaufman algorithm does not take advantage of high-performance architectures such as the Cray Y-MP. Three new algorithms, based on Bunch-Kaufman factorization, that take advantage of such architectures are described. Results from an implementation of the third algorithm are presented.
Performance measurements and operational characteristics of the Storage Tek ACS 4400 tape library with the Cray Y-MP EL

NASA Technical Reports Server (NTRS)

Hull, Gary; Ranade, Sanjay

1993-01-01

With over 5000 units sold, the Storage Tek Automated Cartridge System (ACS) 4400 tape library is currently the most popular large automated tape library. Based on 3480/90 tape technology, the library is used as the migration device ('nearline' storage) in high-performance mass storage systems. In its maximum configuration, one ACS 4400 tape library houses sixteen 3480/3490 tape drives and is capable of holding approximately 6000 cartridge tapes. The maximum storage capacity of one library using 3480 tapes is 1.2 TB and the advertised aggregate I/O rate is about 24 MB/s. This paper reports on an extensive set of tests designed to accurately assess the performance capabilities and operational characteristics of one STK ACS 4400 tape library holding approximately 5200 cartridge tapes and configured with eight 3480 tape drives. A Cray Y-MP EL2-256 was configured as its host machine. More than 40,000 tape jobs were run in a variety of conditions to gather data in the areas of channel speed characteristics, robotics motion, time taped mounts, and timed tape reads and writes.
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mendygral, P. J.; Radcliffe, N.; Kandalla, K.

2017-02-01

We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it maymore » be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.« less
Cray Research, Inc. Cray 1-s, Cray FORTRAN translator CFT) version 1. 11 Bugfix 1. Validation summary report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

1983-09-09

This Validation Summary Report (VSR) for the Cray Research, Inc., CRAY FORTRAN Translator (CFT) Version 1.11 Bugfix 1 running under the CRAY Operating System (COS) Version 1.12 provides a consolidated summary of the results obtained from the validation of the subject compiler against the 1978 FORTRAN Standard (X3.9-1978/FIPS PUB 69). The compiler was validated against the Full Level FORTRAN level of FIPS PUB 69. The VSR is made up of several sections showing all the discrepancies found -if any. These include an overview of the validation which lists all categories of discrepancies together with the tests which failed.
Resource-Efficient, Hierarchical Auto-Tuning of a Hybrid Lattice Boltzmann Computation on the Cray XT4

DOE Office of Scientific and Technical Information (OSTI.GOV)

Computational Research Division, Lawrence Berkeley National Laboratory; NERSC, Lawrence Berkeley National Laboratory; Computer Science Department, University of California, Berkeley

2009-05-04

We apply auto-tuning to a hybrid MPI-pthreads lattice Boltzmann computation running on the Cray XT4 at National Energy Research Scientific Computing Center (NERSC). Previous work showed that multicore-specific auto-tuning can improve the performance of lattice Boltzmann magnetohydrodynamics (LBMHD) by a factor of 4x when running on dual- and quad-core Opteron dual-socket SMPs. We extend these studies to the distributed memory arena via a hybrid MPI/pthreads implementation. In addition to conventional auto-tuning at the local SMP node, we tune at the message-passing level to determine the optimal aspect ratio as well as the correct balance between MPI tasks and threads permore » MPI task. Our study presents a detailed performance analysis when moving along an isocurve of constant hardware usage: fixed total memory, total cores, and total nodes. Overall, our work points to approaches for improving intra- and inter-node efficiency on large-scale multicore systems for demanding scientific applications.« less
Revealing topographic lineaments through IHS enhancement of DEM data. [Digital Elevation Model

NASA Technical Reports Server (NTRS)

Murdock, Gary

1990-01-01

Intensity-hue-saturation (IHS) processing of slope (dip), aspect (dip direction), and elevation to reveal subtle topographic lineaments which may not be obvious in the unprocessed data are used to enhance digital elevation model (DEM) data from northwestern Nevada. This IHS method of lineament identification was applied to a mosiac of 12 square degrees using a Cray Y-MP8/864. Square arrays from 3 x 3 to 31 x 31 points were tested as well as several different slope enhancements. When relatively few points are used to fit the plane, lineaments of various lengths are observed and a mechanism for lineament classification is described. An area encompassing the gold deposits of the Carlin trend and including the Rain in the southeast to Midas in the northwest is investigated in greater detail. The orientation and density of lineaments may be determined on the gently sloping pediment surface as well as in the more steeply sloping ranges.
The fusion code XGC: Enabling kinetic study of multi-scale edge turbulent transport in ITER [Book Chapter

DOE Office of Scientific and Technical Information (OSTI.GOV)

D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas

The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning for balancing computational work in pushing particlesmore » and in grid related work, scalable and accurate discretization algorithms for non-linear Coulomb collisions, and communication-avoiding subcycling technology for pushing particles on both CPUs and GPUs are also utilized to dramatically improve the scalability and time-to-solution, hence enabling the difficult kinetic ITER edge simulation on a present-day leadership class computer.« less
Exploring Accelerating Science Applications with FPGAs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Storaasli, Olaf O; Strenski, Dave

2007-01-01

FPGA hardware and tools (VHDL, Viva, MitrionC and CHiMPS) are described. FPGA performance is evaluated on two Cray XD1 systems (Virtex-II Pro 50 and Virtex-4 LX160) for human genome (DNA and protein) sequence comparisons for a computational biology code (FASTA). Scalable FPGA speedups of 50X (Virtex-II) and 100X (Virtex-4) over a 2.2 GHz Opteron were achieved. Coding and IO issues faced for human genome data are described.
The growth of the UniTree mass storage system at the NASA Center for Computational Sciences: Some lessons learned

NASA Technical Reports Server (NTRS)

Tarshish, Adina; Salmon, Ellen

1994-01-01

In October 1992, the NASA Center for Computational Sciences made its Convex-based UniTree system generally available to users. The ensuing months saw growth in every area. Within 26 months, data under UniTree control grew from nil to over 12 terabytes, nearly all of it stored on robotically mounted tape. HiPPI/UltraNet was added to enhance connectivity, and later HiPPI/TCP was added as well. Disks and robotic tape silos were added to those already under UniTree's control, and 18-track tapes were upgraded to 36-track. The primary data source for UniTree, the facility's Cray Y-MP/4-128, first doubled its processing power and then was replaced altogether by a C98/6-256 with nearly two-and-a-half times the Y-MP's combined peak gigaflops. The Convex/UniTree software was upgraded from version 1.5 to 1.7.5, and then to 1.7.6. Finally, the server itself, a Convex C3240, was upgraded to a C3830 with a second I/O bay, doubling the C3240's memory and capacity for I/O. This paper describes insights gained and reinforced with the burgeoning demands on the UniTree storage system and the significant increases in performance gained from the many upgrades.
Development of a CRAY 1 version of the SINDA program. [thermo-structural analyzer program

NASA Technical Reports Server (NTRS)

Juba, S. M.; Fogerson, P. E.

1982-01-01

The SINDA thermal analyzer program was transferred from the UNIVAC 1110 computer to a CYBER And then to a CRAY 1. Significant changes to the code of the program were required in order to execute efficiently on the CYBER and CRAY. The program was tested on the CRAY using a thermal math model of the shuttle which was too large to run on either the UNIVAC or CYBER. An effort was then begun to further modify the code of SINDA in order to make effective use of the vector capabilities of the CRAY.
Parallel eigenanalysis of finite element models in a completely connected architecture

NASA Technical Reports Server (NTRS)

Akl, F. A.; Morel, M. R.

1989-01-01

A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.
Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.

1994-01-01

The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
Multitasking a three-dimensional Navier-Stokes algorithm on the Cray-2

NASA Technical Reports Server (NTRS)

Swisshelm, Julie M.

1989-01-01

A three-dimensional computational aerodynamics algorithm has been multitasked for efficient parallel execution on the Cray-2. It provides a means for examining the multitasking performance of a complete CFD application code. An embedded zonal multigrid scheme is used to solve the Reynolds-averaged Navier-Stokes equations for an internal flow model problem. The explicit nature of each component of the method allows a spatial partitioning of the computational domain to achieve a well-balanced task load for MIMD computers with vector-processing capability. Experiments have been conducted with both two- and three-dimensional multitasked cases. The best speedup attained by an individual task group was 3.54 on four processors of the Cray-2, while the entire solver yielded a speedup of 2.67 on four processors for the three-dimensional case. The multiprocessing efficiency of various types of computational tasks is examined, performance on two Cray-2s with different memory access speeds is compared, and extrapolation to larger problems is discussed.
Force user's manual: A portable, parallel FORTRAN

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

1990-01-01

The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Maxwell, Don E; Ezell, Matthew A; Becklehimer, Jeff

While sites generally have systems in place to monitor the health of Cray computers themselves, often the cooling systems are ignored until a computer failure requires investigation into the source of the failure. The Liebert XDP units used to cool the Cray XE/XK models as well as the Cray proprietary cooling system used for the Cray XC30 models provide data useful for health monitoring. Unfortunately, this valuable information is often available only to custom solutions not accessible by a center-wide monitoring system or is simply ignored entirely. In this paper, methods and tools used to harvest the monitoring data availablemore » are discussed, and the implementation needed to integrate the data into a center-wide monitoring system at the Oak Ridge National Laboratory is provided.« less
Reorientation of rotating fluid in microgravity environment with and without gravity jitters

NASA Technical Reports Server (NTRS)

Hung, R. J.; Lee, C. C.; Shyu, K. L.

1990-01-01

In a spacecraft design, the requirements of settled propellant are different for tank pressurization, engine restart, venting, or propellant transfer. The requirement to settle or to position liquid fuel over the outlet end of the spacecraft propellant tank prior main engine restart poses a microgravity fluid behavior problem. In this paper, the dynamical behavior of liquid propellant, fluid reorientation, and propellant resettling have been carried out through the execution of supercomputer CRAY X-MP to simulate the fluid management in a microgravity environment. Results show that the resettlement of fluid can be accomplished more efficiently for fluid in rotating tank than in nonrotating tank, and also better performance for gravity jitters imposed on fluid settlement than without gravity jitters based on the amount of time needed to carry out resettlement period of time between the initiation and termination of geysering.
Analysis and modeling of summertime convective cloud and precipitation structure over the southeastern United States

NASA Technical Reports Server (NTRS)

Knupp, Kevin R.

1988-01-01

Described is work performed under NASA Grant NAG8-654 for the period 15 March to 15 September 1988. This work entails primarily data analysis and numerical modeling efforts related to the 1986 Satellite Precipitation and Cloud Experiment (SPACE). In the following, the SPACE acronym is used along with the acronym COHMEX, which represents the encompassing Cooperative Huntsville Meteorological Experiment. Progress made during the second half of the first year of the study included: (1) installation and testing of the RAMS numerical Modeling system on the Alabama CRAY X-MP/24; (2) a start on the analysis of the mesoscale convection system (MCS) of 13 July 1986 COHMEX case; and (3) a cursory examination of a small MCS that formed over the COHMEX region on 15 July 1986. Details of each of these individual tasks are given.

Optimal spacecraft attitude control using collocation and nonlinear programming

NASA Astrophysics Data System (ADS)

Herman, A. L.; Conway, B. A.

1992-10-01

Direct collocation with nonlinear programming (DCNLP) is employed to find the optimal open-loop control histories for detumbling a disabled satellite. The controls are torques and forces applied to the docking arm and joint and torques applied about the body axes of the OMV. Solutions are obtained for cases in which various constraints are placed on the controls and in which the number of controls is reduced or increased from that considered in Conway and Widhalm (1986). DCLNP works well when applied to the optimal control problem of satellite attitude control. The formulation is straightforward and produces good results in a relatively small amount of time on a Cray X/MP with no a priori information about the optimal solution. The addition of joint acceleration to the controls significantly reduces the control magnitudes and optimal cost. In all cases, the torques and acclerations are modest and the optimal cost is very modest.
GASNet-EX Performance Improvements Due to Specialization for the Cray Aries Network

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hargrove, Paul H.; Bonachea, Dan

This document is a deliverable for milestone STPM17-6 of the Exascale Computing Project, delivered by WBS 2.3.1.14. It reports on the improvements in performance observed on Cray XC-series systems due to enhancements made to the GASNet-EX software. These enhancements, known as “specializations”, primarily consist of replacing network-independent implementations of several recently added features with implementations tailored to the Cray Aries network. Performance gains from specialization include (1) Negotiated-Payload Active Messages improve bandwidth of a ping-pong test by up to 14%, (2) Immediate Operations reduce running time of a synthetic benchmark by up to 93%, (3) non-bulk RMA Put bandwidth ismore » increased by up to 32%, (4) Remote Atomic performance is 70% faster than the reference on a point-to-point test and allows a hot-spot test to scale robustly, and (5) non-contiguous RMA interfaces see up to 8.6x speedups for an intra-node benchmark and 26% for inter-node. These improvements are available in the GASNet-EX 2018.3.0 release.« less
Understanding the Cray X1 System

NASA Technical Reports Server (NTRS)

Cheung, Samson

2004-01-01

This paper helps the reader understand the characteristics of the Cray X1 vector supercomputer system, and provides hints and information to enable the reader to port codes to the system. It provides a comparison between the basic performance of the X1 platform and other platforms that are available at NASA Ames Research Center. A set of codes, solving the Laplacian equation with different parallel paradigms, is used to understand some features of the X1 compiler. An example code from the NAS Parallel Benchmarks is used to demonstrate performance optimization on the X1 platform.
High Performance Distributed Computing in a Supercomputer Environment: Computational Services and Applications Issues

NASA Technical Reports Server (NTRS)

Kramer, Williams T. C.; Simon, Horst D.

1994-01-01

This tutorial proposes to be a practical guide for the uninitiated to the main topics and themes of high-performance computing (HPC), with particular emphasis to distributed computing. The intent is first to provide some guidance and directions in the rapidly increasing field of scientific computing using both massively parallel and traditional supercomputers. Because of their considerable potential computational power, loosely or tightly coupled clusters of workstations are increasingly considered as a third alternative to both the more conventional supercomputers based on a small number of powerful vector processors, as well as high massively parallel processors. Even though many research issues concerning the effective use of workstation clusters and their integration into a large scale production facility are still unresolved, such clusters are already used for production computing. In this tutorial we will utilize the unique experience made at the NAS facility at NASA Ames Research Center. Over the last five years at NAS massively parallel supercomputers such as the Connection Machines CM-2 and CM-5 from Thinking Machines Corporation and the iPSC/860 (Touchstone Gamma Machine) and Paragon Machines from Intel were used in a production supercomputer center alongside with traditional vector supercomputers such as the Cray Y-MP and C90.
Strategies for vectorizing the sparse matrix vector product on the CRAY XMP, CRAY 2, and CYBER 205

NASA Technical Reports Server (NTRS)

Bauschlicher, Charles W., Jr.; Partridge, Harry

1987-01-01

Large, randomly sparse matrix vector products are important in a number of applications in computational chemistry, such as matrix diagonalization and the solution of simultaneous equations. Vectorization of this process is considered for the CRAY XMP, CRAY 2, and CYBER 205, using a matrix of dimension of 20,000 with from 1 percent to 6 percent nonzeros. Efficient scatter/gather capabilities add coding flexibility and yield significant improvements in performance. For the CYBER 205, it is shown that minor changes in the IO can reduce the CPU time by a factor of 50. Similar changes in the CRAY codes make a far smaller improvement.
An efficient nonlinear relaxation technique for the three-dimensional, Reynolds-averaged Navier-Stokes equations

NASA Technical Reports Server (NTRS)

Edwards, Jack R.; Mcrae, D. S.

1993-01-01

An efficient implicit method for the computation of steady, three-dimensional, compressible Navier-Stokes flowfields is presented. A nonlinear iteration strategy based on planar Gauss-Seidel sweeps is used to drive the solution toward a steady state, with approximate factorization errors within a crossflow plane reduced by the application of a quasi-Newton technique. A hybrid discretization approach is employed, with flux-vector splitting utilized in the streamwise direction and central differences with artificial dissipation used for the transverse fluxes. Convergence histories and comparisons with experimental data are presented for several 3-D shock-boundary layer interactions. Both laminar and turbulent cases are considered, with turbulent closure provided by a modification of the Baldwin-Barth one-equation model. For the problems considered (175,000-325,000 mesh points), the algorithm provides steady-state convergence in 900-2000 CPU seconds on a single processor of a Cray Y-MP.
The accuracy of quantum chemical methods for large noncovalent complexes

PubMed Central

Pitoňák, Michal; Řezáč, Jan; Pulay, Peter

2013-01-01

We evaluate the performance of the most widely used wavefunction, density functional theory, and semiempirical methods for the description of noncovalent interactions in a set of larger, mostly dispersion-stabilized noncovalent complexes (the L7 data set). The methods tested include MP2, MP3, SCS-MP2, SCS(MI)-MP2, MP2.5, MP2.X, MP2C, DFT-D, DFT-D3 (B3-LYP-D3, B-LYP-D3, TPSS-D3, PW6B95-D3, M06-2X-D3) and M06-2X, and semiempirical methods augmented with dispersion and hydrogen bonding corrections: SCC-DFTB-D, PM6-D, PM6-DH2 and PM6-D3H4. The test complexes are the octadecane dimer, the guanine trimer, the circumcoronene…adenine dimer, the coronene dimer, the guanine-cytosine dimer, the circumcoronene…guanine-cytosine dimer, and an amyloid fragment trimer containing phenylalanine residues. The best performing method is MP2.5 with relative root mean square deviation (rRMSD) of 4 %. It can thus be recommended as an alternative to the CCSD(T)/CBS (alternatively QCISD(T)/CBS) benchmark for molecular systems which exceed current computational capacity. The second best non-DFT method is MP2C with rRMSD of 8 %. A method with the most favorable “accuracy/cost” ratio belongs to the DFT family: BLYP-D3, with an rRMSD of 8 %. Semiempirical methods deliver less accurate results (the rRMSD exceeds 25 %). Nevertheless, their absolute errors are close to some much more expensive methods such as M06-2X, MP2 or SCS(MI)-MP2, and thus their price/performance ratio is excellent. PMID:24098094
DOE Office of Scientific and Technical Information (OSTI.GOV)

Haynes, R.A.

The Network File System (NFS) is used in UNIX-based networks to provide transparent file sharing between heterogeneous systems. Although NFS is well-known for being weak in security, it is widely used and has become a de facto standard. This paper examines the user authentication shortcomings of NFS and the approach Sandia National Laboratories has taken to strengthen it with Kerberos. The implementation on a Cray Y-MP8/864 running UNICOS is described and resource/performance issues are discussed. 4 refs., 4 figs.
Improvements to the Unstructured Mesh Generator MESH3D

NASA Technical Reports Server (NTRS)

Thomas, Scott D.; Baker, Timothy J.; Cliff, Susan E.

1999-01-01

The AIRPLANE process starts with an aircraft geometry stored in a CAD system. The surface is modeled with a mesh of triangles and then the flow solver produces pressures at surface points which may be integrated to find forces and moments. The biggest advantage is that the grid generation bottleneck of the CFD process is eliminated when an unstructured tetrahedral mesh is used. MESH3D is the key to turning around the first analysis of a CAD geometry in days instead of weeks. The flow solver part of AIRPLANE has proven to be robust and accurate over a decade of use at NASA. It has been extensively validated with experimental data and compares well with other Euler flow solvers. AIRPLANE has been applied to all the HSR geometries treated at Ames over the course of the HSR program in order to verify the accuracy of other flow solvers. The unstructured approach makes handling complete and complex geometries very simple because only the surface of the aircraft needs to be discretized, i.e. covered with triangles. The volume mesh is created automatically by MESH3D. AIRPLANE runs well on multiple platforms. Vectorization on the Cray Y-MP is reasonable for a code that uses indirect addressing. Massively parallel computers such as the IBM SP2, SGI Origin 2000, and the Cray T3E have been used with an MPI version of the flow solver and the code scales very well on these systems. AIRPLANE can run on a desktop computer as well. AIRPLANE has a future. The unstructured technologies developed as part of the HSR program are now targeting high Reynolds number viscous flow simulation. The pacing item in this effort is Navier-Stokes mesh generation.
Experiences From NASA/Langley's DMSS Project

NASA Technical Reports Server (NTRS)

1996-01-01

There is a trend in institutions with high performance computing and data management requirements to explore mass storage systems with peripherals directly attached to a high speed network. The Distributed Mass Storage System (DMSS) Project at the NASA Langley Research Center (LaRC) has placed such a system into production use. This paper will present the experiences, both good and bad, we have had with this system since putting it into production usage. The system is comprised of: 1) National Storage Laboratory (NSL)/UniTree 2.1, 2) IBM 9570 HIPPI attached disk arrays (both RAID 3 and RAID 5), 3) IBM RS6000 server, 4) HIPPI/IPI3 third party transfers between the disk array systems and the supercomputer clients, a CRAY Y-MP and a CRAY 2, 5) a "warm spare" file server, 6) transition software to convert from CRAY's Data Migration Facility (DMF) based system to DMSS, 7) an NSC PS32 HIPPI switch, and 8) a STK 4490 robotic library accessed from the IBM RS6000 block mux interface. This paper will cover: the performance of the DMSS in the following areas: file transfer rates, migration and recall, and file manipulation (listing, deleting, etc.); the appropriateness of a workstation class of file server for NSL/UniTree with LaRC's present storage requirements in mind the role of the third party transfers between the supercomputers and the DMSS disk array systems in DMSS; a detailed comparison (both in performance and functionality) between the DMF and DMSS systems LaRC's enhancements to the NSL/UniTree system administration environment the mechanism for DMSS to provide file server redundancy the statistics on the availability of DMSS the design and experiences with the locally developed transparent transition software which allowed us to make over 1.5 million DMF files available to NSL/UniTree with minimal system outage
ARCGRAPH SYSTEM - AMES RESEARCH GRAPHICS SYSTEM

NASA Technical Reports Server (NTRS)

Hibbard, E. A.

1994-01-01

Ames Research Graphics System, ARCGRAPH, is a collection of libraries and utilities which assist researchers in generating, manipulating, and visualizing graphical data. In addition, ARCGRAPH defines a metafile format that contains device independent graphical data. This file format is used with various computer graphics manipulation and animation packages at Ames, including SURF (COSMIC Program ARC-12381) and GAS (COSMIC Program ARC-12379). In its full configuration, the ARCGRAPH system consists of a two stage pipeline which may be used to output graphical primitives. Stage one is associated with the graphical primitives (i.e. moves, draws, color, etc.) along with the creation and manipulation of the metafiles. Five distinct data filters make up stage one. They are: 1) PLO which handles all 2D vector primitives, 2) POL which handles all 3D polygonal primitives, 3) RAS which handles all 2D raster primitives, 4) VEC which handles all 3D raster primitives, and 5) PO2 which handles all 2D polygonal primitives. Stage two is associated with the process of displaying graphical primitives on a device. To generate the various graphical primitives, create and reprocess ARCGRAPH metafiles, and access the device drivers in the VDI (Video Device Interface) library, users link their applications to ARCGRAPH's GRAFIX library routines. Both FORTRAN and C language versions of the GRAFIX and VDI libraries exist for enhanced portability within these respective programming environments. The ARCGRAPH libraries were developed on a VAX running VMS. Minor documented modification of various routines, however, allows the system to run on the following computers: Cray X-MP running COS (no C version); Cray 2 running UNICOS; DEC VAX running BSD 4.3 UNIX, or Ultrix; SGI IRIS Turbo running GL2-W3.5 and GL2-W3.6; Convex C1 running UNIX; Amhdahl 5840 running UTS; Alliant FX8 running UNIX; Sun 3/160 running UNIX (no native device driver); Stellar GS1000 running Stellex (no native device driver); and an SGI IRIS 4D running IRIX (no native device driver). Currently with version 7.0 of ARCGRAPH, the VDI library supports the following output devices: A VT100 terminal with a RETRO-GRAPHICS board installed, a VT240 using the Tektronix 4010 emulation capability, an SGI IRIS turbo using the native GL2 library, a Tektronix 4010, a Tektronix 4105, and the Tektronix 4014. ARCGRAPH version 7.0 was developed in 1988.
Two-dimensional Euler and Navier-Stokes Time accurate simulations of fan rotor flows

NASA Technical Reports Server (NTRS)

Boretti, A. A.

1990-01-01

Two numerical methods are presented which describe the unsteady flow field in the blade-to-blade plane of an axial fan rotor. These methods solve the compressible, time-dependent, Euler and the compressible, turbulent, time-dependent, Navier-Stokes conservation equations for mass, momentum, and energy. The Navier-Stokes equations are written in Favre-averaged form and are closed with an approximate two-equation turbulence model with low Reynolds number and compressibility effects included. The unsteady aerodynamic component is obtained by superposing inflow or outflow unsteadiness to the steady conditions through time-dependent boundary conditions. The integration in space is performed by using a finite volume scheme, and the integration in time is performed by using k-stage Runge-Kutta schemes, k = 2,5. The numerical integration algorithm allows the reduction of the computational cost of an unsteady simulation involving high frequency disturbances in both CPU time and memory requirements. Less than 200 sec of CPU time are required to advance the Euler equations in a computational grid made up of about 2000 grid during 10,000 time steps on a CRAY Y-MP computer, with a required memory of less than 0.3 megawords.
Scalability study of parallel spatial direct numerical simulation code on IBM SP1 parallel supercomputer

NASA Technical Reports Server (NTRS)

Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad

1994-01-01

The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.
OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms

DOE PAGES

Meng, Zhaoyi; Koniges, Alice; He, Yun Helen; ...

2016-09-21

In this paper, we investigate the OpenMP parallelization and optimization of two novel data classification algorithms. The new algorithms are based on graph and PDE solution techniques and provide significant accuracy and performance advantages over traditional data classification algorithms in serial mode. The methods leverage the Nystrom extension to calculate eigenvalue/eigenvectors of the graph Laplacian and this is a self-contained module that can be used in conjunction with other graph-Laplacian based methods such as spectral clustering. We use performance tools to collect the hotspots and memory access of the serial codes and use OpenMP as the parallelization language to parallelizemore » the most time-consuming parts. Where possible, we also use library routines. We then optimize the OpenMP implementations and detail the performance on traditional supercomputer nodes (in our case a Cray XC30), and test the optimization steps on emerging testbed systems based on Intel’s Knights Corner and Landing processors. We show both performance improvement and strong scaling behavior. Finally, a large number of optimization techniques and analyses are necessary before the algorithm reaches almost ideal scaling.« less
Large-scale structural analysis: The structural analyst, the CSM Testbed and the NAS System

NASA Technical Reports Server (NTRS)

Knight, Norman F., Jr.; Mccleary, Susan L.; Macy, Steven C.; Aminpour, Mohammad A.

1989-01-01

The Computational Structural Mechanics (CSM) activity is developing advanced structural analysis and computational methods that exploit high-performance computers. Methods are developed in the framework of the CSM testbed software system and applied to representative complex structural analysis problems from the aerospace industry. An overview of the CSM testbed methods development environment is presented and some numerical methods developed on a CRAY-2 are described. Selected application studies performed on the NAS CRAY-2 are also summarized.
The CSM testbed software system: A development environment for structural analysis methods on the NAS CRAY-2

NASA Technical Reports Server (NTRS)

Gillian, Ronnie E.; Lotts, Christine G.

1988-01-01

The Computational Structural Mechanics (CSM) Activity at Langley Research Center is developing methods for structural analysis on modern computers. To facilitate that research effort, an applications development environment has been constructed to insulate the researcher from the many computer operating systems of a widely distributed computer network. The CSM Testbed development system was ported to the Numerical Aerodynamic Simulator (NAS) Cray-2, at the Ames Research Center, to provide a high end computational capability. This paper describes the implementation experiences, the resulting capability, and the future directions for the Testbed on supercomputers.
The computation of pi to 29,360,000 decimal digits using Borweins' quartically convergent algorithm

NASA Technical Reports Server (NTRS)

Bailey, David H.

1988-01-01

The quartically convergent numerical algorithm developed by Borwein and Borwein (1987) for 1/pi is implemented via a prime-modulus-transform multiprecision technique on the NASA Ames Cray-2 supercomputer to compute the first 2.936 x 10 to the 7th digits of the decimal expansion of pi. The history of pi computations is briefly recalled; the most recent algorithms are characterized; the implementation procedures are described; and samples of the output listing are presented. Statistical analyses show that the present decimal expansion is completely random, with only acceptable numbers of long repeating strings and single-digit runs.
CONVEX mini manual

NASA Technical Reports Server (NTRS)

Tennille, Geoffrey M.; Howser, Lona M.

1993-01-01

The use of the CONVEX computers that are an integral part of the Supercomputing Network Subsystems (SNS) of the Central Scientific Computing Complex of LaRC is briefly described. Features of the CONVEX computers that are significantly different than the CRAY supercomputers are covered, including: FORTRAN, C, architecture of the CONVEX computers, the CONVEX environment, batch job submittal, debugging, performance analysis, utilities unique to CONVEX, and documentation. This revision reflects the addition of the Applications Compiler and X-based debugger, CXdb. The document id intended for all CONVEX users as a ready reference to frequently asked questions and to more detailed information contained with the vendor manuals. It is appropriate for both the novice and the experienced user.
Multitasking the Davidson algorithm for the large, sparse eigenvalue problem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Umar, V.M.; Fischer, C.F.

1989-01-01

The authors report how the Davidson algorithm, developed for handling the eigenvalue problem for large and sparse matrices arising in quantum chemistry, was modified for use in atomic structure calculations. To date these calculations have used traditional eigenvalue methods, which limit the range of feasible calculations because of their excessive memory requirements and unsatisfactory performance attributed to time-consuming and costly processing of zero valued elements. The replacement of a traditional matrix eigenvalue method by the Davidson algorithm reduced these limitations. Significant speedup was found, which varied with the size of the underlying problem and its sparsity. Furthermore, the range ofmore » matrix sizes that can be manipulated efficiently was expended by more than one order or magnitude. On the CRAY X-MP the code was vectorized and the importance of gather/scatter analyzed. A parallelized version of the algorithm obtained an additional 35% reduction in execution time. Speedup due to vectorization and concurrency was also measured on the Alliant FX/8.« less
The ASC Sequoia Programming Model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seager, M

2008-08-06

In the late 1980's and early 1990's, Lawrence Livermore National Laboratory was deeply engrossed in determining the next generation programming model for the Integrated Design Codes (IDC) beyond vectorization for the Cray 1s series of computers. The vector model, developed in mid 1970's first for the CDC 7600 and later extended from stack based vector operation to memory to memory operations for the Cray 1s, lasted approximately 20 years (See Slide 5). The Cray vector era was deemed an extremely long lived era as it allowed vector codes to be developed over time (the Cray 1s were faster in scalarmore » mode than the CDC 7600) with vector unit utilization increasing incrementally over time. The other attributes of the Cray vector era at LLNL were that we developed, supported and maintained the Operating System (LTSS and later NLTSS), communications protocols (LINCS), Compilers (Civic Fortran77 and Model), operating system tools (e.g., batch system, job control scripting, loaders, debuggers, editors, graphics utilities, you name it) and math and highly machine optimized libraries (e.g., SLATEC, and STACKLIB). Although LTSS was adopted by Cray for early system generations, they later developed COS and UNICOS operating systems and environment on their own. In the late 1970s and early 1980s two trends appeared that made the Cray vector programming model (described above including both the hardware and system software aspects) seem potentially dated and slated for major revision. These trends were the appearance of low cost CMOS microprocessors and their attendant, departmental and mini-computers and later workstations and personal computers. With the wide spread adoption of Unix in the early 1980s, it appeared that LLNL (and the other DOE Labs) would be left out of the mainstream of computing without a rapid transition to these 'Killer Micros' and modern OS and tools environments. The other interesting advance in the period is that systems were being developed with multiple 'cores' in them and called Symmetric Multi-Processor or Shared Memory Processor (SMP) systems. The parallel revolution had begun. The Laboratory started a small 'parallel processing project' in 1983 to study the new technology and its application to scientific computing with four people: Tim Axelrod, Pete Eltgroth, Paul Dubois and Mark Seager. Two years later, Eugene Brooks joined the team. This team focused on Unix and 'killer micro' SMPs. Indeed, Eugene Brooks was credited with coming up with the 'Killer Micro' term. After several generations of SMP platforms (e.g., Sequent Balance 8000 with 8 33MHz MC32032s, Allian FX8 with 8 MC68020 and FPGA based Vector Units and finally the BB&N Butterfly with 128 cores), it became apparent to us that the killer micro revolution would indeed take over Crays and that we definitely needed a new programming and systems model. The model developed by Mark Seager and Dale Nielsen focused on both the system aspects (Slide 3) and the code development aspects (Slide 4). Although now succinctly captured in two attached slides, at the time there was tremendous ferment in the research community as to what parallel programming model would emerge, dominate and survive. In addition, we wanted a model that would provide portability between platforms of a single generation but also longevity over multiple--and hopefully--many generations. Only after we developed the 'Livermore Model' and worked it out in considerable detail did it become obvious that what we came up with was the right approach. In a nutshell, the applications programming model of the Livermore Model posited that SMP parallelism would ultimately not scale indefinitely and one would have to bite the bullet and implement MPI parallelism within the Integrated Design Code (IDC). We also had a major emphasis on doing everything in a completely standards based, portable methodology with POSIX/Unix as the target environment. We decided against specialized libraries like STACKLIB for performance, but kept as many general purpose, portable math libraries as were needed by the codes. Third, we assumed that the SMPs in clusters would evolve in time to become more powerful, feature rich and, in particular, offer more cores. Thus, we focused on OpenMP, and POSIX PThreads for programming SMP parallelism. These code porting efforts were lead by Dale Nielsen, A-Division code group leader, and Randy Christensen, B-Division code group leader. Most of the porting effort revolved removing 'Crayisms' in the codes: artifacts of LTSS/NLTSS, Civic compiler extensions beyond Fortran77, IO libraries and dealing with new code control languages (we switched to Perl and later to Python). Adding MPI to the codes was initially problematic and error prone because the programmers used MPI directly and sprinkled the calls throughout the code.« less

Three-Dimensional Analysis and Modeling of a Wankel Engine

NASA Technical Reports Server (NTRS)

Raju, M. S.; Willis, E. A.

1991-01-01

A new computer code, AGNI-3D, has been developed for the modeling of combustion, spray, and flow properties in a stratified-charge rotary engine (SCRE). The mathematical and numerical details of the new code are described by the first author in a separate NASA publication. The solution procedure is based on an Eulerian-Lagrangian approach where the unsteady, three-dimensional Navier-Stokes equations for a perfect gas-mixture with variable properties are solved in generalized, Eulerian coordinates on a moving grid by making use of an implicit finite-volume, Steger-Warming flux vector splitting scheme. The liquid-phase equations are solved in Lagrangian coordinates. The engine configuration studied was similar to existing rotary engine flow-visualization and hot-firing test rigs. The results of limited test cases indicate a good degree of qualitative agreement between the predicted and measured pressures. It is conjectured that the impulsive nature of the torque generated by the observed pressure nonuniformity may be one of the mechanisms responsible for the excessive wear of the timing gears observed during the early stages of the rotary combustion engine (RCE) development. It was identified that the turbulence intensities near top-dead-center were dominated by the compression process and only slightly influenced by the intake and exhaust processes. Slow mixing resulting from small turbulence intensities within the rotor pocket and also from a lack of formation of any significant recirculation regions within the rotor pocket were identified as the major factors leading to incomplete combustion. Detailed flowfield results during exhaust and intake, fuel injection, fuel vaporization, combustion, mixing and expansion processes are also presented. The numerical procedure is very efficient as it takes 7 to 10 CPU hours on a CRAY Y-MP for one entire engine cycle when the computations are performed over a 31 x16 x 20 grid.
CSM Testbed Development and Large-Scale Structural Applications

NASA Technical Reports Server (NTRS)

Knight, Norman F., Jr.; Gillian, R. E.; Mccleary, Susan L.; Lotts, C. G.; Poole, E. L.; Overman, A. L.; Macy, S. C.

1989-01-01

A research activity called Computational Structural Mechanics (CSM) conducted at the NASA Langley Research Center is described. This activity is developing advanced structural analysis and computational methods that exploit high-performance computers. Methods are developed in the framework of the CSM Testbed software system and applied to representative complex structural analysis problems from the aerospace industry. An overview of the CSM Testbed methods development environment is presented and some new numerical methods developed on a CRAY-2 are described. Selected application studies performed on the NAS CRAY-2 are also summarized.
Multitasking the INS3D-LU code on the Cray Y-MP

NASA Technical Reports Server (NTRS)

Fatoohi, Rod; Yoon, Seokkwan

1991-01-01

This paper presents the results of multitasking the INS3D-LU code on eight processors. The code is a full Navier-Stokes solver for incompressible fluid in three dimensional generalized coordinates using a lower-upper symmetric-Gauss-Seidel implicit scheme. This code has been fully vectorized on oblique planes of sweep and parallelized using autotasking with some directives and minor modifications. The timing results for five grid sizes are presented and analyzed. The code has achieved a processing rate of over one Gflops.
Mixed polyanion glass cathodes: Glass-state conversion reactions

DOE PAGES

Kercher, Andrew K.; Kolopus, James A.; Carroll, Kyler; ...

2015-11-10

Mixed polyanion (MP) glasses can undergo glass-state conversion (GSC) reactions to provide an alternate class of high-capacity cathode materials. GSC reactions have been demonstrated in phosphate/vanadate glasses with Ag, Co, Cu, Fe, and Ni cations. These MP glasses provided high capacity and good high power performance, but suffer from moderate voltages, large voltage hysteresis, and significant capacity fade with cycling. Details of the GSC reaction have been revealed by x-ray absorption spectroscopy, electron microscopy, and energy dispersive x-ray spectroscopy of ex situ cathodes at key states of charge. Using the Open Quantum Materials Database (OQMD), a computational thermodynamic model hasmore » been developed to predict the near-equilibrium voltages of glass-state conversion reactions in MP glasses.« less
Massively parallel quantum computer simulator

NASA Astrophysics Data System (ADS)

De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

2007-01-01

We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.
Edison - A New Cray Supercomputer Advances Discovery at NERSC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dosanjh, Sudip; Parkinson, Dula; Yelick, Kathy

2014-02-06

When a supercomputing center installs a new system, users are invited to make heavy use of the computer as part of the rigorous testing. In this video, find out what top scientists have discovered using Edison, a Cray XC30 supercomputer, and how NERSC's newest supercomputer will accelerate their future research.
Edison - A New Cray Supercomputer Advances Discovery at NERSC

ScienceCinema

Dosanjh, Sudip; Parkinson, Dula; Yelick, Kathy; Trebotich, David; Broughton, Jeff; Antypas, Katie; Lukic, Zarija, Borrill, Julian; Draney, Brent; Chen, Jackie

2018-01-16

When a supercomputing center installs a new system, users are invited to make heavy use of the computer as part of the rigorous testing. In this video, find out what top scientists have discovered using Edison, a Cray XC30 supercomputer, and how NERSC's newest supercomputer will accelerate their future research.
Development of iterative techniques for the solution of unsteady compressible viscous flows

NASA Technical Reports Server (NTRS)

Hixon, Duane; Sankar, L. N.

1993-01-01

During the past two decades, there has been significant progress in the field of numerical simulation of unsteady compressible viscous flows. At present, a variety of solution techniques exist such as the transonic small disturbance analyses (TSD), transonic full potential equation-based methods, unsteady Euler solvers, and unsteady Navier-Stokes solvers. These advances have been made possible by developments in three areas: (1) improved numerical algorithms; (2) automation of body-fitted grid generation schemes; and (3) advanced computer architectures with vector processing and massively parallel processing features. In this work, the GMRES scheme has been considered as a candidate for acceleration of a Newton iteration time marching scheme for unsteady 2-D and 3-D compressible viscous flow calculation; from preliminary calculations, this will provide up to a 65 percent reduction in the computer time requirements over the existing class of explicit and implicit time marching schemes. The proposed method has ben tested on structured grids, but is flexible enough for extension to unstructured grids. The described scheme has been tested only on the current generation of vector processor architecture of the Cray Y/MP class, but should be suitable for adaptation to massively parallel machines.
MAGNA (Materially and Geometrically Nonlinear Analysis). Part I. Finite Element Analysis Manual.

DTIC Science & Technology

1982-12-01

provided for operating the program, modifying storage caoacity, preparing input data, estimating computer run times , and interpreting the output...7.1.3 Reserved File Names 7.1.16 7.1.4 Typical Execution Times on CDC Computers 7.1.18 7.2 CRAY PROGRAM VERSION 7.2.1 7.2.1 Job Control Language 7.2.1...7.2.2 Modification of Storage Capacity 7.2.8 7.2.3 Execution Times on the CRAY-I Computer 7.2.12 7.3 VAX PROGRAM VERSION 7.3.1 8 INPUT DATA 8.0.1 8.1
Investigating the impact of the cielo cray XE6 architecture on scientific application codes.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rajan, Mahesh; Barrett, Richard; Pedretti, Kevin Thomas Tauke

2010-12-01

Cielo, a Cray XE6, is the Department of Energy NNSA Advanced Simulation and Computing (ASC) campaign's newest capability machine. Rated at 1.37 PFLOPS, it consists of 8,944 dual-socket oct-core AMD Magny-Cours compute nodes, linked using Cray's Gemini interconnect. Its primary mission objective is to enable a suite of the ASC applications implemented using MPI to scale to tens of thousands of cores. Cielo is an evolutionary improvement to a successful architecture previously available to many of our codes, thus enabling a basis for understanding the capabilities of this new architecture. Using three codes strategically important to the ASC campaign, andmore » supplemented with some micro-benchmarks that expose the fundamental capabilities of the XE6, we report on the performance characteristics and capabilities of Cielo.« less
The solution of linear systems of equations with a structural analysis code on the NAS CRAY-2

NASA Technical Reports Server (NTRS)

Poole, Eugene L.; Overman, Andrea L.

1988-01-01

Two methods for solving linear systems of equations on the NAS Cray-2 are described. One is a direct method; the other is an iterative method. Both methods exploit the architecture of the Cray-2, particularly the vectorization, and are aimed at structural analysis applications. To demonstrate and evaluate the methods, they were installed in a finite element structural analysis code denoted the Computational Structural Mechanics (CSM) Testbed. A description of the techniques used to integrate the two solvers into the Testbed is given. Storage schemes, memory requirements, operation counts, and reformatting procedures are discussed. Finally, results from the new methods are compared with results from the initial Testbed sparse Choleski equation solver for three structural analysis problems. The new direct solvers described achieve the highest computational rates of the methods compared. The new iterative methods are not able to achieve as high computation rates as the vectorized direct solvers but are best for well conditioned problems which require fewer iterations to converge to the solution.
ARC2D - EFFICIENT SOLUTION METHODS FOR THE NAVIER-STOKES EQUATIONS (CRAY VERSION)

NASA Technical Reports Server (NTRS)

Pulliam, T. H.

1994-01-01

ARC2D is a computational fluid dynamics program developed at the NASA Ames Research Center specifically for airfoil computations. The program uses implicit finite-difference techniques to solve two-dimensional Euler equations and thin layer Navier-Stokes equations. It is based on the Beam and Warming implicit approximate factorization algorithm in generalized coordinates. The methods are either time accurate or accelerated non-time accurate steady state schemes. The evolution of the solution through time is physically realistic; good solution accuracy is dependent on mesh spacing and boundary conditions. The mathematical development of ARC2D begins with the strong conservation law form of the two-dimensional Navier-Stokes equations in Cartesian coordinates, which admits shock capturing. The Navier-Stokes equations can be transformed from Cartesian coordinates to generalized curvilinear coordinates in a manner that permits one computational code to serve a wide variety of physical geometries and grid systems. ARC2D includes an algebraic mixing length model to approximate the effect of turbulence. In cases of high Reynolds number viscous flows, thin layer approximation can be applied. ARC2D allows for a variety of solutions to stability boundaries, such as those encountered in flows with shocks. The user has considerable flexibility in assigning geometry and developing grid patterns, as well as in assigning boundary conditions. However, the ARC2D model is most appropriate for attached and mildly separated boundary layers; no attempt is made to model wake regions and widely separated flows. The techniques have been successfully used for a variety of inviscid and viscous flowfield calculations. The Cray version of ARC2D is written in FORTRAN 77 for use on Cray series computers and requires approximately 5Mb memory. The program is fully vectorized. The tape includes variations for the COS and UNICOS operating systems. Also included is a sample routine for CONVEX computers to emulate Cray system time calls, which should be easy to modify for other machines as well. The standard distribution media for this version is a 9-track 1600 BPI ASCII Card Image format magnetic tape. The Cray version was developed in 1987. The IBM ES/3090 version is an IBM port of the Cray version. It is written in IBM VS FORTRAN and has the capability of executing in both vector and parallel modes on the MVS/XA operating system and in vector mode on the VM/XA operating system. Various options of the IBM VS FORTRAN compiler provide new features for the ES/3090 version, including 64-bit arithmetic and up to 2 GB of virtual addressability. The IBM ES/3090 version is available only as a 9-track, 1600 BPI IBM IEBCOPY format magnetic tape. The IBM ES/3090 version was developed in 1989. The DEC RISC ULTRIX version is a DEC port of the Cray version. It is written in FORTRAN 77 for RISC-based Digital Equipment platforms. The memory requirement is approximately 7Mb of main memory. It is available in UNIX tar format on TK50 tape cartridge. The port to DEC RISC ULTRIX was done in 1990. COS and UNICOS are trademarks and Cray is a registered trademark of Cray Research, Inc. IBM, ES/3090, VS FORTRAN, MVS/XA, and VM/XA are registered trademarks of International Business Machines. DEC and ULTRIX are registered trademarks of Digital Equipment Corporation.
Interconnect Performance Evaluation of SGI Altix 3700 BX2, Cray X1, Cray Opteron Cluster, and Dell PowerEdge

NASA Technical Reports Server (NTRS)

Fatoohi, Rod; Saini, Subbash; Ciotti, Robert

2006-01-01

We study the performance of inter-process communication on four high-speed multiprocessor systems using a set of communication benchmarks. The goal is to identify certain limiting factors and bottlenecks with the interconnect of these systems as well as to compare these interconnects. We measured network bandwidth using different number of communicating processors and communication patterns, such as point-to-point communication, collective communication, and dense communication patterns. The four platforms are: a 512-processor SGI Altix 3700 BX2 shared-memory machine with 3.2 GB/s links; a 64-processor (single-streaming) Cray XI shared-memory machine with 32 1.6 GB/s links; a 128-processor Cray Opteron cluster using a Myrinet network; and a 1280-node Dell PowerEdge cluster with an InfiniBand network. Our, results show the impact of the network bandwidth and topology on the overall performance of each interconnect.
VizieR Online Data Catalog: ChaMP X-ray point source catalog (Kim+, 2007)

NASA Astrophysics Data System (ADS)

Kim, M.; Kim, D.-W.; Wilkes, B. J.; Green, P. J.; Kim, E.; Anderson, C. S.; Barkhouse, W. A.; Evans, N. R.; Ivezic, Z.; Karovska, M.; Kashyap, V. L.; Lee, M. G.; Maksym, P.; Mossman, A. E.; Silverman, J. D.; Tananbaum, H. D.

2009-01-01

We present the Chandra Multiwavelength Project (ChaMP) X-ray point source catalog with ~6800 X-ray sources detected in 149 Chandra observations covering ~10deg2. The full ChaMP catalog sample is 7 times larger than the initial published ChaMP catalog. The exposure time of the fields in our sample ranges from 0.9 to 124ks, corresponding to a deepest X-ray flux limit of f0.5-8.0=9x10-16ergs/cm2/s. The ChaMP X-ray data have been uniformly reduced and analyzed with ChaMP-specific pipelines and then carefully validated by visual inspection. The ChaMP catalog includes X-ray photometric data in eight different energy bands as well as X-ray spectral hardness ratios and colors. To best utilize the ChaMP catalog, we also present the source reliability, detection probability, and positional uncertainty. (10 data files).
Performance analysis of three dimensional integral equation computations on a massively parallel computer. M.S. Thesis

NASA Technical Reports Server (NTRS)

Logan, Terry G.

1994-01-01

The purpose of this study is to investigate the performance of the integral equation computations using numerical source field-panel method in a massively parallel processing (MPP) environment. A comparative study of computational performance of the MPP CM-5 computer and conventional Cray-YMP supercomputer for a three-dimensional flow problem is made. A serial FORTRAN code is converted into a parallel CM-FORTRAN code. Some performance results are obtained on CM-5 with 32, 62, 128 nodes along with those on Cray-YMP with a single processor. The comparison of the performance indicates that the parallel CM-FORTRAN code near or out-performs the equivalent serial FORTRAN code for some cases.
Integrating Grid Services into the Cray XT4 Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

NERSC; Cholia, Shreyas; Lin, Hwa-Chun Wendy

2009-05-01

The 38640 core Cray XT4"Franklin" system at the National Energy Research Scientific Computing Center (NERSC) is a massively parallel resource available to Department of Energy researchers that also provides on-demand grid computing to the Open Science Grid. The integration of grid services on Franklin presented various challenges, including fundamental differences between the interactive and compute nodes, a stripped down compute-node operating system without dynamic library support, a shared-root environment and idiosyncratic application launching. Inour work, we describe how we resolved these challenges on a running, general-purpose production system to provide on-demand compute, storage, accounting and monitoring services through generic gridmore » interfaces that mask the underlying system-specific details for the end user.« less
Parallelization of Rocket Engine System Software (Press)

NASA Technical Reports Server (NTRS)

Cezzar, Ruknet

1996-01-01

The main goal is to assess parallelization requirements for the Rocket Engine Numeric Simulator (RENS) project which, aside from gathering information on liquid-propelled rocket engines and setting forth requirements, involve a large FORTRAN based package at NASA Lewis Research Center and TDK software developed by SUBR/UWF. The ultimate aim is to develop, test, integrate, and suitably deploy a family of software packages on various aspects and facets of rocket engines using liquid-propellants. At present, all project efforts by the funding agency, NASA Lewis Research Center, and the HBCU participants are disseminated over the internet using world wide web home pages. Considering obviously expensive methods of actual field trails, the benefits of software simulators are potentially enormous. When realized, these benefits will be analogous to those provided by numerous CAD/CAM packages and flight-training simulators. According to the overall task assignments, Hampton University's role is to collect all available software, place them in a common format, assess and evaluate, define interfaces, and provide integration. Most importantly, the HU's mission is to see to it that the real-time performance is assured. This involves source code translations, porting, and distribution. The porting will be done in two phases: First, place all software on Cray XMP platform using FORTRAN. After testing and evaluation on the Cray X-MP, the code will be translated to C + + and ported to the parallel nCUBE platform. At present, we are evaluating another option of distributed processing over local area networks using Sun NFS, Ethernet, TCP/IP. Considering the heterogeneous nature of the present software (e.g., first started as an expert system using LISP machines) which now involve FORTRAN code, the effort is expected to be quite challenging.
On the parallel solution of parabolic equations

NASA Technical Reports Server (NTRS)

Gallopoulos, E.; Saad, Youcef

1989-01-01

Parallel algorithms for the solution of linear parabolic problems are proposed. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The two other methods proposed are based on Pade and Chebyshev approximations to the matrix exponential. The parallelization of these methods is achieved by using partial fraction decomposition techniques to solve the resulting systems and thus offers the potential for increased time parallelism in time dependent problems. Experimental results from the Alliant FX/8 and the Cray Y-MP/832 vector multiprocessors are also presented.
Supercomputer analysis of purine and pyrimidine metabolism leading to DNA synthesis.

PubMed

Heinmets, F

1989-06-01

A model-system is established to analyze purine and pyrimidine metabolism leading to DNA synthesis. The principal aim is to explore the flow and regulation of terminal deoxynucleoside triophosphates (dNTPs) in various input and parametric conditions. A series of flow equations are established, which are subsequently converted to differential equations. These are programmed (Fortran) and analyzed on a Cray chi-MP/48 supercomputer. The pool concentrations are presented as a function of time in conditions in which various pertinent parameters of the system are modified. The system is formulated by 100 differential equations.
Performance of a Bounce-Averaged Global Model of Super-Thermal Electron Transport in the Earth's Magnetic Field

NASA Technical Reports Server (NTRS)

McGuire, Tim

1998-01-01

In this paper, we report the results of our recent research on the application of a multiprocessor Cray T916 supercomputer in modeling super-thermal electron transport in the earth's magnetic field. In general, this mathematical model requires numerical solution of a system of partial differential equations. The code we use for this model is moderately vectorized. By using Amdahl's Law for vector processors, it can be verified that the code is about 60% vectorized on a Cray computer. Speedup factors on the order of 2.5 were obtained compared to the unvectorized code. In the following sections, we discuss the methodology of improving the code. In addition to our goal of optimizing the code for solution on the Cray computer, we had the goal of scalability in mind. Scalability combines the concepts of portabilty with near-linear speedup. Specifically, a scalable program is one whose performance is portable across many different architectures with differing numbers of processors for many different problem sizes. Though we have access to a Cray at this time, the goal was to also have code which would run well on a variety of architectures.

Production Experiences with the Cray-Enabled TORQUE Resource Manager

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ezell, Matthew A; Maxwell, Don E; Beer, David

High performance computing resources utilize batch systems to manage the user workload. Cray systems are uniquely different from typical clusters due to Cray s Application Level Placement Scheduler (ALPS). ALPS manages binary transfer, job launch and monitoring, and error handling. Batch systems require special support to integrate with ALPS using an XML protocol called BASIL. Previous versions of Adaptive Computing s TORQUE and Moab batch suite integrated with ALPS from within Moab, using PERL scripts to interface with BASIL. This would occasionally lead to problems when all the components would become unsynchronized. Version 4.1 of the TORQUE Resource Manager introducedmore » new features that allow it to directly integrate with ALPS using BASIL. This paper describes production experiences at Oak Ridge National Laboratory using the new TORQUE software versions, as well as ongoing and future work to improve TORQUE.« less
A Performance Evaluation of the Cray X1 for Scientific Applications

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Borrill, Julian; Canning, Andrew; Carter, Jonathan; Djomehri, M. Jahed; Shan, Hongzhang; Skinner, David

2004-01-01

The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.
Extraction of guided wave dispersion curve in isotropic and anisotropic materials by Matrix Pencil method.

PubMed

Chang, C Y; Yuan, F G

2018-05-16

Guided wave dispersion curves in isotropic and anisotropic materials are extracted automatically from measured data by Matrix Pencil (MP) method investigating through k-t or x-ω domain with a broadband signal. A piezoelectric wafer emits a broadband excitation, linear chirp signal to generate guided waves in the plate. The propagating waves are measured at discrete locations along the lines for one-dimensional laser Doppler vibrometer (1-D LDV). Measurements are first Fourier transformed into either wavenumber-time k-t domain or space-frequency x-ω domain. MP method is then employed to extract the dispersion curves explicitly associated with different wave modes. In addition, the phase and group velocity are deduced by the relations between wavenumbers and frequencies. In this research, the inspections for dispersion relations on an aluminum plate by MP method from k-t or x-ω domain are demonstrated and compared with two-dimensional Fourier transform (2-D FFT). Other experiments on a thicker aluminum plate for higher modes and a composite plate are analyzed by MP method. Extracted relations of composite plate are confirmed by three-dimensional (3-D) theoretical curves computed numerically. The results explain that the MP method not only shows more accuracy for distinguishing the dispersion curves on isotropic material, but also obtains good agreements with theoretical curves on anisotropic and laminated materials. Copyright © 2018 Elsevier B.V. All rights reserved.
Modeling high-temperature superconductors and metallic alloys on the Intel IPSC/860

NASA Astrophysics Data System (ADS)

Geist, G. A.; Peyton, B. W.; Shelton, W. A.; Stocks, G. M.

Oak Ridge National Laboratory has embarked on several computational Grand Challenges, which require the close cooperation of physicists, mathematicians, and computer scientists. One of these projects is the determination of the material properties of alloys from first principles and, in particular, the electronic structure of high-temperature superconductors. While the present focus of the project is on superconductivity, the approach is general enough to permit study of other properties of metallic alloys such as strength and magnetic properties. This paper describes the progress to date on this project. We include a description of a self-consistent KKR-CPA method, parallelization of the model, and the incorporation of a dynamic load balancing scheme into the algorithm. We also describe the development and performance of a consolidated KKR-CPA code capable of running on CRAYs, workstations, and several parallel computers without source code modification. Performance of this code on the Intel iPSC/860 is also compared to a CRAY 2, CRAY YMP, and several workstations. Finally, some density of state calculations of two perovskite superconductors are given.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
A comparative study of the structures and electronic properties of graphene fragments: A DFT and MP2 survey

NASA Astrophysics Data System (ADS)

de Carvalho, E. F. V.; Lopez-Castillo, A.; Roberto-Neto, O.

2018-01-01

Graphene can be viewed as sheet of benzene rings fused together forming a variety of structures including the trioxotriangulenes (TOTs) which is a class of organic molecules with electro-active properties. In order to clarify such properties, structures and electronic properties of the graphene fragments phenalenyl, triangulene, 6-oxophenalenoxyl, and X3TOT (X = H, F, Cl) are computed. Validation of the methodologies are carried out using the density functionals B3LYP, M06-2X, B2PLYP-D, and the MP2 theory, giving equilibrium geometries of benzene, naphthalene, and anthracene with mean unsigned error (MUE) of only 0.003, 0.007, 0.004, and 0.007 Å, respectively in relation to experiment.
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2002-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.
CDC to CRAY FORTRAN conversion manual

NASA Technical Reports Server (NTRS)

Mcgary, C.; Diebert, D.

1983-01-01

Documentation describing software differences between two general purpose computers for scientific applications is presented. Descriptions of the use of the FORTRAN and FORTRAN 77 high level programming language on a CDC 7600 under SCOPE and a CRAY XMP under COS are offered. Itemized differences of the FORTRAN language sets of the two machines are also included. The material is accompanied by numerous examples of preferred programming techniques for the two machines.
ARC2D - EFFICIENT SOLUTION METHODS FOR THE NAVIER-STOKES EQUATIONS (DEC RISC ULTRIX VERSION)

NASA Technical Reports Server (NTRS)

Biyabani, S. R.

1994-01-01

ARC2D is a computational fluid dynamics program developed at the NASA Ames Research Center specifically for airfoil computations. The program uses implicit finite-difference techniques to solve two-dimensional Euler equations and thin layer Navier-Stokes equations. It is based on the Beam and Warming implicit approximate factorization algorithm in generalized coordinates. The methods are either time accurate or accelerated non-time accurate steady state schemes. The evolution of the solution through time is physically realistic; good solution accuracy is dependent on mesh spacing and boundary conditions. The mathematical development of ARC2D begins with the strong conservation law form of the two-dimensional Navier-Stokes equations in Cartesian coordinates, which admits shock capturing. The Navier-Stokes equations can be transformed from Cartesian coordinates to generalized curvilinear coordinates in a manner that permits one computational code to serve a wide variety of physical geometries and grid systems. ARC2D includes an algebraic mixing length model to approximate the effect of turbulence. In cases of high Reynolds number viscous flows, thin layer approximation can be applied. ARC2D allows for a variety of solutions to stability boundaries, such as those encountered in flows with shocks. The user has considerable flexibility in assigning geometry and developing grid patterns, as well as in assigning boundary conditions. However, the ARC2D model is most appropriate for attached and mildly separated boundary layers; no attempt is made to model wake regions and widely separated flows. The techniques have been successfully used for a variety of inviscid and viscous flowfield calculations. The Cray version of ARC2D is written in FORTRAN 77 for use on Cray series computers and requires approximately 5Mb memory. The program is fully vectorized. The tape includes variations for the COS and UNICOS operating systems. Also included is a sample routine for CONVEX computers to emulate Cray system time calls, which should be easy to modify for other machines as well. The standard distribution media for this version is a 9-track 1600 BPI ASCII Card Image format magnetic tape. The Cray version was developed in 1987. The IBM ES/3090 version is an IBM port of the Cray version. It is written in IBM VS FORTRAN and has the capability of executing in both vector and parallel modes on the MVS/XA operating system and in vector mode on the VM/XA operating system. Various options of the IBM VS FORTRAN compiler provide new features for the ES/3090 version, including 64-bit arithmetic and up to 2 GB of virtual addressability. The IBM ES/3090 version is available only as a 9-track, 1600 BPI IBM IEBCOPY format magnetic tape. The IBM ES/3090 version was developed in 1989. The DEC RISC ULTRIX version is a DEC port of the Cray version. It is written in FORTRAN 77 for RISC-based Digital Equipment platforms. The memory requirement is approximately 7Mb of main memory. It is available in UNIX tar format on TK50 tape cartridge. The port to DEC RISC ULTRIX was done in 1990. COS and UNICOS are trademarks and Cray is a registered trademark of Cray Research, Inc. IBM, ES/3090, VS FORTRAN, MVS/XA, and VM/XA are registered trademarks of International Business Machines. DEC and ULTRIX are registered trademarks of Digital Equipment Corporation.
Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

DOE PAGES

Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...

2013-01-01

Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less
Demonstration of Cost-Effective, High-Performance Computing at Performance and Reliability Levels Equivalent to a 1994 Vector Supercomputer

NASA Technical Reports Server (NTRS)

Babrauckas, Theresa

2000-01-01

The Affordable High Performance Computing (AHPC) project demonstrated that high-performance computing based on a distributed network of computer workstations is a cost-effective alternative to vector supercomputers for running CPU and memory intensive design and analysis tools. The AHPC project created an integrated system called a Network Supercomputer. By connecting computer work-stations through a network and utilizing the workstations when they are idle, the resulting distributed-workstation environment has the same performance and reliability levels as the Cray C90 vector Supercomputer at less than 25 percent of the C90 cost. In fact, the cost comparison between a Cray C90 Supercomputer and Sun workstations showed that the number of distributed networked workstations equivalent to a C90 costs approximately 8 percent of the C90.
Input/output behavior of supercomputing applications

NASA Technical Reports Server (NTRS)

Miller, Ethan L.

1991-01-01

The collection and analysis of supercomputer I/O traces and their use in a collection of buffering and caching simulations are described. This serves two purposes. First, it gives a model of how individual applications running on supercomputers request file system I/O, allowing system designer to optimize I/O hardware and file system algorithms to that model. Second, the buffering simulations show what resources are needed to maximize the CPU utilization of a supercomputer given a very bursty I/O request rate. By using read-ahead and write-behind in a large solid stated disk, one or two applications were sufficient to fully utilize a Cray Y-MP CPU.
Data communication requirements for the advanced NAS network

NASA Technical Reports Server (NTRS)

Levin, Eugene; Eaton, C. K.; Young, Bruce

1986-01-01

The goal of the Numerical Aerodynamic Simulation (NAS) Program is to provide a powerful computational environment for advanced research and development in aeronautics and related disciplines. The present NAS system consists of a Cray 2 supercomputer connected by a data network to a large mass storage system, to sophisticated local graphics workstations, and by remote communications to researchers throughout the United States. The program plan is to continue acquiring the most powerful supercomputers as they become available. In the 1987/1988 time period it is anticipated that a computer with 4 times the processing speed of a Cray 2 will be obtained and by 1990 an additional supercomputer with 16 times the speed of the Cray 2. The implications of this 20-fold increase in processing power on the data communications requirements are described. The analysis was based on models of the projected workload and system architecture. The results are presented together with the estimates of their sensitivity to assumptions inherent in the models.
Introducing Argonne’s Theta Supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

None

Theta, the Argonne Leadership Computing Facility’s (ALCF) new Intel-Cray supercomputer, is officially open to the research community. Theta’s massively parallel, many-core architecture puts the ALCF on the path to Aurora, the facility’s future Intel-Cray system. Capable of nearly 10 quadrillion calculations per second, Theta enables researchers to break new ground in scientific investigations that range from modeling the inner workings of the brain to developing new materials for renewable energy applications.
Achieving High Performance on the i860 Microprocessor

NASA Technical Reports Server (NTRS)

Lee, King; Kutler, Paul (Technical Monitor)

1998-01-01

The i860 is a high performance microprocessor used in the Intel Touchstone project. This paper proposes a paradigm for programming the i860 that is modelled on the vector instructions of the Cray computers. Fortran callable assembler subroutines were written that mimic the concurrent vector instructions of the Cray. Cache takes the place of vector registers. Using this paradigm we have achieved twice the performance of compiled code on a traditional solve.
First Detection of the Hatchett-McCray Effect in the High-Mass X-ray Binary

NASA Technical Reports Server (NTRS)

Sonneborn, G.; Iping, R. C.; Kaper, L.; Hammerschiag-Hensberge, G.; Hutchings, J. B.

2004-01-01

The orbital modulation of stellar wind UV resonance line profiles as a result of ionization of the wind by the X-ray source has been observed in the high-mass X-ray binary 4U1700-37/HD 153919 for the first time. Far-UV observations (905-1180 Angstrom, resolution 0.05 Angstroms) were made at the four quadrature points of the binary orbit with the Far Ultraviolet Spectroscopic Explorer (FUSE) in 2003 April and August. The O6.5 laf primary eclipses the X-ray source (neutron star or black hole) with a 3.41-day period. Orbital modulation of the UV resonance lines, resulting from X-ray photoionization of the dense stellar wind, the so-called Hatchett-McCray (HM) effect, was predicted for 4U1700-37/HD153919 (Hatchett 8 McCray 1977, ApJ, 211, 522) but was not seen in N V 1240, Si IV 1400, or C IV 1550 in IUE and HST spectra. The FUSE spectra show that the P V 1118-1128 and S IV 1063-1073 P-Cygni lines appear to vary as expected for the HM effect, weakest at phase 0.5 (X-ray source conjunction) and strongest at phase 0.0 (X-ray source eclipse). The phase modulation of the O VI 1032-1037 lines, however, is opposite to P V and S IV, implying that O VI may be a byproduct of the wind's ionization by the X-ray source. Such variations were not observed in N V, Si IV, and C IV because of their high optical depth. Due to their lower cosmic abundance, the P V and S IV wind lines are unsaturated, making them excellent tracers of the ionization conditions in the O star's wind.
ATLAS and LHC computing on CRAY

NASA Astrophysics Data System (ADS)

Sciacca, F. G.; Haug, S.; ATLAS Collaboration

2017-10-01

Access and exploitation of large scale computing resources, such as those offered by general purpose HPC centres, is one important measure for ATLAS and the other Large Hadron Collider experiments in order to meet the challenge posed by the full exploitation of the future data within the constraints of flat budgets. We report on the effort of moving the Swiss WLCG T2 computing, serving ATLAS, CMS and LHCb, from a dedicated cluster to the large Cray systems at the Swiss National Supercomputing Centre CSCS. These systems do not only offer very efficient hardware, cooling and highly competent operators, but also have large backfill potentials due to size and multidisciplinary usage and potential gains due to economy at scale. Technical solutions, performance, expected return and future plans are discussed.
Basic JCL for the CRAY-1 operating system (COS) with emphasis on making the transition from CDC 7600/SCOPE

NASA Technical Reports Server (NTRS)

Howe, G.; Saunders, D.

1983-01-01

Users of the CDC 7600 at Ames are assisted in making the transition to the CRAY-1. Similarities and differences in the basic JCL are summarized, and a dozen or so examples of typical batch jobs for the two systems are shown in parallel. Some changes to look for in FORTRAN programs and in the use of UPDATE are also indicated. No attempt is made to cover magnetic tape handling. The material here should not be considered a substitute for reading the more conventional manuals or the User's Guide for the Advanced Computational Facility, available from the Computer Information Center.
Tuning HDF5 subfiling performance on parallel file systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Byna, Suren; Chaarawi, Mohamad; Koziol, Quincey

Subfiling is a technique used on parallel file systems to reduce locking and contention issues when multiple compute nodes interact with the same storage target node. Subfiling provides a compromise between the single shared file approach that instigates the lock contention problems on parallel file systems and having one file per process, which results in generating a massive and unmanageable number of files. In this paper, we evaluate and tune the performance of recently implemented subfiling feature in HDF5. In specific, we explain the implementation strategy of subfiling feature in HDF5, provide examples of using the feature, and evaluate andmore » tune parallel I/O performance of this feature with parallel file systems of the Cray XC40 system at NERSC (Cori) that include a burst buffer storage and a Lustre disk-based storage. We also evaluate I/O performance on the Cray XC30 system, Edison, at NERSC. Our results show performance benefits of 1.2X to 6X performance advantage with subfiling compared to writing a single shared HDF5 file. We present our exploration of configurations, such as the number of subfiles and the number of Lustre storage targets to storing files, as optimization parameters to obtain superior I/O performance. Based on this exploration, we discuss recommendations for achieving good I/O performance as well as limitations with using the subfiling feature.« less

Genetically engineered Pseudomonas putida X3 strain and its potential ability to bioremediate soil microcosms contaminated with methyl parathion and cadmium.

PubMed

Zhang, Rong; Xu, Xingjian; Chen, Wenli; Huang, Qiaoyun

2016-02-01

A multifunctional Pseudomonas putida X3 strain was successfully engineered by introducing methyl parathion (MP)-degrading gene and enhanced green fluorescent protein (EGFP) gene in P. putida X4 (CCTCC: 209319). In liquid cultures, the engineered X3 strain utilized MP as sole carbon source for growth and degraded 100 mg L(-1) of MP within 24 h; however, this strain did not further metabolize p-nitrophenol (PNP), an intermediate metabolite of MP. No discrepancy in minimum inhibitory concentrations (MICs) to cadmium (Cd), copper (Cu), zinc (Zn), and cobalt (Co) was observed between the engineered X3 strain and its host strain. The inoculated X3 strain accelerated MP degradation in different polluted soil microcosms with 100 mg MP kg(-1) dry soil and/or 5 mg Cd kg(-1) dry soil; MP was completely eliminated within 40 h. However, the presence of Cd in the early stage of remediation slightly delayed MP degradation. The application of X3 strain in Cd-contaminated soil strongly affected the distribution of Cd fractions and immobilized Cd by reducing bioavailable Cd concentrations with lower soluble/exchangeable Cd and organic-bound Cd. The inoculated X3 strain also colonized and proliferated in various contaminated microcosms. Our results suggested that the engineered X3 strain is a potential bioremediation agent showing competitive advantage in complex contaminated environments.
a Physical Parameterization of Snow Albedo for Use in Climate Models.

NASA Astrophysics Data System (ADS)

Marshall, Susan Elaine

The albedo of a natural snowcover is highly variable ranging from 90 percent for clean, new snow to 30 percent for old, dirty snow. This range in albedo represents a difference in surface energy absorption of 10 to 70 percent of incident solar radiation. Most general circulation models (GCMs) fail to calculate the surface snow albedo accurately, yet the results of these models are sensitive to the assumed value of the snow albedo. This study replaces the current simple empirical parameterizations of snow albedo with a physically-based parameterization which is accurate (within +/- 3% of theoretical estimates) yet efficient to compute. The parameterization is designed as a FORTRAN subroutine (called SNOALB) which can be easily implemented into model code. The subroutine requires less then 0.02 seconds of computer time (CRAY X-MP) per call and adds only one new parameter to the model calculations, the snow grain size. The snow grain size can be calculated according to one of the two methods offered in this thesis. All other input variables to the subroutine are available from a climate model. The subroutine calculates a visible, near-infrared and solar (0.2-5 μm) snow albedo and offers a choice of two wavelengths (0.7 and 0.9 mu m) at which the solar spectrum is separated into the visible and near-infrared components. The parameterization is incorporated into the National Center for Atmospheric Research (NCAR) Community Climate Model, version 1 (CCM1), and the results of a five -year, seasonal cycle, fixed hydrology experiment are compared to the current model snow albedo parameterization. The results show the SNOALB albedos to be comparable to the old CCM1 snow albedos for current climate conditions, with generally higher visible and lower near-infrared snow albedos using the new subroutine. However, this parameterization offers a greater predictability for climate change experiments outside the range of current snow conditions because it is physically-based and not tuned to current empirical results.
Machine characterization and benchmark performance prediction

NASA Technical Reports Server (NTRS)

Saavedra-Barrera, Rafael H.

1988-01-01

From runs of standard benchmarks or benchmark suites, it is not possible to characterize the machine nor to predict the run time of other benchmarks which have not been run. A new approach to benchmarking and machine characterization is reported. The creation and use of a machine analyzer is described, which measures the performance of a given machine on FORTRAN source language constructs. The machine analyzer yields a set of parameters which characterize the machine and spotlight its strong and weak points. Also described is a program analyzer, which analyzes FORTRAN programs and determines the frequency of execution of each of the same set of source language operations. It is then shown that by combining a machine characterization and a program characterization, we are able to predict with good accuracy the run time of a given benchmark on a given machine. Characterizations are provided for the Cray-X-MP/48, Cyber 205, IBM 3090/200, Amdahl 5840, Convex C-1, VAX 8600, VAX 11/785, VAX 11/780, SUN 3/50, and IBM RT-PC/125, and for the following benchmark programs or suites: Los Alamos (BMK8A1), Baskett, Linpack, Livermore Loops, Madelbrot Set, NAS Kernels, Shell Sort, Smith, Whetstone and Sieve of Erathostenes.
Quantum Mechanics Approach to Hydration Energies and Structures of Alanine and Dialanine.

PubMed

Lanza, Giuseppe; Chiacchio, Maria A

2017-06-20

A systematic approach to the phenomena related to hydration of biomolecules is reported at the state of the art of electronic-structure methods. Large-scale CCSD(T), MP4-SDQ, MP2, and DFT(M06-2X) calculations for some hydrated complexes of alanine and dialanine (Ala⋅13 H 2 O, Ala 2 H + ⋅18 H 2 O, and Ala 2 ⋅18 H 2 O) are compared with experimental data and other elaborate modeling to assess the reliability of a simple bottom-up approach. The inclusion of a minimal number of water molecules for microhydration of the polar groups together with the polarizable continuum model is sufficient to reproduce the relative bulk thermodynamic functions of the considered biomolecules. These quantities depend on the adopted electronic-structure method, which should be chosen with great care. Nevertheless, the computationally feasible MP2 and M06-2X functionals with the aug-cc-pVTZ basis set satisfactorily reproduce values derived by high-level CCSD(T) and MP4-SDQ methods, and thus they are suitable for future developments of more elaborate and hence more biochemically significant peptides. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
GPU-Accelerated Large-Scale Electronic Structure Theory on Titan with a First-Principles All-Electron Code

NASA Astrophysics Data System (ADS)

Huhn, William Paul; Lange, Björn; Yu, Victor; Blum, Volker; Lee, Seyong; Yoon, Mina

Density-functional theory has been well established as the dominant quantum-mechanical computational method in the materials community. Large accurate simulations become very challenging on small to mid-scale computers and require high-performance compute platforms to succeed. GPU acceleration is one promising approach. In this talk, we present a first implementation of all-electron density-functional theory in the FHI-aims code for massively parallel GPU-based platforms. Special attention is paid to the update of the density and to the integration of the Hamiltonian and overlap matrices, realized in a domain decomposition scheme on non-uniform grids. The initial implementation scales well across nodes on ORNL's Titan Cray XK7 supercomputer (8 to 64 nodes, 16 MPI ranks/node) and shows an overall speed up in runtime due to utilization of the K20X Tesla GPUs on each Titan node of 1.4x, with the charge density update showing a speed up of 2x. Further acceleration opportunities will be discussed. Work supported by the LDRD Program of ORNL managed by UT-Battle, LLC, for the U.S. DOE and by the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.
Internal computational fluid mechanics on supercomputers for aerospace propulsion systems

NASA Technical Reports Server (NTRS)

Andersen, Bernhard H.; Benson, Thomas J.

1987-01-01

The accurate calculation of three-dimensional internal flowfields for application towards aerospace propulsion systems requires computational resources available only on supercomputers. A survey is presented of three-dimensional calculations of hypersonic, transonic, and subsonic internal flowfields conducted at the Lewis Research Center. A steady state Parabolized Navier-Stokes (PNS) solution of flow in a Mach 5.0, mixed compression inlet, a Navier-Stokes solution of flow in the vicinity of a terminal shock, and a PNS solution of flow in a diffusing S-bend with vortex generators are presented and discussed. All of these calculations were performed on either the NAS Cray-2 or the Lewis Research Center Cray XMP.
Using archived ITS data to measure the operational benefits of a system-wide adaptive ramp metering system : appendix online 4 : I\\0x2010205 NB flow\\0x2010occupancy plots.

DOT National Transportation Integrated Search

2008-12-01

The appendix includes various flow-occupancy plots, including: I-205 NB, Gladstone MP 11.05; I-205 NB, Gladstone Hway MP 12.94; I-205 NB, Lawnfield MP 13.58; I-205 NB, Sunnybrook MP 14.32; I-205 NB, Sunnyside MP 14.7; I-205 NB, Johnson Creek MP 16.2;...
Using archived ITS data to measure the operational benefits of a system-wide adaptive ramp metering system : appendix online 3 : I\\0x2010205 NB speed\\0x2010occupancy plots.

DOT National Transportation Integrated Search

2008-12-01

The appendix includes various speed-occupancy plots, including: I-205 NB, Gladstone MP 11.05; I-205 NB, Gladstone Hway MP 12.94; I-205 NB, Lawnfield MP 13.58; I-205 NB, Sunnybrook MP 14.32; I-205 NB, Sunnyside MP 14.7; I-205 NB, Johnson Creek MP 16.2...
An Evaluation of Architectural Platforms for Parallel Navier-Stokes Computations

NASA Technical Reports Server (NTRS)

Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.

1996-01-01

We study the computational, communication, and scalability characteristics of a computational fluid dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architecture platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and distributed memory multiprocessors with different topologies - the IBM SP and the Cray T3D. We investigate the impact of various networks connecting the cluster of workstations on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Parallelizing Navier-Stokes Computations on a Variety of Architectural Platforms

NASA Technical Reports Server (NTRS)

Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.

1997-01-01

We study the computational, communication, and scalability characteristics of a Computational Fluid Dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architectural platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), distributed memory multiprocessors with different topologies-the IBM SP and the Cray T3D. We investigate the impact of various networks, connecting the cluster of workstations, on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
New Computer Simulations of Macular Neural Functioning

NASA Technical Reports Server (NTRS)

Ross, Muriel D.; Doshay, D.; Linton, S.; Parnas, B.; Montgomery, K.; Chimento, T.

1994-01-01

We use high performance graphics workstations and supercomputers to study the functional significance of the three-dimensional (3-D) organization of gravity sensors. These sensors have a prototypic architecture foreshadowing more complex systems. Scaled-down simulations run on a Silicon Graphics workstation and scaled-up, 3-D versions run on a Cray Y-MP supercomputer. A semi-automated method of reconstruction of neural tissue from serial sections studied in a transmission electron microscope has been developed to eliminate tedious conventional photography. The reconstructions use a mesh as a step in generating a neural surface for visualization. Two meshes are required to model calyx surfaces. The meshes are connected and the resulting prisms represent the cytoplasm and the bounding membranes. A finite volume analysis method is employed to simulate voltage changes along the calyx in response to synapse activation on the calyx or on calyceal processes. The finite volume method insures that charge is conserved at the calyx-process junction. These and other models indicate that efferent processes act as voltage followers, and that the morphology of some afferent processes affects their functioning. In a final application, morphological information is symbolically represented in three dimensions in a computer. The possible functioning of the connectivities is tested using mathematical interpretations of physiological parameters taken from the literature. Symbolic, 3-D simulations are in progress to probe the functional significance of the connectivities. This research is expected to advance computer-based studies of macular functioning and of synaptic plasticity.
Using archived ITS data to measure the operational benefits of a system-wide adaptive ramp metering system : appendix online 8 : OR\\0x2010217 NB ML speed\\0x2010occupancy plots.

DOT National Transportation Integrated Search

2008-12-01

The appendix includes various ML speed-occupancy plots: OR-217 NB, 72nd MP 6.61; OR-217 NB, 99W-EB MP 5.9; OR-217 NB, 99W-WB MP 5.85; OR-217 NB, Greenburg MP 4.65; OR-217 NB, Scholls MP 3.85; OR-217 NB, Denney MP 2.68; OR-217 NB, Allen MP 2.16; OR-21...
Using archived ITS data to measure the operational benefits of a system-wide adaptive ramp metering system : appendix online 7 : OR\\0x2010217 NB ML speed\\0x2010flow plots.

DOT National Transportation Integrated Search

2008-12-01

The appendix includes various ML speed-flow plots: OR-217 NB, 72nd MP 6.61; OR-217 NB, 99W-EB MP 5.9; OR-217 NB, 99W-WB MP 5.85; OR-217 NB, Greenburg MP 4.65; OR-217 NB, Scholls MP 3.85; OR-217 NB, Denney MP 2.68; OR-217 NB, Allen MP 2.16; OR-217 NB,...
Using archived ITS data to measure the operational benefits of a system-wide adaptive ramp metering system : appendix online 9 : OR\\0x2010217 NB ML flow\\0x2010occupancy plots.

DOT National Transportation Integrated Search

2008-12-01

The appendix includes various ML flow occupancy plots: OR-217 NB, 72nd MP 6.61; OR-217 NB, 99W-EB MP 5.9; OR-217 NB, 99W-WB MP 5.85; OR-217 NB, Greenburg MP 4.65; OR-217 NB, Scholls MP 3.85; OR-217 NB, Denney MP 2.68; OR-217 NB, Allen MP 2.16; OR-217...
A matrix-algebraic formulation of distributed-memory maximal cardinality matching algorithms in bipartite graphs

DOE PAGES

Azad, Ariful; Buluç, Aydın

2016-05-16

We describe parallel algorithms for computing maximal cardinality matching in a bipartite graph on distributed-memory systems. Unlike traditional algorithms that match one vertex at a time, our algorithms process many unmatched vertices simultaneously using a matrix-algebraic formulation of maximal matching. This generic matrix-algebraic framework is used to develop three efficient maximal matching algorithms with minimal changes. The newly developed algorithms have two benefits over existing graph-based algorithms. First, unlike existing parallel algorithms, cardinality of matching obtained by the new algorithms stays constant with increasing processor counts, which is important for predictable and reproducible performance. Second, relying on bulk-synchronous matrix operations,more » these algorithms expose a higher degree of parallelism on distributed-memory platforms than existing graph-based algorithms. We report high-performance implementations of three maximal matching algorithms using hybrid OpenMP-MPI and evaluate the performance of these algorithm using more than 35 real and randomly generated graphs. On real instances, our algorithms achieve up to 200 × speedup on 2048 cores of a Cray XC30 supercomputer. Even higher speedups are obtained on larger synthetically generated graphs where our algorithms show good scaling on up to 16,384 cores.« less
Full speed ahead for software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wolfe, A.

1986-03-10

Supercomputing software is moving into high gear, spurred by the rapid spread of supercomputers into new applications. The critical challenge is how to develop tools that will make it easier for programmers to write applications that take advantage of vectorizing in the classical supercomputer and the parallelism that is emerging in supercomputers and minisupercomputers. Writing parallel software is a challenge that every programmer must face because parallel architectures are springing up across the range of computing. Cray is developing a host of tools for programmers. Tools to support multitasking (in supercomputer parlance, multitasking means dividing up a single program tomore » run on multiple processors) are high on Cray's agenda. On tap for multitasking is Premult, dubbed a microtasking tool. As a preprocessor for Cray's CFT77 FORTRAN compiler, Premult will provide fine-grain multitasking.« less
The SGI/CRAY T3E: Experiences and Insights

NASA Technical Reports Server (NTRS)

Bernard, Lisa Hamet

1999-01-01

The focus of the HPCC Earth and Space Sciences (ESS) Project is capability computing - pushing highly scalable computing testbeds to their performance limits. The drivers of this focus are the Grand Challenge problems in Earth and space science: those that could not be addressed in a capacity computing environment where large jobs must continually compete for resources. These Grand Challenge codes require a high degree of communication, large memory, and very large I/O (throughout the duration of the processing, not just in loading initial conditions and saving final results). This set of parameters led to the selection of an SGI/Cray T3E as the current ESS Computing Testbed. The T3E at the Goddard Space Flight Center is a unique computational resource within NASA. As such, it must be managed to effectively support the diverse research efforts across the NASA research community yet still enable the ESS Grand Challenge Investigator teams to achieve their performance milestones, for which the system was intended. To date, all Grand Challenge Investigator teams have achieved the 10 GFLOPS milestone, eight of nine have achieved the 50 GFLOPS milestone, and three have achieved the 100 GFLOPS milestone. In addition, many technical papers have been published highlighting results achieved on the NASA T3E, including some at this Workshop. The successes enabled by the NASA T3E computing environment are best illustrated by the 512 PE upgrade funded by the NASA Earth Science Enterprise earlier this year. Never before has an HPCC computing testbed been so well received by the general NASA science community that it was deemed critical to the success of a core NASA science effort. NASA looks forward to many more success stories before the conclusion of the NASA-SGI/Cray cooperative agreement in June 1999.
Critical Test of Some Computational Chemistry Methods for Prediction of Gas-Phase Acidities and Basicities.

PubMed

Toomsalu, Eve; Koppel, Ilmar A; Burk, Peeter

2013-09-10

Gas-phase acidities and basicities were calculated for 64 neutral bases (covering the scale from 139.9 kcal/mol to 251.9 kcal/mol) and 53 neutral acids (covering the scale from 299.5 kcal/mol to 411.7 kcal/mol). The following methods were used: AM1, PM3, PM6, PDDG, G2, G2MP2, G3, G3MP2, G4, G4MP2, CBS-QB3, B1B95, B2PLYP, B2PLYPD, B3LYP, B3PW91, B97D, B98, BLYP, BMK, BP86, CAM-B3LYP, HSEh1PBE, M06, M062X, M06HF, M06L, mPW2PLYP, mPW2PLYPD, O3LYP, OLYP, PBE1PBE, PBEPBE, tHCTHhyb, TPSSh, VSXC, X3LYP. The addition of the Grimmes empirical dispersion correction (D) to B2PLYP and mPW2PLYP was evaluated, and it was found that adding this correction gave more-accurate results when considering acidities. Calculations with B3LYP, B97D, BLYP, B2PLYPD, and PBE1PBE methods were carried out with five basis sets (6-311G**, 6-311+G**, TZVP, cc-pVTZ, and aug-cc-pVTZ) to evaluate the effect of basis sets on the accuracy of calculations. It was found that the best basis sets when considering accuracy of results and needed time were 6-311+G** and TZVP. Among semiempirical methods AM1 had the best ability to reproduce experimental acidities and basicities (the mean absolute error (mae) was 7.3 kcal/mol). Among DFT methods the best method considering accuracy, robustness, and computation time was PBE1PBE/6-311+G** (mae = 2.7 kcal/mol). Four Gaussian-type methods (G2, G2MP2, G4, and G4MP2) gave similar results to each other (mae = 2.3 kcal/mol). Gaussian-type methods are quite accurate, but their downside is the relatively long computational time.
Climate Data Assimilation on a Massively Parallel Supercomputer

NASA Technical Reports Server (NTRS)

Ding, Hong Q.; Ferraro, Robert D.

1996-01-01

We have designed and implemented a set of highly efficient and highly scalable algorithms for an unstructured computational package, the PSAS data assimilation package, as demonstrated by detailed performance analysis of systematic runs on up to 512-nodes of an Intel Paragon. The preconditioned Conjugate Gradient solver achieves a sustained 18 Gflops performance. Consequently, we achieve an unprecedented 100-fold reduction in time to solution on the Intel Paragon over a single head of a Cray C90. This not only exceeds the daily performance requirement of the Data Assimilation Office at NASA's Goddard Space Flight Center, but also makes it possible to explore much larger and challenging data assimilation problems which are unthinkable on a traditional computer platform such as the Cray C90.
An ab initio/Rice-Ramsperger-Kassel-Marcus study of the hydrogen-abstraction reactions of methyl ethers, H(3)COCH(3-x)(CH(3))(x), x = 0-2, by OH; mechanism and kinetics.

PubMed

Zhou, Chong-Wen; Simmie, John M; Curran, Henry J

2010-07-14

A theoretical study of the mechanism and kinetics of the H-abstraction reaction from dimethyl (DME), ethylmethyl (EME) and iso-propylmethyl (IPME) ethers by the OH radical has been carried out using the high-level methods CCSD(T)/CBS, G3 and G3MP2BH&H. The computationally less-expensive methods of G3 and G3MP2BH&H yield results for DME within 0.2-0.6 and 0.7-0.9 kcal mol(-1), respectively, of the coupled cluster, CCSD(T), values extrapolated to the basis set limit. So the G3 and G3MP2BH&H methods can be confidently used for the reactions of the higher ethers. A distinction is made between the two different kinds of H-atoms, classified as in/out-of the symmetry plane, and it is found that abstraction from the out-of-plane H-atoms proceeds through a stepwise mechanism involving the formation of a reactant complex in the entrance channel and product complex in the exit channel. The in-plane H-atom abstractions take place through a more direct mechanism and are less competitive. Rate constants of the three reactions have been calculated in the temperature range of 500-3000 K using the Variflex code, based on the weak collision, master equation/microcanonical variational RRKM theory including tunneling corrections. The computed total rate constants (cm(3) mol(-1) s(-1)) have been fitted as follows: k(DME) = 2.74 xT(3.94) exp (1534.2/T), k(EME) = 20.93 xT(3.61) exp (2060.1/T) and k(IPME) = 0.55 xT(3.93) exp (2826.1/T). Expressions of the group rate constants for the three different carbon sites are also provided.

Evaluation of DFT methods for computing the interaction energies of homomolecular and heteromolecular dimers of monosubstituted benzene

NASA Astrophysics Data System (ADS)

Godfrey-Kittle, Andrew; Cafiero, Mauricio

We present density functional theory (DFT) interaction energies for the sandwich and T-shaped conformers of substituted benzene dimers. The DFT functionals studied include TPSS, HCTH407, B3LYP, and X3LYP. We also include Hartree-Fock (HF) and second-order Møller-Plesset perturbation theory calculations (MP2), as well as calculations using a new functional, P3LYP, which includes PBE and HF exchange and LYP correlation. Although DFT methods do not explicitly account for the dispersion interactions important in the benzene-dimer interactions, we find that our new method, P3LYP, as well as HCTH407 and TPSS, match MP2 and CCSD(T) calculations much better than the hybrid methods B3LYP and X3LYP methods do.
Optimizing Blocking and Nonblocking Reduction Operations for Multicore Systems: Hierarchical Design and Implementation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gorentla Venkata, Manjunath; Shamis, Pavel; Graham, Richard L

2013-01-01

Many scientific simulations, using the Message Passing Interface (MPI) programming model, are sensitive to the performance and scalability of reduction collective operations such as MPI Allreduce and MPI Reduce. These operations are the most widely used abstractions to perform mathematical operations over all processes that are part of the simulation. In this work, we propose a hierarchical design to implement the reduction operations on multicore systems. This design aims to improve the efficiency of reductions by 1) tailoring the algorithms and customizing the implementations for various communication mechanisms in the system 2) providing the ability to configure the depth ofmore » hierarchy to match the system architecture, and 3) providing the ability to independently progress each of this hierarchy. Using this design, we implement MPI Allreduce and MPI Reduce operations (and its nonblocking variants MPI Iallreduce and MPI Ireduce) for all message sizes, and evaluate on multiple architectures including InfiniBand and Cray XT5. We leverage and enhance our existing infrastructure, Cheetah, which is a framework for implementing hierarchical collective operations to implement these reductions. The experimental results show that the Cheetah reduction operations outperform the production-grade MPI implementations such as Open MPI default, Cray MPI, and MVAPICH2, demonstrating its efficiency, flexibility and portability. On Infini- Band systems, with a microbenchmark, a 512-process Cheetah nonblocking Allreduce and Reduce achieves a speedup of 23x and 10x, respectively, compared to the default Open MPI reductions. The blocking variants of the reduction operations also show similar performance benefits. A 512-process nonblocking Cheetah Allreduce achieves a speedup of 3x, compared to the default MVAPICH2 Allreduce implementation. On a Cray XT5 system, a 6144-process Cheetah Allreduce outperforms the Cray MPI by 145%. The evaluation with an application kernel, Conjugate Gradient solver, shows that the Cheetah reductions speeds up total time to solution by 195%, demonstrating the potential benefits for scientific simulations.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Mubarak, Misbah; Ross, Robert B.

This technical report describes the experiments performed to validate the MPI performance measurements reported by the CODES dragonfly network simulation with the Theta Cray XC system at the Argonne Leadership Computing Facility (ALCF).
EFFECTS OF TUMORS ON INHALED PHARMACOLOGIC DRUGS: II. PARTICLE MOTION

EPA Science Inventory

ABSTRACT

Computer simulations were conducted to describe drug particle motion in human lung bifurcations with tumors. The computations used FIDAP with a Cray T90 supercomputer. The objective was to better understand particle behavior as affected by particle characteristics...
Implementation of an ADI method on parallel computers

NASA Technical Reports Server (NTRS)

Fatoohi, Raad A.; Grosch, Chester E.

1987-01-01

The implementation of an ADI method for solving the diffusion equation on three parallel/vector computers is discussed. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, an SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the FLEX/32 and CRAY/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
Implementation of an ADI method on parallel computers

NASA Technical Reports Server (NTRS)

Fatoohi, Raad A.; Grosch, Chester E.

1987-01-01

In this paper the implementation of an ADI method for solving the diffusion equation on three parallel/vector computers is discussed. The computers were chosen so as to encompass a variety of architectures. They are the MPP, an SIMD machine with 16-Kbit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the Flex/32 and Cray/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally conclusions are presented.
Barrier-breaking performance for industrial problems on the CRAY C916

DOE Office of Scientific and Technical Information (OSTI.GOV)

Graffunder, S.K.

1993-12-31

Nine applications, including third-party codes, were submitted to the Gordon Bell Prize committee showing the CRAY C916 supercomputer providing record-breaking time to solution for industrial problems in several disciplines. Performance was obtained by balancing raw hardware speed; effective use of large, real, shared memory; compiler vectorization and autotasking; hand optimization; asynchronous I/O techniques; and new algorithms. The highest GFLOPS performance for the submissions was 11.1 GFLOPS out of a peak advertised performance of 16 GFLOPS for the CRAY C916 system. One program achieved a 15.45 speedup from the compiler with just two hand-inserted directives to scope variables properly for themore » mathematical library. New I/O techniques hide tens of gigabytes of I/O behind parallel computations. Finally, new iterative solver algorithms have demonstrated times to solution on 1 CPU as high as 70 times faster than the best direct solvers.« less
Parallel algorithms for modeling flow in permeable media. Annual report, February 15, 1995 - February 14, 1996

DOE Office of Scientific and Technical Information (OSTI.GOV)

G.A. Pope; K. Sephernoori; D.C. McKinney

1996-03-15

This report describes the application of distributed-memory parallel programming techniques to a compositional simulator called UTCHEM. The University of Texas Chemical Flooding reservoir simulator (UTCHEM) is a general-purpose vectorized chemical flooding simulator that models the transport of chemical species in three-dimensional, multiphase flow through permeable media. The parallel version of UTCHEM addresses solving large-scale problems by reducing the amount of time that is required to obtain the solution as well as providing a flexible and portable programming environment. In this work, the original parallel version of UTCHEM was modified and ported to CRAY T3D and CRAY T3E, distributed-memory, multiprocessor computersmore » using CRAY-PVM as the interprocessor communication library. Also, the data communication routines were modified such that the portability of the original code across different computer architectures was mad possible.« less
RISC Processors and High Performance Computing

NASA Technical Reports Server (NTRS)

Saini, Subhash; Bailey, David H.; Lasinski, T. A. (Technical Monitor)

1995-01-01

In this tutorial, we will discuss top five current RISC microprocessors: The IBM Power2, which is used in the IBM RS6000/590 workstation and in the IBM SP2 parallel supercomputer, the DEC Alpha, which is in the DEC Alpha workstation and in the Cray T3D; the MIPS R8000, which is used in the SGI Power Challenge; the HP PA-RISC 7100, which is used in the HP 700 series workstations and in the Convex Exemplar; and the Cray proprietary processor, which is used in the new Cray J916. The architecture of these microprocessors will first be presented. The effective performance of these processors will then be compared, both by citing standard benchmarks and also in the context of implementing a real applications. In the process, different programming models such as data parallel (CM Fortran and HPF) and message passing (PVM and MPI) will be introduced and compared. The latest NAS Parallel Benchmark (NPB) absolute performance and performance per dollar figures will be presented. The next generation of the NP13 will also be described. The tutorial will conclude with a discussion of general trends in the field of high performance computing, including likely future developments in hardware and software technology, and the relative roles of vector supercomputers tightly coupled parallel computers, and clusters of workstations. This tutorial will provide a unique cross-machine comparison not available elsewhere.
Accelerating MP2C dispersion corrections for dimers and molecular crystals

NASA Astrophysics Data System (ADS)

Huang, Yuanhang; Shao, Yihan; Beran, Gregory J. O.

2013-06-01

The MP2C dispersion correction of Pitonak and Hesselmann [J. Chem. Theory Comput. 6, 168 (2010)], 10.1021/ct9005882 substantially improves the performance of second-order Møller-Plesset perturbation theory for non-covalent interactions, albeit with non-trivial computational cost. Here, the MP2C correction is computed in a monomer-centered basis instead of a dimer-centered one. When applied to a single dimer MP2 calculation, this change accelerates the MP2C dispersion correction several-fold while introducing only trivial new errors. More significantly, in the context of fragment-based molecular crystal studies, combination of the new monomer basis algorithm and the periodic symmetry of the crystal reduces the cost of computing the dispersion correction by two orders of magnitude. This speed-up reduces the MP2C dispersion correction calculation from a significant computational expense to a negligible one in crystals like aspirin or oxalyl dihydrazide, without compromising accuracy.
Parallel Navier-Stokes computations on shared and distributed memory architectures

NASA Technical Reports Server (NTRS)

Hayder, M. Ehtesham; Jayasimha, D. N.; Pillay, Sasi Kumar

1995-01-01

We study a high order finite difference scheme to solve the time accurate flow field of a jet using the compressible Navier-Stokes equations. As part of our ongoing efforts, we have implemented our numerical model on three parallel computing platforms to study the computational, communication, and scalability characteristics. The platforms chosen for this study are a cluster of workstations connected through fast networks (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and a distributed memory multiprocessor (the IBM SPI). Our focus in this study is on the LACE testbed. We present some results for the Cray YMP and the IBM SP1 mainly for comparison purposes. On the LACE testbed, we study: (1) the communication characteristics of Ethernet, FDDI, and the ALLNODE networks and (2) the overheads induced by the PVM message passing library used for parallelizing the application. We demonstrate that clustering of workstations is effective and has the potential to be computationally competitive with supercomputers at a fraction of the cost.
FAST: A multi-processed environment for visualization of computational fluid dynamics

NASA Technical Reports Server (NTRS)

Bancroft, Gordon V.; Merritt, Fergus J.; Plessel, Todd C.; Kelaita, Paul G.; Mccabe, R. Kevin

1991-01-01

Three-dimensional, unsteady, multi-zoned fluid dynamics simulations over full scale aircraft are typical of the problems being investigated at NASA Ames' Numerical Aerodynamic Simulation (NAS) facility on CRAY2 and CRAY-YMP supercomputers. With multiple processor workstations available in the 10-30 Mflop range, we feel that these new developments in scientific computing warrant a new approach to the design and implementation of analysis tools. These larger, more complex problems create a need for new visualization techniques not possible with the existing software or systems available as of this writing. The visualization techniques will change as the supercomputing environment, and hence the scientific methods employed, evolves even further. The Flow Analysis Software Toolkit (FAST), an implementation of a software system for fluid mechanics analysis, is discussed.
Density functional theory study of the interaction of vinyl radical, ethyne, and ethene with benzene, aimed to define an affordable computational level to investigate stability trends in large van der Waals complexes

NASA Astrophysics Data System (ADS)

Maranzana, Andrea; Giordana, Anna; Indarto, Antonius; Tonachini, Glauco; Barone, Vincenzo; Causà, Mauro; Pavone, Michele

2013-12-01

Our purpose is to identify a computational level sufficiently dependable and affordable to assess trends in the interaction of a variety of radical or closed shell unsaturated hydro-carbons A adsorbed on soot platelet models B. These systems, of environmental interest, would unavoidably have rather large sizes, thus prompting to explore in this paper the performances of relatively low-level computational methods and compare them with higher-level reference results. To this end, the interaction of three complexes between non-polar species, vinyl radical, ethyne, or ethene (A) with benzene (B) is studied, since these species, involved themselves in growth processes of polycyclic aromatic hydrocarbons (PAHs) and soot particles, are small enough to allow high-level reference calculations of the interaction energy ΔEAB. Counterpoise-corrected interaction energies ΔEAB are used at all stages. (1) Density Functional Theory (DFT) unconstrained optimizations of the A-B complexes are carried out, using the B3LYP-D, ωB97X-D, and M06-2X functionals, with six basis sets: 6-31G(d), 6-311 (2d,p), and 6-311++G(3df,3pd); aug-cc-pVDZ and aug-cc-pVTZ; N07T. (2) Then, unconstrained optimizations by Møller-Plesset second order Perturbation Theory (MP2), with each basis set, allow subsequent single point Coupled Cluster Singles Doubles and perturbative estimate of the Triples energy computations with the same basis sets [CCSD(T)//MP2]. (3) Based on an additivity assumption of (i) the estimated MP2 energy at the complete basis set limit [EMP2/CBS] and (ii) the higher-order correlation energy effects in passing from MP2 to CCSD(T) at the aug-cc-pVTZ basis set, ΔECC-MP, a CCSD(T)/CBS estimate is obtained and taken as a computational energy reference. At DFT, variations in ΔEAB with basis set are not large for the title molecules, and the three functionals perform rather satisfactorily even with rather small basis sets [6-31G(d) and N07T], exhibiting deviation from the computational reference of less than 1 kcal mol-1. The zero-point vibrational energy corrected estimates Δ(EAB+ZPE), obtained with the three functionals and the 6-31G(d) and N07T basis sets, are compared with experimental D0 measures, when available. In particular, this comparison is finally extended to the naphthalene and coronene dimers and to three π-π associations of different PAHs (R, made by 10, 16, or 24 C atoms) and P (80 C atoms).
Density functional theory study of the interaction of vinyl radical, ethyne, and ethene with benzene, aimed to define an affordable computational level to investigate stability trends in large van der Waals complexes.

PubMed

Maranzana, Andrea; Giordana, Anna; Indarto, Antonius; Tonachini, Glauco; Barone, Vincenzo; Causà, Mauro; Pavone, Michele

2013-12-28

Our purpose is to identify a computational level sufficiently dependable and affordable to assess trends in the interaction of a variety of radical or closed shell unsaturated hydro-carbons A adsorbed on soot platelet models B. These systems, of environmental interest, would unavoidably have rather large sizes, thus prompting to explore in this paper the performances of relatively low-level computational methods and compare them with higher-level reference results. To this end, the interaction of three complexes between non-polar species, vinyl radical, ethyne, or ethene (A) with benzene (B) is studied, since these species, involved themselves in growth processes of polycyclic aromatic hydrocarbons (PAHs) and soot particles, are small enough to allow high-level reference calculations of the interaction energy ΔEAB. Counterpoise-corrected interaction energies ΔEAB are used at all stages. (1) Density Functional Theory (DFT) unconstrained optimizations of the A-B complexes are carried out, using the B3LYP-D, ωB97X-D, and M06-2X functionals, with six basis sets: 6-31G(d), 6-311 (2d,p), and 6-311++G(3df,3pd); aug-cc-pVDZ and aug-cc-pVTZ; N07T. (2) Then, unconstrained optimizations by Møller-Plesset second order Perturbation Theory (MP2), with each basis set, allow subsequent single point Coupled Cluster Singles Doubles and perturbative estimate of the Triples energy computations with the same basis sets [CCSD(T)//MP2]. (3) Based on an additivity assumption of (i) the estimated MP2 energy at the complete basis set limit [EMP2/CBS] and (ii) the higher-order correlation energy effects in passing from MP2 to CCSD(T) at the aug-cc-pVTZ basis set, ΔECC-MP, a CCSD(T)/CBS estimate is obtained and taken as a computational energy reference. At DFT, variations in ΔEAB with basis set are not large for the title molecules, and the three functionals perform rather satisfactorily even with rather small basis sets [6-31G(d) and N07T], exhibiting deviation from the computational reference of less than 1 kcal mol(-1). The zero-point vibrational energy corrected estimates Δ(EAB+ZPE), obtained with the three functionals and the 6-31G(d) and N07T basis sets, are compared with experimental D0 measures, when available. In particular, this comparison is finally extended to the naphthalene and coronene dimers and to three π-π associations of different PAHs (R, made by 10, 16, or 24 C atoms) and P (80 C atoms).
Highlights of X-Stack ExM Deliverable Swift/T

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wozniak, Justin M.

Swift/T is a key success from the ExM: System support for extreme-scale, many-task applications1 X-Stack project, which proposed to use concurrent dataflow as an innovative programming model to exploit extreme parallelism in exascale computers. The Swift/T component of the project reimplemented the Swift language from scratch to allow applications that compose scientific modules together to be build and run on available petascale computers (Blue Gene, Cray). Swift/T does this via a new compiler and runtime that generates and executes the application as an MPI program. We assume that mission-critical emerging exascale applications will be composed as scalable applications using existingmore » software components, connected by data dependencies. Developers wrap native code fragments using a higherlevel language, then build composite applications to form a computational experiment. This exemplifies hierarchical concurrency: lower-level messaging libraries are used for fine-grained parallelism; highlevel control is used for inter-task coordination. These patterns are best expressed with dataflow, but static DAGs (i.e., other workflow languages) limit the applications that can be built; they do not provide the expressiveness of Swift, such as conditional execution, iteration, and recursive functions.« less
Research on Spectroscopy, Opacity, and Atmospheres

NASA Technical Reports Server (NTRS)

Kurucz, Robert L.

1999-01-01

To make my calculations more readily accessible I have set up a web site cfaku5.harvard.edu that can also be accessed by FTP. it has 5 9GB disks that hold all of my atomic and diatomic molecular data, my tables of distribution function opacities, my grids of model atmospheres, colors, fluxes, etc, my program that are ready for distribution, most of my recent papers. Atlases and computed spectra will be added as they are completed. New atomic and molecular calculations will be added as they are completed. I got my atomic programs that had been running on a Cray at the San Diego Supercomputer Center to run on my Vaxes and Alpha. I started with Ni and Co because there were new laboratory analyses that included isotopic and hyperfine splitting. Those calculations are described in the appended abstract for the 6th Atomic Spectroscopy and oscillator Strengths meeting in Victoria last summer. A surprising finding is that quadrupole transitions have been grossly in error because mixing with higher levels has not been included. I now have enough memory in my Alpha to treat 3000 x 3000 matrices. I now include all levels up through n=9 for Fe I and 11, the spectra for which the most information is available. I am finishing those calculations right now. After Fe I and Fe 11, all other spectra are "easy", and I will be in mass production. ATL;LS12, my opacity sampling program for computing models with arbitrary abundances, has been put on the web server. I wrote a new distribution function opacity program for workstations that replaces the one I used on the Cray at the San Diego Supercomputer Center. Each set of abundances would take 100 Cray hours costing $100,000. 1 ran 25 cases. Each of my opacity CDs contains three abundances. I have a new program -iinning on the Alpha that takes about a week. I am going to have to get a faster processor or I will have to dedicate a whole workstation just to opacities.
Electronic response of rare-earth magnetic-refrigeration compounds GdX2 (X = Fe and Co)

NASA Astrophysics Data System (ADS)

Bhatt, Samir; Ahuja, Ushma; Kumar, Kishor; Heda, N. L.

2018-05-01

We present the Compton profiles (CPs) of rare-earth-transition metal compounds GdX2 (X = Fe and Co) using 740 GBq 137Cs Compton spectrometer. To compare the experimental momentum densities, we have also computed the CPs, electronic band structure, density of states (DOS) and Mulliken population (MP) using linear combination of atomic orbitals (LCAO) method. Local density and generalized gradient approximations within density functional theory (DFT) along with the hybridization of Hartree-Fock and DFT (B3LYP and PBE0) have been considered under the framework of LCAO scheme. It is seen that the LCAO-B3LYP based momentum densities give a better agreement with the experimental data for both the compounds. The energy bands and DOS for both the spin-up and spin-down states show metallic like character of the reported intermetallic compounds. The localization of 3d electrons of Co and Fe has also been discussed in terms of equally normalized CPs and MP data. Discussion on magnetization using LCAO method is also included.
A performance comparison of current HPC systems: Blue Gene/Q, Cray XE6 and InfiniBand systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kerbyson, Darren J.; Barker, Kevin J.; Vishnu, Abhinav

2014-01-01

We present here a performance analysis of three of current architectures that have become commonplace in the High Performance Computing world. Blue Gene/Q is the third generation of systems from IBM that use modestly performing cores but at large-scale in order to achieve high performance. The XE6 is the latest in a long line of Cray systems that use a 3-D topology but the first to use its Gemini interconnection network. InfiniBand provides the flexibility of using compute nodes from many vendors that can be connected in many possible topologies. The performance characteristics of each vary vastly, and the waymore » in which nodes are allocated in each type of system can significantly impact on achieved performance. In this work we compare these three systems using a combination of micro-benchmarks and a set of production applications. In addition we also examine the differences in performance variability observed on each system and quantify the lost performance using a combination of both empirical measurements and performance models. Our results show that significant performance can be lost in normal production operation of the Cray XE6 and InfiniBand Clusters in comparison to Blue Gene/Q.« less
40 CFR 86.145-82 - Calculations; particulate emissions.

Code of Federal Regulations, 2014 CFR

2014-07-01

... final reported test results for the mass particulate (Mp) in grams/mile shall be computed as follows. Mp = 0.43(Mp1 + Mp2)/(Dct + Ds) + 0.57(Mp3 + Mp2)/(Dht = Ds) where: (1) Mp1 = Mass of particulate...) for determination.) (2) Mp2 = Mass of particulate determined from the “stabilized” phase of the cold...
40 CFR 86.145-82 - Calculations; particulate emissions.

Code of Federal Regulations, 2012 CFR

2012-07-01

... final reported test results for the mass particulate (Mp) in grams/mile shall be computed as follows. Mp = 0.43(Mp1 + Mp2)/(Dct + Ds) + 0.57(Mp3 + Mp2)/(Dht = Ds) where: (1) Mp1 = Mass of particulate...) for determination.) (2) Mp2 = Mass of particulate determined from the “stabilized” phase of the cold...

Analytic energy gradients for orbital-optimized MP3 and MP2.5 with the density-fitting approximation: An efficient implementation.

PubMed

Bozkaya, Uğur

2018-03-15

Efficient implementations of analytic gradients for the orbital-optimized MP3 and MP2.5 and their standard versions with the density-fitting approximation, which are denoted as DF-MP3, DF-MP2.5, DF-OMP3, and DF-OMP2.5, are presented. The DF-MP3, DF-MP2.5, DF-OMP3, and DF-OMP2.5 methods are applied to a set of alkanes and noncovalent interaction complexes to compare the computational cost with the conventional MP3, MP2.5, OMP3, and OMP2.5. Our results demonstrate that density-fitted perturbation theory (DF-MP) methods considered substantially reduce the computational cost compared to conventional MP methods. The efficiency of our DF-MP methods arise from the reduced input/output (I/O) time and the acceleration of gradient related terms, such as computations of particle density and generalized Fock matrices (PDMs and GFM), solution of the Z-vector equation, back-transformations of PDMs and GFM, and evaluation of analytic gradients in the atomic orbital basis. Further, application results show that errors introduced by the DF approach are negligible. Mean absolute errors for bond lengths of a molecular set, with the cc-pCVQZ basis set, is 0.0001-0.0002 Å. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Dense and Sparse Matrix Operations on the Cell Processor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Samuel W.; Shalf, John; Oliker, Leonid

2005-05-01

The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. Therefore, the high performance computing community is examining alternative architectures that address the limitations of modern superscalar designs. In this work, we examine STI's forthcoming Cell processor: a novel, low-power architecture that combines a PowerPC core with eight independent SIMD processing units coupled with a software-controlled memory to offer high FLOP/s/Watt. Since neither Cell hardware nor cycle-accurate simulators are currently publicly available, we develop an analytic framework to predict Cell performance on dense and sparse matrix operations, usingmore » a variety of algorithmic approaches. Results demonstrate Cell's potential to deliver more than an order of magnitude better GFLOP/s per watt performance, when compared with the Intel Itanium2 and Cray X1 processors.« less
THC-MP: High performance numerical simulation of reactive transport and multiphase flow in porous media

NASA Astrophysics Data System (ADS)

Wei, Xiaohui; Li, Weishan; Tian, Hailong; Li, Hongliang; Xu, Haixiao; Xu, Tianfu

2015-07-01

The numerical simulation of multiphase flow and reactive transport in the porous media on complex subsurface problem is a computationally intensive application. To meet the increasingly computational requirements, this paper presents a parallel computing method and architecture. Derived from TOUGHREACT that is a well-established code for simulating subsurface multi-phase flow and reactive transport problems, we developed a high performance computing THC-MP based on massive parallel computer, which extends greatly on the computational capability for the original code. The domain decomposition method was applied to the coupled numerical computing procedure in the THC-MP. We designed the distributed data structure, implemented the data initialization and exchange between the computing nodes and the core solving module using the hybrid parallel iterative and direct solver. Numerical accuracy of the THC-MP was verified through a CO2 injection-induced reactive transport problem by comparing the results obtained from the parallel computing and sequential computing (original code). Execution efficiency and code scalability were examined through field scale carbon sequestration applications on the multicore cluster. The results demonstrate successfully the enhanced performance using the THC-MP on parallel computing facilities.
A Pacific Ocean general circulation model for satellite data assimilation

NASA Technical Reports Server (NTRS)

Chao, Y.; Halpern, D.; Mechoso, C. R.

1991-01-01

A tropical Pacific Ocean General Circulation Model (OGCM) to be used in satellite data assimilation studies is described. The transfer of the OGCM from a CYBER-205 at NOAA's Geophysical Fluid Dynamics Laboratory to a CRAY-2 at NASA's Ames Research Center is documented. Two 3-year model integrations from identical initial conditions but performed on those two computers are compared. The model simulations are very similar to each other, as expected, but the simulations performed with the higher-precision CRAY-2 is smoother than that with the lower-precision CYBER-205. The CYBER-205 and CRAY-2 use 32 and 64-bit mantissa arithmetic, respectively. The major features of the oceanic circulation in the tropical Pacific, namely the North Equatorial Current, the North Equatorial Countercurrent, the South Equatorial Current, and the Equatorial Undercurrent, are realistically produced and their seasonal cycles are described. The OGCM provides a powerful tool for study of tropical oceans and for the assimilation of satellite altimetry data.
Three-Dimensional Analysis of Mandibular Angle Classification and Aesthetic Evaluation of the Lower Face in Chinese Female Adults.

PubMed

Mao, Xiaoyan; Fu, Xi; Niu, Feng; Chen, Ying; Jin, Qi; Qiao, Jia; Gui, Lai

2018-05-14

Reduction gonioplasty is very popular in East Asia. However, there has been little quantitative criteria for mandibular angle classification or aesthetics. The aim of this study was to investigate the quantitative differences of mandibular angle types and determine the morphologic features of mandibular angle in attractive women. We created a database of skull computed tomography and standardized frontal and lateral photographs of 96 Chinese female adults. Mandibular angle was classified into 3 groups, namely, extraversion, introversion, and healthy group, based on the position of gonion. We used a 5-point Likert scale to quantify attractiveness based on photographs. Those who scored 4 or higher were defined as attractive women. Three types of computed tomography measurements of the mandible were taken, including 4 distances, 4 angles, and 3 proportions. Discriminant analysis was applied to establish a mathematic model for mandibular angle aesthetics evaluation. Significant differences were observed between the different types of mandibular angle in lower facial width (Gol-Gor), mandibular angle (Co-Go-Me), and gonion divergence angle (Gol-Me-Gor) (P < 0.01). Chinese attractive women had a mandibular angle of 123.913 ± 2.989 degrees, a FH-MP of 27.033 ± 2.695 degrees, and a Go-Me/Co-Go index of 2.0. The "healthy" women had a mandibular angle of 116.402 ± 5.373 degrees, a FH-MP of 19.556 ± 5.999 degrees, and a Go-Me/Co-Go index of 1.6. The estimated Fisher linear discriminant function for the identification of attractive women was as follows: Y = -0.1516X1(Co-Go) + 0.128X2(Go-Me) + 0.04936X3(Co-Go-Me) +0.0218X4(FH-MP). Our study quantified the differences of mandibular angle types and identified the morphological features of mandibular angle in attractive Chinese female adults. Our results could assist plastic surgeons in presurgical designing of new aesthetic gonion and help to evaluate lower face aesthetics.
Assessment of higher order correlation effects with the help of Moller-Plesset perturbation theory up to sixth order

NASA Astrophysics Data System (ADS)

He, Yuan; Cremer, Dieter

For 30 molecules and two atoms, MP n correlation energies up to n = 6 are computed and used to analyse higher order correlation effects and the initial convergence behaviour of the MP n series. Particularly useful is the analysis of correlation contributions E(n)XY ...( n = 4,5,6; X , Y ,... = S, D, T, Q denoting single, double, triple, and quadruple excitations) in the form of correlation energy spectra. Two classes of system are distinguished, namely class A systems possessing well separated electron pairs and class B systems which are characterized by electron clustering in certain regions of atomic and molecular space. For class A systems, electron pair correlation effects as described by D, Q, DD, DQ, QQ, DDD, etc., contributions are most important, which are stepwise included at MP n with n = 2,... ,6. Class A systems are reasonably described by MP n theory, which is reflected by the fact that convergence of the MP n series is monotonic (but relatively slow) for class A systems. The description of class B systems is difficult since three- and four-electron correlation effects and couplings between two-, three-, and four-electron correlation effects missing for lower order perturbation theory are significant. MP n methods, which do not cover these effects, simulate higher order with lower order correlation effects thus exaggerating the latter, which has to be corrected with increasing n. Consequently, the MP n series oscillates for class B systems at low orders. A possible divergence of the MP n series is mostly a consequence of an unbalanced basis set. For example, diffuse functions added to an unsaturated sp basis lead to an exaggeration of higher order correlation effects, which can cause enhanced oscillations and divergence of the MP n series.
Comparison of scientific computing platforms for MCNP4A Monte Carlo calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendricks, J.S.; Brockhoff, R.C.

1994-04-01

The performance of seven computer platforms is evaluated with the widely used and internationally available MCNP4A Monte Carlo radiation transport code. All results are reproducible and are presented in such a way as to enable comparison with computer platforms not in the study. The authors observed that the HP/9000-735 workstation runs MCNP 50% faster than the Cray YMP 8/64. Compared with the Cray YMP 8/64, the IBM RS/6000-560 is 68% as fast, the Sun Sparc10 is 66% as fast, the Silicon Graphics ONYX is 90% as fast, the Gateway 2000 model 4DX2-66V personal computer is 27% as fast, and themore » Sun Sparc2 is 24% as fast. In addition to comparing the timing performance of the seven platforms, the authors observe that changes in compilers and software over the past 2 yr have resulted in only modest performance improvements, hardware improvements have enhanced performance by less than a factor of [approximately]3, timing studies are very problem dependent, MCNP4Q runs about as fast as MCNP4.« less
Performance of a plasma fluid code on the Intel parallel computers

NASA Technical Reports Server (NTRS)

Lynch, V. E.; Carreras, B. A.; Drake, J. B.; Leboeuf, J. N.; Liewer, P.

1992-01-01

One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel (sigma) machine gives an improvement factor close to 64 over the single-processor CRAY-2.
Optimal Full Information Synthesis for Flexible Structures Implemented on Cray Supercomputers

NASA Technical Reports Server (NTRS)

Lind, Rick; Balas, Gary J.

1995-01-01

This paper considers an algorithm for synthesis of optimal controllers for full information feedback. The synthesis procedure reduces to a single linear matrix inequality which may be solved via established convex optimization algorithms. The computational cost of the optimization is investigated. It is demonstrated the problem dimension and corresponding matrices can become large for practical engineering problems. This algorithm represents a process that is impractical for standard workstations for large order systems. A flexible structure is presented as a design example. Control synthesis requires several days on a workstation but may be solved in a reasonable amount of time using a Cray supercomputer.
A Comparison Between the PLM and the MC68020 as Prolog Processors

DTIC Science & Technology

1988-01-01

Continnt &OII P0111ter CP Memory X6_ofset(MP) A11ument Register 6 A6 Memory X7_ofset(MP) A11ument Register 7 A7 Memory X6_ofaet(MP) Tempor&ry Register 6...get_vuia.ble_Y iaput. Permeunt nria.ble Yi &Dd &rgumeat ~Jl8ler XJ output: fuDctioD move the content of Xj iato Yi get_va.na.ble_Y: move.! Xi.·4
Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

NASA Astrophysics Data System (ADS)

Calafiura, Paolo; Leggett, Charles; Seuster, Rolf; Tsulaia, Vakhtang; Van Gemmeren, Peter

2015-12-01

AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write mechanisms, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows the running of AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.
Computing sextic centrifugal distortion constants by DFT: A benchmark analysis on halogenated compounds

NASA Astrophysics Data System (ADS)

Pietropolli Charmet, Andrea; Stoppa, Paolo; Tasinato, Nicola; Giorgianni, Santi

2017-05-01

This work presents a benchmark study on the calculation of the sextic centrifugal distortion constants employing cubic force fields computed by means of density functional theory (DFT). For a set of semi-rigid halogenated organic compounds several functionals (B2PLYP, B3LYP, B3PW91, M06, M06-2X, O3LYP, X3LYP, ωB97XD, CAM-B3LYP, LC-ωPBE, PBE0, B97-1 and B97-D) were used for computing the sextic centrifugal distortion constants. The effects related to the size of basis sets and the performances of hybrid approaches, where the harmonic data obtained at higher level of electronic correlation are coupled with cubic force constants yielded by DFT functionals, are presented and discussed. The predicted values were compared to both the available data published in the literature and those obtained by calculations carried out at increasing level of electronic correlation: Hartree-Fock Self Consistent Field (HF-SCF), second order Møller-Plesset perturbation theory (MP2), and coupled-cluster single and double (CCSD) level of theory. Different hybrid approaches, having the cubic force field computed at DFT level of theory coupled to harmonic data computed at increasing level of electronic correlation (up to CCSD level of theory augmented by a perturbational estimate of the effects of connected triple excitations, CCSD(T)) were considered. The obtained results demonstrate that they can represent reliable and computationally affordable methods to predict sextic centrifugal terms with an accuracy almost comparable to that yielded by the more expensive anharmonic force fields fully computed at MP2 and CCSD levels of theory. In view of their reduced computational cost, these hybrid approaches pave the route to the study of more complex systems.
Improved Access to Supercomputers Boosts Chemical Applications.

ERIC Educational Resources Information Center

Borman, Stu

1989-01-01

Supercomputing is described in terms of computing power and abilities. The increase in availability of supercomputers for use in chemical calculations and modeling are reported. Efforts of the National Science Foundation and Cray Research are highlighted. (CW)
Parallelization of ARC3D with Computer-Aided Tools

NASA Technical Reports Server (NTRS)

Jin, Haoqiang; Hribar, Michelle; Yan, Jerry; Saini, Subhash (Technical Monitor)

1998-01-01

A series of efforts have been devoted to investigating methods of porting and parallelizing applications quickly and efficiently for new architectures, such as the SCSI Origin 2000 and Cray T3E. This report presents the parallelization of a CFD application, ARC3D, using the computer-aided tools, Cesspools. Steps of parallelizing this code and requirements of achieving better performance are discussed. The generated parallel version has achieved reasonably well performance, for example, having a speedup of 30 for 36 Cray T3E processors. However, this performance could not be obtained without modification of the original serial code. It is suggested that in many cases improving serial code and performing necessary code transformations are important parts for the automated parallelization process although user intervention in many of these parts are still necessary. Nevertheless, development and improvement of useful software tools, such as Cesspools, can help trim down many tedious parallelization details and improve the processing efficiency.
Three-Dimensional High-Lift Analysis Using a Parallel Unstructured Multigrid Solver

NASA Technical Reports Server (NTRS)

Mavriplis, Dimitri J.

1998-01-01

A directional implicit unstructured agglomeration multigrid solver is ported to shared and distributed memory massively parallel machines using the explicit domain-decomposition and message-passing approach. Because the algorithm operates on local implicit lines in the unstructured mesh, special care is required in partitioning the problem for parallel computing. A weighted partitioning strategy is described which avoids breaking the implicit lines across processor boundaries, while incurring minimal additional communication overhead. Good scalability is demonstrated on a 128 processor SGI Origin 2000 machine and on a 512 processor CRAY T3E machine for reasonably fine grids. The feasibility of performing large-scale unstructured grid calculations with the parallel multigrid algorithm is demonstrated by computing the flow over a partial-span flap wing high-lift geometry on a highly resolved grid of 13.5 million points in approximately 4 hours of wall clock time on the CRAY T3E.
STARS: An Integrated, Multidisciplinary, Finite-Element, Structural, Fluids, Aeroelastic, and Aeroservoelastic Analysis Computer Program

NASA Technical Reports Server (NTRS)

Gupta, K. K.

1997-01-01

A multidisciplinary, finite element-based, highly graphics-oriented, linear and nonlinear analysis capability that includes such disciplines as structures, heat transfer, linear aerodynamics, computational fluid dynamics, and controls engineering has been achieved by integrating several new modules in the original STARS (STructural Analysis RoutineS) computer program. Each individual analysis module is general-purpose in nature and is effectively integrated to yield aeroelastic and aeroservoelastic solutions of complex engineering problems. Examples of advanced NASA Dryden Flight Research Center projects analyzed by the code in recent years include the X-29A, F-18 High Alpha Research Vehicle/Thrust Vectoring Control System, B-52/Pegasus Generic Hypersonics, National AeroSpace Plane (NASP), SR-71/Hypersonic Launch Vehicle, and High Speed Civil Transport (HSCT) projects. Extensive graphics capabilities exist for convenient model development and postprocessing of analysis results. The program is written in modular form in standard FORTRAN language to run on a variety of computers, such as the IBM RISC/6000, SGI, DEC, Cray, and personal computer; associated graphics codes use OpenGL and IBM/graPHIGS language for color depiction. This program is available from COSMIC, the NASA agency for distribution of computer programs.
Calculations of molecular multipole electric moments of a series of exo-insaturated four-membered heterocycles, Y = CCH2CH2X

NASA Astrophysics Data System (ADS)

Romero, Angel H.

2017-10-01

The influence of ring puckering angle on the multipole moments of sixteen four-membered heterocycles (1-16) was theoretically estimated using MP2 and different DFTs in combination with the 6-31+G(d,p) basis set. To obtain an accurate evaluation, CCSD/cc-pVDZ level and, the MP2 and PBE1PBE methods in combination with the aug-cc-pVDZ and aug-cc-pVTZ basis sets were performed on the planar geometries of 1-16. In general, the DFT and MP2 approaches provided an identical dependence of the electrical properties with the puckering angle for 1-16. Quantitatively, the quality of the level of theory and basis sets affects significant the predictions of the multipole moments, in particular for the heterocycles containing C=O and C=S bonds. Convergence basis sets within the MP2 and PBE1PBE approximations are reached in the dipole moment calculations when the aug-cc-pVTZ basis set is used, while the quadrupole and octupole moment computations require a larger basis set than aug-cc-pVTZ. On the other hand, the multipole moments showed a strong dependence with the molecular geometry and the nature of the carbon-heteroatom bonds. Specifically, the C-X bond determines the behavior of the μ(ϕ), θ(ϕ) and Ώ(ϕ) functions, while the C=Y bond plays an important role in the magnitude of the studied properties.
A high level computational study of the CH4/CF4 dimer: how does it compare with the CH4/CH4 and CF4/CF4 dimers?

NASA Astrophysics Data System (ADS)

Biller, Matthew J.; Mecozzi, Sandro

2012-04-01

The interaction within the methane-methane (CH4/CH4), perfluoromethane-perfluoromethane (CF4/CF4) methane-perfluoromethane dimers (CH4/CF4) was calculated using the Hartree-Fock (HF) method, multiple orders of Møller-Plesset perturbation theory [MP2, MP3, MP4(DQ), MP4(SDQ), MP4(SDTQ)], and coupled cluster theory [CCSD, CCSD(T)], as well as the PW91, B97D, and M06-2X density functional theory (DFT) functionals. The basis sets of Dunning and coworkers (aug-cc-pVxZ, x = D, T, Q), Krishnan and coworkers [6-311++G(d,p), 6-311++G(2d,2p)], and Tsuzuki and coworkers [aug(df, pd)-6-311G(d,p)] were used. Basis set superposition error (BSSE) was corrected via the counterpoise method in all cases. Interaction energies obtained with the MP2 method do not fit with the experimental finding that the methane-perfluoromethane system phase separates at 94.5 K. It was not until the CCSD(T) method was considered that the interaction energy of the methane-perfluoromethane dimer (-0.69 kcal mol-1) was found to be intermediate between the methane (-0.51 kcal mol-1) and perfluoromethane (-0.78 kcal mol-1) dimers. This suggests that a perfluoromethane molecule interacts preferentially with another perfluoromethane (by about 0.09 kcal mol-1) than with a methane molecule. At temperatures much lower than the CH4/CF4 critical solution temperature of 94.5 K, this energy difference becomes significant and leads perfluoromethane molecules to associate with themselves, forming a phase separation. The DFT functionals yielded erratic results for the three dimers. Further development of DFT is needed in order to model dispersion interactions in hydrocarbon/perfluorocarbon systems.
ARC-1986-AC86-0746-2

NASA Image and Video Library

1986-10-10

Ames Director William 'Bill' Ballhaus (center left) joins visitor Sir Jeffrey Pope from Royla Aircraft Industry, England (center right) at the NAS Facility Cray 2 computer with Ron Deiss, NAS Deputy Manager (L) and Vic Peterson, Ames Deputy Director (R).
Using archived ITS data to measure the operational benefits of a system-wide adaptive ramp metering system : appendix online 10 : OR\\0x2010217 NB ramp flow & ML speed-flow plots.

DOT National Transportation Integrated Search

2008-12-01

The appendix includes various ramp flow and ML speed-flow plots: OR-217 NB, 72nd MP 6.61; OR-217 NB, 99W-EB MP 5.9; OR-217 NB, 99W-WB MP 5.85; OR-217 NB, Greenburg MP 4.65; OR-217 NB, Scholls MP 3.85; OR-217 NB, Denney MP 2.68; OR-217 NB, Allen MP 2....

Using archived ITS data to measure the operational benefits of a system-wide adaptive ramp metering system : appendix online 2 : I\\0x2010 205 NB speed flow plots.

DOT National Transportation Integrated Search

2008-12-01

The appendix includes various speed flow plots, including: I-205 NB, Gladstone MP 11.05; I-205 NB, Gladstone Hway MP 12.94; I-205 NB, Lawnfield MP 13.58; I-205 NB, Sunnybrook MP 14.32; I-205 NB, Sunnyside MP 14.7; I-205 NB, Johnson Creek MP 16.2; I-2...
Adaptation of MSC/NASTRAN to a supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gloudeman, J.F.; Hodge, J.C.

1982-01-01

MSC/NASTRAN is a large-scale general purpose digital computer program which solves a wider variety of engineering analysis problems by the finite element method. The program capabilities include static and dynamic structural analysis (linear and nonlinear), heat transfer, acoustics, electromagnetism and other types of field problems. It is used worldwide by large and small companies in such diverse fields as automotive, aerospace, civil engineering, shipbuilding, offshore oil, industrial equipment, chemical engineering, biomedical research, optics and government research. The paper presents the significant aspects of the adaptation of MSC/NASTRAN to the Cray-1. First, the general architecture and predominant functional use of MSC/NASTRANmore » are discussed to help explain the imperatives and the challenges of this undertaking. The key characteristics of the Cray-1 which influenced the decision to undertake this effort are then reviewed to help identify performance targets. An overview of the MSC/NASTRAN adaptation effort is then given to help define the scope of the project. Finally, some measures of MSC/NASTRAN's operational performance on the Cray-1 are given, along with a few guidelines to help avoid improper interpretation. 17 references.« less
Implementation and analysis of a Navier-Stokes algorithm on parallel computers

NASA Technical Reports Server (NTRS)

Fatoohi, Raad A.; Grosch, Chester E.

1988-01-01

The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
Amperometric determination of 6-mercaptopurine on functionalized multi-wall carbon nanotubes modified electrode by liquid chromatography coupled with microdialysis and its application to pharmacokinetics in rabbit.

PubMed

Cao, Xu-Ni; Lin, Li; Zhou, Yu-Yan; Shi, Guo-Yue; Zhang, Wen; Yamamoto, Katsunobu; Jin, Li-Tong

2003-07-27

In this paper, multi-wall carbon nanotubes functionalized with carboxylic groups modified electrode (MWNT-COOH CME) was fabricated. This chemically modified electrode (CME) can be used as the working electrode in the liquid chromatography for the determination of 6-mercaptopurine (6-MP). The results indicate that the CME exhibits efficiently electrocatalytic oxidation for 6-MP with relatively high sensitivity, stability and long-life. The peak currents of 6-MP are linear to its concentrations ranging from 4.0 x 10(-7) to 1.0 x 10(-4) mol l(-1) with the calculated detection limit (S/N=3) of 2.0 x 10(-7) mol l(-1). Coupled with microdialysis, the method has been successfully applied to the pharmacokinetic study of 6-MP in rabbit blood. This method provides a fast, sensible and simple technique for the pharmacokinetic study of 6-MP in vivo.
Scaled MP3 non-covalent interaction energies agree closely with accurate CCSD(T) benchmark data.

PubMed

Pitonák, Michal; Neogrády, Pavel; Cerný, Jirí; Grimme, Stefan; Hobza, Pavel

2009-01-12

Scaled MP3 interaction energies calculated as a sum of MP2/CBS (complete basis set limit) interaction energies and scaled third-order energy contributions obtained in small or medium size basis sets agree very closely with the estimated CCSD(T)/CBS interaction energies for the 22 H-bonded, dispersion-controlled and mixed non-covalent complexes from the S22 data set. Performance of this so-called MP2.5 (third-order scaling factor of 0.5) method has also been tested for 33 nucleic acid base pairs and two stacked conformers of porphine dimer. In all the test cases, performance of the MP2.5 method was shown to be superior to the scaled spin-component MP2 based methods, e.g. SCS-MP2, SCSN-MP2 and SCS(MI)-MP2. In particular, a very balanced treatment of hydrogen-bonded compared to stacked complexes is achieved with MP2.5. The main advantage of the approach is that it employs only a single empirical parameter and is thus biased by two rigorously defined, asymptotically correct ab-initio methods, MP2 and MP3. The method is proposed as an accurate but computationally feasible alternative to CCSD(T) for the computation of the properties of various kinds of non-covalently bound systems.
Research in Computational Aeroscience Applications Implemented on Advanced Parallel Computing Systems

NASA Technical Reports Server (NTRS)

Wigton, Larry

1996-01-01

Improving the numerical linear algebra routines for use in new Navier-Stokes codes, specifically Tim Barth's unstructured grid code, with spin-offs to TRANAIR is reported. A fast distance calculation routine for Navier-Stokes codes using the new one-equation turbulence models is written. The primary focus of this work was devoted to improving matrix-iterative methods. New algorithms have been developed which activate the full potential of classical Cray-class computers as well as distributed-memory parallel computers.
Turbomachinery Forced Response Prediction System (FREPS): User's Manual

NASA Technical Reports Server (NTRS)

Morel, M. R.; Murthy, D. V.

1994-01-01

The turbomachinery forced response prediction system (FREPS), version 1.2, is capable of predicting the aeroelastic behavior of axial-flow turbomachinery blades. This document is meant to serve as a guide in the use of the FREPS code with specific emphasis on its use at NASA Lewis Research Center (LeRC). A detailed explanation of the aeroelastic analysis and its development is beyond the scope of this document, and may be found in the references. FREPS has been developed by the NASA LeRC Structural Dynamics Branch. The manual is divided into three major parts: an introduction, the preparation of input, and the procedure to execute FREPS. Part 1 includes a brief background on the necessity of FREPS, a description of the FREPS system, the steps needed to be taken before FREPS is executed, an example input file with instructions, presentation of the geometric conventions used, and the input/output files employed and produced by FREPS. Part 2 contains a detailed description of the command names needed to create the primary input file that is required to execute the FREPS code. Also, Part 2 has an example data file to aid the user in creating their own input files. Part 3 explains the procedures required to execute the FREPS code on the Cray Y-MP, a computer system available at the NASA LeRC.
Analysis of vibrational spectra of 3-halo-1-propanols CH(2)XCH(2)CH(2)OH (X is Cl and Br).

PubMed

Badawi, Hassan M; Förner, Wolfgang

2008-12-01

The conformational stability and the three rotor internal rotations in 3-chloro- and 3-bromo-1-propanols were investigated by DFT-B3LYP/6-311+G and ab initio MP2/6-311+G, MP3/6-311+G and MP4(SDTQ)//MP3/6-311+G levels of theory. On the calculated potential energy surface twelve distinct minima were located all of which were not predicted to have imaginary frequencies at the B3LYP level of theory. The calculated lowest energy minimum in the potential curves of both molecules was predicted to correspond to the Gauche-gauche-trans (Ggt) conformer in excellent agreement with earlier microwave and electron diffraction results. The equilibrium constants for the conformational interconversion of the two 3-halo-1-propanols were calculated at the B3LYP/6-311+G level of calculation and found to correspond to an equilibrium mixture of about 32% Ggt, 18% Ggg1, 13% Tgt, 8% Tgg and 8% Gtt conformations for 3-chloro-1-propanol and 34% Ggt, 15% Tgt, 13% Ggg1, 9% Tgg and 7% Gtt conformations for 3-bromo-1-propanol at 298.15K. The nature of the high energy conformations was verified by carrying out solvent experiments using formamide ( epsilon=109.5) and MP3 and MP4//MP3 calculations. The vibrational frequencies of each molecule in its three most stable forms were computed at the B3LYP level and complete vibrational assignments were made based on normal coordinate calculations and comparison with experimental data of the molecules.
Franklin: User Experiences

DOE Office of Scientific and Technical Information (OSTI.GOV)

National Energy Research Supercomputing Center; He, Yun; Kramer, William T.C.

2008-05-07

The newest workhorse of the National Energy Research Scientific Computing Center is a Cray XT4 with 9,736 dual core nodes. This paper summarizes Franklin user experiences from friendly early user period to production period. Selected successful user stories along with top issues affecting user experiences are presented.
Antenna pattern control using impedance surfaces

NASA Technical Reports Server (NTRS)

Balanis, Constantine A.; Liu, Kefeng

1992-01-01

During this research period, we have effectively transferred existing computer codes from CRAY supercomputer to work station based systems. The work station based version of our code preserved the accuracy of the numerical computations while giving a much better turn-around time than the CRAY supercomputer. Such a task relieved us of the heavy dependence of the supercomputer account budget and made codes developed in this research project more feasible for applications. The analysis of pyramidal horns with impedance surfaces was our major focus during this research period. Three different modeling algorithms in analyzing lossy impedance surfaces were investigated and compared with measured data. Through this investigation, we discovered that a hybrid Fourier transform technique, which uses the eigen mode in the stepped waveguide section and the Fourier transformed field distributions across the stepped discontinuities for lossy impedances coating, gives a better accuracy in analyzing lossy coatings. After a further refinement of the present technique, we will perform an accurate radiation pattern synthesis in the coming reporting period.
Studying an Eulerian Computer Model on Different High-performance Computer Platforms and Some Applications

NASA Astrophysics Data System (ADS)

Georgiev, K.; Zlatev, Z.

2010-11-01

The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.
A Computational Study of Chalcogen-containing H2 X…YF and (CH3 )2 X…YF (X=O, S, Se; Y=F, Cl, H) and Pnicogen-containing H3 X'…YF and (CH3 )3 X'…YF (X'=N, P, As) Complexes.

PubMed

McDowell, Sean A C; Buckingham, A David

2018-04-20

A computational study was undertaken for the model complexes H 2 X…YF and (CH 3 ) 2 X…YF (X=O, S, Se; Y=F, Cl, H), and H 3 X'…YF and (CH 3 ) 3 X'…YF (X'=N, P, As), at the MP2/6-311++G(d,p) level of theory. For H 2 X…YF and H 3 X'…YF, noncovalent interactions dominate the binding in order of increasing YF dipole moment, except for H 3 As…F 2 , and possibly H 3 As…ClF. However, for the methyl-substituted complexes (CH 3 ) 2 X…YF and (CH 3 ) 3 X'…YF the binding is especially strong for the complexes containing F 2 , implying significant chemical bonding between the interacting molecules. The relative stability of these complexes can be rationalized by the difference in the electronegativity of the X or X' and Y atoms. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
The application of CFD to rotary wing flow problems

NASA Technical Reports Server (NTRS)

Caradonna, F. X.

1990-01-01

Rotorcraft aerodynamics is especially rich in unsolved problems, and for this reason the need for independent computational and experimental studies is great. Three-dimensional unsteady, nonlinear potential methods are becoming fast enough to enable their use in parametric design studies. At present, combined CAMRAD/FPR analyses for a complete trimmed rotor soltution can be performed in about an hour on a CRAY Y-MP (or ten minutes, with multiple processors). These computational speeds indicate that in the near future many of the large CFD problems will no longer require a supercomputer. The ability to convect circulation is routine for integral methods, but only recently was it discovered how to do the same with differential methods. It is clear that the differential CFD rotor analyses are poised to enter the engineering workplace. Integral methods already constitute a mainstay. Ultimately, it is the users who will integrate CFD into the entire engineering process and provide a new measure of confidence in design and analysis. It should be recognized that the above classes of analyses do not include several major limiting phenomena which will continue to require empirical treatment because of computational time constraints and limited physical understanding. Such empirical treatment should be included, however, into the developing CFD, engineering level analyses. It is likely that properly constructed flow models containing corrections from physical testing will be able to fill in unavoidable gaps in the experimental data base, both for basic studies and for specific configuration testing. For these kinds of applications, computational cost is not an issue. Finally, it should be recognized that although rotorcraft are probably the most complex of aircraft, the rotorcraft engineering community is very small compared to the fixed-wing community. Likewise, rotorcraft CFD resources can never achieve fixed-wing proportions and must be used wisely. Therefore the fixed-wing work must be gleaned for many of the basic methods.
An Experimental/Analytical Investigation into the Performance of a 20-Percent Thick, 8.5-Percent Cambered, Circulation Controlled Airfoil.

DTIC Science & Technology

1982-12-01

u z w Li 0 -1 .5 -C~mp ",0.9800 + - Cpmp -,.074 "-a - Cm -am .8144 .-. •- Cp~mp -. 0191 So - Cp0mp -. 02413 :. x - Cjp - .0l381 • -n - Cpap - .808 I-2...b.4. I ° A 0- CmJmp - .0492 0 +- - Cm p - .431 a - Cpmp - .3355 0 f - Clamp - .0293 a - Cpmp - .0228 x - Cpimp - .0171 n - Cpap - .6693 1.0 n 0 .01
Parallel 3D Mortar Element Method for Adaptive Nonconforming Meshes

NASA Technical Reports Server (NTRS)

Feng, Huiyu; Mavriplis, Catherine; VanderWijngaart, Rob; Biswas, Rupak

2004-01-01

High order methods are frequently used in computational simulation for their high accuracy. An efficient way to avoid unnecessary computation in smooth regions of the solution is to use adaptive meshes which employ fine grids only in areas where they are needed. Nonconforming spectral elements allow the grid to be flexibly adjusted to satisfy the computational accuracy requirements. The method is suitable for computational simulations of unsteady problems with very disparate length scales or unsteady moving features, such as heat transfer, fluid dynamics or flame combustion. In this work, we select the Mark Element Method (MEM) to handle the non-conforming interfaces between elements. A new technique is introduced to efficiently implement MEM in 3-D nonconforming meshes. By introducing an "intermediate mortar", the proposed method decomposes the projection between 3-D elements and mortars into two steps. In each step, projection matrices derived in 2-D are used. The two-step method avoids explicitly forming/deriving large projection matrices for 3-D meshes, and also helps to simplify the implementation. This new technique can be used for both h- and p-type adaptation. This method is applied to an unsteady 3-D moving heat source problem. With our new MEM implementation, mesh adaptation is able to efficiently refine the grid near the heat source and coarsen the grid once the heat source passes. The savings in computational work resulting from the dynamic mesh adaptation is demonstrated by the reduction of the the number of elements used and CPU time spent. MEM and mesh adaptation, respectively, bring irregularity and dynamics to the computer memory access pattern. Hence, they provide a good way to gauge the performance of computer systems when running scientific applications whose memory access patterns are irregular and unpredictable. We select a 3-D moving heat source problem as the Unstructured Adaptive (UA) grid benchmark, a new component of the NAS Parallel Benchmarks (NPB). In this paper, we present some interesting performance results of ow OpenMP parallel implementation on different architectures such as the SGI Origin2000, SGI Altix, and Cray MTA-2.
Multilevel Parallelization of AutoDock 4.2.

PubMed

Norgan, Andrew P; Coffman, Paul K; Kocher, Jean-Pierre A; Katzmann, David J; Sosa, Carlos P

2011-04-28

Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4). Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.
A leap forward with UTK s Cray XC30

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fahey, Mark R

2014-01-01

This paper shows a significant productivity leap for several science groups and the accomplishments they have made to date on Darter - a Cray XC30 at the University of Tennessee Knoxville. The increased productivity is due to faster processors and interconnect combined in a new generation from Cray, and yet it still has a very similar programming environment as compared to previous generations of Cray machines that makes porting easy.
Late evolution of very low mass X-ray binaries sustained by radiation from their primaries

NASA Technical Reports Server (NTRS)

Ruderman, M.; Shaham, J.; Tavani, M.; Eichler, D.

1989-01-01

The accretion-powered radiation from the X-ray pulsar system Her X-1 (McCray et al. 1982) is studied. The changes in the soft X-ray and gamma-ray flux and in the accompanying electron-positron wind are discussed. These are believed to be associated with the inward movement of the inner edge of the accretion disk corresponding to the boundary with the neutron star's corotating magnetosphere (Alfven radius). LMXB evolution which is self-sustained by secondary winds intercepting the radiation emitted near an LMXB neutron star is investigated as well.
Comparing the Performance of Blue Gene/Q with Leading Cray XE6 and InfiniBand Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kerbyson, Darren J.; Barker, Kevin J.; Vishnu, Abhinav

2013-01-21

Abstract—Three types of systems dominate the current High Performance Computing landscape: the Cray XE6, the IBM Blue Gene, and commodity clusters using InfiniBand. These systems have quite different characteristics making the choice for a particular deployment difficult. The XE6 uses Cray’s proprietary Gemini 3-D torus interconnect with two nodes at each network endpoint. The latest IBM Blue Gene/Q uses a single socket integrating processor and communication in a 5-D torus network. InfiniBand provides the flexibility of using nodes from many vendors connected in many possible topologies. The performance characteristics of each vary vastly along with their utilization model. In thismore » work we compare the performance of these three systems using a combination of micro-benchmarks and a set of production applications. In particular we discuss the causes of variability in performance across the systems and also quantify where performance is lost using a combination of measurements and models. Our results show that significant performance can be lost in normal production operation of the Cray XT6 and InfiniBand Clusters in comparison to Blue Gene/Q.« less
G3X-K theory: A composite theoretical method for thermochemical kinetics

NASA Astrophysics Data System (ADS)

da Silva, Gabriel

2013-02-01

A composite theoretical method for accurate thermochemical kinetics, G3X-K, is described. This method is accurate to around 0.5 kcal mol-1 for barrier heights and 0.8 kcal mol-1 for enthalpies of formation. G3X-K is a modification of G3SX theory using the M06-2X density functional for structures and zero-point energies and parameterized for a test set of 223 heats of formation and 23 barrier heights. A reduced perturbation-order variant, G3X(MP3)-K, is also developed, providing around 0.7 kcal mol-1 accuracy for barrier heights and 0.9 kcal mol-1 accuracy for enthalpies, at reduced computational cost. Some opportunities to further improve Gn composite methods are identified and briefly discussed.

The Chandra Multi-Wavelength Project (ChaMP): A Serendipitous X-Ray Survey Using Chandra Archival Data

NASA Technical Reports Server (NTRS)

Wilkes, Belinda; Lavoie, Anthony R. (Technical Monitor)

2000-01-01

The launch of the Chandra X-ray Observatory in July 2000 opened a new era in X-ray astronomy. Its unprecedented, < 1" spatial resolution and low background is providing views of the X-ray sky 10-100 times fainter than previously possible. We have begun to carry out a serendipitous survey of the X-ray sky using Chandra archival data to flux limits covering the range between those reached by current satellites and those of the small area Chandra deep surveys. We estimate the survey will cover about 8 sq.deg. per year to X-ray fluxes (2-10 keV) in the range 10(exp -13) - 6(exp -16) erg cm2/s and include about 3000 sources per year, roughly two thirds of which are expected to be active galactic nuclei (AGN). Optical imaging of the ChaMP fields is underway at NOAO and SAO telescopes using g',r',z' colors with which we will be able to classify the X-ray sources into object types and, in some cases, estimate their redshifts. We are also planning to obtain optical spectroscopy of a well-defined subset to allow confirmation of classification and redshift determination. All X-ray and optical results and supporting optical data will be place in the ChaMP archive within a year of the completion of our data analysis. Over the five years of Chandra operations, ChaMP will provide both a major resource for Chandra observers and a key research tool for the study of the cosmic X-ray background and the individual source populations which comprise it. ChaMP promises profoundly new science return on a number of key questions at the current frontier of many areas of astronomy including solving the spectral paradox by resolving the CXRB, locating and studying high redshift clusters and so constraining cosmological parameters, defining the true, possibly absorbed, population of quasars and studying coronal emission from late-type stars as their cores become fully convective. The current status and initial results from the ChaMP will be presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Christoph, G.G; Jackson, K.A.; Neuman, M.C.

An effective method for detecting computer misuse is the automatic auditing and analysis of on-line user activity. This activity is reflected in the system audit record, by changes in the vulnerability posture of the system configuration, and in other evidence found through active testing of the system. In 1989 we started developing an automatic misuse detection system for the Integrated Computing Network (ICN) at Los Alamos National Laboratory. Since 1990 this system has been operational, monitoring a variety of network systems and services. We call it the Network Anomaly Detection and Intrusion Reporter, or NADIR. During the last year andmore » a half, we expanded NADIR to include processing of audit and activity records for the Cray UNICOS operating system. This new component is called the UNICOS Real-time NADIR, or UNICORN. UNICORN summarizes user activity and system configuration information in statistical profiles. In near real-time, it can compare current activity to historical profiles and test activity against expert rules that express our security policy and define improper or suspicious behavior. It reports suspicious behavior to security auditors and provides tools to aid in follow-up investigations. UNICORN is currently operational on four Crays in Los Alamos` main computing network, the ICN.« less
Researchers Mine Information from Next-Generation Subsurface Flow Simulations

DOE PAGES

Gedenk, Eric D.

2015-12-01

A research team based at Virginia Tech University leveraged computing resources at the US Department of Energy's (DOE's) Oak Ridge National Laboratory to explore subsurface multiphase flow phenomena that can't be experimentally observed. Using the Cray XK7 Titan supercomputer at the Oak Ridge Leadership Computing Facility, the team took Micro-CT images of subsurface geologic systems and created two-phase flow simulations. The team's model development has implications for computational research pertaining to carbon sequestration, oil recovery, and contaminant transport.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Painter, J.; McCormick, P.; Krogh, M.

This paper presents the ACL (Advanced Computing Lab) Message Passing Library. It is a high throughput, low latency communications library, based on Thinking Machines Corp.`s CMMD, upon which message passing applications can be built. The library has been implemented on the Cray T3D, Thinking Machines CM-5, SGI workstations, and on top of PVM.
Computing Operating Characteristics Of Bearing/Shaft Systems

NASA Technical Reports Server (NTRS)

Moore, James D.

1996-01-01

SHABERTH computer program predicts operating characteristics of bearings in multibearing load-support system. Lubricated and nonlubricated bearings modeled. Calculates loads, torques, temperatures, and fatigue lives of ball and/or roller bearings on single shaft. Provides for analysis of reaction of system to termination of supply of lubricant to bearings and other lubricated mechanical elements. Valuable in design and analysis of shaft/bearing systems. Two versions of SHABERTH available. Cray version (LEW-14860), "Computing Thermal Performances Of Shafts and Bearings". IBM PC version (MFS-28818), written for IBM PC-series and compatible computers running MS-DOS.
LASL benchmark performance 1978. [CDC STAR-100, 6600, 7600, Cyber 73, and CRAY-1

DOE Office of Scientific and Technical Information (OSTI.GOV)

McKnight, A.L.

1979-08-01

This report presents the results of running several benchmark programs on a CDC STAR-100, a Cray Research CRAY-1, a CDC 6600, a CDC 7600, and a CDC Cyber 73. The benchmark effort included CRAY-1's at several installations running different operating systems and compilers. This benchmark is part of an ongoing program at Los Alamos Scientific Laboratory to collect performance data and monitor the development trend of supercomputers. 3 tables.
Reconstruction for time-domain in vivo EPR 3D multigradient oximetric imaging--a parallel processing perspective.

PubMed

Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C

2009-01-01

Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.
Surprising performance for vibrational frequencies of the distinguishable clusters with singles and doubles (DCSD) and MP2.5 approximations

NASA Astrophysics Data System (ADS)

Kesharwani, Manoj K.; Sylvetsky, Nitai; Martin, Jan M. L.

2017-11-01

We show that the DCSD (distinguishable clusters with all singles and doubles) correlation method permits the calculation of vibrational spectra at near-CCSD(T) quality but at no more than CCSD cost, and with comparatively inexpensive analytical gradients. For systems dominated by a single reference configuration, even MP2.5 is a viable alternative, at MP3 cost. MP2.5 performance for vibrational frequencies is comparable to double hybrids such as DSD-PBEP86-D3BJ, but without resorting to empirical parameters. DCSD is also quite suitable for computing zero-point vibrational energies in computational thermochemistry.
Evaluation of the performance of MP4-based procedures for a wide range of thermochemical and kinetic properties

NASA Astrophysics Data System (ADS)

Yu, Li-Juan; Wan, Wenchao; Karton, Amir

2016-11-01

We evaluate the performance of standard and modified MPn procedures for a wide set of thermochemical and kinetic properties, including atomization energies, structural isomerization energies, conformational energies, and reaction barrier heights. The reference data are obtained at the CCSD(T)/CBS level by means of the Wn thermochemical protocols. We find that none of the MPn-based procedures show acceptable performance for the challenging W4-11 and BH76 databases. For the other thermochemical/kinetic databases, the MP2.5 and MP3.5 procedures provide the most attractive accuracy-to-computational cost ratios. The MP2.5 procedure results in a weighted-total-root-mean-square deviation (WTRMSD) of 3.4 kJ/mol, whilst the computationally more expensive MP3.5 procedure results in a WTRMSD of 1.9 kJ/mol (the same WTRMSD obtained for the CCSD(T) method in conjunction with a triple-zeta basis set). We also assess the performance of the computationally economical CCSD(T)/CBS(MP2) method, which provides the best overall performance for all the considered databases, including W4-11 and BH76.
Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Villa, Oreste; Tumeo, Antonino; Secchi, Simone

Irregular applications, such as data mining and analysis or graph-based computations, show unpredictable memory/network access patterns and control structures. Highly multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2 and XMT, appear to address their requirements better than commodity clusters. However, the research on highly multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy and customization. At the same time, Shared-memory MultiProcessors (SMPs) with multi-core processors have become an attractive platform to simulate large scale machines. In this paper, wemore » introduce a cycle-level simulator of the highly multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques introduced to make the simulation as fast as possible while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at run-time and includes a network model that takes into account contention. On a modern 48-core SMP host, our infrastructure simulates a large set of irregular applications 500 to 2000 times slower than real time when compared to a 128-processor XMT, while remaining within 10\\% of accuracy. Emulation is only from 25 to 200 times slower than real time.« less
Research on Spectroscopy, Opacity, and Atmospheres

NASA Technical Reports Server (NTRS)

Kurucz, Robert L.

1999-01-01

A web site has been set up to make the calculations accessible; (i.e., cfakus.harvard.edu) This data can also be accessed by FTP. It has all of the atomic and diatomic molecular data, tables of distribution function opacities, grids of model atmospheres, colors, fluxes, etc, programs that are ready for distribution, and most of recent papers developed during this grant. Atlases and computed spectra will be added as they are completed. New atomic and molecular calculations will be added as they are completed. The atomic programs that had been running on a Cray at the San Diego Supercomputer Center can now run on the Vaxes and Alpha. The work started with Ni and Co because there were new laboratory analyses that included isotopic and hyperfine splitting. Those calculations are described in the appended abstract for the 6th Atomic Spectroscopy and oscillator Strengths meeting in Victoria last summer. A surprising finding is that quadrupole transitions have been grossly in error because mixing with higher levels has not been included. All levels up through n=9 for Fe I and II, the spectra for which the most information is available, are now included. After Fe I and Fe II, all other spectra are "easy". ATLAS12, the opacity sampling program for computing models with arbitrary abundances, has been put on the web server. A new distribution function opacity program for workstations that replaces the one used on the Cray at the San Diego Supercomputer Center has been written. Each set of abundances would take 100 Cray hours costing $100,000.
The reliability of dental x-ray film in assessment of MP3 stages of the pubertal growth spurt.

PubMed

Abdel-Kader, H M

1998-10-01

The main object of this clinical study is to provide a simple and practical method to assess the pubertal growth spurt stages of a subject by recording MP3 stages with the dental periapical radiograph and the standard dental x-ray machine.
Cpu/gpu Computing for AN Implicit Multi-Block Compressible Navier-Stokes Solver on Heterogeneous Platform

NASA Astrophysics Data System (ADS)

Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin

2016-06-01

CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.
Importance of the Electron Correlation and Dispersion Corrections in Calculations Involving Enamines, Hemiaminals, and Aminals. Comparison of B3LYP, M06-2X, MP2, and CCSD Results with Experimental Data.

PubMed

Castro-Alvarez, Alejandro; Carneros, Héctor; Sánchez, Dani; Vilarrasa, Jaume

2015-12-18

While B3LYP, M06-2X, and MP2 calculations predict the ΔG° values for exchange equilibria between enamines and ketones with similar acceptable accuracy, the M06-2X/6-311+G(d,p) and MP2/6-311+G(d,p) methods are required for enamine formation reactions (for example, for enamine 5a, arising from 3-methylbutanal and pyrrolidine). Stronger disagreement was observed when calculated energies of hemiaminals (N,O-acetals) and aminals (N,N-acetals) were compared with experimental equilibrium constants, which are reported here for the first time. Although it is known that the B3LYP method does not provide a good description of the London dispersion forces, while M06-2X and MP2 may overestimate them, it is shown here how large the gaps are and that at least single-point calculations at the CCSD(T)/6-31+G(d) level should be used for these reaction intermediates; CCSD(T)/6-31+G(d) and CCSD(T)/6-311+G(d,p) calculations afford ΔG° values in some cases quite close to MP2/6-311+G(d,p) while in others closer to M06-2X/6-311+G(d,p). The effect of solvents is similarly predicted by the SMD, CPCM, and IEFPCM approaches (with energy differences below 1 kcal/mol).
A dual communicator and dual grid-resolution algorithm for petascale simulations of turbulent mixing at high Schmidt number

NASA Astrophysics Data System (ADS)

Clay, M. P.; Buaria, D.; Gotoh, T.; Yeung, P. K.

2017-10-01

A new dual-communicator algorithm with very favorable performance characteristics has been developed for direct numerical simulation (DNS) of turbulent mixing of a passive scalar governed by an advection-diffusion equation. We focus on the regime of high Schmidt number (S c), where because of low molecular diffusivity the grid-resolution requirements for the scalar field are stricter than those for the velocity field by a factor √{ S c }. Computational throughput is improved by simulating the velocity field on a coarse grid of Nv3 points with a Fourier pseudo-spectral (FPS) method, while the passive scalar is simulated on a fine grid of Nθ3 points with a combined compact finite difference (CCD) scheme which computes first and second derivatives at eighth-order accuracy. A static three-dimensional domain decomposition and a parallel solution algorithm for the CCD scheme are used to avoid the heavy communication cost of memory transposes. A kernel is used to evaluate several approaches to optimize the performance of the CCD routines, which account for 60% of the overall simulation cost. On the petascale supercomputer Blue Waters at the University of Illinois, Urbana-Champaign, scalability is improved substantially with a hybrid MPI-OpenMP approach in which a dedicated thread per NUMA domain overlaps communication calls with computational tasks performed by a separate team of threads spawned using OpenMP nested parallelism. At a target production problem size of 81923 (0.5 trillion) grid points on 262,144 cores, CCD timings are reduced by 34% compared to a pure-MPI implementation. Timings for 163843 (4 trillion) grid points on 524,288 cores encouragingly maintain scalability greater than 90%, although the wall clock time is too high for production runs at this size. Performance monitoring with CrayPat for problem sizes up to 40963 shows that the CCD routines can achieve nearly 6% of the peak flop rate. The new DNS code is built upon two existing FPS and CCD codes. With the grid ratio Nθ /Nv = 8, the disparity in the computational requirements for the velocity and scalar problems is addressed by splitting the global communicator MPI_COMM_WORLD into disjoint communicators for the velocity and scalar fields, respectively. Inter-communicator transfer of the velocity field from the velocity communicator to the scalar communicator is handled with discrete send and non-blocking receive calls, which are overlapped with other operations on the scalar communicator. For production simulations at Nθ = 8192 and Nv = 1024 on 262,144 cores for the scalar field, the DNS code achieves 94% strong scaling relative to 65,536 cores and 92% weak scaling relative to Nθ = 1024 and Nv = 128 on 512 cores.
Logistic model analysis of neurological findings in Minamata disease and the predicting index.

PubMed

Nakagawa, Masanori; Kodama, Tomoko; Akiba, Suminori; Arimura, Kimiyoshi; Wakamiya, Junji; Futatsuka, Makoto; Kitano, Takao; Osame, Mitsuhiro

2002-01-01

To establish a statistical diagnostic method to identify patients with Minamata disease (MD) considering factors of aging and sex, we analyzed the neurological findings in MD patients, inhabitants in a methylmercury polluted (MP) area, and inhabitants in a non-MP area. We compared the neurological findings in MD patients and inhabitants aged more than 40 years in the non-MP area. Based on the different frequencies of the neurological signs in the two groups, we devised the following formula to calculate the predicting index for MD: predicting index = 1/(1+e(-x)) x 100 (The value of x was calculated using the regression coefficients of each neurological finding obtained from logistic analysis. The index 100 indicated MD, and 0, non-MD). Using this method, we found that 100% of male and 98% of female patients with MD (95 cases) gave predicting indices higher than 95. Five percent of the aged inhabitants in the MP area (598 inhabitants) and 0.2% of those in the non-MP area (558 inhabitants) gave predicting indices of 50 or higher. Our statistical diagnostic method for MD was useful in distinguishing MD patients from healthy elders based on their neurological findings.
Crystallization and preliminary X-ray characterization of the genetically encoded fluorescent calcium indicator protein GCaMP2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rodríguez Guilbe, María M.; Protein Research and Development Center, University of Puerto Rico; Alfaro Malavé, Elisa C.

The genetically encoded fluorescent calcium-indicator protein GCaMP2 was crystallized in the calcium-saturated form. X-ray diffraction data were collected to 2.0 Å resolution and the structure was solved by molecular replacement. Fluorescent proteins and their engineered variants have played an important role in the study of biology. The genetically encoded calcium-indicator protein GCaMP2 comprises a circularly permuted fluorescent protein coupled to the calcium-binding protein calmodulin and a calmodulin target peptide, M13, derived from the intracellular calmodulin target myosin light-chain kinase and has been used to image calcium transients in vivo. To aid rational efforts to engineer improved variants of GCaMP2, thismore » protein was crystallized in the calcium-saturated form. X-ray diffraction data were collected to 2.0 Å resolution. The crystals belong to space group C2, with unit-cell parameters a = 126.1, b = 47.1, c = 68.8 Å, β = 100.5° and one GCaMP2 molecule in the asymmetric unit. The structure was phased by molecular replacement and refinement is currently under way.« less
A Reinvestigation of the Dimer of para-Benzoquinone with Pyrimidine with MP2, CCSD(T) and DFT using Functionals including those Designed to Describe Dispersion

PubMed Central

Marianski, Mateusz; Oliva, Antoni

2012-01-01

We reevaluate the interaction of pyridine and p-benzoquinone using functionals designed to treat dispersion. We compare the relative energies of four different structures: stacked, T-shaped (identified for the first time) and two planar H-bonded geometries using these functionals (B97-D, ωB97x-D, M05, M05-2X, M06, M06L, M06-2X), other functionals (PBE1PBE, B3LYP, X3LYP), MP2 and CCSD(T) using basis sets as large as cc-pVTZ. The functionals designed to treat dispersion behave erratically as the predictions of the most stable structure vary considerably. MP2 predicts the experimentally observed structure (H-bonded) to be the least stable, while single point CCSD(T) at the MP2 optimized geometry correctly predicts the observed structure to be most stable. We have confirmed the assignment of the experimental structure using new calculations of the vibrational frequency shifts previously used to identify the structure. The MP2/cc-pVTZ vibrational calculations are in excellent agreement with the observations. All methods used to calculate the energies provide vibrational shifts that agree with the observed structure even though most do not predict this structure to be most stable. The implications for evaluating possible π-stacking in biologically important systems are discussed. PMID:22765283
A reinvestigation of the dimer of para-benzoquinone and pyrimidine with MP2, CCSD(T), and DFT using functionals including those designed to describe dispersion.

PubMed

Marianski, Mateusz; Oliva, Antoni; Dannenberg, J J

2012-08-02

We reevaluate the interaction of pyridine and p-benzoquinone using functionals designed to treat dispersion. We compare the relative energies of four different structures: stacked, T-shaped (identified for the first time), and two planar H-bonded geometries using these functionals (B97-D, ωB97x-D, M05, M05-2X, M06, M06L, and M06-2X), other functionals (PBE1PBE, B3LYP, X3LYP), MP2, and CCSD(T) using basis sets as large as cc-pVTZ. The functionals designed to treat dispersion behave erratically as the predictions of the most stable structure vary considerably. MP2 predicts the experimentally observed structure (H-bonded) to be the least stable, while single-point CCSD(T) at the MP2 optimized geometry correctly predicts the observed structure to be the most stable. We have confirmed the assignment of the experimental structure using new calculations of the vibrational frequency shifts previously used to identify the structure. The MP2/cc-pVTZ vibrational calculations are in excellent agreement with the observations. All methods used to calculate the energies provide vibrational shifts that agree with the observed structure even though most do not predict this structure to be most stable. The implications for evaluating possible π-stacking in biologically important systems are discussed.
The Research of the Parallel Computing Development from the Angle of Cloud Computing

NASA Astrophysics Data System (ADS)

Peng, Zhensheng; Gong, Qingge; Duan, Yanyu; Wang, Yun

2017-10-01

Cloud computing is the development of parallel computing, distributed computing and grid computing. The development of cloud computing makes parallel computing come into people’s lives. Firstly, this paper expounds the concept of cloud computing and introduces two several traditional parallel programming model. Secondly, it analyzes and studies the principles, advantages and disadvantages of OpenMP, MPI and Map Reduce respectively. Finally, it takes MPI, OpenMP models compared to Map Reduce from the angle of cloud computing. The results of this paper are intended to provide a reference for the development of parallel computing.

Improved analysis of SP and CoSaMP under total perturbations

NASA Astrophysics Data System (ADS)

Li, Haifeng

2016-12-01

Practically, in the underdetermined model y= A x, where x is a K sparse vector (i.e., it has no more than K nonzero entries), both y and A could be totally perturbed. A more relaxed condition means less number of measurements are needed to ensure the sparse recovery from theoretical aspect. In this paper, based on restricted isometry property (RIP), for subspace pursuit (SP) and compressed sampling matching pursuit (CoSaMP), two relaxed sufficient conditions are presented under total perturbations to guarantee that the sparse vector x is recovered. Taking random matrix as measurement matrix, we also discuss the advantage of our condition. Numerical experiments validate that SP and CoSaMP can provide oracle-order recovery performance.
Multigrid direct numerical simulation of the whole process of flow transition in 3-D boundary layers

NASA Technical Reports Server (NTRS)

Liu, Chaoqun; Liu, Zhining

1993-01-01

A new technology was developed in this study which provides a successful numerical simulation of the whole process of flow transition in 3-D boundary layers, including linear growth, secondary instability, breakdown, and transition at relatively low CPU cost. Most other spatial numerical simulations require high CPU cost and blow up at the stage of flow breakdown. A fourth-order finite difference scheme on stretched and staggered grids, a fully implicit time marching technique, a semi-coarsening multigrid based on the so-called approximate line-box relaxation, and a buffer domain for the outflow boundary conditions were all used for high-order accuracy, good stability, and fast convergence. A new fine-coarse-fine grid mapping technique was developed to keep the code running after the laminar flow breaks down. The computational results are in good agreement with linear stability theory, secondary instability theory, and some experiments. The cost for a typical case with 162 x 34 x 34 grid is around 2 CRAY-YMP CPU hours for 10 T-S periods.
Field-scale multi-phase LNAPL remediation: Validating a new computational framework against sequential field pilot trials.

PubMed

Sookhak Lari, Kaveh; Johnston, Colin D; Rayner, John L; Davis, Greg B

2018-03-05

Remediation of subsurface systems, including groundwater, soil and soil gas, contaminated with light non-aqueous phase liquids (LNAPLs) is challenging. Field-scale pilot trials of multi-phase remediation were undertaken at a site to determine the effectiveness of recovery options. Sequential LNAPL skimming and vacuum-enhanced skimming, with and without water table drawdown were trialled over 78days; in total extracting over 5m 3 of LNAPL. For the first time, a multi-component simulation framework (including the multi-phase multi-component code TMVOC-MP and processing codes) was developed and applied to simulate the broad range of multi-phase remediation and recovery methods used in the field trials. This framework was validated against the sequential pilot trials by comparing predicted and measured LNAPL mass removal rates and compositional changes. The framework was tested on both a Cray supercomputer and a cluster. Simulations mimicked trends in LNAPL recovery rates (from 0.14 to 3mL/s) across all remediation techniques each operating over periods of 4-14days over the 78day trial. The code also approximated order of magnitude compositional changes of hazardous chemical concentrations in extracted gas during vacuum-enhanced recovery. The verified framework enables longer term prediction of the effectiveness of remediation approaches allowing better determination of remediation endpoints and long-term risks. Copyright © 2017 Commonwealth Scientific and Industrial Research Organisation. Published by Elsevier B.V. All rights reserved.
Ultraviolet, X-ray, and infrared observations of HDE 226868 equals Cygnus X-1

NASA Technical Reports Server (NTRS)

Treves, A.; Chiappetti, L.; Tanzi, E. G.; Tarenghi, M.; Gursky, H.; Dupree, A. K.; Hartmann, L. W.; Raymond, J.; Davis, R. J.; Black, J.

1980-01-01

During April, May, and July of 1978, HDE 226868, the optical counterpart of Cygnus X-1, was repeatedly observed in the ultraviolet with the IUE satellite. Some X-ray and infrared observations have been made during the same period. The general shape of the spectrum is that expected from a late O supergiant. Strong absorption features are apparent in the ultraviolet, some of which have been identified. The equivalent widths of the most prominent lines appear to be modulated with the orbital phase. This modulation is discussed in terms of the ionization contours calculated by Hatchett and McCray, for a binary X-ray source in the stellar wind of the companion.
High Performance Programming Using Explicit Shared Memory Model on the Cray T3D

NASA Technical Reports Server (NTRS)

Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)

1994-01-01

The Cray T3D is the first-phase system in Cray Research Inc.'s (CRI) three-phase massively parallel processing program. In this report we describe the architecture of the T3D, as well as the CRAFT (Cray Research Adaptive Fortran) programming model, and contrast it with PVM, which is also supported on the T3D We present some performance data based on the NAS Parallel Benchmarks to illustrate both architectural and software features of the T3D.
Networking for large-scale science: infrastructure, provisioning, transport and application mapping

NASA Astrophysics Data System (ADS)

Rao, Nageswara S.; Carter, Steven M.; Wu, Qishi; Wing, William R.; Zhu, Mengxia; Mezzacappa, Anthony; Veeraraghavan, Malathi; Blondin, John M.

2005-01-01

Large-scale science computations and experiments require unprecedented network capabilities in the form of large bandwidth and dynamically stable connections to support data transfers, interactive visualizations, and monitoring and steering operations. A number of component technologies dealing with the infrastructure, provisioning, transport and application mappings must be developed and/or optimized to achieve these capabilities. We present a brief account of the following technologies that contribute toward achieving these network capabilities: (a) DOE UltraScienceNet and NSF CHEETAH network testbeds that provide on-demand and scheduled dedicated network connections; (b) experimental results on transport protocols that achieve close to 100% utilization on dedicated 1Gbps wide-area channels; (c) a scheme for optimally mapping a visualization pipeline onto a network to minimize the end-to-end delays; and (d) interconnect configuration and protocols that provides multiple Gbps flows from Cray X1 to external hosts.
Data Movement Dominates: Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jacob, Bruce L.

Over the past three years in this project, what we have observed is that the primary reason for data movement in large-scale systems is that the per-node capacity is not large enough—i.e., one of the solutions to the data-movement problem (certainly not the only solution that is required, but a significant one nonetheless) is to increase per-node capacity so that inter-node traffic is reduced. This unfortunately is not as simple as it sounds. Today’s main memory systems for datacenters, enterprise computing systems, and supercomputers, fail to provide high per-socket capacity [Dirik & Jacob 2009; Cooper-Balis et al. 2012], except atmore » extremely high price points (factors of 10–100x the cost/bit of consumer main-memory systems) [Stokes 2008]. The reason is that our choice of technology for today’s main memory systems—i.e., DRAM, which we have used as a main-memory technology since the 1970s [Jacob et al. 2007]—can no longer keep up with our needs for density and price per bit. Main memory systems have always been built from the cheapest, densest, lowest-power memory technology available, and DRAM is no longer the cheapest, the densest, nor the lowest-power storage technology out there. It is now time for DRAM to go the way that SRAM went: move out of the way for a cheaper, slower, denser storage technology, and become a cache instead. This inflection point has happened before, in the context of SRAM yielding to DRAM. There was once a time that SRAM was the storage technology of choice for all main memories [Tomasulo 1967; Thornton 1970; Kidder 1981]. However, once DRAM hit volume production in the 1970s and 80s, it supplanted SRAM as a main memory technology because it was cheaper, and it was denser. It also happened to be lower power, but that was not the primary consideration of the day. At the time, it was recognized that DRAM was much slower than SRAM, but it was only at the supercomputer level (For instance the Cray X-MP in the 1980s and its follow-on, the Cray Y-MP, in the 1990s) that could one afford to build ever- larger main memories out of SRAM—the reasoning for moving to DRAM was that an appropriately designed memory hierarchy, built of DRAM as main memory and SRAM as a cache, would approach the performance of SRAM, at the price-per-bit of DRAM [Mashey 1999]. Today it is quite clear that, were one to build an entire multi-gigabyte main memory out of SRAM instead of DRAM, one could improve the performance of almost any computer system by up to an order of magnitude—but this option is not even considered, because to build that system would be prohibitively expensive. It is now time to revisit the same design choice in the context of modern technologies and modern systems. For reasons both technical and economic, we can no longer afford to build ever-larger main memory systems out of DRAM. Flash memory, on the other hand, is significantly cheaper and denser than DRAM and therefore should take its place. While it is true that flash is significantly slower than DRAM, one can afford to build much larger main memories out of flash than out of DRAM, and we show that an appropriately designed memory hierarchy, built of flash as main memory and DRAM as a cache, will approach the performance of DRAM, at the price-per-bit of flash. In our studies as part of this project, we have investigated Non-Volatile Main Memory (NVMM), a new main-memory architecture for large-scale computing systems, one that is specifically designed to address the weaknesses described previously. In particular, it provides the following features: non-volatility: The bulk of the storage is comprised of NAND flash, and in this organization DRAM is used only as a cache, not as main memory. Furthermore, the flash is journaled, which means that operations such as checkpoint/restore are already built into the system. 1+ terabytes of storage per socket: SSDs and DRAM DIMMs have roughly the same form factor (several square inches of PCB surface area), and terabyte SSDs are now commonplace. performance approaching that of DRAM: DRAM is used as a cache to the flash system. price-per-bit approaching that of NAND: Flash is currently well under $0.50 per gigabyte; DDR3 SDRAM is currently just over $10 per gigabyte [Newegg 2014]. Even today, one can build an easily affordable main memory system with a terabyte or more of NAND storage per CPU socket (which would be extremely expensive were one to use DRAM), and our cycle- accurate, full-system experiments show that this can be done at a performance point that lies within a factor of two of DRAM.« less
Large-Scale Parallel Viscous Flow Computations using an Unstructured Multigrid Algorithm

NASA Technical Reports Server (NTRS)

Mavriplis, Dimitri J.

1999-01-01

The development and testing of a parallel unstructured agglomeration multigrid algorithm for steady-state aerodynamic flows is discussed. The agglomeration multigrid strategy uses a graph algorithm to construct the coarse multigrid levels from the given fine grid, similar to an algebraic multigrid approach, but operates directly on the non-linear system using the FAS (Full Approximation Scheme) approach. The scalability and convergence rate of the multigrid algorithm are examined on the SGI Origin 2000 and the Cray T3E. An argument is given which indicates that the asymptotic scalability of the multigrid algorithm should be similar to that of its underlying single grid smoothing scheme. For medium size problems involving several million grid points, near perfect scalability is obtained for the single grid algorithm, while only a slight drop-off in parallel efficiency is observed for the multigrid V- and W-cycles, using up to 128 processors on the SGI Origin 2000, and up to 512 processors on the Cray T3E. For a large problem using 25 million grid points, good scalability is observed for the multigrid algorithm using up to 1450 processors on a Cray T3E, even when the coarsest grid level contains fewer points than the total number of processors.
Modeling Subsurface Reactive Flows Using Leadership-Class Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mills, Richard T; Hammond, Glenn; Lichtner, Peter

2009-01-01

We describe our experiences running PFLOTRAN - a code for simulation of coupled hydro-thermal-chemical processes in variably saturated, non-isothermal, porous media - on leadership-class supercomputers, including initial experiences running on the petaflop incarnation of Jaguar, the Cray XT5 at the National Center for Computational Sciences at Oak Ridge National Laboratory. PFLOTRAN utilizes fully implicit time-stepping and is built on top of the Portable, Extensible Toolkit for Scientific Computation (PETSc). We discuss some of the hurdles to 'at scale' performance with PFLOTRAN and the progress we have made in overcoming them on leadership-class computer architectures.
Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel

Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends

DOE PAGES

Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel; ...

2017-03-08

Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
How Forcefully Should Universities Enforce Copyright Law on Audio Files?

ERIC Educational Resources Information Center

McCollum, Kelly

1999-01-01

The Recording Industry Association of America is aggressively pursuing copyright violations on campuses concerning MP3 music recordings being exchanged on computer networks. Carnegie Mellon University (Pennsylvania), to avoid litigation, has been searching public folders of students' computers to find illegally copied MP3s. Controversy over…
Automatic Multilevel Parallelization Using OpenMP

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

2002-01-01

In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.
Study of the interaction of 6-mercaptopurine with protein by microdialysis coupled with LC and electrochemical detection based on functionalized multi-wall carbon nanotubes modified electrode.

PubMed

Cao, Xu-Ni; Lin, Li; Zhou, Yu-Yan; Zhang, Wen; Shi, Guo-Yue; Yamamoto, Katsunobu; Jin, Li-Tong

2003-07-14

Microdialysis sampling coupled with liquid chromatography and electrochemical detection (LC-ECD) was developed and applied to study the interaction of 6-Mercaptopurine (6-MP) with bovine serum albumin (BSA). In the LC-ECD, the multi-wall carbon nanotubes fuctionalized with carboxylic groups modified electrode (MWNT-COOH CME) was used as the working electrode for the determination of 6-MP. The results indicated that this chemically modified electrode (CME) exhibited efficiently electrocatalytic oxidation for 6-MP with relatively high sensitivity, stability and long-life. The peak currents of 6-MP were linear to its concentrations ranging from 4.0 x 10(-7) to 1.0 x 10(-4) mol l(-1) with the calculated detection limit (S/N = 3) of 2.0 x 10(-7) mol l(-1). The method had been successfully applied to assess the association constant (K) and the number of the binding sites (n) on a BSA molecular, which calculated by Scatchard equation, were 3.97 x 10(3) mol(-1) l and 1.51, respectively. This method provided a fast, sensible and simple technique for the study of drug-protein interactions.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel W.

Coupled-cluster methods provide highly accurate models of molecular structure by explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix-matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy efficient manner. We achieve up to 240 speedup compared with the best optimized shared memory implementation. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures, (Cray XC30&XC40, BlueGene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance. Nevertheless, we preserve a uni ed interface to both programming models to maintain the productivity of computational quantum chemists.« less
S-HARP: A parallel dynamic spectral partitioner

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sohn, A.; Simon, H.

1998-01-01

Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. The authors present in this report a fast parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP partitions a graph from scratch, requiring no partition information from previous iterations. Two types of parallelism have been exploited in S-HARP, fine grain loop level parallelism and coarse grain recursive parallelism. The parallel partitioner has been implemented in Messagemore » Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.2 seconds on a 64 processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 15 fold speedup on 64 processors while ParaMeTiS1.0 gives a few fold speedup. Experimental results demonstrate that S-HARP is three to 10 times faster than the dynamic partitioners ParaMeTiS and Jostle on six computational meshes of size over 100,000 vertices.« less
Derivation of general analytic gradient expressions for density-fitted post-Hartree-Fock methods: An efficient implementation for the density-fitted second-order Møller–Plesset perturbation theory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bozkaya, Uğur, E-mail: ugur.bozkaya@atauni.edu.tr

General analytic gradient expressions (with the frozen-core approximation) are presented for density-fitted post-HF methods. An efficient implementation of frozen-core analytic gradients for the second-order Møller–Plesset perturbation theory (MP2) with the density-fitting (DF) approximation (applying to both reference and correlation energies), which is denoted as DF-MP2, is reported. The DF-MP2 method is applied to a set of alkanes, conjugated dienes, and noncovalent interaction complexes to compare the computational cost of single point analytic gradients with MP2 with the resolution of the identity approach (RI-MP2) [F. Weigend and M. Häser, Theor. Chem. Acc. 97, 331 (1997); R. A. Distasio, R. P. Steele,more » Y. M. Rhee, Y. Shao, and M. Head-Gordon, J. Comput. Chem. 28, 839 (2007)]. In the RI-MP2 method, the DF approach is used only for the correlation energy. Our results demonstrate that the DF-MP2 method substantially accelerate the RI-MP2 method for analytic gradient computations due to the reduced input/output (I/O) time. Because in the DF-MP2 method the DF approach is used for both reference and correlation energies, the storage of 4-index electron repulsion integrals (ERIs) are avoided, 3-index ERI tensors are employed instead. Further, as in case of integrals, our gradient equation is completely avoid construction or storage of the 4-index two-particle density matrix (TPDM), instead we use 2- and 3-index TPDMs. Hence, the I/O bottleneck of a gradient computation is significantly overcome. Therefore, the cost of the generalized-Fock matrix (GFM), TPDM, solution of Z-vector equations, the back transformation of TPDM, and integral derivatives are substantially reduced when the DF approach is used for the entire energy expression. Further application results show that the DF approach introduce negligible errors for closed-shell reaction energies and equilibrium bond lengths.« less
Design, Implementation, and Characterization of a Dedicated Breast Computed Mammo Tomography System for Enhanced Lesion Imaging

DTIC Science & Technology

2007-03-01

common FOV of each system. 64 SPECT System Our current emission tomography system uses a compact 16x20cm 2 field of view Cadmium Zinc Telluride (CZT...Brzymialkiewicz, M.P. Tornai, R.L. McKinley, J.E. Bowsher. “Evaluation of Fully 3D Emission Mammotomography with a Compact Cadmium Zinc Telluride Detector...conclusions. Stacks of breast tissue equivalent plates, each 2.0cm thick (CIRS Inc., Norfolk, VA) having either 100% glandular or 100% adipose composition
Predicting vapor liquid equilibria using density functional theory: A case study of argon

NASA Astrophysics Data System (ADS)

Goel, Himanshu; Ling, Sanliang; Ellis, Breanna Nicole; Taconi, Anna; Slater, Ben; Rai, Neeraj

2018-06-01

Predicting vapor liquid equilibria (VLE) of molecules governed by weak van der Waals (vdW) interactions using the first principles approach is a significant challenge. Due to the poor scaling of the post Hartree-Fock wave function theory with system size/basis functions, the Kohn-Sham density functional theory (DFT) is preferred for systems with a large number of molecules. However, traditional DFT cannot adequately account for medium to long range correlations which are necessary for modeling vdW interactions. Recent developments in DFT such as dispersion corrected models and nonlocal van der Waals functionals have attempted to address this weakness with a varying degree of success. In this work, we predict the VLE of argon and assess the performance of several density functionals and the second order Møller-Plesset perturbation theory (MP2) by determining critical and structural properties via first principles Monte Carlo simulations. PBE-D3, BLYP-D3, and rVV10 functionals were used to compute vapor liquid coexistence curves, while PBE0-D3, M06-2X-D3, and MP2 were used for computing liquid density at a single state point. The performance of the PBE-D3 functional for VLE is superior to other functionals (BLYP-D3 and rVV10). At T = 85 K and P = 1 bar, MP2 performs well for the density and structural features of the first solvation shell in the liquid phase.
Ab Initio Theoretical Studies on the Kinetics of Hydrogen Abstraction Type Reactions of Hydroxyl Radicals with CH3CCl2F and CH3CClF2

NASA Astrophysics Data System (ADS)

Saheb, Vahid; Maleki, Samira

2018-03-01

The hydrogen abstraction reactions from CH3Cl2F (R-141b) and CH3CClF2 (R-142b) by OH radicals are studied theoretically by semi-classical transition state theory. The stationary points for the reactions are located by using KMLYP density functional method along with 6-311++G(2 d,2 p) basis set and MP2 method along with 6-311+G( d, p) basis set. Single-point energy calculations are performed by the CBS-Q and G4 combination methods on the geometries optimized at the KMLYP/6-311++G(2 d,2 p) level of theory. Vibrational anharmonicity coefficients, x ij , which are needed for semi-classical transition state theory calculations, are computed at the KMLYP/6-311++G(2 d,2 p) and MP2/6-311+G( d, p) levels of theory. The computed barrier heights are slightly sensitive to the quantum-chemical method. Thermal rate coefficients are computed over the temperature range from 200 to 2000 K and they are shown to be in accordance with available experimental data. On the basis of the computed rate coefficients, the tropospheric lifetime of the CH3CCl2F and CH3CClF2 are estimated to be about 6.5 and 12.0 years, respectively.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thorson, L.D.

A description is given of a new version of the TRUMP (UCRL-14754) computer code, NOTRUMP, which runs on both the CDC-7600 and CRAY-1. There are slight differences in the input and major changes in output capability. A postprocessor, AFTER, is available to manipulate some of the new output features. Old data decks for TRUMP will normally run with only minor changes.
Flux Pinning Enhancement in YBa2Cu3O7-x Films for Coated Conductor Applications (Postprint)

DTIC Science & Technology

2010-01-01

YBa2Cu3O7–x Films for Coated Conductor Applications Maiorov , B. , Civale , L. , Lin , Y. , Hawley , M.E. , Maley , M.P. , and Peterson , D.E...L. , Maiorov , B. , Hawley , M.E. , Maley , M.P. , and Peterson , D.E. ( 2004 ) Nat. Mater. , 3 , 439 . 30 Kang , S. , Goyal...1864 . 47 Civale , L. , Maiorov , B. , Serquis , A. , Willis , J.O. , Coulter , J.Y. , Wang , H. , Jia , Q.X. , Arendt , P.N
Arsenate tolerance mechanism of Oenothera odorata from a mine population involves the induction of phytochelatins in roots.

PubMed

Kim, Dae-Yeon; Park, Hyun; Lee, Sang-Hwan; Koo, Namin; Kim, Jeong-Gyu

2009-04-01

We investigated the arsenate tolerance mechanisms of Oenothera odorata by comparing two populations [i.e., one population from the mine site (MP) and the other population from an uncontaminated site (UP)] via the exposure of hydroponic solution containing arsenate (i.e., 0-50 microM). The MP plants were significantly more tolerant to arsenate than UP plants. The UP plants accumulated more As in their shoots and roots than did the MP plants. The UP plants translocated up to 21 microg g(-1) of As into shoots, whereas MP plants translocated less As (up to 4.5 microg g(-1)) to shoots over all treatments. The results of lipid peroxidation indicated that MP plants were less damaged by oxidative stress than were UP plants. Phytochelatin (PC) content correlated linearly with root As concentration in the MP (i.e., [PCs](root)=1.69x[As](root), r(2)=0.945) and UP (i.e., [PCs](root)=0.89x[As](root), r(2)=0.979) plants. This relationship means that increased PC to As ratio may be associated with increased tolerance. Our results suggest that PC induction in roots plays a critical role in As tolerance of O. odorata.
POMESH - DIFFRACTION ANALYSIS OF REFLECTOR ANTENNAS

NASA Technical Reports Server (NTRS)

Hodges, R. E.

1994-01-01

POMESH is a computer program capable of predicting the performance of reflector antennas. Both far field pattern and gain calculations are performed using the Physical Optics (PO) approximation of the equivalent surface currents. POMESH is primarily intended for relatively small reflectors. It is useful in situations where the surface is described by irregular data that must be interpolated and for cases where the surface derivatives are not known. This method is flexible and robust and also supports near field calculations. Because of the near field computation ability, this computational engine is quite useful for subreflector computations. The program is constructed in a highly modular form so that it may be readily adapted to perform tasks other than the one that is explicitly described here. Since the computationally intensive portions of the algorithm are simple loops, the program can be easily adapted to take advantage of vector processor and parallel architectures. In POMESH the reflector is represented as a piecewise planar surface comprised of triangular regions known as facets. A uniform physical optics (PO) current is assumed to exist on each triangular facet. Then, the PO integral on a facet is approximated by the product of the PO current value at the center and the area of the triangle. In this way, the PO integral over the reflector surface is reduced to a summation of the contribution from each triangular facet. The source horn, or feed, that illuminates the subreflector is approximated by a linear combination of plane patterns. POMESH contains three polarization pattern definitions for the feed; a linear x-polarized element, linear y-polarized element, and a circular polarized element. If a more general feed pattern is required, it is a simple matter to replace the subroutine that implements the pattern definitions. POMESH obtains information necessary to specify the coordinate systems, location of other data files, and parameters of the desired calculation from a user provided data file. A numerical description of the principle plane patterns of the source horn must also be provided. The program is supplied with an analytically defined parabolic reflector surface. However, it is a simple matter to replace it with a user defined reflector surface. Output is given in the form of a data stream to the terminal; a summary of the parameters used in the computation and some sample results in a file; and a data file of the results of the pattern calculations suitable for plotting. POMESH is written in FORTRAN 77 for execution on CRAY series computers running UNICOS. With minor modifications, it has also been successfully implemented on a Sun4 series computer running SunOS, a DEC VAX series computer running VMS, and an IBM PC series computer running OS/2. It requires 2.5Mb of RAM under SunOS 4.1.1, 2.5Mb of RAM under VMS 5-4.3, and 2.5Mb of RAM under OS/2. The OS/2 version requires the Lahey F77L compiler. The standard distribution medium for this program is one 5.25 inch 360K MS-DOS format diskette. It is also available on a .25 inch streaming magnetic tape cartridge in UNIX tar format and a 9-track 1600 BPI magnetic tape in DEC VAX FILES-11 format. POMESH was developed in 1989 and is a copyrighted work with all copyright vested in NASA. CRAY and UNICOS are registered trademarks of Cray Research, Inc. SunOS and Sun4 are trademarks of Sun Microsystems, Inc. DEC, DEC FILES-11, VAX and VMS are trademarks of Digital Equipment Corporation. IBM PC and OS/2 are registered trademarks of International Business Machines, Inc. UNIX is a registered trademark of Bell Laboratories.
Model potentials for main group elements Li through Rn

NASA Astrophysics Data System (ADS)

Sakai, Yoshiko; Miyoshi, Eisaku; Klobukowski, Mariusz; Huzinaga, Sigeru

1997-05-01

Model potential (MP) parameters and valence basis sets were systematically determined for the main group elements Li through Rn. For alkali and alkaline-earth metal atoms, the outermost core (n-1)p electrons were treated explicitly together with the ns valence electrons. For the remaining atoms, only the valence ns and np electrons were treated explicitly. The major relativistic effects at the level of Cowan and Griffin's quasi-relativistic Hartree-Fock method (QRHF) were incorporated in the MPs for all atoms heavier than Kr. The valence orbitals thus obtained have inner nodal structure. The reliability of the MP method was tested in calculations for X-, X, and X+ (X=Br, I, and At) at the SCF level and the results were compared with the corresponding values given by the numerical HF (or QRHF) calculations. Calculations that include electron correlation were done for X-, X, and X+ (X=Cl and Br) at the SDCI level and for As2 at the CASSCF and MRSDCI levels. These results were compared with those of all-electron (AE) calculations using the well-tempered basis sets. Close agreement between the MP and AE results was obtained at all levels of the treatment.
Parallel k-means++

DOE Office of Scientific and Technical Information (OSTI.GOV)

A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique.more » We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.« less
Orbital-Optimized MP3 and MP2.5 with Density-Fitting and Cholesky Decomposition Approximations.

PubMed

Bozkaya, Uğur

2016-03-08

Efficient implementations of the orbital-optimized MP3 and MP2.5 methods with the density-fitting (DF-OMP3 and DF-OMP2.5) and Cholesky decomposition (CD-OMP3 and CD-OMP2.5) approaches are presented. The DF/CD-OMP3 and DF/CD-OMP2.5 methods are applied to a set of alkanes to compare the computational cost with the conventional orbital-optimized MP3 (OMP3) [Bozkaya J. Chem. Phys. 2011, 135, 224103] and the orbital-optimized MP2.5 (OMP2.5) [Bozkaya and Sherrill J. Chem. Phys. 2014, 141, 204105]. Our results demonstrate that the DF-OMP3 and DF-OMP2.5 methods provide considerably lower computational costs than OMP3 and OMP2.5. Further application results show that the orbital-optimized methods are very helpful for the study of open-shell noncovalent interactions, aromatic bond dissociation energies, and hydrogen transfer reactions. We conclude that the DF-OMP3 and DF-OMP2.5 methods are very promising for molecular systems with challenging electronic structures.
Comparison of Implicit Collocation Methods for the Heat Equation

NASA Technical Reports Server (NTRS)

Kouatchou, Jules; Jezequel, Fabienne; Zukor, Dorothy (Technical Monitor)

2001-01-01

We combine a high-order compact finite difference scheme to approximate spatial derivatives arid collocation techniques for the time component to numerically solve the two dimensional heat equation. We use two approaches to implement the collocation methods. The first one is based on an explicit computation of the coefficients of polynomials and the second one relies on differential quadrature. We compare them by studying their merits and analyzing their numerical performance. All our computations, based on parallel algorithms, are carried out on the CRAY SV1.
Opening Remarks: SciDAC 2007

NASA Astrophysics Data System (ADS)

Strayer, Michael

2007-09-01

Good morning. Welcome to Boston, the home of the Red Sox, Celtics and Bruins, baked beans, tea parties, Robert Parker, and SciDAC 2007. A year ago I stood before you to share the legacy of the first SciDAC program and identify the challenges that we must address on the road to petascale computing—a road E E Cummins described as `. . . never traveled, gladly beyond any experience.' Today, I want to explore the preparations for the rapidly approaching extreme scale (X-scale) generation. These preparations are the first step propelling us along the road of burgeoning scientific discovery enabled by the application of X- scale computing. We look to petascale computing and beyond to open up a world of discovery that cuts across scientific fields and leads us to a greater understanding of not only our world, but our universe. As part of the President's America Competitiveness Initiative, the ASCR Office has been preparing a ten year vision for computing. As part of this planning the LBNL together with ORNL and ANL hosted three town hall meetings on Simulation and Modeling at the Exascale for Energy, Ecological Sustainability and Global Security (E3). The proposed E3 initiative is organized around four programmatic themes: Engaging our top scientists, engineers, computer scientists and applied mathematicians; investing in pioneering large-scale science; developing scalable analysis algorithms, and storage architectures to accelerate discovery; and accelerating the build-out and future development of the DOE open computing facilities. It is clear that we have only just started down the path to extreme scale computing. Plan to attend Thursday's session on the out-briefing and discussion of these meetings. The road to the petascale has been at best rocky. In FY07, the continuing resolution provided 12% less money for Advanced Scientific Computing than either the President, the Senate, or the House. As a consequence, many of you had to absorb a no cost extension for your SciDAC work. I am pleased that the President's FY08 budget restores the funding for SciDAC. Quoting from Advanced Scientific Computing Research description in the House Energy and Water Development Appropriations Bill for FY08, "Perhaps no other area of research at the Department is so critical to sustaining U.S. leadership in science and technology, revolutionizing the way science is done and improving research productivity." As a society we need to revolutionize our approaches to energy, environmental and global security challenges. As we go forward along the road to the X-scale generation, the use of computation will continue to be a critical tool along with theory and experiment in understanding the behavior of the fundamental components of nature as well as for fundamental discovery and exploration of the behavior of complex systems. The foundation to overcome these societal challenges will build from the experiences and knowledge gained as you, members of our SciDAC research teams, work together to attack problems at the tera- and peta- scale. If SciDAC is viewed as an experiment for revolutionizing scientific methodology, then a strategic goal of ASCR program must be to broaden the intellectual base prepared to address the challenges of the new X-scale generation of computing. We must focus our computational science experiences gained over the past five years on the opportunities introduced with extreme scale computing. Our facilities are on a path to provide the resources needed to undertake the first part of our journey. Using the newly upgraded 119 teraflop Cray XT system at the Leadership Computing Facility, SciDAC research teams have in three days performed a 100-year study of the time evolution of the atmospheric CO2 concentration originating from the land surface. The simulation of the El Nino/Southern Oscillation which was part of this study has been characterized as `the most impressive new result in ten years' gained new insight into the behavior of superheated ionic gas in the ITER reactor as a result of an AORSA run on 22,500 processors that achieved over 87 trillion calculations per second (87 teraflops) which is 74% of the system's theoretical peak. Tomorrow, Argonne and IBM will announce that the first IBM Blue Gene/P, a 100 teraflop system, will be shipped to the Argonne Leadership Computing Facility later this fiscal year. By the end of FY2007 ASCR high performance and leadership computing resources will include the 114 teraflop IBM Blue Gene/P; a 102 teraflop Cray XT4 at NERSC and a 119 teraflop Cray XT system at Oak Ridge. Before ringing in the New Year, Oak Ridge will upgrade to 250 teraflops with the replacement of the dual core processors with quad core processors and Argonne will upgrade to between 250-500 teraflops, and next year, a petascale Cray Baker system is scheduled for delivery at Oak Ridge. The multidisciplinary teams in our SciDAC Centers for Enabling Technologies and our SciDAC Institutes must continue to work with our Scientific Application teams to overcome the barriers that prevent effective use of these new systems. These challenges include: the need for new algorithms as well as operating system and runtime software and tools which scale to parallel systems composed of hundreds of thousands processors; program development environments and tools which scale effectively and provide ease of use for developers and scientific end users; and visualization and data management systems that support moving, storing, analyzing, manipulating and visualizing multi-petabytes of scientific data and objects. The SciDAC Centers, located primarily at our DOE national laboratories will take the lead in ensuring that critical computer science and applied mathematics issues are addressed in a timely and comprehensive fashion and to address issues associated with research software lifecycle. In contrast, the SciDAC Institutes, which are university-led centers of excellence, will have more flexibility to pursue new research topics through a range of research collaborations. The Institutes will also work to broaden the intellectual and researcher base—conducting short courses and summer schools to take advantage of new high performance computing capabilities. The SciDAC Outreach Center at Lawrence Berkeley National Laboratory complements the outreach efforts of the SciDAC Institutes. The Outreach Center is our clearinghouse for SciDAC activities and resources and will communicate with the high performance computing community in part to understand their needs for workshops, summer schools and institutes. SciDAC is not ASCR's only effort to broaden the computational science community needed to meet the challenges of the new X-scale generation. I hope that you were able to attend the Computational Science Graduate Fellowship poster session last night. ASCR developed the fellowship in 1991 to meet the nation's growing need for scientists and technology professionals with advanced computer skills. CSGF, now jointly funded between ASCR and NNSA, is more than a traditional academic fellowship. It has provided more than 200 of the best and brightest graduate students with guidance, support and community in preparing them as computational scientists. Today CSGF alumni are bringing their diverse top-level skills and knowledge to research teams at DOE laboratories and in industries such as Proctor and Gamble, Lockheed Martin and Intel. At universities they are working to train the next generation of computational scientists. To build on this success, we intend to develop a wholly new Early Career Principal Investigator's (ECPI) program. Our objective is to stimulate academic research in scientific areas within ASCR's purview especially among faculty in early stages of their academic careers. Last February, we lost Ken Kennedy, one of the leading lights of our community. As we move forward into the extreme computing generation, his vision and insight will be greatly missed. In memorial to Ken Kennedy, we shall designate the ECPI grants to beginning faculty in Computer Science as the Ken Kennedy Fellowship. Watch the ASCR website for more information about ECPI and other early career programs in the computational sciences. We look to you, our scientists, researchers, and visionaries to take X-scale computing and use it to explode scientific discovery in your fields. We at SciDAC will work to ensure that this tool is the sharpest and most precise and efficient instrument to carve away the unknown and reveal the most exciting secrets and stimulating scientific discoveries of our time. The partnership between research and computing is the marriage that will spur greater discovery, and as Spencer said to Susan in Robert Parker's novel, `Sudden Mischief', `We stick together long enough, and we may get as smart as hell'. Michael Strayer
Parallel peak pruning for scalable SMP contour tree computation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carr, Hamish A.; Weber, Gunther H.; Sewell, Christopher M.

As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this formmore » of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.« less
Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster

NASA Technical Reports Server (NTRS)

Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland

2003-01-01

In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

NASA Technical Reports Server (NTRS)

Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
A heterogeneous computing environment for simulating astrophysical fluid flows

NASA Technical Reports Server (NTRS)

Cazes, J.

1994-01-01

In the Concurrent Computing Laboratory in the Department of Physics and Astronomy at Louisiana State University we have constructed a heterogeneous computing environment that permits us to routinely simulate complicated three-dimensional fluid flows and to readily visualize the results of each simulation via three-dimensional animation sequences. An 8192-node MasPar MP-1 computer with 0.5 GBytes of RAM provides 250 MFlops of execution speed for our fluid flow simulations. Utilizing the parallel virtual machine (PVM) language, at periodic intervals data is automatically transferred from the MP-1 to a cluster of workstations where individual three-dimensional images are rendered for inclusion in a single animation sequence. Work is underway to replace executions on the MP-1 with simulations performed on the 512-node CM-5 at NCSA and to simultaneously gain access to more potent volume rendering workstations.
Effects of the bilayer nano-hydroxyapatite/mineralized collagen-guided bone regeneration membrane on site preservation in dogs.

PubMed

Sun, Yi; Wang, Chengyue; Chen, Qixin; Liu, Hai; Deng, Chao; Ling, Peixue; Cui, Fu-Zhai

2017-08-01

This study was aimed at assessing the effects of the porous mineralized collagen plug with or without the bilayer mineralized collagen-guided bone regeneration membrane on alveolar ridge preservation in dogs. The third premolars in the bilateral maxilla of mongrel dogs ( N = 12) were extracted. Twenty-four alveolar sockets were thus randomly divided into three groups: membrane + collagen plug (MP, n = 8), nonmembrane + collagen plug (NP, n = 8) and blank group without any implantation (BG, n = 8). Radiographic assessment was carried out immediately and in the 2nd, 6th, and 12th week after surgery. The bone-repairing effects of the two grafts were respectively evaluated by clinical observation, X-ray micro-computed tomography examination, and histological analysis in the 8th and 12th week after surgery. Three groups presented excellent osseointegration without any inflammation or dehiscence. X-ray micro-computed tomography and histological assessment indicated that the ratios of new bone formation of MP group were significantly higher than those of NP group and BG group in the 8th and 12th week after surgery ( P < 0.05). As a result, the porous mineralized collagen plug with or without the bilayer mineralized collagen-guided bone regeneration membrane could reduce the absorption of alveolar ridge compared to BG group, and the combined use of porous mineralized collagen plug and bilayer mineralized collagen-guided bone regeneration could further improve the activity of bone regeneration.
Electronic and optical response of Cr-doped MoSe2 and WSe2: Compton measurements and first-principles strategies

NASA Astrophysics Data System (ADS)

Kumar, Kishor; Heda, N. L.; Jani, A. R.; Ahuja, B. L.

2017-08-01

In this paper, we present energy bands, density of states and Mulliken's population (MP) data using the linear combination of atomic orbitals (LCAO) method. To compare the theoretical momentum densities, we have also employed 100 mCi 241Am Compton spectrometer to measure the Compton profiles of Cr0.5X0.5Se2 (X=Mo and W). The experimental Compton data have been used to check the performance of various exchange and correlation energies for the present mixed dichalcogenides within the LCAO scheme. It is seen that CPs based on the hybridization of Hartree-Fock and density functional theory give a better agreement with the experimental data than other schemes employed in the present investigations. All theoretical approximations show an indirect band gap between the Γ and K points of the Brillouin zone. Further, equal-valence-electron-density scaled experimental data predict a more ionic character in Cr0.5W0.5Se2 than in Cr0.5Mo0.5Se2, which is in tune with our MP data. Going beyond the computation of electronic properties using LCAO, we have also reported accurate electronic and optical properties using the modified Becke-Johnson (mBJ) potential within the full potential augmented plane wave (FP-LAPW) method. Optical properties computed using the FP-LAPW-mBJ method show the feasibility of using both the mixed dichalcogenides in photovoltaic devices.
Assessment of Orbital-Optimized MP2.5 for Thermochemistry and Kinetics: Dramatic Failures of Standard Perturbation Theory Approaches for Aromatic Bond Dissociation Energies and Barrier Heights of Radical Reactions.

PubMed

Soydaş, Emine; Bozkaya, Uğur

2015-04-14

An assessment of orbital-optimized MP2.5 (OMP2.5) [ Bozkaya, U.; Sherrill, C. D. J. Chem. Phys. 2014, 141, 204105 ] for thermochemistry and kinetics is presented. The OMP2.5 method is applied to closed- and open-shell reaction energies, barrier heights, and aromatic bond dissociation energies. The performance of OMP2.5 is compared with that of the MP2, OMP2, MP2.5, MP3, OMP3, CCSD, and CCSD(T) methods. For most of the test sets, the OMP2.5 method performs better than MP2.5 and CCSD, and provides accurate results. For barrier heights of radical reactions and aromatic bond dissociation energies OMP2.5-MP2.5, OMP2-MP2, and OMP3-MP3 differences become obvious. Especially, for aromatic bond dissociation energies, standard perturbation theory (MP) approaches dramatically fail, providing mean absolute errors (MAEs) of 22.5 (MP2), 17.7 (MP2.5), and 12.8 (MP3) kcal mol(-1), while the MAE values of the orbital-optimized counterparts are 2.7, 2.4, and 2.4 kcal mol(-1), respectively. Hence, there are 5-8-folds reductions in errors when optimized orbitals are employed. Our results demonstrate that standard MP approaches dramatically fail when the reference wave function suffers from the spin-contamination problem. On the other hand, the OMP2.5 method can reduce spin-contamination in the unrestricted Hartree-Fock (UHF) initial guess orbitals. For overall evaluation, we conclude that the OMP2.5 method is very helpful not only for challenging open-shell systems and transition-states but also for closed-shell molecules. Hence, one may prefer OMP2.5 over MP2.5 and CCSD as an O(N(6)) method, where N is the number of basis functions, for thermochemistry and kinetics. The cost of the OMP2.5 method is comparable with that of CCSD for energy computations. However, for analytic gradient computations, the OMP2.5 method is only half as expensive as CCSD.
Competition of hydrogen bonds and halogen bonds in complexes of hypohalous acids with nitrogenated bases.

PubMed

Alkorta, Ibon; Blanco, Fernando; Solimannejad, Mohammad; Elguero, Jose

2008-10-30

A theoretical study of the complexes formed by hypohalous acids (HOX, X = F, Cl, Br, I, and At) with three nitrogenated bases (NH 3, N 2, and NCH) has been carried out by means of ab initio methods, up to MP2/aug-cc-pVTZ computational method. In general, two minima complexes are found, one with an OH...N hydrogen bond and the other one with a X...N halogen bond. While the first one is more stable for the smallest halogen derivatives, the two complexes present similar stabilities for the iodine case and the halogen-bonded structure is the most stable one for the hypoastatous acid complexes.
Three-dimensional multigrid Navier-Stokes computations for turbomachinery applications

NASA Astrophysics Data System (ADS)

Subramanian, S. V.

1989-07-01

The fully three-dimensional, time-dependent compressible Navier-Stokes equations in cylindrical coordinates are presently used, in conjunction with the multistage Runge-Kutta numerical integration scheme for solution of the governing flow equations, to simulate complex flowfields within turbomechanical components whose pertinent effects encompass those of viscosity, compressibility, blade rotation, and tip clearance. Computed results are presented for selected cascades, emphasizing the code's capabilities in the accurate prediction of such features as airfoil loadings, exit flow angles, shocks, and secondary flows. Computations for several test cases have been performed on a Cray-YMP, using nearly 90,000 grid points.
1993 Gordon Bell Prize Winners

NASA Technical Reports Server (NTRS)

Karp, Alan H.; Simon, Horst; Heller, Don; Cooper, D. M. (Technical Monitor)

1994-01-01

The Gordon Bell Prize recognizes significant achievements in the application of supercomputers to scientific and engineering problems. In 1993, finalists were named for work in three categories: (1) Performance, which recognizes those who solved a real problem in the quickest elapsed time. (2) Price/performance, which encourages the development of cost-effective supercomputing. (3) Compiler-generated speedup, which measures how well compiler writers are facilitating the programming of parallel processors. The winners were announced November 17 at the Supercomputing 93 conference in Portland, Oregon. Gordon Bell, an independent consultant in Los Altos, California, is sponsoring $2,000 in prizes each year for 10 years to promote practical parallel processing research. This is the sixth year of the prize, which Computer administers. Something unprecedented in Gordon Bell Prize competition occurred this year: A computer manufacturer was singled out for recognition. Nine entries reporting results obtained on the Cray C90 were received, seven of the submissions orchestrated by Cray Research. Although none of these entries showed sufficiently high performance to win outright, the judges were impressed by the breadth of applications that ran well on this machine, all nine running at more than a third of the peak performance of the machine.
Vectorization of a particle simulation method for hypersonic rarefied flow

NASA Technical Reports Server (NTRS)

Mcdonald, Jeffrey D.; Baganoff, Donald

1988-01-01

An efficient particle simulation technique for hypersonic rarefied flows is presented at an algorithmic and implementation level. The implementation is for a vector computer architecture, specifically the Cray-2. The method models an ideal diatomic Maxwell molecule with three translational and two rotational degrees of freedom. Algorithms are designed specifically for compatibility with fine grain parallelism by reducing the number of data dependencies in the computation. By insisting on this compatibility, the method is capable of performing simulation on a much larger scale than previously possible. A two-dimensional simulation of supersonic flow over a wedge is carried out for the near-continuum limit where the gas is in equilibrium and the ideal solution can be used as a check on the accuracy of the gas model employed in the method. Also, a three-dimensional, Mach 8, rarefied flow about a finite-span flat plate at a 45 degree angle of attack was simulated. It utilized over 10 to the 7th particles carried through 400 discrete time steps in less than one hour of Cray-2 CPU time. This problem was chosen to exhibit the capability of the method in handling a large number of particles and a true three-dimensional geometry.

Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster

NASA Technical Reports Server (NTRS)

Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)

2002-01-01

In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Assessment of Orbital-Optimized Third-Order Møller-Plesset Perturbation Theory and Its Spin-Component and Spin-Opposite Scaled Variants for Thermochemistry and Kinetics.

PubMed

Soydaş, Emine; Bozkaya, Uğur

2013-03-12

An assessment of the OMP3 method and its spin-component and spin-scaled variants for thermochemistry and kinetics is presented. For reaction energies of closed-shell systems, the CCSD, SCS-MP3, and SCS-OMP3 methods show better performances than other considered methods, and no significant improvement is observed due to orbital optimization. For barrier heights, OMP3 and SCS-OMP3 provide the lowest mean absolute deviations. The MP3 method yields considerably higher errors, and the spin scaling approaches do not help to improve upon MP3, but worsen it. For radical stabilization energies, the CCSD, OMP3, and SCS-OMP3 methods exhibit noticeably better performances than MP3 and its variants. Our results demonstrate that if the reference wave function suffers from a spin-contamination, then the MP3 methods dramatically fail. On the other hand, the OMP3 method and its variants can tolerate the spin-contamination in the reference wave function. For overall evaluation, we conclude that OMP3 is quite helpful, especially in electronically challenged systems, such as free radicals or transition states where spin contamination dramatically deteriorates the quality of the canonical MP3 and SCS-MP3 methods. Both OMP3 and CCSD methods scale as n(6), where n is the number of basis functions. However, the OMP3 method generally converges in much fewer iterations than CCSD. In practice, OMP3 is several times faster than CCSD in energy computations. Further, the stationary properties of OMP3 make it much more favorable than CCSD in the evaluation of analytic derivatives. For OMP3, the analytic gradient computations are much less expensive than CCSD. For the frequency computation, both methods require the evaluation of the perturbed amplitudes and orbitals. However, in the OMP3 case there is still a significant computational time savings due to simplifications in the analytic Hessian expression owing to the stationary property of OMP3. Hence, the OMP3 method emerges as a very useful tool for computational quantum chemistry.
Thrust chamber performance using Navier-Stokes solution. [space shuttle main engine viscous nozzle calculation

NASA Technical Reports Server (NTRS)

Chan, J. S.; Freeman, J. A.

1984-01-01

The viscous, axisymmetric flow in the thrust chamber of the space shuttle main engine (SSME) was computed on the CRAY 205 computer using the general interpolants method (GIM) code. Results show that the Navier-Stokes codes can be used for these flows to study trends and viscous effects as well as determine flow patterns; but further research and development is needed before they can be used as production tools for nozzle performance calculations. The GIM formulation, numerical scheme, and computer code are described. The actual SSME nozzle computation showing grid points, flow contours, and flow parameter plots is discussed. The computer system and run times/costs are detailed.
Aerodynamic optimization studies on advanced architecture computers

NASA Technical Reports Server (NTRS)

Chawla, Kalpana

1995-01-01

The approach to carrying out multi-discipline aerospace design studies in the future, especially in massively parallel computing environments, comprises of choosing (1) suitable solvers to compute solutions to equations characterizing a discipline, and (2) efficient optimization methods. In addition, for aerodynamic optimization problems, (3) smart methodologies must be selected to modify the surface shape. In this research effort, a 'direct' optimization method is implemented on the Cray C-90 to improve aerodynamic design. It is coupled with an existing implicit Navier-Stokes solver, OVERFLOW, to compute flow solutions. The optimization method is chosen such that it can accomodate multi-discipline optimization in future computations. In the work , however, only single discipline aerodynamic optimization will be included.
Parallelization of interpolation, solar radiation and water flow simulation modules in GRASS GIS using OpenMP

NASA Astrophysics Data System (ADS)

Hofierka, Jaroslav; Lacko, Michal; Zubal, Stanislav

2017-10-01

In this paper, we describe the parallelization of three complex and computationally intensive modules of GRASS GIS using the OpenMP application programming interface for multi-core computers. These include the v.surf.rst module for spatial interpolation, the r.sun module for solar radiation modeling and the r.sim.water module for water flow simulation. We briefly describe the functionality of the modules and parallelization approaches used in the modules. Our approach includes the analysis of the module's functionality, identification of source code segments suitable for parallelization and proper application of OpenMP parallelization code to create efficient threads processing the subtasks. We document the efficiency of the solutions using the airborne laser scanning data representing land surface in the test area and derived high-resolution digital terrain model grids. We discuss the performance speed-up and parallelization efficiency depending on the number of processor threads. The study showed a substantial increase in computation speeds on a standard multi-core computer while maintaining the accuracy of results in comparison to the output from original modules. The presented parallelization approach showed the simplicity and efficiency of the parallelization of open-source GRASS GIS modules using OpenMP, leading to an increased performance of this geospatial software on standard multi-core computers.
OPAL: An Open-Source MPI-IO Library over Cray XT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yu, Weikuan; Vetter, Jeffrey S; Canon, Richard Shane

Parallel IO over Cray XT is supported by a vendor-supplied MPI-IO package. This package contains a proprietary ADIO implementation built on top of the sysio library. While it is reasonable to maintain a stable code base for application scientists' convenience, it is also very important to the system developers and researchers to analyze and assess the effectiveness of parallel IO software, and accordingly, tune and optimize the MPI-IO implementation. A proprietary parallel IO code base relinquishes such flexibilities. On the other hand, a generic UFS-based MPI-IO implementation is typically used on many Linux-based platforms. We have developed an open-source MPI-IOmore » package over Lustre, referred to as OPAL (OPportunistic and Adaptive MPI-IO Library over Lustre). OPAL provides a single source-code base for MPI-IO over Lustre on Cray XT and Linux platforms. Compared to Cray implementation, OPAL provides a number of good features, including arbitrary specification of striping patterns and Lustre-stripe aligned file domain partitioning. This paper presents the performance comparisons between OPAL and Cray's proprietary implementation. Our evaluation demonstrates that OPAL achieves the performance comparable to the Cray implementation. We also exemplify the benefits of an open source package in revealing the underpinning of the parallel IO performance.« less
From the X-rays to a reliable “low cost” computational structure of caffeic acid: DFT, MP2, HF and integrated molecular dynamics-X-ray diffraction approach to condensed phases

NASA Astrophysics Data System (ADS)

Lombardo, Giuseppe M.; Portalone, Gustavo; Colapietro, Marcello; Rescifina, Antonio; Punzo, Francesco

2011-05-01

The ability of caffeic acid to act as antioxidant against hyperoxo-radicals as well as its recently found therapeutic properties in the treatment of hepatocarcinoma, still make this compound, more than 20 years later the refinement of its crystal structure, object of study. It belongs to the vast family of humic substances, which play a key role in the biodegradation processes and easily form complexes with ions widely diffused in the environment. This class of compounds is therefore interesting for potential environmental chemistry applications concerning the possible complexation of heavy metals. Our study focused on the characterization of caffeic acid as a starting necessary step, which will be followed in the future by the application of our findings on the study of the properties of caffeate anion interaction with heavy metal ions. To reach this goal, we applied a low cost approach - in terms of computational time and resources - aimed at the achievement of a high resolution, robust and trustable structure using the X-ray single crystal data, recollected with a higher resolution, as touchstone for a detailed check. A comparison between the calculations carried out with density functional theory (DFT), Hartree-Fock (HF) method and post SCF second order Møller-Plesset perturbation method (MP2), at the 6-31G ** level of the theory, molecular mechanics (MM) and molecular dynamics (MD) was performed. As a consequence we explained on one hand the possible reasons for the pitfalls of the DFT approach and on the other the benefits of using a good and robust force field developed for condensed phases, as AMBER, with MM and MD. The reliability of the latter, highlighted by the overall agreement extended up to the anisotropic displacement parameters calculated by means of MD and the ones gathered by X-ray measurements, makes it very promising for the above-mentioned goals.
Synthesis and characterization of a series of nickel( ii ) alkoxide precursors and their utility for Ni(0) nanoparticle production

DOE Office of Scientific and Technical Information (OSTI.GOV)

Treadwell, LaRico J.; Boyle, Timothy J.; Phelan, W. Adam

2017-02-22

We synthesized a series of nickel(II) aryloxide ([Ni(OAr) 2(py) x]) precursors from an amide-alcohol exchange using [Ni(NR 2) 2] in the presence of pyridine (py). The H-OAr selected were the mono- and di-ortho-substituted 2-alkyl phenols: alkyl = methyl (H-oMP), iso-propyl (H-oPP), tert-butyl (H-oBP) and 2,6-di-alkyl phenols (alkyl = di-iso-propyl (H-DIP), di-tert-butyl (H-DBP), di-phenyl (H-DPhP)). Furthermore, the crystalline products were solved as solvated monomers and structurally characterized as [Ni(OAr) 2(py) x], where x = 4: OAr = oMP (1), oPP (2); x = 3: OAr = oBP (3), DIP (4); x = 2: OAr = DBP (5), DPhP (6). The excitedmore » states (singlet or triplet) and various geometries of 1–6 were identified by experimental UV-vis and verified by computational modeling. Magnetic susceptibility of the representative compound 4 was fit to a Curie Weiss model that yielded a magnetic moment of 4.38(3)μ B consistent with a Ni 2+ center. Compounds 1 and 6 were selected for decomposition studied under solution precipitation routes since they represent the two extremes of coordination. The particle size and crystalline structure were characterized using transmission electron microscopy (TEM) and powder X-ray diffraction (PXRD). Finally, the materials isolated from 1 and 6 were found by TEM to form irregular shape nanomaterials (8–15 nm), which by PXRD were found to be Ni 0 hcp (PDF: 01-089-7129) and fcc (PDF: 01-070-0989), respectively.« less
Numerical Methods for 2-Dimensional Modeling

DTIC Science & Technology

1980-12-01

high-order finite element methods, and a multidimensional version of the method of lines, both utilizing an optimized stiff integrator for the time...integration. The finite element methods have proved disappointing, but the method of lines has provided an unexpectedly large gain in speed. Two...diffusion problems with the same number of unknowns (a 21 x 41 grid), solved by second-order finite element methods, took over seven minutes on the Cray-i
Vectorization of a particle code used in the simulation of rarefied hypersonic flow

NASA Technical Reports Server (NTRS)

Baganoff, D.

1990-01-01

A limitation of the direct simulation Monte Carlo (DSMC) method is that it does not allow efficient use of vector architectures that predominate in current supercomputers. Consequently, the problems that can be handled are limited to those of one- and two-dimensional flows. This work focuses on a reformulation of the DSMC method with the objective of designing a procedure that is optimized to the vector architectures found on machines such as the Cray-2. In addition, it focuses on finding a better balance between algorithmic complexity and the total number of particles employed in a simulation so that the overall performance of a particle simulation scheme can be greatly improved. Simulations of the flow about a 3D blunt body are performed with 10 to the 7th particles and 4 x 10 to the 5th mesh cells. Good statistics are obtained with time averaging over 800 time steps using 4.5 h of Cray-2 single-processor CPU time.
A study of workstation computational performance for real-time flight simulation

NASA Technical Reports Server (NTRS)

Maddalon, Jeffrey M.; Cleveland, Jeff I., II

1995-01-01

With recent advances in microprocessor technology, some have suggested that modern workstations provide enough computational power to properly operate a real-time simulation. This paper presents the results of a computational benchmark, based on actual real-time flight simulation code used at Langley Research Center, which was executed on various workstation-class machines. The benchmark was executed on different machines from several companies including: CONVEX Computer Corporation, Cray Research, Digital Equipment Corporation, Hewlett-Packard, Intel, International Business Machines, Silicon Graphics, and Sun Microsystems. The machines are compared by their execution speed, computational accuracy, and porting effort. The results of this study show that the raw computational power needed for real-time simulation is now offered by workstations.
A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Madduri, Kamesh; Ediger, David; Jiang, Karl

2009-02-15

We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 millionmore » vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Madduri, Kamesh; Ediger, David; Jiang, Karl

2009-05-29

We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor.more » For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
Do Some X-ray Stars Have White Dwarf Companions?

NASA Technical Reports Server (NTRS)

McCollum, Bruce

1995-01-01

Some Be stars which are intermittent C-ray sources may have white dwarf companions rather than neutron stars. It is not possible to prove or rule out the existence of Be+WD systems using X-ray or optical data. However, the presence of a white dwarf could be established by the detection of its EUV continuum shortward of the Be star's continuum turnover at 1OOOA. Either the detection or the nondetection of Be+WD systems would have implications for models of Be star variability, models of Be binary system formation and evolution, and models of wind-fed accretion.
Improving the accuracy of Møller-Plesset perturbation theory with neural networks

NASA Astrophysics Data System (ADS)

McGibbon, Robert T.; Taube, Andrew G.; Donchev, Alexander G.; Siva, Karthik; Hernández, Felipe; Hargus, Cory; Law, Ka-Hei; Klepeis, John L.; Shaw, David E.

2017-10-01

Noncovalent interactions are of fundamental importance across the disciplines of chemistry, materials science, and biology. Quantum chemical calculations on noncovalently bound complexes, which allow for the quantification of properties such as binding energies and geometries, play an essential role in advancing our understanding of, and building models for, a vast array of complex processes involving molecular association or self-assembly. Because of its relatively modest computational cost, second-order Møller-Plesset perturbation (MP2) theory is one of the most widely used methods in quantum chemistry for studying noncovalent interactions. MP2 is, however, plagued by serious errors due to its incomplete treatment of electron correlation, especially when modeling van der Waals interactions and π-stacked complexes. Here we present spin-network-scaled MP2 (SNS-MP2), a new semi-empirical MP2-based method for dimer interaction-energy calculations. To correct for errors in MP2, SNS-MP2 uses quantum chemical features of the complex under study in conjunction with a neural network to reweight terms appearing in the total MP2 interaction energy. The method has been trained on a new data set consisting of over 200 000 complete basis set (CBS)-extrapolated coupled-cluster interaction energies, which are considered the gold standard for chemical accuracy. SNS-MP2 predicts gold-standard binding energies of unseen test compounds with a mean absolute error of 0.04 kcal mol-1 (root-mean-square error 0.09 kcal mol-1), a 6- to 7-fold improvement over MP2. To the best of our knowledge, its accuracy exceeds that of all extant density functional theory- and wavefunction-based methods of similar computational cost, and is very close to the intrinsic accuracy of our benchmark coupled-cluster methodology itself. Furthermore, SNS-MP2 provides reliable per-conformation confidence intervals on the predicted interaction energies, a feature not available from any alternative method.
Improving the accuracy of Møller-Plesset perturbation theory with neural networks.

PubMed

McGibbon, Robert T; Taube, Andrew G; Donchev, Alexander G; Siva, Karthik; Hernández, Felipe; Hargus, Cory; Law, Ka-Hei; Klepeis, John L; Shaw, David E

2017-10-28

Noncovalent interactions are of fundamental importance across the disciplines of chemistry, materials science, and biology. Quantum chemical calculations on noncovalently bound complexes, which allow for the quantification of properties such as binding energies and geometries, play an essential role in advancing our understanding of, and building models for, a vast array of complex processes involving molecular association or self-assembly. Because of its relatively modest computational cost, second-order Møller-Plesset perturbation (MP2) theory is one of the most widely used methods in quantum chemistry for studying noncovalent interactions. MP2 is, however, plagued by serious errors due to its incomplete treatment of electron correlation, especially when modeling van der Waals interactions and π-stacked complexes. Here we present spin-network-scaled MP2 (SNS-MP2), a new semi-empirical MP2-based method for dimer interaction-energy calculations. To correct for errors in MP2, SNS-MP2 uses quantum chemical features of the complex under study in conjunction with a neural network to reweight terms appearing in the total MP2 interaction energy. The method has been trained on a new data set consisting of over 200 000 complete basis set (CBS)-extrapolated coupled-cluster interaction energies, which are considered the gold standard for chemical accuracy. SNS-MP2 predicts gold-standard binding energies of unseen test compounds with a mean absolute error of 0.04 kcal mol -1 (root-mean-square error 0.09 kcal mol -1 ), a 6- to 7-fold improvement over MP2. To the best of our knowledge, its accuracy exceeds that of all extant density functional theory- and wavefunction-based methods of similar computational cost, and is very close to the intrinsic accuracy of our benchmark coupled-cluster methodology itself. Furthermore, SNS-MP2 provides reliable per-conformation confidence intervals on the predicted interaction energies, a feature not available from any alternative method.
Airloads on Bluff Bodies, with Application to the Rotor-Induced Downloads on Tilt-Rotor Aircraft.

DTIC Science & Technology

1983-09-01

interference aerodynamics would be tion on hover performance (Ref. (11). to study the two-dimensional sec- tion characteristics of a wing in the wake of a...resources for large numbers of vortices; a typical case requires 10-15 min CPU time on the Ames Cray IS computer. Figure 6 shows a typical result. Here...CPU time per case on a Prime 550UPPER SURFACE (WINDWARD) computer to converge to a steady solution; this would be equivalent to one or two seconds on
LAMPS software

NASA Technical Reports Server (NTRS)

Perkey, D. J.; Kreitzberg, C. W.

1984-01-01

The dynamic prediction model along with its macro-processor capability and data flow system from the Drexel Limited-Area and Mesoscale Prediction System (LAMPS) were converted and recorded for the Perkin-Elmer 3220. The previous version of this model was written for Control Data Corporation 7600 and CRAY-1a computer environment which existed until recently at the National Center for Atmospheric Research. The purpose of this conversion was to prepare LAMPS for porting to computer environments other than that encountered at NCAR. The emphasis was shifted from programming tasks to model simulation and evaluation tests.
Atomic orbital-based SOS-MP2 with tensor hypercontraction. II. Local tensor hypercontraction

NASA Astrophysics Data System (ADS)

Song, Chenchen; Martínez, Todd J.

2017-01-01

In the first paper of the series [Paper I, C. Song and T. J. Martinez, J. Chem. Phys. 144, 174111 (2016)], we showed how tensor-hypercontracted (THC) SOS-MP2 could be accelerated by exploiting sparsity in the atomic orbitals and using graphical processing units (GPUs). This reduced the formal scaling of the SOS-MP2 energy calculation to cubic with respect to system size. The computational bottleneck then becomes the THC metric matrix inversion, which scales cubically with a large prefactor. In this work, the local THC approximation is proposed to reduce the computational cost of inverting the THC metric matrix to linear scaling with respect to molecular size. By doing so, we have removed the primary bottleneck to THC-SOS-MP2 calculations on large molecules with O(1000) atoms. The errors introduced by the local THC approximation are less than 0.6 kcal/mol for molecules with up to 200 atoms and 3300 basis functions. Together with the graphical processing unit techniques and locality-exploiting approaches introduced in previous work, the scaled opposite spin MP2 (SOS-MP2) calculations exhibit O(N2.5) scaling in practice up to 10 000 basis functions. The new algorithms make it feasible to carry out SOS-MP2 calculations on small proteins like ubiquitin (1231 atoms/10 294 atomic basis functions) on a single node in less than a day.
Atomic orbital-based SOS-MP2 with tensor hypercontraction. II. Local tensor hypercontraction.

PubMed

Song, Chenchen; Martínez, Todd J

2017-01-21

In the first paper of the series [Paper I, C. Song and T. J. Martinez, J. Chem. Phys. 144, 174111 (2016)], we showed how tensor-hypercontracted (THC) SOS-MP2 could be accelerated by exploiting sparsity in the atomic orbitals and using graphical processing units (GPUs). This reduced the formal scaling of the SOS-MP2 energy calculation to cubic with respect to system size. The computational bottleneck then becomes the THC metric matrix inversion, which scales cubically with a large prefactor. In this work, the local THC approximation is proposed to reduce the computational cost of inverting the THC metric matrix to linear scaling with respect to molecular size. By doing so, we have removed the primary bottleneck to THC-SOS-MP2 calculations on large molecules with O(1000) atoms. The errors introduced by the local THC approximation are less than 0.6 kcal/mol for molecules with up to 200 atoms and 3300 basis functions. Together with the graphical processing unit techniques and locality-exploiting approaches introduced in previous work, the scaled opposite spin MP2 (SOS-MP2) calculations exhibit O(N 2.5 ) scaling in practice up to 10 000 basis functions. The new algorithms make it feasible to carry out SOS-MP2 calculations on small proteins like ubiquitin (1231 atoms/10 294 atomic basis functions) on a single node in less than a day.

HEAT.PRO - THERMAL IMBALANCE FORCE SIMULATION AND ANALYSIS USING PDE2D

NASA Technical Reports Server (NTRS)

Vigue, Y.

1994-01-01

HEAT.PRO calculates the thermal imbalance force resulting from satellite surface heating. The heated body of a satellite re-radiates energy at a rate that is proportional to its temperature, losing the energy in the form of photons. By conservation of momentum, this momentum flux out of the body creates a reaction force against the radiation surface, and the net thermal force can be observed as a small perturbation that affects long term orbital behavior of the satellite. HEAT.PRO calculates this thermal imbalance force and then determines its effects on satellite orbits, especially where the Earth's shadowing of an orbiting satellite causes periodic changes in the spacecraft's thermal environment. HEAT.PRO implements a finite element method routine called PDE2D which incorporates material properties to determine the solar panel surface temperatures. The nodal temperatures are computed at specified time steps and are used to determine the magnitude and direction of the thermal force on the spacecraft. These calculations are based on the solar panel orientation and satellite's position with respect to the earth and sun. It is necessary to have accurate, current knowledge of surface emissivity, thermal conductivity, heat capacity, and material density. These parameters, which may change due to degradation of materials in the environment of space, influence the nodal temperatures that are computed and thus the thermal force calculations. HEAT.PRO was written in FORTRAN 77 for Cray series computers running UNICOS. The source code contains directives for and is used as input to the required partial differential equation solver, PDE2D. HEAT.PRO is available on a 9-track 1600 BPI magnetic tape in UNIX tar format (standard distribution medium) or a .25 inch streaming magnetic tape cartridge in UNIX tar format. An electronic copy of the documentation in Macintosh Microsoft Word format is included on the distribution tape. HEAT.PRO was developed in 1991. Cray and UNICOS are registered trademarks of Cray Research, Inc. UNIX is a trademark of AT&T Bell Laboratories. PDE2D is available from Granville Sewell, Mathematics Dept., University of Texas at El Paso, El Paso, Texas 79968.
Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications

NASA Technical Reports Server (NTRS)

Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele

2004-01-01

In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.
The Atomization Energy of Mg4

NASA Technical Reports Server (NTRS)

Bauschlicher, Charles W., Jr.; Arnold, James O. (Technical Monitor)

1999-01-01

The atomization energy of Mg4 is determined using the MP2 and CCSD(T) levels of theory. Basis set incompleteness, basis set extrapolation, and core-valence effects are discussed. Our best atomization energy, including the zero-point energy and scalar relativistic effects, is 24.6+/-1.6 kcal per mol. Our computed and extrapolated values are compared with previous results, where it is observed that our extrapolated MP2 value is good agreement with the MP2-R12 value. The CCSD(T) and MP2 core effects are found to have the opposite signs.
Screening, isolation and optimization of anti–white spot syndrome virus drug derived from marine plants

PubMed Central

Chakraborty, Somnath; Ghosh, Upasana; Balasubramanian, Thangavel; Das, Punyabrata

2014-01-01

Objective To screen, isolate and optimize anti-white spot syndrome virus (WSSV) drug derived from various marine floral ecosystems and to evaluate the efficacy of the same in host–pathogen interaction model. Methods Thirty species of marine plants were subjected to Soxhlet extraction using water, ethanol, methanol and hexane as solvents. The 120 plant isolates thus obtained were screened for their in vivo anti-WSSV property in Litopenaeus vannamei. By means of chemical processes, the purified anti-WSSV plant isolate, MP07X was derived. The drug was optimized at various concentrations. Viral and immune genes were analysed using reverse transcriptase PCR to confirm the potency of the drug. Results Nine plant isolates exhibited significant survivability in host. The drug MP07X thus formulated showing 85% survivability in host. The surviving shrimps were nested PCR negative at the end of the 15 d experimentation. The lowest concentration of MP07X required intramuscularly for virucidal property was 10 mg/mL. The oral dosage of 1 000 mg/kg body weight/day survived at the rate of 85%. Neither VP28 nor ie 1 was expressed in the test samples at 42nd hour and 84th hour post viral infection. Conclusions The drug MP07X derived from Rhizophora mucronata is a potent anti-WSSV drug. PMID:25183065
Crystallization and preliminary X-ray characterization of the genetically encoded fluorescent calcium indicator protein GCaMP2

PubMed Central

Rodríguez Guilbe, María M.; Alfaro Malavé, Elisa C.; Akerboom, Jasper; Marvin, Jonathan S.; Looger, Loren L.; Schreiter, Eric R.

2008-01-01

Fluorescent proteins and their engineered variants have played an important role in the study of biology. The genetically encoded calcium-indicator protein GCaMP2 comprises a circularly permuted fluorescent protein coupled to the calcium-binding protein calmodulin and a calmodulin target peptide, M13, derived from the intracellular calmodulin target myosin light-chain kinase and has been used to image calcium transients in vivo. To aid rational efforts to engineer improved variants of GCaMP2, this protein was crystallized in the calcium-saturated form. X-ray diffraction data were collected to 2.0 Å resolution. The crystals belong to space group C2, with unit-cell parameters a = 126.1, b = 47.1, c = 68.8 Å, β = 100.5° and one GCaMP2 molecule in the asymmetric unit. The structure was phased by molecular replacement and refinement is currently under way. PMID:18607093
IBM PC enhances the world's future

NASA Technical Reports Server (NTRS)

Cox, Jozelle

1988-01-01

Although the purpose of this research is to illustrate the importance of computers to the public, particularly the IBM PC, present examinations will include computers developed before the IBM PC was brought into use. IBM, as well as other computing facilities, began serving the public years ago, and is continuing to find ways to enhance the existence of man. With new developments in supercomputers like the Cray-2, and the recent advances in artificial intelligence programming, the human race is gaining knowledge at a rapid pace. All have benefited from the development of computers in the world; not only have they brought new assets to life, but have made life more and more of a challenge everyday.
Breaking the bottleneck: Use of molecular tailoring approach for the estimation of binding energies at MP2/CBS limit for large water clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Singh, Gurmeet; Nandi, Apurba; Gadre, Shridhar R., E-mail: gadre@iitk.ac.in

2016-03-14

A pragmatic method based on the molecular tailoring approach (MTA) for estimating the complete basis set (CBS) limit at Møller-Plesset second order perturbation (MP2) theory accurately for large molecular clusters with limited computational resources is developed. It is applied to water clusters, (H{sub 2}O){sub n} (n = 7, 8, 10, 16, 17, and 25) optimized employing aug-cc-pVDZ (aVDZ) basis-set. Binding energies (BEs) of these clusters are estimated at the MP2/aug-cc-pVNZ (aVNZ) [N = T, Q, and 5 (whenever possible)] levels of theory employing grafted MTA (GMTA) methodology and are found to lie within 0.2 kcal/mol of the corresponding full calculationmore » MP2 BE, wherever available. The results are extrapolated to CBS limit using a three point formula. The GMTA-MP2 calculations are feasible on off-the-shelf hardware and show around 50%–65% saving of computational time. The methodology has a potential for application to molecular clusters containing ∼100 atoms.« less
An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture

DOE PAGES

Mironov, Vladimir; Moskovsky, Alexander; D’Mello, Michael; ...

2017-10-04

The Hartree-Fock (HF) method in the quantum chemistry package GAMESS represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals (ERIs) and the building of the Fock matrix. These are the central components of the main Self Consistent Field (SCF) loop, the key hotspot in Electronic Structure (ES) codes. By threading the MPI ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4x to 6x for large systems), but also achieve a significant (>2x) reduction in the overallmore » memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel R Xeon PhiTM supercomputer. Here, scaling numbers are reported on up to 7,680 cores on Intel Xeon Phi coprocessors.« less
Parallel computing using a Lagrangian formulation

NASA Technical Reports Server (NTRS)

Liou, May-Fun; Loh, Ching Yuen

1991-01-01

A new Lagrangian formulation of the Euler equation is adopted for the calculation of 2-D supersonic steady flow. The Lagrangian formulation represents the inherent parallelism of the flow field better than the common Eulerian formulation and offers a competitive alternative on parallel computers. The implementation of the Lagrangian formulation on the Thinking Machines Corporation CM-2 Computer is described. The program uses a finite volume, first-order Godunov scheme and exhibits high accuracy in dealing with multidimensional discontinuities (slip-line and shock). By using this formulation, a better than six times speed-up was achieved on a 8192-processor CM-2 over a single processor of a CRAY-2.
Parallel computing using a Lagrangian formulation

NASA Technical Reports Server (NTRS)

Liou, May-Fun; Loh, Ching-Yuen

1992-01-01

This paper adopts a new Lagrangian formulation of the Euler equation for the calculation of two dimensional supersonic steady flow. The Lagrangian formulation represents the inherent parallelism of the flow field better than the common Eulerian formulation and offers a competitive alternative on parallel computers. The implementation of the Lagrangian formulation on the Thinking Machines Corporation CM-2 Computer is described. The program uses a finite volume, first-order Godunov scheme and exhibits high accuracy in dealing with multidimensional discontinuities (slip-line and shock). By using this formulation, we have achieved better than six times speed-up on a 8192-processor CM-2 over a single processor of a CRAY-2.
Multitasking the code ARC3D. [for computational fluid dynamics

NASA Technical Reports Server (NTRS)

Barton, John T.; Hsiung, Christopher C.

1986-01-01

The CRAY multitasking system was developed in order to utilize all four processors and sharply reduce the wall clock run time. This paper describes the techniques used to modify the computational fluid dynamics code ARC3D for this run and analyzes the achieved speedup. The ARC3D code solves either the Euler or thin-layer N-S equations using an implicit approximate factorization scheme. Results indicate that multitask processing can be used to achieve wall clock speedup factors of over three times, depending on the nature of the program code being used. Multitasking appears to be particularly advantageous for large-memory problems running on multiple CPU computers.
Involvement of RNA2-encoded proteins in the specific transmission of Grapevine fanleaf virus by its nematode vector Xiphinema index.

PubMed

Belin, C; Schmitt, C; Demangeat, G; Komar, V; Pinck, L; Fuchs, M

2001-12-05

The nepovirus Grapevine fanleaf virus (GFLV) is specifically transmitted by the nematode Xiphinema index. To identify the RNA2-encoded proteins involved in X. index-mediated spread of GFLV, chimeric RNA2 constructs were engineered by replacing the 2A, 2B(MP), and/or 2C(CP) sequences of GFLV with their counterparts in Arabis mosaic virus (ArMV), a closely related nepovirus which is transmitted by Xiphinema diversicaudatum but not by X. index. Among the recombinant viruses obtained from transcripts of GFLV RNA1 and chimeric RNA2, only those which contained the 2C(CP) gene (504 aa) and 2B(MP) contiguous 9 C-terminal residues of GFLV were transmitted by X. index as efficiently as natural and synthetic wild-type GFLV, regardless of the origin of the 2A and 2B(MP) genes. As expected, ArMV was not transmitted probably because it is not retained by X. index. These results indicate that the determinants responsible for the specific spread of GFLV by X. index are located within the 513 C-terminal residues of the polyprotein encoded by RNA2. Copyright 2001 Elsevier Science.
Efficient, massively parallel eigenvalue computation

NASA Technical Reports Server (NTRS)

Huo, Yan; Schreiber, Robert

1993-01-01

In numerical simulations of disordered electronic systems, one of the most common approaches is to diagonalize random Hamiltonian matrices and to study the eigenvalues and eigenfunctions of a single electron in the presence of a random potential. An effort to implement a matrix diagonalization routine for real symmetric dense matrices on massively parallel SIMD computers, the Maspar MP-1 and MP-2 systems, is described. Results of numerical tests and timings are also presented.
Tautomerization, molecular structure, transition state structure, and vibrational spectra of 2-aminopyridines: a combined computational and experimental study.

PubMed

Al-Otaibi, Jamelah S

2015-01-01

2-amino pyridine derivatives have attracted considerable interest because they are useful precursors for the synthesis of a variety of heterocyclic compounds possessing a medicinal value. In this work we aim to study both structural and electronic as well as high quality vibrational spectra for 2-amino-3-methylpyridine (2A3MP) and 2-amino-4-methylpyridine (2A4MP). Møller-Plesset perturbation theory (MP2/6-31G(d) and MP2/6-31++G(d,p) methods were used to investigate the structure and vibrational analysis of (2A3MP) and (2A4MP). Tautomerization of 2A4MP was investigated by Density Functional Theory (DFT/B3LYP) method in the gas phase. For the first time, all tautomers including NH → NH conversions as well as those usually omitted, NH → CH and CH → CH, were considered. The canonical structure (2A4MP1) is the most stable tautomer. It is 13.60 kcal/mole more stable than the next (2A4MP2). Transition state structures of pyramidal N inversion and proton transfer were computed at B3LYP/6-311++G(d,p). Barrier to transition state of hydrogen proton transfer is calculated as 44.81 kcal/mol. Transition state activation energy of pyramidal inversion at amino N is found to be 0.41 kcal/mol using the above method. Bond order and natural atomic charges were also calculated at the same level. The raman and FT-IR spectra of (2A3MP) and (2A4MP) were measured (4000-400 cm(-1)). The optimized molecular geometries, frequencies and vibrational bands intensity were calculated at ab initio (MP2) and DFT(B3LYP) levels of theory with 6-31G(d), 6-31++G(d,p) and 6-311++G(d,p) basis sets. The vibrational frequencies were compared with experimentally measured FT-IR and FT-Raman spectra. Reconsidering the vibrational analysis of (2A3MP) and (2A4MP) with more accurate FT-IR machine and highly accurate animation programs result in new improved vibrational assignments. Sophisticated quantum mechanics methods enable studying the transition state structure for different chemical systems.
Behavior Models for Software Architecture

DTIC Science & Technology

2014-11-01

MP. Existing process modeling frameworks (BPEL, BPMN [Grosskopf et al. 2009], IDEF) usually follow the “single flowchart” paradigm. MP separates...Process: Business Process Modeling using BPMN , Meghan Kiffer Press. HAREL, D., 1987, A Visual Formalism for Complex Systems. Science of Computer
Maximum parsimony, substitution model, and probability phylogenetic trees.

PubMed

Weng, J F; Thomas, D A; Mareels, I

2011-01-01

The problem of inferring phylogenies (phylogenetic trees) is one of the main problems in computational biology. There are three main methods for inferring phylogenies-Maximum Parsimony (MP), Distance Matrix (DM) and Maximum Likelihood (ML), of which the MP method is the most well-studied and popular method. In the MP method the optimization criterion is the number of substitutions of the nucleotides computed by the differences in the investigated nucleotide sequences. However, the MP method is often criticized as it only counts the substitutions observable at the current time and all the unobservable substitutions that really occur in the evolutionary history are omitted. In order to take into account the unobservable substitutions, some substitution models have been established and they are now widely used in the DM and ML methods but these substitution models cannot be used within the classical MP method. Recently the authors proposed a probability representation model for phylogenetic trees and the reconstructed trees in this model are called probability phylogenetic trees. One of the advantages of the probability representation model is that it can include a substitution model to infer phylogenetic trees based on the MP principle. In this paper we explain how to use a substitution model in the reconstruction of probability phylogenetic trees and show the advantage of this approach with examples.
Distributed process manager for an engineering network computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gait, J.

1987-08-01

MP is a manager for systems of cooperating processes in a local area network of engineering workstations. MP supports transparent continuation by maintaining multiple copies of each process on different workstations. Computational bandwidth is optimized by executing processes in parallel on different workstations. Responsiveness is high because workstations compete among themselves to respond to requests. The technique is to select a master from among a set of replicates of a process by a competitive election between the copies. Migration of the master when a fault occurs or when response slows down is effected by inducing the election of a newmore » master. Competitive response stabilizes system behavior under load, so MP exhibits realtime behaviors.« less
Collective Framework and Performance Optimizations to Open MPI for Cray XT Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ladd, Joshua S; Gorentla Venkata, Manjunath; Shamis, Pavel

2011-01-01

The performance and scalability of collective operations plays a key role in the performance and scalability of many scientific applications. Within the Open MPI code base we have developed a general purpose hierarchical collective operations framework called Cheetah, and applied it at large scale on the Oak Ridge Leadership Computing Facility's Jaguar (OLCF) platform, obtaining better performance and scalability than the native MPI implementation. This paper discuss Cheetah's design and implementation, and optimizations to the framework for Cray XT 5 platforms. Our results show that the Cheetah's Broadcast and Barrier perform better than the native MPI implementation. For medium data,more » the Cheetah's Broadcast outperforms the native MPI implementation by 93% for 49,152 processes problem size. For small and large data, it out performs the native MPI implementation by 10% and 9%, respectively, at 24,576 processes problem size. The Cheetah's Barrier performs 10% better than the native MPI implementation for 12,288 processes problem size.« less
Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. I. An efficient and simple linear scaling local MP2 method that uses an intermediate basis of pair natural orbitals

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pinski, Peter; Riplinger, Christoph; Neese, Frank, E-mail: evaleev@vt.edu, E-mail: frank.neese@cec.mpg.de

2015-07-21

In this work, a systematic infrastructure is described that formalizes concepts implicit in previous work and greatly simplifies computer implementation of reduced-scaling electronic structure methods. The key concept is sparse representation of tensors using chains of sparse maps between two index sets. Sparse map representation can be viewed as a generalization of compressed sparse row, a common representation of a sparse matrix, to tensor data. By combining few elementary operations on sparse maps (inversion, chaining, intersection, etc.), complex algorithms can be developed, illustrated here by a linear-scaling transformation of three-center Coulomb integrals based on our compact code library that implementsmore » sparse maps and operations on them. The sparsity of the three-center integrals arises from spatial locality of the basis functions and domain density fitting approximation. A novel feature of our approach is the use of differential overlap integrals computed in linear-scaling fashion for screening products of basis functions. Finally, a robust linear scaling domain based local pair natural orbital second-order Möller-Plesset (DLPNO-MP2) method is described based on the sparse map infrastructure that only depends on a minimal number of cutoff parameters that can be systematically tightened to approach 100% of the canonical MP2 correlation energy. With default truncation thresholds, DLPNO-MP2 recovers more than 99.9% of the canonical resolution of the identity MP2 (RI-MP2) energy while still showing a very early crossover with respect to the computational effort. Based on extensive benchmark calculations, relative energies are reproduced with an error of typically <0.2 kcal/mol. The efficiency of the local MP2 (LMP2) method can be drastically improved by carrying out the LMP2 iterations in a basis of pair natural orbitals. While the present work focuses on local electron correlation, it is of much broader applicability to computation with sparse tensors in quantum chemistry and beyond.« less
Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. I. An efficient and simple linear scaling local MP2 method that uses an intermediate basis of pair natural orbitals.

PubMed

Pinski, Peter; Riplinger, Christoph; Valeev, Edward F; Neese, Frank

2015-07-21

In this work, a systematic infrastructure is described that formalizes concepts implicit in previous work and greatly simplifies computer implementation of reduced-scaling electronic structure methods. The key concept is sparse representation of tensors using chains of sparse maps between two index sets. Sparse map representation can be viewed as a generalization of compressed sparse row, a common representation of a sparse matrix, to tensor data. By combining few elementary operations on sparse maps (inversion, chaining, intersection, etc.), complex algorithms can be developed, illustrated here by a linear-scaling transformation of three-center Coulomb integrals based on our compact code library that implements sparse maps and operations on them. The sparsity of the three-center integrals arises from spatial locality of the basis functions and domain density fitting approximation. A novel feature of our approach is the use of differential overlap integrals computed in linear-scaling fashion for screening products of basis functions. Finally, a robust linear scaling domain based local pair natural orbital second-order Möller-Plesset (DLPNO-MP2) method is described based on the sparse map infrastructure that only depends on a minimal number of cutoff parameters that can be systematically tightened to approach 100% of the canonical MP2 correlation energy. With default truncation thresholds, DLPNO-MP2 recovers more than 99.9% of the canonical resolution of the identity MP2 (RI-MP2) energy while still showing a very early crossover with respect to the computational effort. Based on extensive benchmark calculations, relative energies are reproduced with an error of typically <0.2 kcal/mol. The efficiency of the local MP2 (LMP2) method can be drastically improved by carrying out the LMP2 iterations in a basis of pair natural orbitals. While the present work focuses on local electron correlation, it is of much broader applicability to computation with sparse tensors in quantum chemistry and beyond.

Systolic array IC for genetic computation

NASA Technical Reports Server (NTRS)

Anderson, D.

1991-01-01

Measuring similarities between large sequences of genetic information is a formidable task requiring enormous amounts of computer time. Geneticists claim that nearly two months of CRAY-2 time are required to run a single comparison of the known database against the new bases that will be found this year, and more than a CRAY-2 year for next year's genetic discoveries, and so on. The DNA IC, designed at HP-ICBD in cooperation with the California Institute of Technology and the Jet Propulsion Laboratory, is being implemented in order to move the task of genetic comparison onto workstations and personal computers, while vastly improving performance. The chip is a systolic (pumped) array comprised of 16 processors, control logic, and global RAM, totaling 400,000 FETS. At 12 MHz, each chip performs 2.7 billion 16 bit operations per second. Using 35 of these chips in series on one PC board (performing nearly 100 billion operations per second), a sequence of 560 bases can be compared against the eventual total genome of 3 billion bases, in minutes--on a personal computer. While the designed purpose of the DNA chip is for genetic research, other disciplines requiring similarity measurements between strings of 7 bit encoded data could make use of this chip as well. Cryptography and speech recognition are two examples. A mix of full custom design and standard cells, in CMOS34, were used to achieve these goals. Innovative test methods were developed to enhance controllability and observability in the array. This paper describes these techniques as well as the chip's functionality. This chip was designed in the 1989-90 timeframe.
The keto-enol equilibrium in substituted acetaldehydes: focal-point analysis and ab initio limit

NASA Astrophysics Data System (ADS)

Balabin, Roman M.

2011-10-01

High-level ab initio electronic structure calculations up to the CCSD(T) theory level, including extrapolations to the complete basis set (CBS) limit, resulted in high precision energetics of the tautomeric equilibrium in 2-substituted acetaldehydes (XH2C-CHO). The CCSD(T)/CBS relative energies of the tautomers were estimated using CCSD(T)/aug-cc-pVTZ, MP3/aug-cc-pVQZ, and MP2/aug-cc-pV5Z calculations with MP2/aug-cc-pVTZ geometries. The relative enol (XHC = CHOH) stabilities (ΔE e,CCSD(T)/CBS) were found to be 5.98 ± 0.17, -1.67 ± 0.82, 7.64 ± 0.21, 8.39 ± 0.31, 2.82 ± 0.52, 10.27 ± 0.39, 9.12 ± 0.18, 5.47 ± 0.53, 7.50 ± 0.43, 10.12 ± 0.51, 8.49 ± 0.33, and 6.19 ± 0.18 kcal mol-1 for X = BeH, BH2, CH3, Cl, CN, F, H, NC, NH2, OCH3, OH, and SH, respectively. Inconsistencies between the results of complex/composite energy computations methods Gn/CBS (G2, G3, CBS-4M, and CBS-QB3) and high-level ab initio methods (CCSD(T)/CBS and MP2/CBS) were found. DFT/aug-cc-pVTZ results with B3LYP, PBE0 (PBE1PBE), TPSS, and BMK density functionals were close to the CCSD(T)/CBS levels (MAD = 1.04 kcal mol-1).
Parallel-vector computation for structural analysis and nonlinear unconstrained optimization problems

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.

1990-01-01

Practical engineering application can often be formulated in the form of a constrained optimization problem. There are several solution algorithms for solving a constrained optimization problem. One approach is to convert a constrained problem into a series of unconstrained problems. Furthermore, unconstrained solution algorithms can be used as part of the constrained solution algorithms. Structural optimization is an iterative process where one starts with an initial design, a finite element structure analysis is then performed to calculate the response of the system (such as displacements, stresses, eigenvalues, etc.). Based upon the sensitivity information on the objective and constraint functions, an optimizer such as ADS or IDESIGN, can be used to find the new, improved design. For the structural analysis phase, the equation solver for the system of simultaneous, linear equations plays a key role since it is needed for either static, or eigenvalue, or dynamic analysis. For practical, large-scale structural analysis-synthesis applications, computational time can be excessively large. Thus, it is necessary to have a new structural analysis-synthesis code which employs new solution algorithms to exploit both parallel and vector capabilities offered by modern, high performance computers such as the Convex, Cray-2 and Cray-YMP computers. The objective of this research project is, therefore, to incorporate the latest development in the parallel-vector equation solver, PVSOLVE into the widely popular finite-element production code, such as the SAP-4. Furthermore, several nonlinear unconstrained optimization subroutines have also been developed and tested under a parallel computer environment. The unconstrained optimization subroutines are not only useful in their own right, but they can also be incorporated into a more popular constrained optimization code, such as ADS.
Impact of machining on the flexural fatigue strength of glass and polycrystalline CAD/CAM ceramics.

PubMed

Fraga, Sara; Amaral, Marina; Bottino, Marco Antônio; Valandro, Luiz Felipe; Kleverlaan, Cornelis Johannes; May, Liliana Gressler

2017-11-01

To assess the effect of machining on the flexural fatigue strength and on the surface roughness of different computer-aided design, computer-aided manufacturing (CAD/CAM) ceramics by comparing machined and polished after machining specimens. Disc-shaped specimens of yttria-stabilized polycrystalline tetragonal zirconia (Y-TZP), leucite-, and lithium disilicate-based glass ceramics were prepared by CAD/CAM machining, and divided into two groups: machining (M) and machining followed by polishing (MP). The surface roughness was measured and the flexural fatigue strength was evaluated by the step-test method (n=20). The initial load and the load increment for each ceramic material were based on a monotonic test (n=5). A maximum of 10,000 cycles was applied in each load step, at 1.4Hz. Weibull probability statistics was used for the analysis of the flexural fatigue strength, and Mann-Whitney test (α=5%) to compare roughness between the M and MP conditions. Machining resulted in lower values of characteristic flexural fatigue strength than machining followed by polishing. The greatest reduction in flexural fatigue strength from MP to M was observed for Y-TZP (40%; M=536.48MPa; MP=894.50MPa), followed by lithium disilicate (33%; M=187.71MPa; MP=278.93MPa) and leucite (29%; M=72.61MPa; MP=102.55MPa). Significantly higher values of roughness (Ra) were observed for M compared to MP (leucite: M=1.59μm and MP=0.08μm; lithium disilicate: M=1.84μm and MP=0.13μm; Y-TZP: M=1.79μm and MP=0.18μm). Machining negatively affected the flexural fatigue strength of CAD/CAM ceramics, indicating that machining of partially or fully sintered ceramics is deleterious to fatigue strength. Copyright © 2017 The Academy of Dental Materials. Published by Elsevier Ltd. All rights reserved.
A computational study of hydrogen-bonded X3CH⋯YZ (X = Cl, F, NC; YZ = FLi, BF, CO, N2) complexes

NASA Astrophysics Data System (ADS)

McDowell, Sean A. C.

2018-03-01

An MP2/6-311++G(3df,3pd) computational study of a series of hydrogen-bonded complexes X3CH⋯YZ (X = Cl, F, NC; YZ = FLi, BF, CO, N2) was undertaken to assess the trends in the relative stability and other molecular properties with variation of both the X group and the chemical hardness of the Y atom of YZ. The red- and blue-shifting propensities of the proton donor X3CH were investigated by considering the Csbnd H bond length change and its associated vibrational frequency shift. The proton donor Cl3CH, which has a positive dipole moment derivative with respect to Csbnd H bond extension, tends to form red-shifted complexes, this tendency being modified by the hardness (and dipole moment) associated with the proton acceptor. On the other hand, F3CH has a negative dipole moment derivative and tends to form blue-shifted complexes, suggesting that as X becomes more electron-withdrawing, the proton donor should have a negative dipole moment derivative and form blue-shifted complexes. Surprisingly, the most polar proton donor (NC)3CH was found to have a positive dipole moment derivative and produces red-shifted complexes. A perturbative model was found useful in rationalizing the trends for the Csbnd H bond length change and associated frequency shift.
User and Performance Impacts from Franklin Upgrades

DOE Office of Scientific and Technical Information (OSTI.GOV)

He, Yun

2009-05-10

The NERSC flagship computer Cray XT4 system"Franklin" has gone through three major upgrades: quad core upgrade, CLE 2.1 upgrade, and IO upgrade, during the past year. In this paper, we will discuss the various aspects of the user impacts such as user access, user environment, and user issues etc from these upgrades. The performance impacts on the kernel benchmarks and selected application benchmarks will also be presented.
Performance Analysis of the Unitree Central File

NASA Technical Reports Server (NTRS)

Pentakalos, Odysseas I.; Flater, David

1994-01-01

This report consists of two parts. The first part briefly comments on the documentation status of two major systems at NASA#s Center for Computational Sciences, specifically the Cray C98 and the Convex C3830. The second part describes the work done on improving the performance of file transfers between the Unitree Mass Storage System running on the Convex file server and the users workstations distributed over a large georgraphic area.
Scalable Low-Power Deep Machine Learning with Analog Computation

DTIC Science & Technology

2013-07-19

transimpedance amplifier (TIA) that measures the output current 7 V Cf Vbias MP1 MN1 Vdd = 3 V 2.5 V 2.6 V + − Vox = 4.4 V 0.1 V + − 7 V Cf Vbias MP1 MN1 Vddt... amplifier . The amplifier has Cf as its feedback capacitor and the FG voltage Vfg as its input. The two MUXs at the sources of MP1 and MP2 control the...as a simple operational transconductor amplifier (OTA), converts voltage Vout to output current Iout. Vref determines the nominal voltage of Vout
Toward Enhancing OpenMP's Work-Sharing Directives

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chapman, B M; Huang, L; Jin, H

2006-05-17

OpenMP provides a portable programming interface for shared memory parallel computers (SMPs). Although this interface has proven successful for small SMPs, it requires greater flexibility in light of the steadily growing size of individual SMPs and the recent advent of multithreaded chips. In this paper, we describe two application development experiences that exposed these expressivity problems in the current OpenMP specification. We then propose mechanisms to overcome these limitations, including thread subteams and thread topologies. Thus, we identify language features that improve OpenMP application performance on emerging and large-scale platforms while preserving ease of programming.
NavP: Structured and Multithreaded Distributed Parallel Programming

NASA Technical Reports Server (NTRS)

Pan, Lei

2007-01-01

We present Navigational Programming (NavP) -- a distributed parallel programming methodology based on the principles of migrating computations and multithreading. The four major steps of NavP are: (1) Distribute the data using the data communication pattern in a given algorithm; (2) Insert navigational commands for the computation to migrate and follow large-sized distributed data; (3) Cut the sequential migrating thread and construct a mobile pipeline; and (4) Loop back for refinement. NavP is significantly different from the current prevailing Message Passing (MP) approach. The advantages of NavP include: (1) NavP is structured distributed programming and it does not change the code structure of an original algorithm. This is in sharp contrast to MP as MP implementations in general do not resemble the original sequential code; (2) NavP implementations are always competitive with the best MPI implementations in terms of performance. Approaches such as DSM or HPF have failed to deliver satisfying performance as of today in contrast, even if they are relatively easy to use compared to MP; (3) NavP provides incremental parallelization, which is beyond the reach of MP; and (4) NavP is a unifying approach that allows us to exploit both fine- (multithreading on shared memory) and coarse- (pipelined tasks on distributed memory) grained parallelism. This is in contrast to the currently popular hybrid use of MP+OpenMP, which is known to be complex to use. We present experimental results that demonstrate the effectiveness of NavP.
Ab initio SCF study of the barrier to internal rotation in simple amides. Part 3. Thioamides

NASA Astrophysics Data System (ADS)

Vassilev, Nikolay G.; Dimitrov, Valentin S.

2003-06-01

The free energies of activation for rotation about the thiocarbonyl C-N bond in X-C(S)N(CH 3) 2 (X=H, F, Cl, CH 3, CF 3) were calculated at the MP2(fc)/6-31+G*//6-31G* and MP2(fc)/6-311++G**//6-311++G** levels and compared with literature NMR gas-phase data. The results of calculations indicate that the nonbonded interactions in ground state (GS) are mainly responsible for the differences in the rotational barriers. For X=H, CH 3 and CF 3, the anti transition state (TS) is more stable; for the case X=Cl, the syn TS is more stable, while for the X=F, the two TS are energetically almost equivalent.
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bylaska, Eric J., E-mail: Eric.Bylaska@pnnl.gov; Weare, Jonathan Q., E-mail: weare@uchicago.edu; Weare, John H., E-mail: jweare@ucsd.edu

2013-08-21

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} − f(x{sub (i−1})]{sub i} {sub =1,M} = 0, for themore » trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H{sub 2}O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.« less
Median prior constrained TV algorithm for sparse view low-dose CT reconstruction.

PubMed

Liu, Yi; Shangguan, Hong; Zhang, Quan; Zhu, Hongqing; Shu, Huazhong; Gui, Zhiguo

2015-05-01

It is known that lowering the X-ray tube current (mAs) or tube voltage (kVp) and simultaneously reducing the total number of X-ray views (sparse view) is an effective means to achieve low-dose in computed tomography (CT) scan. However, the associated image quality by the conventional filtered back-projection (FBP) usually degrades due to the excessive quantum noise. Although sparse-view CT reconstruction algorithm via total variation (TV), in the scanning protocol of reducing X-ray tube current, has been demonstrated to be able to result in significant radiation dose reduction while maintain image quality, noticeable patchy artifacts still exist in reconstructed images. In this study, to address the problem of patchy artifacts, we proposed a median prior constrained TV regularization to retain the image quality by introducing an auxiliary vector m in register with the object. Specifically, the approximate action of m is to draw, in each iteration, an object voxel toward its own local median, aiming to improve low-dose image quality with sparse-view projection measurements. Subsequently, an alternating optimization algorithm is adopted to optimize the associative objective function. We refer to the median prior constrained TV regularization as "TV_MP" for simplicity. Experimental results on digital phantoms and clinical phantom demonstrated that the proposed TV_MP with appropriate control parameters can not only ensure a higher signal to noise ratio (SNR) of the reconstructed image, but also its resolution compared with the original TV method. Copyright © 2015 Elsevier Ltd. All rights reserved.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Taft, James R.

1999-01-01

Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
Ab initio study of the barrier to internal rotation in simple amides. 1. N, N-dimethylformamide and N, N-dimethylcarbamic halogenides

NASA Astrophysics Data System (ADS)

Vassilev, Nikolay G.; Dimitrov, Valentin S.

1999-06-01

Free energies of activation for rotation about the amide C-N bond in X-C(O)N(CH 3) 2 (X=H, F, Cl and Br) were calculated at the MP2(fc)/6-31+G*//6-31G* and MP2(fc)/6-311++G**//6-311++G** levels and compared with NMR gas-phase data. The results of calculations indicate that the repulsion between X and methyl group in ground state and the repulsion between X or oxygen and nitrogen lone pair in transition states (TS) are largely responsible for the difference in the free energies of the studied amides. For X=H (DMF), the anti TS is more stable; for the cases X=Cl, Br, the syn TS is more stable, while for the case X=F the two transition states are energetically almost equivalent.
COMPPAP - COMPOSITE PLATE BUCKLING ANALYSIS PROGRAM (IBM PC VERSION)

NASA Technical Reports Server (NTRS)

Smith, J. P.

1994-01-01

The Composite Plate Buckling Analysis Program (COMPPAP) was written to help engineers determine buckling loads of orthotropic (or isotropic) irregularly shaped plates without requiring hand calculations from design curves or extensive finite element modeling. COMPPAP is a one element finite element program that utilizes high-order displacement functions. The high order of the displacement functions enables the user to produce results more accurate than traditional h-finite elements. This program uses these high-order displacement functions to perform a plane stress analysis of a general plate followed by a buckling calculation based on the stresses found in the plane stress solution. The current version assumes a flat plate (constant thickness) subject to a constant edge load (normal or shear) on one or more edges. COMPPAP uses the power method to find the eigenvalues of the buckling problem. The power method provides an efficient solution when only one eigenvalue is desired. Once the eigenvalue is found, the eigenvector, which corresponds to the plate buckling mode shape, results as a by-product. A positive feature of the power method is that the dominant eigenvalue is the first found, which is this case is the plate buckling load. The reported eigenvalue expresses a load factor to induce plate buckling. COMPPAP is written in ANSI FORTRAN 77. Two machine versions are available from COSMIC: a PC version (MSC-22428), which is for IBM PC 386 series and higher computers and compatibles running MS-DOS; and a UNIX version (MSC-22286). The distribution medium for both machine versions includes source code for both single and double precision versions of COMPPAP. The PC version includes source code which has been optimized for implementation within DOS memory constraints as well as sample executables for both the single and double precision versions of COMPPAP. The double precision versions of COMPPAP have been successfully implemented on an IBM PC 386 compatible running MS-DOS, a Sun4 series computer running SunOS, an HP-9000 series computer running HP-UX, and a CRAY X-MP series computer running UNICOS. COMPPAP requires 1Mb of RAM and the BLAS and LINPACK math libraries, which are included on the distribution medium. The COMPPAP documentation provides instructions for using the commercial post-processing package PATRAN for graphical interpretation of COMPPAP output. The UNIX version includes two electronic versions of the documentation: one in LaTex format and one in PostScript format. The standard distribution medium for the PC version (MSC-22428) is a 5.25 inch 1.2Mb MS-DOS format diskette. The standard distribution medium for the UNIX version (MSC-22286) is a .25 inch streaming magnetic tape cartridge (Sun QIC-24) in UNIX tar format. For the UNIX version, alternate distribution media and formats are available upon request. COMPPAP was developed in 1992.
COMPPAP - COMPOSITE PLATE BUCKLING ANALYSIS PROGRAM (UNIX VERSION)

NASA Technical Reports Server (NTRS)

Smith, J. P.

1994-01-01

The Composite Plate Buckling Analysis Program (COMPPAP) was written to help engineers determine buckling loads of orthotropic (or isotropic) irregularly shaped plates without requiring hand calculations from design curves or extensive finite element modeling. COMPPAP is a one element finite element program that utilizes high-order displacement functions. The high order of the displacement functions enables the user to produce results more accurate than traditional h-finite elements. This program uses these high-order displacement functions to perform a plane stress analysis of a general plate followed by a buckling calculation based on the stresses found in the plane stress solution. The current version assumes a flat plate (constant thickness) subject to a constant edge load (normal or shear) on one or more edges. COMPPAP uses the power method to find the eigenvalues of the buckling problem. The power method provides an efficient solution when only one eigenvalue is desired. Once the eigenvalue is found, the eigenvector, which corresponds to the plate buckling mode shape, results as a by-product. A positive feature of the power method is that the dominant eigenvalue is the first found, which is this case is the plate buckling load. The reported eigenvalue expresses a load factor to induce plate buckling. COMPPAP is written in ANSI FORTRAN 77. Two machine versions are available from COSMIC: a PC version (MSC-22428), which is for IBM PC 386 series and higher computers and compatibles running MS-DOS; and a UNIX version (MSC-22286). The distribution medium for both machine versions includes source code for both single and double precision versions of COMPPAP. The PC version includes source code which has been optimized for implementation within DOS memory constraints as well as sample executables for both the single and double precision versions of COMPPAP. The double precision versions of COMPPAP have been successfully implemented on an IBM PC 386 compatible running MS-DOS, a Sun4 series computer running SunOS, an HP-9000 series computer running HP-UX, and a CRAY X-MP series computer running UNICOS. COMPPAP requires 1Mb of RAM and the BLAS and LINPACK math libraries, which are included on the distribution medium. The COMPPAP documentation provides instructions for using the commercial post-processing package PATRAN for graphical interpretation of COMPPAP output. The UNIX version includes two electronic versions of the documentation: one in LaTex format and one in PostScript format. The standard distribution medium for the PC version (MSC-22428) is a 5.25 inch 1.2Mb MS-DOS format diskette. The standard distribution medium for the UNIX version (MSC-22286) is a .25 inch streaming magnetic tape cartridge (Sun QIC-24) in UNIX tar format. For the UNIX version, alternate distribution media and formats are available upon request. COMPPAP was developed in 1992.
The multidimensional perturbation value: a single metric to measure similarity and activity of treatments in high-throughput multidimensional screens.

PubMed

Hutz, Janna E; Nelson, Thomas; Wu, Hua; McAllister, Gregory; Moutsatsos, Ioannis; Jaeger, Savina A; Bandyopadhyay, Somnath; Nigsch, Florian; Cornett, Ben; Jenkins, Jeremy L; Selinger, Douglas W

2013-04-01

Screens using high-throughput, information-rich technologies such as microarrays, high-content screening (HCS), and next-generation sequencing (NGS) have become increasingly widespread. Compared with single-readout assays, these methods produce a more comprehensive picture of the effects of screened treatments. However, interpreting such multidimensional readouts is challenging. Univariate statistics such as t-tests and Z-factors cannot easily be applied to multidimensional profiles, leaving no obvious way to answer common screening questions such as "Is treatment X active in this assay?" and "Is treatment X different from (or equivalent to) treatment Y?" We have developed a simple, straightforward metric, the multidimensional perturbation value (mp-value), which can be used to answer these questions. Here, we demonstrate application of the mp-value to three data sets: a multiplexed gene expression screen of compounds and genomic reagents, a microarray-based gene expression screen of compounds, and an HCS compound screen. In all data sets, active treatments were successfully identified using the mp-value, and simulations and follow-up analyses supported the mp-value's statistical and biological validity. We believe the mp-value represents a promising way to simplify the analysis of multidimensional data while taking full advantage of its richness.
Study of the TRAC Airfoil Table Computational System

NASA Technical Reports Server (NTRS)

Hu, Hong

1999-01-01

The report documents the study of the application of the TRAC airfoil table computational package (TRACFOIL) to the prediction of 2D airfoil force and moment data over a wide range of angle of attack and Mach number. The TRACFOIL generates the standard C-81 airfoil table for input into rotorcraft comprehensive codes such as CAM- RAD. The existing TRACFOIL computer package is successfully modified to run on Digital alpha workstations and on Cray-C90 supercomputers. A step-by-step instruction for using the package on both computer platforms is provided. Application of the newer version of TRACFOIL is made for two airfoil sections. The C-81 data obtained using the TRACFOIL method are compared with those of wind-tunnel data and results are presented.
Functional diversification of cerato-platanins in Moniliophthora perniciosa as seen by differential expression and protein function specialization.

PubMed

de O Barsottini, Mario R; de Oliveira, Juliana F; Adamoski, Douglas; Teixeira, Paulo J P L; do Prado, Paula F V; Tiezzi, Henrique O; Sforça, Mauricio L; Cassago, Alexandre; Portugal, Rodrigo V; de Oliveira, Paulo S L; de M Zeri, Ana C; Dias, Sandra M G; Pereira, Gonçalo A G; Ambrosio, Andre L B

2013-11-01

Cerato-platanins (CP) are small, cysteine-rich fungal-secreted proteins involved in the various stages of the host-fungus interaction process, acting as phytotoxins, elicitors, and allergens. We identified 12 CP genes (MpCP1 to MpCP12) in the genome of Moniliophthora perniciosa, the causal agent of witches' broom disease in cacao, and showed that they present distinct expression profiles throughout fungal development and infection. We determined the X-ray crystal structures of MpCP1, MpCP2, MpCP3, and MpCP5, representative of different branches of a phylogenetic tree and expressed at different stages of the disease. Structure-based biochemistry, in combination with nuclear magnetic resonance and mass spectrometry, allowed us to define specialized capabilities regarding self-assembling and the direct binding to chitin and N-acetylglucosamine (NAG) tetramers, a fungal cell wall building block, and to map a previously unknown binding region in MpCP5. Moreover, fibers of MpCP2 were shown to act as expansin and facilitate basidiospore germination whereas soluble MpCP5 blocked NAG6-induced defense response. The correlation between these roles, the fungus life cycle, and its tug-of-war interaction with cacao plants is discussed.

Synthesis and characterization of a series of Group 4 phenoxy-thiol derivatives

DOE PAGES

Boyle, Timothy J.; Neville, Michael L.; Parkes, Marie V.

2016-02-11

In this study, a series of Group 4 phenoxy-thiols were developed from the reaction products of a series of metal tert-butoxides ([M(OBu t) 4]) with four equivalents of 4-mercaptophenol (H-4MP). The products were found by single crystal X-ray diffraction to adopt the general structure [(HOBu t)(4MP) 3M(μ-4MP)] 2 [where M = Ti (1), Zr (2), Hf (3)] from toluene and [(py) 2M(4MP)] where M = Ti (4), Zr (5) and [(py)(4MP) 3Hf(μ-4MP)] 2 (6) from pyridine (py). Varying the [Ti(OR) 4] precursors (OR = iso-propoxide (OPr i) or neo-pentoxide (ONep)) in toluene led to [(HOR)(4MP) 3Ti(μ-4MP)] 2 (OR = OPrmore » i (7), ONep (8)), which were structurally similar to 1. Lower stoichiometric reactions in toluene led to partial substitution by the 4MP ligands yielding [H][Ti(μ-4MP)(4MP)(ONep) 3] 2 (9). Independent of the stoichiometry, all of the Ti derivatives were found to be red in color, whereas the heavier congeners were colorless. Attempts to understand this phenomenon led to investigation with a series of varied –SH substituted phenols. From the reaction of H-2MP and H-3MP (2-mercaptophenol and 3-mercaptophenol, respectively), the isolated products had identical arrangements: [(ONep) 2(2MP)Ti(μ,η2-2MP)] 2 (10) and [(HOR)(3MP)M(μ-3MP)] 2 (M/OR = Ti/ONep (11); Zr/OBu t (12)) with a similar red color. Based on the simulated and observed UV–Vis spectra, it was reasoned that the color was generated due to a ligand-to-metal charge transfer for Ti that was not available for the larger congeners.« less
SHABERTH - ANALYSIS OF A SHAFT BEARING SYSTEM (CRAY VERSION)

NASA Technical Reports Server (NTRS)

Coe, H. H.

1994-01-01

The SHABERTH computer program was developed to predict operating characteristics of bearings in a multibearing load support system. Lubricated and non-lubricated bearings can be modeled. SHABERTH calculates the loads, torques, temperatures, and fatigue life for ball and/or roller bearings on a single shaft. The program also allows for an analysis of the system reaction to the termination of lubricant supply to the bearings and other lubricated mechanical elements. SHABERTH has proven to be a valuable tool in the design and analysis of shaft bearing systems. The SHABERTH program is structured with four nested calculation schemes. The thermal scheme performs steady state and transient temperature calculations which predict system temperatures for a given operating state. The bearing dimensional equilibrium scheme uses the bearing temperatures, predicted by the temperature mapping subprograms, and the rolling element raceway load distribution, predicted by the bearing subprogram, to calculate bearing diametral clearance for a given operating state. The shaft-bearing system load equilibrium scheme calculates bearing inner ring positions relative to the respective outer rings such that the external loading applied to the shaft is brought into equilibrium by the rolling element loads which develop at each bearing inner ring for a given operating state. The bearing rolling element and cage load equilibrium scheme calculates the rolling element and cage equilibrium positions and rotational speeds based on the relative inner-outer ring positions, inertia effects, and friction conditions. The ball bearing subprograms in the current SHABERTH program have several model enhancements over similar programs. These enhancements include an elastohydrodynamic (EHD) film thickness model that accounts for thermal heating in the contact area and lubricant film starvation; a new model for traction combined with an asperity load sharing model; a model for the hydrodynamic rolling and shear forces in the inlet zone of lubricated contacts, which accounts for the degree of lubricant film starvation; modeling normal and friction forces between a ball and a cage pocket, which account for the transition between the hydrodynamic and elastohydrodynamic regimes of lubrication; and a model of the effect on fatigue life of the ratio of the EHD plateau film thickness to the composite surface roughness. SHABERTH is intended to be as general as possible. The models in SHABERTH allow for the complete mathematical simulation of real physical systems. Systems are limited to a maximum of five bearings supporting the shaft, a maximum of thirty rolling elements per bearing, and a maximum of one hundred temperature nodes. The SHABERTH program structure is modular and has been designed to permit refinement and replacement of various component models as the need and opportunities develop. A preprocessor is included in the IBM PC version of SHABERTH to provide a user friendly means of developing SHABERTH models and executing the resulting code. The preprocessor allows the user to create and modify data files with minimal effort and a reduced chance for errors. Data is utilized as it is entered; the preprocessor then decides what additional data is required to complete the model. Only this required information is requested. The preprocessor can accommodate data input for any SHABERTH compatible shaft bearing system model. The system may include ball bearings, roller bearings, and/or tapered roller bearings. SHABERTH is written in FORTRAN 77, and two machine versions are available from COSMIC. The CRAY version (LEW-14860) has a RAM requirement of 176K of 64 bit words. The IBM PC version (MFS-28818) is written for IBM PC series and compatible computers running MS-DOS, and includes a sample MS-DOS executable. For execution, the PC version requires at least 1Mb of RAM and an 80386 or 486 processor machine with an 80x87 math co-processor. The standard distribution medium for the IBM PC version is a set of two 5.25 inch 360K MS-DOS format diskettes. The contents of the diskettes are compressed using the PKWARE archiving tools. The utility to unarchive the files, PKUNZIP.EXE, is included. The standard distribution medium for the CRAY version is also a 5.25 inch 360K MS-DOS format diskette, but alternate distribution media and formats are available upon request. The original version of SHABERTH was developed in FORTRAN IV at Lewis Research Center for use on a UNIVAC 1100 series computer. The Cray version was released in 1988, and was updated in 1990 to incorporate fluid rheological data for Rocket Propellant 1 (RP-1), thereby allowing the analysis of bearings lubricated with RP-1. The PC version is a port of the 1990 CRAY version and was developed in 1992 by SRS Technologies under contract to NASA Marshall Space Flight Center.
Water interaction and bond strength to dentin of dye-labelled adhesive as a function of the addition of rhodamine B.

PubMed

Wang, Linda; Bim, Odair; Lopes, Adolfo Coelho de Oliveira; Francisconi-Dos-Rios, Luciana Fávaro; Maenosono, Rafael Massunari; D'Alpino, Paulo Henrique Perlatti; Honório, Heitor Marques; Atta, Maria Teresa

2016-01-01

This study investigated the effect of the fluorescent dye rhodamine B (RB) for interfacial micromorphology analysis of dental composite restorations on water sorption/solubility (WS/WSL) and microtensile bond strength to dentin (µTBS) of a 3-step total etch and a 2-step self-etch adhesive system. The adhesives Adper Scotchbond Multi-Purpose (MP) and Clearfil SE Bond (SE) were mixed with 0.1 mg/mL of RB. For the WS/WSL tests, cured resin disks (5.0 mm in diameter x 0.8 mm thick) were prepared and assigned into four groups (n=10): MP, MP-RB, SE, and SE-RB. For µTBS assessment, extracted human third molars (n=40) had the flat occlusal dentin prepared and assigned into the same experimental groups (n=10). After the bonding and restoration procedures, specimens were sectioned in rectangular beams, stored in water and tested after seven days or after 12 months. The failure mode of fractured specimens was qualitatively evaluated under optical microscope (x40). Data from WS/WSL and µTBS were assessed by one-way and three-way ANOVA, respectively, and Tukey's test (α=5%). RB increased the WSL of MP and SE. On the other hand, WS of both MP and SE was not affected by the addition of RB. No significance in µTBS between MP and MP-RB for seven days or one year was observed, whereas for SE a decrease in the µTBS means occurred in both storage times. RB should be incorporated into non-simplified DBSs with caution, as it can interfere with their physical-mechanical properties, leading to a possible misinterpretation of bonded interface.
Water interaction and bond strength to dentin of dye-labelled adhesive as a function of the addition of rhodamine B

PubMed Central

WANG, Linda; BIM, Odair; LOPES, Adolfo Coelho de Oliveira; FRANCISCONI-DOS-RIOS, Luciana Fávaro; MAENOSONO, Rafael Massunari; D’ALPINO, Paulo Henrique Perlatti; HONÓRIO, Heitor Marques; ATTA, Maria Teresa

2016-01-01

ABSTRACT Objective This study investigated the effect of the fluorescent dye rhodamine B (RB) for interfacial micromorphology analysis of dental composite restorations on water sorption/solubility (WS/WSL) and microtensile bond strength to dentin (µTBS) of a 3-step total etch and a 2-step self-etch adhesive system. Material and Methods The adhesives Adper Scotchbond Multi-Purpose (MP) and Clearfil SE Bond (SE) were mixed with 0.1 mg/mL of RB. For the WS/WSL tests, cured resin disks (5.0 mm in diameter x 0.8 mm thick) were prepared and assigned into four groups (n=10): MP, MP-RB, SE, and SE-RB. For µTBS assessment, extracted human third molars (n=40) had the flat occlusal dentin prepared and assigned into the same experimental groups (n=10). After the bonding and restoration procedures, specimens were sectioned in rectangular beams, stored in water and tested after seven days or after 12 months. The failure mode of fractured specimens was qualitatively evaluated under optical microscope (x40). Data from WS/WSL and µTBS were assessed by one-way and three-way ANOVA, respectively, and Tukey’s test (α=5%). Results RB increased the WSL of MP and SE. On the other hand, WS of both MP and SE was not affected by the addition of RB. No significance in µTBS between MP and MP-RB for seven days or one year was observed, whereas for SE a decrease in the µTBS means occurred in both storage times. Conclusions RB should be incorporated into non-simplified DBSs with caution, as it can interfere with their physical-mechanical properties, leading to a possible misinterpretation of bonded interface. PMID:27556201
Efficient Process Migration for Parallel Processing on Non-Dedicated Networks of Workstations

NASA Technical Reports Server (NTRS)

Chanchio, Kasidit; Sun, Xian-He

1996-01-01

This paper presents the design and preliminary implementation of MpPVM, a software system that supports process migration for PVM application programs in a non-dedicated heterogeneous computing environment. New concepts of migration point as well as migration point analysis and necessary data analysis are introduced. In MpPVM, process migrations occur only at previously inserted migration points. Migration point analysis determines appropriate locations to insert migration points; whereas, necessary data analysis provides a minimum set of variables to be transferred at each migration pint. A new methodology to perform reliable point-to-point data communications in a migration environment is also discussed. Finally, a preliminary implementation of MpPVM and its experimental results are presented, showing the correctness and promising performance of our process migration mechanism in a scalable non-dedicated heterogeneous computing environment. While MpPVM is developed on top of PVM, the process migration methodology introduced in this study is general and can be applied to any distributed software environment.
Approximate first-principles anharmonic calculations of polyatomic spectra using MP2 and B3LYP potentials: comparisons with experiment.

PubMed

Roy, Tapta Kanchan; Carrington, Tucker; Gerber, R Benny

2014-08-21

Anharmonic vibrational spectroscopy calculations using MP2 and B3LYP computed potential surfaces are carried out for a series of molecules, and frequencies and intensities are compared with those from experiment. The vibrational self-consistent field with second-order perturbation correction (VSCF-PT2) is used in computing the spectra. The test calculations have been performed for the molecules HNO3, C2H4, C2H4O, H2SO4, CH3COOH, glycine, and alanine. Both MP2 and B3LYP give results in good accord with experimental frequencies, though, on the whole, MP2 gives very slightly better agreement. A statistical analysis of deviations in frequencies from experiment is carried out that gives interesting insights. The most probable percentage deviation from experimental frequencies is about -2% (to the red of the experiment) for B3LYP and +2% (to the blue of the experiment) for MP2. There is a higher probability for relatively large percentage deviations when B3LYP is used. The calculated intensities are also found to be in good accord with experiment, but the percentage deviations are much larger than those for frequencies. The results show that both MP2 and B3LYP potentials, used in VSCF-PT2 calculations, account well for anharmonic effects in the spectroscopy of molecules of the types considered.
Maximum entropy method applied to deblurring images on a MasPar MP-1 computer

NASA Technical Reports Server (NTRS)

Bonavito, N. L.; Dorband, John; Busse, Tim

1991-01-01

A statistical inference method based on the principle of maximum entropy is developed for the purpose of enhancing and restoring satellite images. The proposed maximum entropy image restoration method is shown to overcome the difficulties associated with image restoration and provide the smoothest and most appropriate solution consistent with the measured data. An implementation of the method on the MP-1 computer is described, and results of tests on simulated data are presented.
COLD-SAT dynamic model

NASA Technical Reports Server (NTRS)

Adams, Neil S.; Bollenbacher, Gary

1992-01-01

This report discusses the development and underlying mathematics of a rigid-body computer model of a proposed cryogenic on-orbit liquid depot storage, acquisition, and transfer spacecraft (COLD-SAT). This model, referred to in this report as the COLD-SAT dynamic model, consists of both a trajectory model and an attitudinal model. All disturbance forces and torques expected to be significant for the actual COLD-SAT spacecraft are modeled to the required degree of accuracy. Control and experimental thrusters are modeled, as well as fluid slosh. The model also computes microgravity disturbance accelerations at any specified point in the spacecraft. The model was developed by using the Boeing EASY5 dynamic analysis package and will run on Apollo, Cray, and other computing platforms.
An Evaluation of One-Sided and Two-Sided Communication Paradigms on Relaxed-Ordering Interconnect

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ibrahim, Khaled Z.; Hargrove, Paul H.; Iancu, Costin

The Cray Gemini interconnect hardware provides multiple transfer mechanisms and out-of-order message delivery to improve communication throughput. In this paper we quantify the performance of one-sided and two-sided communication paradigms with respect to: 1) the optimal available hardware transfer mechanism, 2) message ordering constraints, 3) per node and per core message concurrency. In addition to using Cray native communication APIs, we use UPC and MPI micro-benchmarks to capture one- and two-sided semantics respectively. Our results indicate that relaxing the message delivery order can improve performance up to 4.6x when compared with strict ordering. When hardware allows it, high-level one-sided programmingmore » models can already take advantage of message reordering. Enforcing the ordering semantics of two-sided communication comes with a performance penalty. Furthermore, we argue that exposing out-of-order delivery at the application level is required for the next-generation programming models. Any ordering constraints in the language specifications reduce communication performance for small messages and increase the number of active cores required for peak throughput.« less
Automatic differentiation for design sensitivity analysis of structural systems using multiple processors

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Storaasli, Olaf O.; Qin, Jiangning; Qamar, Ramzi

1994-01-01

An automatic differentiation tool (ADIFOR) is incorporated into a finite element based structural analysis program for shape and non-shape design sensitivity analysis of structural systems. The entire analysis and sensitivity procedures are parallelized and vectorized for high performance computation. Small scale examples to verify the accuracy of the proposed program and a medium scale example to demonstrate the parallel vector performance on multiple CRAY C90 processors are included.
P ≠NP Millenium-Problem(MP) TRIVIAL Physics Proof Via NATURAL TRUMPS Artificial-``Intelligence'' Via: Euclid Geometry, Plato Forms, Aristotle Square-of-Opposition, Menger Dimension-Theory Connections!!! NO Computational-Complexity(CC)/ANYthing!!!: Geometry!!!

NASA Astrophysics Data System (ADS)

Clay, London; Menger, Karl; Rota, Gian-Carlo; Euclid, Alexandria; Siegel, Edward

P ≠NP MP proof is by computer-''science''/SEANCE(!!!)(CS) computational-''intelligence'' lingo jargonial-obfuscation(JO) NATURAL-Intelligence(NI) DISambiguation! CS P =(?) =NP MEANS (Deterministic)(PC) = (?) =(Non-D)(PC) i.e. D(P) =(?) = N(P). For inclusion(equality) vs. exclusion (inequality) irrelevant (P) simply cancels!!! (Equally any/all other CCs IF both sides identical). Crucial question left: (D) =(?) =(ND), i.e. D =(?) = N. Algorithmics[Sipser[Intro. Thy.Comp.(`97)-p.49Fig.1.15!!!
Orbital-optimized MP2.5 and its analytic gradients: Approaching CCSD(T) quality for noncovalent interactions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bozkaya, Uğur, E-mail: ugur.bozkaya@atauni.edu.tr; Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332; Sherrill, C. David

2014-11-28

Orbital-optimized MP2.5 [or simply “optimized MP2.5,” OMP2.5, for short] and its analytic energy gradients are presented. The cost of the presented method is as much as that of coupled-cluster singles and doubles (CCSD) [O(N{sup 6}) scaling] for energy computations. However, for analytic gradient computations the OMP2.5 method is only half as expensive as CCSD because there is no need to solve λ{sub 2}-amplitude equations for OMP2.5. The performance of the OMP2.5 method is compared with that of the standard second-order Møller–Plesset perturbation theory (MP2), MP2.5, CCSD, and coupled-cluster singles and doubles with perturbative triples (CCSD(T)) methods for equilibrium geometries, hydrogenmore » transfer reactions between radicals, and noncovalent interactions. For bond lengths of both closed and open-shell molecules, the OMP2.5 method improves upon MP2.5 and CCSD by 38%–43% and 31%–28%, respectively, with Dunning's cc-pCVQZ basis set. For complete basis set (CBS) predictions of hydrogen transfer reaction energies, the OMP2.5 method exhibits a substantially better performance than MP2.5, providing a mean absolute error of 1.1 kcal mol{sup −1}, which is more than 10 times lower than that of MP2.5 (11.8 kcal mol{sup −1}), and comparing to MP2 (14.6 kcal mol{sup −1}) there is a more than 12-fold reduction in errors. For noncovalent interaction energies (at CBS limits), the OMP2.5 method maintains the very good performance of MP2.5 for closed-shell systems, and for open-shell systems it significantly outperforms MP2.5 and CCSD, and approaches CCSD(T) quality. The MP2.5 errors decrease by a factor of 5 when the optimized orbitals are used for open-shell noncovalent interactions, and comparing to CCSD there is a more than 3-fold reduction in errors. Overall, the present application results indicate that the OMP2.5 method is very promising for open-shell noncovalent interactions and other chemical systems with difficult electronic structures.« less
Geometry planning and image registration in magnetic particle imaging using bimodal fiducial markers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Werner, F., E-mail: f.werner@uke.de; Hofmann, M.; Them, K.

Purpose: Magnetic particle imaging (MPI) is a quantitative imaging modality that allows the distribution of superparamagnetic nanoparticles to be visualized. Compared to other imaging techniques like x-ray radiography, computed tomography (CT), and magnetic resonance imaging (MRI), MPI only provides a signal from the administered tracer, but no additional morphological information, which complicates geometry planning and the interpretation of MP images. The purpose of the authors’ study was to develop bimodal fiducial markers that can be visualized by MPI and MRI in order to create MP–MR fusion images. Methods: A certain arrangement of three bimodal fiducial markers was developed and usedmore » in a combined MRI/MPI phantom and also during in vivo experiments in order to investigate its suitability for geometry planning and image fusion. An algorithm for automated marker extraction in both MR and MP images and rigid registration was established. Results: The developed bimodal fiducial markers can be visualized by MRI and MPI and allow for geometry planning as well as automated registration and fusion of MR–MP images. Conclusions: To date, exact positioning of the object to be imaged within the field of view (FOV) and the assignment of reconstructed MPI signals to corresponding morphological regions has been difficult. The developed bimodal fiducial markers and the automated image registration algorithm help to overcome these difficulties.« less
A Hybrid MPI/OpenMP Approach for Parallel Groundwater Model Calibration on Multicore Computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan

2010-01-01

Groundwater model calibration is becoming increasingly computationally time intensive. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelism in software and hardware to reduce calibration time on multicore computers with minimal parallelization effort. At first, HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for a uranium transport model with over a hundred species involving nearly a hundred reactions, and a field scale coupled flow and transport model. In the first application, a single parallelizable loop is identified to consume over 97% of the total computational time. With a few lines of OpenMP compiler directives inserted into the code,more » the computational time reduces about ten times on a compute node with 16 cores. The performance is further improved by selectively parallelizing a few more loops. For the field scale application, parallelizable loops in 15 of the 174 subroutines in HGC5 are identified to take more than 99% of the execution time. By adding the preconditioned conjugate gradient solver and BICGSTAB, and using a coloring scheme to separate the elements, nodes, and boundary sides, the subroutines for finite element assembly, soil property update, and boundary condition application are parallelized, resulting in a speedup of about 10 on a 16-core compute node. The Levenberg-Marquardt (LM) algorithm is added into HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, compute nodes at the number of adjustable parameters (when the forward difference is used for Jacobian approximation), or twice that number (if the center difference is used), are used to reduce the calibration time from days and weeks to a few hours for the two applications. This approach can be extended to global optimization scheme and Monte Carol analysis where thousands of compute nodes can be efficiently utilized.« less
Background studies in gas ionizing x ray detectors

NASA Technical Reports Server (NTRS)

Eldridge, Hudson B.

1989-01-01

The background response of a gas ionizing proportional x ray detector is estimated by solving the one dimensional photon transport equation for two regions using Monte Carlo techniques. The solution was effected using the SSL VAX 780 and the CRAY XMP computers at Marshall Space Flight Center. The isotropic photon energy spectrum encompassing the range from 1 to 1000 KeV incident onto the first region, the shield, is taken so as to represent the measured spectrum at an altitude of 3 mb over Palastine, Texas. The differential energy spectrum deposited in the gas region, xenon, over the range of 0 to 100 KeV is written to an output file. In addition, the photon flux emerging from the shield region, tin, over the range of 1 to 1000 KeV is also tabulated and written to a separate file. Published tabular cross sections for photoelectric, elastic and inelastic Compton scattering as well as the total absorption coefficient are used. Histories of each incident photon as well as secondary photons from Compton and photoelectric interactions are followed until the photon either is absorbed or exits from the regions under consideration. The effect of shielding thickness upon the energy spectrum deposited in the xenon region for this background spectrum incident upon the tin shield was studied.
VizieR Online Data Catalog: ChaMP. I. First X-ray source catalog (Kim+, 2004)

NASA Astrophysics Data System (ADS)

Kim, D.-W.; Cameron, R. A.; Drake, J. J.; Evans, N. R.; Freeman, P.; Gaetz, T. J.; Ghosh, H.; Green, P. J.; Harnden, F. R. Jr; Karovska, M.; Kashyap, V.; Maksym, P. W.; Ratzlaff, P. W.; Schlegel, E. M.; Silverman, J. D.; Tananbaum, H. D.; Vikhlinin, A. A.; Wilkes, B. J.; Grimes, J. P.

2004-01-01

The Chandra Multiwavelength Project (ChaMP) is a wide-area (~14deg2 < survey of serendipitous Chandra X-ray sources, aiming to establish fair statistical samples covering a wide range of characteristics (such as absorbed active galactic nuclei, high-z clusters of galaxies) at flux levels (fX~10-15 to 10-14erg/s/cm2) ) intermediate between the Chandra deep surveys and previous missions. We present the first ChaMP catalog, which consists of 991 near on-axis, bright X-ray sources obtained from the initial sample of 62 observations. The data have been uniformly reduced and analyzed with techniques specifically developed for the ChaMP and then validated by visual examination. To assess source reliability and positional uncertainty, we perform a series of simulations and also use Chandra data to complement the simulation study. The false source detection rate is found to be as good as or better than expected for a given limiting threshold. On the other hand, the chance of missing a real source is rather complex, depending on the source counts, off-axis distance (or PSF), and background rate. The positional error (95% confidence level) is usually less than 1" for a bright source, regardless of its off-axis distance, while it can be as large as 4" for a weak source (~20counts) at a large off-axis distance (Doff-axis>8'). We have also developed new methods to find spatially extended or temporary variable sources, and those sources are listed in the catalog. (5 data files).
Performance Evaluation of Supercomputers using HPCC and IMB Benchmarks

NASA Technical Reports Server (NTRS)

Saini, Subhash; Ciotti, Robert; Gunney, Brian T. N.; Spelce, Thomas E.; Koniges, Alice; Dossa, Don; Adamidis, Panagiotis; Rabenseifner, Rolf; Tiyyagura, Sunil R.; Mueller, Matthias;

2006-01-01

The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers - SGI Altix BX2, Cray XI, Cray Opteron Cluster, Dell Xeon cluster, and NEC SX-8. These five systems use five different networks (SGI NUMALINK4, Cray network, Myrinet, InfiniBand, and NEC IXS). The complete set of HPCC benchmarks are run on each of these systems. Additionally, we present Intel MPI Benchmarks (IMB) results to study the performance of 11 MPI communication functions on these systems.

Can Motor Proficiency in Preschool Age Affect Physical Activity in Adolescence?

PubMed

Venetsanou, Fotini; Kambas, Antonis

2017-05-01

This study investigated if motor proficiency (MP) in preschool age associate with physical activity (PA) in adolescence. In 2004, the Bruininks-Oseretsky Test of Motor Proficiency-Short Form (BOTMP-SF) (7) was administered to 413 children, aged 4-6 years, who were classified to MP groups according to their BOTMP-SF total score (TS). In 2014, the PA of 106 former participants (47 boys, 59 girls) was measured with Omron pedometers. MP [three (high; above average; average)] × gender (two) ANOVA and Bonferroni tests were computed on average of steps/week. A significant interaction between the two factors was revealed (F = 15.27, p < .001, η 2 =.153), indicating that MP influenced male and female PA differently. Only in average MP group, males presented higher PA than females, whereas there were no differences between the two genders in the higher MP groups. Moreover, the only significant difference in PA among male groups was that between high and above average MP groups, while in females there were significant differences among all groups. High MP at preschool age positively associated with the PA in adolescence, especially in females. Emphasis on the development of proficient young movers might be beneficial for lifelong PA.
Phenol-quinone tautomerism in (arylazo)naphthols and the analogous Schiff bases: benchmark calculations.

PubMed

Ali, S Tahir; Antonov, Liudmil; Fabian, Walter M F

2014-01-30

Tautomerization energies of a series of isomeric [(4-R-phenyl)azo]naphthols and the analogous Schiff bases (R = N(CH3)2, OCH3, H, CN, NO2) are calculated by LPNO-CEPA/1-CBS using the def2-TZVPP and def2-QZVPP basis sets for extrapolation. The performance of various density functionals (B3LYP, M06-2X, PW6B95, B2PLYP, mPW2PLYP, PWPB95) as well as MP2 and SCS-MP2 is evaluated against these results. M06-2X and SCS-MP2 yield results close to the LPNO-CEPA/1-CBS values. Solvent effects (CCl4, CHCl3, CH3CN, and CH3OH) are treated by a variety of bulk solvation models (SM8, IEFPCM, COSMO, PBF, and SMD) as well as explicit solvation (Monte Carlo free energy perturbation using the OPLSAA force field).
6-mercaptopurine dosage and pharmacokinetics influence the degree of bone marrow toxicity following high-dose methotrexate in children with acute lymphoblastic leukemia.

PubMed

Schmiegelow, K; Bretton-Meyer, U

2001-01-01

Through inhibition of purine de novo synthesis and enhancement of 6-mercaptopurine (6MP) bioavailability high-dose methotrexate (HDM) may increase the incorporation into DNA of 6-thioguanine nucleotides (6TGN), the cytoxic metabolites of 6MP. Thus, coadministration of 6MP could increase myelotoxicity following HDM. Twenty-one children with standard risk (SR) and 25 with intermediate risk (IR) acute lymphoblastic leukemia (ALL) were studied. During consolidation therapy they received either three courses of HDM at 2 week intervals without concurrent oral 6MP (SR-ALL) or four courses of HDM given at 2 week intervals with 25 mg/m2 of oral 6MP daily (IR-ALL). During the first year of maintenance with oral 6MP (75 mg/m2/day) and oral MTX (20 mg/m2/week) they all received five courses of HDM at 8 week intervals. In all cases, HDM consisted of 5,000 mg of MTX/m2 given over 24 h with intraspinal MTX and leucovorin rescue. Erythrocyte levels of 6TGN (E-6TGN) and methotrexate (E-MTX) were, on average, measured every second week during maintenance therapy. When SR consolidation (6MP: 0 mg), IR consolidation (6MP: 25 mg/m2), and SR/IR maintenance therapy (6MP: 75 mg/m2) were compared, white cell and absolute neutrophil count (ANC) nadir, lymphocyte count nadir, thrombocyte count nadir, and hemoglobin nadir after HDM decreased significantly with increasing doses of oral 6MP. Three percent of the HDM courses given without oral 6MP (SR consolidation) were followed by an ANC nadir <0.5 x 10(9)/l compared to 50% of the HDM courses given during SR/IR maintenance therapy. Similarly, only 13% of the HDM courses given as SR-ALL consolidation induced a thrombocyte count nadir <100 x 10(9)/l compared to 58% of the HDM courses given during maintenance therapy. The best-fit model to predict the ANC nadir following HDM during maintenance therapy included the dose of 6MP prior to HDM (beta = -0.017, P= 0.001), the average ANC level during maintenance therapy (beta = 0.82, P = 0.004), and E-6TGN (beta = -0.0029, P= 0.02). The best-fit model to predict the thrombocyte nadir following HDM during maintenance therapy included only mPLATE (beta = 0.0057, P = 0.046). In conclusion, the study indicates that reductions of the dose of concurrently given oral 6MP could be one way of reducing the risk of significant myelotoxicity following HDM during maintenance therapy of childhood ALL.

Cooperative and diminutive unusual weak bonding in F3CX···HMgH···Y and F3CX···Y···HMgH trimers (X = Cl, Br; Y = HCN, and HNC).

PubMed

Solimannejad, Mohammad; Malekani, Masumeh; Alkorta, Ibon

2010-11-18

MP2 calculations with cc-pVTZ basis set were used to analyze intermolecular interactions in F(3)CX···HMgH···Y and F(3)CX···Y···HMgH triads (X = Cl, Br; Y = HCN, and HNC) which are connecting with three kinds of unusual weak interactions, namely halogen-hydride, dihydrogen, and σ-hole. To understand the properties of the systems better, the corresponding dyads are also studied. Molecular geometries, binding energies, and infrared spectra of monomers, dyads, and triads were investigated at the MP2/cc-pVTZ computational level. Particular attention is given to parameters such as cooperative energies, cooperative dipole moments, and many-body interaction energies. Those complexes with simultaneous presence of a σ-hole bond and a dihydrogen bond show cooperativity energy ranging between -1.02 and -2.31 kJ mol(-1), whereas those with a halogen-hydride bond and a dihydrogen bond are diminutive, with this energetic effect between 0.1 and 0.63 kJ mol(-1). The electronic properties of the complexes have been analyzed using the molecular electrostatic potential (MEP), the electron density shift maps, and the parameters derived from the atoms in molecules (AIM) methodology.
Three-dimensional transonic potential flow about complex 3-dimensional configurations

NASA Technical Reports Server (NTRS)

Reyhner, T. A.

1984-01-01

An analysis has been developed and a computer code written to predict three-dimensional subsonic or transonic potential flow fields about lifting or nonlifting configurations. Possible condfigurations include inlets, nacelles, nacelles with ground planes, S-ducts, turboprop nacelles, wings, and wing-pylon-nacelle combinations. The solution of the full partial differential equation for compressible potential flow written in terms of a velocity potential is obtained using finite differences, line relaxation, and multigrid. The analysis uses either a cylindrical or Cartesian coordinate system. The computational mesh is not body fitted. The analysis has been programmed in FORTRAN for both the CDC CYBER 203 and the CRAY-1 computers. Comparisons of computed results with experimental measurement are presented. Descriptions of the program input and output formats are included.
High Performance Computing Software Applications for Space Situational Awareness

NASA Astrophysics Data System (ADS)

Giuliano, C.; Schumacher, P.; Matson, C.; Chun, F.; Duncan, B.; Borelli, K.; Desonia, R.; Gusciora, G.; Roe, K.

The High Performance Computing Software Applications Institute for Space Situational Awareness (HSAI-SSA) has completed its first full year of applications development. The emphasis of our work in this first year was in improving space surveillance sensor models and image enhancement software. These applications are the Space Surveillance Network Analysis Model (SSNAM), the Air Force Space Fence simulation (SimFence), and physically constrained iterative de-convolution (PCID) image enhancement software tool. Specifically, we have demonstrated order of magnitude speed-up in those codes running on the latest Cray XD-1 Linux supercomputer (Hoku) at the Maui High Performance Computing Center. The software applications improvements that HSAI-SSA has made, has had significant impact to the warfighter and has fundamentally changed the role of high performance computing in SSA.
TOP500 Sublist for November 2001

DOE Office of Scientific and Technical Information (OSTI.GOV)

Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack J.

2001-11-09

18th Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, GERMANY; KNOXVILLE, TENN.; BERKELEY, CALIF. In what has become a much-anticipated event in the world of high-performance computing, the 18th edition of the TOP500 list of the world's fastest supercomputers was released today (November 9, 2001). The latest edition of the twice-yearly ranking finds IBM as the leader in the field, with 32 percent in terms of installed systems and 37 percent in terms of total performance of all the installed systems. In a surprise move Hewlett-Packard captured the second place with 30 percent of the systems. Most ofmore » these systems are smaller in size and as a consequence HP's share of installed performance is smaller with 15 percent. This is still enough for second place in this category. SGI, Cray and Sun follow in the number of TOP500 systems with 41 (8 percent), 39 (8 percent), and 31 (6 percent) respectively. In the category of installed performance Cray Inc. keeps the third position with 11 percent ahead of SGI (8 percent) and Compaq (8 percent).« less
Designing Next Generation Massively Multithreaded Architectures for Irregular Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tumeo, Antonino; Secchi, Simone; Villa, Oreste

Irregular applications, such as data mining or graph-based computations, show unpredictable memory/network access patterns and control structures. Massively multi-threaded architectures with large node count, like the Cray XMT, have been shown to address their requirements better than commodity clusters. In this paper we present the approaches that we are currently pursuing to design future generations of these architectures. First, we introduce the Cray XMT and compare it to other multithreaded architectures. We then propose an evolution of the architecture, integrating multiple cores per node and next generation network interconnect. We advocate the use of hardware support for remote memory referencemore » aggregation to optimize network utilization. For this evaluation we developed a highly parallel, custom simulation infrastructure for multi-threaded systems. Our simulator executes unmodified XMT binaries with very large datasets, capturing effects due to contention and hot-spotting, while predicting execution times with greater than 90% accuracy. We also discuss the FPGA prototyping approach that we are employing to study efficient support for irregular applications in next generation manycore processors.« less
Phenothiazine-anthraquinone donor-acceptor molecules: synthesis, electronic properties and DFT-TDDFT computational study.

PubMed

Zhang, Wen-Wei; Mao, Wei-Li; Hu, Yun-Xia; Tian, Zi-Qi; Wang, Zhi-Lin; Meng, Qing-Jin

2009-09-17

Two donor-acceptor molecules with different pi-electron conjugative units, 1-((10-methyl-10H-phenothiazin-3-yl)ethynyl)anthracene-9,10-dione (AqMp) and 1,1'-(10-methyl-10H-phenothiazine-3,7-diyl)bis(ethyne-2,1-diyl)dianthracene-9,10-dione (Aq2Mp), have been synthesized and investigated for their photochemical and electrochemical properties. Density functional theory (DFT) calculations provide insights into their molecular geometry, electronic structures, and properties. These studies satisfactorily explain the electrochemistry of the two compounds and indicate that larger conjugative effect leads to smaller HOMO-LUMO gap (Eg) in Aq2Mp. Both compounds show ICT and pi --> pi* transitions in the UV-visible range in solution, and Aq2Mp has a bathochromic shift and shows higher oscillator strength of the absorption, which has been verified by time-dependent DFT (TDDFT) calculations. The differences between AqMp and Aq2Mp indicate that the structural and conjugative effects have great influence on the electronic properties of the molecules.
United Information Services, Inc. , CRAY 1-s/2000, FORTRAN CFT 1. 10. Validation summary report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

1983-12-13

This Validation Summary Report (VSR) for the United Information Services, Inc., FORTRAN CFT 1.10 running under the COS Level C12 1.11 provides a consolidated summary of the results obtained from the validation of the subject compiler against the 1978 FORTRAN Standard (X3.9-1978/FIPS PUB 69). The compiler was validated against the Full Level FORTRAN level of FIPS PUB 69. The VSR is made up of several sections showing all the discrepancies found -if any. These include an overview of the validation which lists all categories of discrepancies within X3.9-1978, and a detailed listing of discrepancies together with the tests which failed.
Caught on the Web

ERIC Educational Resources Information Center

Isakson, Carol

2006-01-01

A podcast is essentially a radio program that can be downloaded for enjoyment. Its content includes radio broadcasts, lectures, walking tours, and student-created audio projects. Most are in the standard MP3 file format that can be played on a computer, MP3 player, PDA, or newer CD or DVD players. This article presents resources for learning about…
Performance Portability Strategies for Grid C++ Expression Templates

NASA Astrophysics Data System (ADS)

Boyle, Peter A.; Clark, M. A.; DeTar, Carleton; Lin, Meifeng; Rana, Verinder; Vaquero Avilés-Casco, Alejandro

2018-03-01

One of the key requirements for the Lattice QCD Application Development as part of the US Exascale Computing Project is performance portability across multiple architectures. Using the Grid C++ expression template as a starting point, we report on the progress made with regards to the Grid GPU offloading strategies. We present both the successes and issues encountered in using CUDA, OpenACC and Just-In-Time compilation. Experimentation and performance on GPUs with a SU(3)×SU(3) streaming test will be reported. We will also report on the challenges of using current OpenMP 4.x for GPU offloading in the same code.
XaNSoNS: GPU-accelerated simulator of diffraction patterns of nanoparticles

NASA Astrophysics Data System (ADS)

Neverov, V. S.

XaNSoNS is an open source software with GPU support, which simulates X-ray and neutron 1D (or 2D) diffraction patterns and pair-distribution functions (PDF) for amorphous or crystalline nanoparticles (up to ∼107 atoms) of heterogeneous structural content. Among the multiple parameters of the structure the user may specify atomic displacements, site occupancies, molecular displacements and molecular rotations. The software uses general equations nonspecific to crystalline structures to calculate the scattering intensity. It supports four major standards of parallel computing: MPI, OpenMP, Nvidia CUDA and OpenCL, enabling it to run on various architectures, from CPU-based HPCs to consumer-level GPUs.
Structure and vibrational spectra of pyridine betaine hydrochloride

NASA Astrophysics Data System (ADS)

Szafran, Mirosław; Koput, Jacek; Baran, Jan; Głowiak, Tadeusz

1997-12-01

The crystal structure of pyridine betaine hydrochloride (PBET·HCl) was determined by X-ray diffraction to be monoclinic, space group {P2 1}/{c} with a = 8.533(2) Å, b = 9.548(2) Å, c = 10.781(2) Å, β = 107.228(3)° and Z = 4. Betaine is protonated and the carboxyl group forms a hydrogen bond with the chloride ion: O·Cl - distance is 2.928(3) Å. The interaction of pyridine betaine (PBET) with HCl was examined by ab initio self-consistent field (SCF), second-order Møller-Plesset (MP2) and density functional theory (DFT) methods using the 6-31G(d,p) basis set. Two minima are located in the potential surface at the SCF level (PBETH +·Cl - and PBET·HCl, with the latter being 1.2 kcal mol -1 lower in energy) and only one minimum (PBET·HCl) at the MP2 and DFT levels. The molecular parameters of PBETH +·Cl -, computed by the SCF method, reproduce the corresponding experimental data. The computed vibrational frequencies of PBETH +·Cl - resemble correctly the experimental vibrational spectrum in the solid state. The root-mean-square (r.m.s.) deviations between the experimental and calculated SCF frequencies are 65 cm -1 for all bands and 15 cm -1 without the νClH band. All measured IR bands were interpreted in terms of the calculated vibrational models.
The growth of the UniTree mass storage system at the NASA Center for Computational Sciences

NASA Technical Reports Server (NTRS)

Tarshish, Adina; Salmon, Ellen

1993-01-01

In October 1992, the NASA Center for Computational Sciences made its Convex-based UniTree system generally available to users. The ensuing months saw the growth of near-online data from nil to nearly three terabytes, a doubling of the number of CPU's on the facility's Cray YMP (the primary data source for UniTree), and the necessity for an aggressive regimen for repacking sparse tapes and hierarchical 'vaulting' of old files to freestanding tape. Connectivity was enhanced as well with the addition of UltraNet HiPPI. This paper describes the increasing demands placed on the storage system's performance and throughput that resulted from the significant augmentation of compute-server processor power and network speed.
A Strassen-Newton algorithm for high-speed parallelizable matrix inversion

NASA Technical Reports Server (NTRS)

Bailey, David H.; Ferguson, Helaman R. P.

1988-01-01

Techniques are described for computing matrix inverses by algorithms that are highly suited to massively parallel computation. The techniques are based on an algorithm suggested by Strassen (1969). Variations of this scheme use matrix Newton iterations and other methods to improve the numerical stability while at the same time preserving a very high level of parallelism. One-processor Cray-2 implementations of these schemes range from one that is up to 55 percent faster than a conventional library routine to one that is slower than a library routine but achieves excellent numerical stability. The problem of computing the solution to a single set of linear equations is discussed, and it is shown that this problem can also be solved efficiently using these techniques.
Ion Bernstein instability dependence on the proton-to-electron mass ratio: Linear dispersion theory

NASA Astrophysics Data System (ADS)

Min, Kyungguk; Liu, Kaijun

2016-07-01

Fast magnetosonic waves, which have as their source ion Bernstein instabilities driven by tenuous ring-like proton velocity distributions, are frequently observed in the inner magnetosphere. One major difficulty in the simulation of these waves is that they are excited in a wide frequency range with discrete harmonic nature and require time-consuming computations. To overcome this difficulty, recent simulation studies assumed a reduced proton-to-electron mass ratio, mp/me, and a reduced light-to-Alfvén speed ratio, c/vA, to reduce the number of unstable modes and, therefore, computational costs. Although these studies argued that the physics of wave-particle interactions would essentially remain the same, detailed investigation of the effect of this reduced system on the excited waves has not been done. In this study, we investigate how the complex frequency, ω = ωr+iγ, of the ion Bernstein modes varies with mp/me for a sufficiently large c/vA (such that ωpe2/Ωe2≡(me/mp)(c/vA)2≫1) using linear dispersion theory assuming two different types of energetic proton velocity distributions, namely, ring and shell. The results show that low- and high-frequency harmonic modes respond differently to the change of mp/me. For the low harmonic modes (i.e., ωr˜Ωp), both ωr/Ωp and γ/Ωp are roughly independent of mp/me, where Ωp is the proton cyclotron frequency. For the high harmonic modes (i.e., Ωp≪ωr≲ωlh, where ωlh is the lower hybrid frequency), γ/ωlh (at fixed ωr/ωlh) stays independent of mp/me when the parallel wave number, k∥, is sufficiently large and becomes inversely proportional to (mp/me)1/4 when k∥ goes to zero. On the other hand, the frequency range of the unstable modes normalized to ωlh remains independent of mp/me, regardless of k∥.
An efficient parallel algorithm for the calculation of canonical MP2 energies.

PubMed

Baker, Jon; Pulay, Peter

2002-09-01

We present the parallel version of a previous serial algorithm for the efficient calculation of canonical MP2 energies (Pulay, P.; Saebo, S.; Wolinski, K. Chem Phys Lett 2001, 344, 543). It is based on the Saebo-Almlöf direct-integral transformation, coupled with an efficient prescreening of the AO integrals. The parallel algorithm avoids synchronization delays by spawning a second set of slaves during the bin-sort prior to the second half-transformation. Results are presented for systems with up to 2000 basis functions. MP2 energies for molecules with 400-500 basis functions can be routinely calculated to microhartree accuracy on a small number of processors (6-8) in a matter of minutes with modern PC-based parallel computers. Copyright 2002 Wiley Periodicals, Inc. J Comput Chem 23: 1150-1156, 2002
Axially deformed solution of the Skyrme-Hartree-Fock-Bogoliubov equations using the transformed harmonic oscillator basis (II) HFBTHO v2.00d: A new version of the program

NASA Astrophysics Data System (ADS)

Stoitsov, M. V.; Schunck, N.; Kortelainen, M.; Michel, N.; Nam, H.; Olsen, E.; Sarich, J.; Wild, S.

2013-06-01

We describe the new version 2.00d of the code HFBTHO that solves the nuclear Skyrme-Hartree-Fock (HF) or Skyrme-Hartree-Fock-Bogoliubov (HFB) problem by using the cylindrical transformed deformed harmonic oscillator basis. In the new version, we have implemented the following features: (i) the modified Broyden method for non-linear problems, (ii) optional breaking of reflection symmetry, (iii) calculation of axial multipole moments, (iv) finite temperature formalism for the HFB method, (v) linear constraint method based on the approximation of the Random Phase Approximation (RPA) matrix for multi-constraint calculations, (vi) blocking of quasi-particles in the Equal Filling Approximation (EFA), (vii) framework for generalized energy density with arbitrary density-dependences, and (viii) shared memory parallelism via OpenMP pragmas. Program summaryProgram title: HFBTHO v2.00d Catalog identifier: ADUI_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADUI_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public License version 3 No. of lines in distributed program, including test data, etc.: 167228 No. of bytes in distributed program, including test data, etc.: 2672156 Distribution format: tar.gz Programming language: FORTRAN-95. Computer: Intel Pentium-III, Intel Xeon, AMD-Athlon, AMD-Opteron, Cray XT5, Cray XE6. Operating system: UNIX, LINUX, WindowsXP. RAM: 200 Mwords Word size: 8 bits Classification: 17.22. Does the new version supercede the previous version?: Yes Catalog identifier of previous version: ADUI_v1_0 Journal reference of previous version: Comput. Phys. Comm. 167 (2005) 43 Nature of problem: The solution of self-consistent mean-field equations for weakly-bound paired nuclei requires a correct description of the asymptotic properties of nuclear quasi-particle wave functions. In the present implementation, this is achieved by using the single-particle wave functions of the transformed harmonic oscillator, which allows for an accurate description of deformation effects and pairing correlations in nuclei arbitrarily close to the particle drip lines. Solution method: The program uses the axial Transformed Harmonic Oscillator (THO) single- particle basis to expand quasi-particle wave functions. It iteratively diagonalizes the Hartree-Fock-Bogoliubov Hamiltonian based on generalized Skyrme-like energy densities and zero-range pairing interactions until a self-consistent solution is found. A previous version of the program was presented in: M.V. Stoitsov, J. Dobaczewski, W. Nazarewicz, P. Ring, Comput. Phys. Commun. 167 (2005) 43-63. Reasons for new version: Version 2.00d of HFBTHO provides a number of new options such as the optional breaking of reflection symmetry, the calculation of axial multipole moments, the finite temperature formalism for the HFB method, optimized multi-constraint calculations, the treatment of odd-even and odd-odd nuclei in the blocking approximation, and the framework for generalized energy density with arbitrary density-dependences. It is also the first version of HFBTHO to contain threading capabilities. Summary of revisions: The modified Broyden method has been implemented, Optional breaking of reflection symmetry has been implemented, The calculation of all axial multipole moments up to λ=8 has been implemented, The finite temperature formalism for the HFB method has been implemented, The linear constraint method based on the approximation of the Random Phase Approximation (RPA) matrix for multi-constraint calculations has been implemented, The blocking of quasi-particles in the Equal Filling Approximation (EFA) has been implemented, The framework for generalized energy density functionals with arbitrary density-dependence has been implemented, Shared memory parallelism via OpenMP pragmas has been implemented. Restrictions: Axial- and time-reversal symmetries are assumed. Unusual features: The user must have access to the LAPACK subroutines DSYEVD, DSYTRF and DSYTRI, and their dependences, which compute eigenvalues and eigenfunctions of real symmetric matrices, the LAPACK subroutines DGETRI and DGETRF, which invert arbitrary real matrices, and the BLAS routines DCOPY, DSCAL, DGEMM and DGEMV for double-precision linear algebra (or provide another set of subroutines that can perform such tasks). The BLAS and LAPACK subroutines can be obtained from the Netlib Repository at the University of Tennessee, Knoxville: http://netlib2.cs.utk.edu/. Running time: Highly variable, as it depends on the nucleus, size of the basis, requested accuracy, requested configuration, compiler and libraries, and hardware architecture. An order of magnitude would be a few seconds for ground-state configurations in small bases N≈8-12, to a few minutes in very deformed configuration of a heavy nucleus with a large basis N>20.
What Is a Computer Program?

ERIC Educational Resources Information Center

Gemignani, Michael

1981-01-01

The concept of computer programs is discussed from many perspectives and shown to be many different things. The ambiguity of the term is reviewed in light of potential ramifications for computer specialists, attorneys, and the general public. (MP)
The SGI/Cray T3E: Experiences and Insights

NASA Technical Reports Server (NTRS)

Bernard, Lisa Hamet

1998-01-01

The NASA Goddard Space Flight Center is home to the fifth most powerful supercomputer in the world, a 1024 processor SGI/Cray T3E-600. The original 512 processor system was placed at Goddard in March, 1997 as part of a cooperative agreement between the High Performance Computing and Communications Program's Earth and Space Sciences Project (ESS) and SGI/Cray Research. The goal of this system is to facilitate achievement of the Project milestones of 10, 50 and 100 GFLOPS sustained performance on selected Earth and space science application codes. The additional 512 processors were purchased in March, 1998 by the NASA Earth Science Enterprise for the NASA Seasonal to Interannual Prediction Project (NSIPP). These two "halves" still operate as a single system, and must satisfy the unique requirements of both aforementioned groups, as well as guest researchers from the Earth, space, microgravity, manned space flight and aeronautics communities. Few large scalable parallel systems are configured for capability computing, so models are hard to find. This unique environment has created a challenging system administration task, and has yielded some insights into the supercomputing needs of the various NASA Enterprises, as well as insights into the strengths and weaknesses of the T3E architecture and software. The T3E is a distributed memory system in which the processing elements (PE's) are connected by a low latency, high bandwidth bidirectional 3-D torus. Due to the focus on high speed communication between PE's, the T3E requires PE's to be allocated contiguously per job. Further, jobs will only execute on the user specified number of PE's and PE timesharing is possible but impractical. With a highly varied job mix in both size and runtime of jobs, the resulting scenario is PE fragmentation and an inability to achieve near 100% utilization. SGI/Cray has provided several scheduling and configuration tools to minimize the impact of fragmentation. These tools include PScheD (the political scheduler), GRM (the global resource manager) and NQE (the Network Queuing Environment). Features and impact of these tools will be discussed, as will resulting performance and utilization data. As a distributed memory system, the T3E is designed to be programmed through explicit message passing. Consequently, certain assumptions related to code design are made by the operating system (UNICOS/mk) and its scheduling tools. With the exception of HPF, which does run on the T3E, however poorly, alternative programming styles have the potential to impact the T3E in unexpected and undesirable ways. Several examples will be presented (preceeded with the disclaimer, "Don't try this at home! Violators will be prosecuted!")
Appropriate description of intermolecular interactions in the methane hydrates: an assessment of DFT methods.

PubMed

Liu, Yuan; Zhao, Jijun; Li, Fengyu; Chen, Zhongfang

2013-01-15

Accurate description of hydrogen-bonding energies between water molecules and van der Waals interactions between guest molecules and host water cages is crucial for study of methane hydrates (MHs). Using high-level ab initio MP2 and CCSD(T) results as the reference, we carefully assessed the performance of a variety of exchange-correlation functionals and various basis sets in describing the noncovalent interactions in MH. The functionals under investigation include the conventional GGA, meta-GGA, and hybrid functionals (PBE, PW91, TPSS, TPSSh, B3LYP, and X3LYP), long-range corrected functionals (ωB97X, ωB97, LC-ωPBE, CAM-B3LYP, and LC-TPSS), the newly developed Minnesota class functionals (M06-L, M06-HF, M06, and M06-2X), and the dispersion-corrected density functional theory (DFT) (DFT-D) methods (B97-D, ωB97X-D, PBE-TS, PBE-Grimme, and PW91-OBS). We found that the conventional functionals are not suitable for MH, notably, the widely used B3LYP functional even predicts repulsive interaction between CH(4) and (H(2)O)(6) cluster. M06-2X is the best among the M06-Class functionals. The ωB97X-D outperforms the other DFT-D methods and is recommended for accurate first-principles calculations of MH. B97-D is also acceptable as a compromise of computational cost and precision. Considering both accuracy and efficiency, B97-D, ωB97X-D, and M06-2X functional with 6-311++G(2d,2p) basis set without basis set superposition error (BSSE) correction are recommended. Though a fairly large basis set (e.g., aug-cc-pVTZ) and BSSE correction are necessary for a reliable MP2 calculation, DFT methods are less sensitive to the basis set and BSSE correction if the basis set is sufficient (e.g., 6-311++G(2d,2p)). These assessments provide useful guidance for choosing appropriate methodology of first-principles simulation of MH and related systems. © 2012 Wiley Periodicals, Inc. Copyright © 2012 Wiley Periodicals, Inc.
Lanczos eigensolution method for high-performance computers

NASA Technical Reports Server (NTRS)

Bostic, Susan W.

1991-01-01

The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors.

Coupling of Noah-MP and the High Resolution CI-WATER ADHydro Hydrological Model

NASA Astrophysics Data System (ADS)

Moreno, H. A.; Goncalves Pureza, L.; Ogden, F. L.; Steinke, R. C.

2014-12-01

ADHydro is a physics-based, high-resolution, distributed hydrological model suitable for simulating large watersheds in a massively parallel computing environment. It simulates important processes such as: rainfall and infiltration, snowfall and snowmelt in complex terrain, vegetation and evapotranspiration, soil heat flux and freezing, overland flow, channel flow, groundwater flow and water management. For the vegetation and evapotranspiration processes, ADHydro uses the validated community land surface model (LSM) Noah-MP. Noah-MP uses multiple options for key land-surface hydrology and was developed to facilitate climate predictions with physically based ensembles. This presentation discusses the lessons learned in coupling Noah-MP to ADHydro. Noah-MP is delivered with a main driver program and not as a library with a clear interface to be called from other codes. This required some investigation to determine the correct functions to call and the appropriate parameter values. ADHydro runs Noah-MP as a point process on each mesh element and provides initialization and forcing data for each element. Modeling data are acquired from various sources including the Soil Survey Geographic Database (SSURGO), the Weather Research and Forecasting (WRF) model, and internal ADHydro simulation states. Despite these challenges in coupling Noah-MP to ADHydro, the use of Noah-MP provides the benefits of a supported community code.
Analytic energy gradients for the orbital-optimized third-order Møller-Plesset perturbation theory

NASA Astrophysics Data System (ADS)

Bozkaya, Uǧur

2013-09-01

Analytic energy gradients for the orbital-optimized third-order Møller-Plesset perturbation theory (OMP3) [U. Bozkaya, J. Chem. Phys. 135, 224103 (2011)], 10.1063/1.3665134 are presented. The OMP3 method is applied to problematic chemical systems with challenging electronic structures. The performance of the OMP3 method is compared with those of canonical second-order Møller-Plesset perturbation theory (MP2), third-order Møller-Plesset perturbation theory (MP3), coupled-cluster singles and doubles (CCSD), and coupled-cluster singles and doubles with perturbative triples [CCSD(T)] for investigating equilibrium geometries, vibrational frequencies, and open-shell reaction energies. For bond lengths, the performance of OMP3 is in between those of MP3 and CCSD. For harmonic vibrational frequencies, the OMP3 method significantly eliminates the singularities arising from the abnormal response contributions observed for MP3 in case of symmetry-breaking problems, and provides noticeably improved vibrational frequencies for open-shell molecules. For open-shell reaction energies, OMP3 exhibits a better performance than MP3 and CCSD as in case of barrier heights and radical stabilization energies. As discussed in previous studies, the OMP3 method is several times faster than CCSD in energy computations. Further, in analytic gradient computations for the CCSD method one needs to solve λ-amplitude equations, however for OMP3 one does not since λ _{ab}^{ij(1)} = t_{ij}^{ab(1)} and λ _{ab}^{ij(2)} = t_{ij}^{ab(2)}. Additionally, one needs to solve orbital Z-vector equations for CCSD, but for OMP3 orbital response contributions are zero owing to the stationary property of OMP3. Overall, for analytic gradient computations the OMP3 method is several times less expensive than CCSD (roughly ˜4-6 times). Considering the balance of computational cost and accuracy we conclude that the OMP3 method emerges as a very useful tool for the study of electronically challenging chemical systems.
Analytic energy gradients for the orbital-optimized third-order Møller-Plesset perturbation theory.

PubMed

Bozkaya, Uğur

2013-09-14

Analytic energy gradients for the orbital-optimized third-order Møller-Plesset perturbation theory (OMP3) [U. Bozkaya, J. Chem. Phys. 135, 224103 (2011)] are presented. The OMP3 method is applied to problematic chemical systems with challenging electronic structures. The performance of the OMP3 method is compared with those of canonical second-order Møller-Plesset perturbation theory (MP2), third-order Møller-Plesset perturbation theory (MP3), coupled-cluster singles and doubles (CCSD), and coupled-cluster singles and doubles with perturbative triples [CCSD(T)] for investigating equilibrium geometries, vibrational frequencies, and open-shell reaction energies. For bond lengths, the performance of OMP3 is in between those of MP3 and CCSD. For harmonic vibrational frequencies, the OMP3 method significantly eliminates the singularities arising from the abnormal response contributions observed for MP3 in case of symmetry-breaking problems, and provides noticeably improved vibrational frequencies for open-shell molecules. For open-shell reaction energies, OMP3 exhibits a better performance than MP3 and CCSD as in case of barrier heights and radical stabilization energies. As discussed in previous studies, the OMP3 method is several times faster than CCSD in energy computations. Further, in analytic gradient computations for the CCSD method one needs to solve λ-amplitude equations, however for OMP3 one does not since λ(ab)(ij(1))=t(ij)(ab(1)) and λ(ab)(ij(2))=t(ij)(ab(2)). Additionally, one needs to solve orbital Z-vector equations for CCSD, but for OMP3 orbital response contributions are zero owing to the stationary property of OMP3. Overall, for analytic gradient computations the OMP3 method is several times less expensive than CCSD (roughly ~4-6 times). Considering the balance of computational cost and accuracy we conclude that the OMP3 method emerges as a very useful tool for the study of electronically challenging chemical systems.
Heart Fibrillation and Parallel Supercomputers

NASA Technical Reports Server (NTRS)

Kogan, B. Y.; Karplus, W. J.; Chudin, E. E.

1997-01-01

The Luo and Rudy 3 cardiac cell mathematical model is implemented on the parallel supercomputer CRAY - T3D. The splitting algorithm combined with variable time step and an explicit method of integration provide reasonable solution times and almost perfect scaling for rectilinear wave propagation. The computer simulation makes it possible to observe new phenomena: the break-up of spiral waves caused by intracellular calcium and dynamics and the non-uniformity of the calcium distribution in space during the onset of the spiral wave.
Parallel FEM Simulation of Electromechanics in the Heart

NASA Astrophysics Data System (ADS)

Xia, Henian; Wong, Kwai; Zhao, Xiaopeng

2011-11-01

Cardiovascular disease is the leading cause of death in America. Computer simulation of complicated dynamics of the heart could provide valuable quantitative guidance for diagnosis and treatment of heart problems. In this paper, we present an integrated numerical model which encompasses the interaction of cardiac electrophysiology, electromechanics, and mechanoelectrical feedback. The model is solved by finite element method on a Linux cluster and the Cray XT5 supercomputer, kraken. Dynamical influences between the effects of electromechanics coupling and mechanic-electric feedback are shown.
Hierarchical resilience with lightweight threads.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wheeler, Kyle Bruce

2011-10-01

This paper proposes methodology for providing robustness and resilience for a highly threaded distributed- and shared-memory environment based on well-defined inputs and outputs to lightweight tasks. These inputs and outputs form a failure 'barrier', allowing tasks to be restarted or duplicated as necessary. These barriers must be expanded based on task behavior, such as communication between tasks, but do not prohibit any given behavior. One of the trends in high-performance computing codes seems to be a trend toward self-contained functions that mimic functional programming. Software designers are trending toward a model of software design where their core functions are specifiedmore » in side-effect free or low-side-effect ways, wherein the inputs and outputs of the functions are well-defined. This provides the ability to copy the inputs to wherever they need to be - whether that's the other side of the PCI bus or the other side of the network - do work on that input using local memory, and then copy the outputs back (as needed). This design pattern is popular among new distributed threading environment designs. Such designs include the Barcelona STARS system, distributed OpenMP systems, the Habanero-C and Habanero-Java systems from Vivek Sarkar at Rice University, the HPX/ParalleX model from LSU, as well as our own Scalable Parallel Runtime effort (SPR) and the Trilinos stateless kernels. This design pattern is also shared by CUDA and several OpenMP extensions for GPU-type accelerators (e.g. the PGI OpenMP extensions).« less
The Science of Computing: Virtual Memory

NASA Technical Reports Server (NTRS)

Denning, Peter J.

1986-01-01

In the March-April issue, I described how a computer's storage system is organized as a hierarchy consisting of cache, main memory, and secondary memory (e.g., disk). The cache and main memory form a subsystem that functions like main memory but attains speeds approaching cache. What happens if a program and its data are too large for the main memory? This is not a frivolous question. Every generation of computer users has been frustrated by insufficient memory. A new line of computers may have sufficient storage for the computations of its predecessor, but new programs will soon exhaust its capacity. In 1960, a longrange planning committee at MIT dared to dream of a computer with 1 million words of main memory. In 1985, the Cray-2 was delivered with 256 million words. Computational physicists dream of computers with 1 billion words. Computer architects have done an outstanding job of enlarging main memories yet they have never kept up with demand. Only the shortsighted believe they can.
Analysis of clinical value of CT in the diagnosis of pediatric pneumonia and mycoplasma pneumonia.

PubMed

Gong, Liang; Zhang, Chong-Lin; Zhen, Qing

2016-04-01

Pneumonia is an infectious disease of the lung causing mortality. Mycoplasma pneumonia (MP) is an atypical bacterial pneumonia that damages several organs. Lung computed tomography (CT) has been utilized in its identification. The aim of the present study was to examine the value of computed tomography diagnosis for pediatric MP. The present study prospectively analyzed the clinical and imaging data of 1,280 cases of pediatric MP in the out- and inpatient departments from March, 2010 to March, 2014; analyzed the morphology and distribution of the pneumonic lesion in the lungs; and summarized the value of CT diagnosis for pediatric MP. In the included children, there were 688 cases of lesions in the unilateral lobe, 592 cases of lesions in the bilateral lobes, 1,101 cases of extensive patchy opacity, 496 cases of mottled opacity, 432 cases of increased lung marking, 256 cases of streak opacity, 192 cases of ground-glass opacity, 992 cases of thickened bronchial wall in the lesions, 128 cases of lymphadenopathy in the hilar lymph nodes and mediastinal lymph nodes, and the lung CT showed 32 cases of pulmonary cavity and 144 cases of pleural effusion. In conclusion, the CT signals of pediatric MP had several types with some children exhibiting complicated changes. The child's clinical manifestation and symptoms should thus be considered in the diagnosis to improve the diagnostic rate.
Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

NASA Technical Reports Server (NTRS)

Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

2016-01-01

In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.
NETS - A NEURAL NETWORK DEVELOPMENT TOOL, VERSION 3.0 (MACINTOSH VERSION)

NASA Technical Reports Server (NTRS)

Phillips, T. A.

1994-01-01

NETS, A Tool for the Development and Evaluation of Neural Networks, provides a simulation of Neural Network algorithms plus an environment for developing such algorithms. Neural Networks are a class of systems modeled after the human brain. Artificial Neural Networks are formed from hundreds or thousands of simulated neurons, connected to each other in a manner similar to brain neurons. Problems which involve pattern matching readily fit the class of problems which NETS is designed to solve. NETS uses the back propagation learning method for all of the networks which it creates. The nodes of a network are usually grouped together into clumps called layers. Generally, a network will have an input layer through which the various environment stimuli are presented to the network, and an output layer for determining the network's response. The number of nodes in these two layers is usually tied to some features of the problem being solved. Other layers, which form intermediate stops between the input and output layers, are called hidden layers. NETS allows the user to customize the patterns of connections between layers of a network. NETS also provides features for saving the weight values of a network during the learning process, which allows for more precise control over the learning process. NETS is an interpreter. Its method of execution is the familiar "read-evaluate-print" loop found in interpreted languages such as BASIC and LISP. The user is presented with a prompt which is the simulator's way of asking for input. After a command is issued, NETS will attempt to evaluate the command, which may produce more prompts requesting specific information or an error if the command is not understood. The typical process involved when using NETS consists of translating the problem into a format which uses input/output pairs, designing a network configuration for the problem, and finally training the network with input/output pairs until an acceptable error is reached. NETS allows the user to generate C code to implement the network loaded into the system. This permits the placement of networks as components, or subroutines, in other systems. In short, once a network performs satisfactorily, the Generate C Code option provides the means for creating a program separate from NETS to run the network. Other features: files may be stored in binary or ASCII format; multiple input propagation is permitted; bias values may be included; capability to scale data without writing scaling code; quick interactive testing of network from the main menu; and several options that allow the user to manipulate learning efficiency. NETS is written in ANSI standard C language to be machine independent. The Macintosh version (MSC-22108) includes code for both a graphical user interface version and a command line interface version. The machine independent version (MSC-21588) only includes code for the command line interface version of NETS 3.0. The Macintosh version requires a Macintosh II series computer and has been successfully implemented under System 7. Four executables are included on these diskettes, two for floating point operations and two for integer arithmetic. It requires Think C 5.0 to compile. A minimum of 1Mb of RAM is required for execution. Sample input files and executables for both the command line version and the Macintosh user interface version are provided on the distribution medium. The Macintosh version is available on a set of three 3.5 inch 800K Macintosh format diskettes. The machine independent version has been successfully implemented on an IBM PC series compatible running MS-DOS, a DEC VAX running VMS, a SunIPC running SunOS, and a CRAY Y-MP running UNICOS. Two executables for the IBM PC version are included on the MS-DOS distribution media, one compiled for floating point operations and one for integer arithmetic. The machine independent version is available on a set of three 5.25 inch 360K MS-DOS format diskettes (standard distribution medium) or a .25 inch streaming magnetic tape cartridge in UNIX tar format. NETS was developed in 1989 and updated in 1992. IBM PC is a registered trademark of International Business Machines. MS-DOS is a registered trademark of Microsoft Corporation. DEC, VAX, and VMS are trademarks of Digital Equipment Corporation. SunIPC and SunOS are trademarks of Sun Microsystems, Inc. CRAY Y-MP and UNICOS are trademarks of Cray Research, Inc.
NETS - A NEURAL NETWORK DEVELOPMENT TOOL, VERSION 3.0 (MACHINE INDEPENDENT VERSION)

NASA Technical Reports Server (NTRS)

Baffes, P. T.

1994-01-01

NETS, A Tool for the Development and Evaluation of Neural Networks, provides a simulation of Neural Network algorithms plus an environment for developing such algorithms. Neural Networks are a class of systems modeled after the human brain. Artificial Neural Networks are formed from hundreds or thousands of simulated neurons, connected to each other in a manner similar to brain neurons. Problems which involve pattern matching readily fit the class of problems which NETS is designed to solve. NETS uses the back propagation learning method for all of the networks which it creates. The nodes of a network are usually grouped together into clumps called layers. Generally, a network will have an input layer through which the various environment stimuli are presented to the network, and an output layer for determining the network's response. The number of nodes in these two layers is usually tied to some features of the problem being solved. Other layers, which form intermediate stops between the input and output layers, are called hidden layers. NETS allows the user to customize the patterns of connections between layers of a network. NETS also provides features for saving the weight values of a network during the learning process, which allows for more precise control over the learning process. NETS is an interpreter. Its method of execution is the familiar "read-evaluate-print" loop found in interpreted languages such as BASIC and LISP. The user is presented with a prompt which is the simulator's way of asking for input. After a command is issued, NETS will attempt to evaluate the command, which may produce more prompts requesting specific information or an error if the command is not understood. The typical process involved when using NETS consists of translating the problem into a format which uses input/output pairs, designing a network configuration for the problem, and finally training the network with input/output pairs until an acceptable error is reached. NETS allows the user to generate C code to implement the network loaded into the system. This permits the placement of networks as components, or subroutines, in other systems. In short, once a network performs satisfactorily, the Generate C Code option provides the means for creating a program separate from NETS to run the network. Other features: files may be stored in binary or ASCII format; multiple input propagation is permitted; bias values may be included; capability to scale data without writing scaling code; quick interactive testing of network from the main menu; and several options that allow the user to manipulate learning efficiency. NETS is written in ANSI standard C language to be machine independent. The Macintosh version (MSC-22108) includes code for both a graphical user interface version and a command line interface version. The machine independent version (MSC-21588) only includes code for the command line interface version of NETS 3.0. The Macintosh version requires a Macintosh II series computer and has been successfully implemented under System 7. Four executables are included on these diskettes, two for floating point operations and two for integer arithmetic. It requires Think C 5.0 to compile. A minimum of 1Mb of RAM is required for execution. Sample input files and executables for both the command line version and the Macintosh user interface version are provided on the distribution medium. The Macintosh version is available on a set of three 3.5 inch 800K Macintosh format diskettes. The machine independent version has been successfully implemented on an IBM PC series compatible running MS-DOS, a DEC VAX running VMS, a SunIPC running SunOS, and a CRAY Y-MP running UNICOS. Two executables for the IBM PC version are included on the MS-DOS distribution media, one compiled for floating point operations and one for integer arithmetic. The machine independent version is available on a set of three 5.25 inch 360K MS-DOS format diskettes (standard distribution medium) or a .25 inch streaming magnetic tape cartridge in UNIX tar format. NETS was developed in 1989 and updated in 1992. IBM PC is a registered trademark of International Business Machines. MS-DOS is a registered trademark of Microsoft Corporation. DEC, VAX, and VMS are trademarks of Digital Equipment Corporation. SunIPC and SunOS are trademarks of Sun Microsystems, Inc. CRAY Y-MP and UNICOS are trademarks of Cray Research, Inc.
A secure file manager for UNIX

DOE Office of Scientific and Technical Information (OSTI.GOV)

DeVries, R.G.

1990-12-31

The development of a secure file management system for a UNIX-based computer facility with supercomputers and workstations is described. Specifically, UNIX in its usual form does not address: (1) Operation which would satisfy rigorous security requirements. (2) Online space management in an environment where total data demands would be many times the actual online capacity. (3) Making the file management system part of a computer network in which users of any computer in the local network could retrieve data generated on any other computer in the network. The characteristics of UNIX can be exploited to develop a portable, secure filemore » manager which would operate on computer systems ranging from workstations to supercomputers. Implementation considerations making unusual use of UNIX features, rather than requiring extensive internal system changes, are described, and implementation using the Cray Research Inc. UNICOS operating system is outlined.« less
Report on the Acceptance Test of the CRI Y-MP 8128, 10 February - 12 March 1990

NASA Technical Reports Server (NTRS)

Carter, Russell; Kutler, Paul (Technical Monitor)

1998-01-01

The NAS Numerical Aerodynamic Simulation Facility's HSP 2 computer system, a CRI Y-MP 832 SN #1002, underwent a major hardware upgrade in February of 1990. The 32 MWord, 6.3 ns mainframe component of the system was replaced with a 128 MWord, 6.0 ns CRI Y-MP 8128 mainframe, SN #1030. A 30 day Acceptance Test of the computer system was performed by the NAS RND HSP group from 08:00 February 10, 1990 to 08:00 March 12, 1990. Overall responsibility for the RND HSP Acceptance Test was assumed by Duane Carbon. The terms of the contract required that the SN #1030 achieve an effectiveness level of greater than or equal to ninety (90) percent for 30 consecutive days within a 60 day time frame. After the first thirty days, the effectiveness level of SN #1030 was 94.4 percent, hence the acceptance test was passed.
Extracting Depth From Motion Parallax in Real-World and Synthetic Displays

NASA Technical Reports Server (NTRS)

Hecht, Heiko; Kaiser, Mary K.; Aiken, William; Null, Cynthia H. (Technical Monitor)

1994-01-01

In psychophysical studies on human sensitivity to visual motion parallax (MP), the use of computer displays is pervasive. However, a number of potential problems are associated with such displays: cue conflicts arise when observers accommodate to the screen surface, and observer head and body movements are often not reflected in the displays. We investigated observers' sensitivity to depth information in MP (slant, depth order, relative depth) using various real-world displays and their computer-generated analogs. Angle judgments of real-world stimuli were consistently superior to judgments that were based on computer-generated stimuli. Similar results were found for perceived depth order and relative depth. Perceptual competence of observers tends to be underestimated in research that is based on computer generated displays. Such findings cannot be generalized to more realistic viewing situations.
Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS

NASA Astrophysics Data System (ADS)

Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.

2018-03-01

Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.
PARALLELISATION OF THE MODEL-BASED ITERATIVE RECONSTRUCTION ALGORITHM DIRA.

PubMed

Örtenberg, A; Magnusson, M; Sandborg, M; Alm Carlsson, G; Malusek, A

2016-06-01

New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelisation of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelisation of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelised using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelisation of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelisation with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Computer-Aided Detection of Prostate Cancer with MRI: Technology and Applications

PubMed Central

Liu, Lizhi; Tian, Zhiqiang; Zhang, Zhenfeng; Fei, Baowei

2016-01-01

One in six men will develop prostate cancer in his life time. Early detection and accurate diagnosis of the disease can improve cancer survival and reduce treatment costs. Recently, imaging of prostate cancer has greatly advanced since the introduction of multi-parametric magnetic resonance imaging (mp-MRI). Mp-MRI consists of T2-weighted sequences combined with functional sequences including dynamic contrast-enhanced MRI, diffusion-weighted MRI, and MR spectroscopy imaging. Due to the big data and variations in imaging sequences, detection can be affected by multiple factors such as observer variability and visibility and complexity of the lesions. In order to improve quantitative assessment of the disease, various computer-aided detection systems have been designed to help radiologists in their clinical practice. This review paper presents an overview of literatures on computer-aided detection of prostate cancer with mp-MRI, which include the technology and its applications. The aim of the survey is threefold: an introduction for those new to the field, an overview for those working in the field, and a reference for those searching for literature on a specific application. PMID:27133005
Hybrid MPI/OpenMP Implementation of the ORAC Molecular Dynamics Program for Generalized Ensemble and Fast Switching Alchemical Simulations.

PubMed

Procacci, Piero

2016-06-27

We present a new release (6.0β) of the ORAC program [Marsili et al. J. Comput. Chem. 2010, 31, 1106-1116] with a hybrid OpenMP/MPI (open multiprocessing message passing interface) multilevel parallelism tailored for generalized ensemble (GE) and fast switching double annihilation (FS-DAM) nonequilibrium technology aimed at evaluating the binding free energy in drug-receptor system on high performance computing platforms. The production of the GE or FS-DAM trajectories is handled using a weak scaling parallel approach on the MPI level only, while a strong scaling force decomposition scheme is implemented for intranode computations with shared memory access at the OpenMP level. The efficiency, simplicity, and inherent parallel nature of the ORAC implementation of the FS-DAM algorithm, project the code as a possible effective tool for a second generation high throughput virtual screening in drug discovery and design. The code, along with documentation, testing, and ancillary tools, is distributed under the provisions of the General Public License and can be freely downloaded at www.chim.unifi.it/orac .
DOE Office of Scientific and Technical Information (OSTI.GOV)

Barbara Chapman

OpenMP was not well recognized at the beginning of the project, around year 2003, because of its limited use in DoE production applications and the inmature hardware support for an efficient implementation. Yet in the recent years, it has been graduately adopted both in HPC applications, mostly in the form of MPI+OpenMP hybrid code, and in mid-scale desktop applications for scientific and experimental studies. We have observed this trend and worked deligiently to improve our OpenMP compiler and runtimes, as well as to work with the OpenMP standard organization to make sure OpenMP are evolved in the direction close tomore » DoE missions. In the Center for Programming Models for Scalable Parallel Computing project, the HPCTools team at the University of Houston (UH), directed by Dr. Barbara Chapman, has been working with project partners, external collaborators and hardware vendors to increase the scalability and applicability of OpenMP for multi-core (and future manycore) platforms and for distributed memory systems by exploring different programming models, language extensions, compiler optimizations, as well as runtime library support.« less
How to compute isomerization energies of organic molecules with quantum chemical methods.

PubMed

Grimme, Stefan; Steinmetz, Marc; Korth, Martin

2007-03-16

The reaction energies for 34 typical organic isomerizations including oxygen and nitrogen heteroatoms are investigated with modern quantum chemical methods that have the perspective of also being applicable to large systems. The experimental reaction enthalpies are corrected for vibrational and thermal effects, and the thus derived "experimental" reaction energies are compared to corresponding theoretical data. A series of standard AO basis sets in combination with second-order perturbation theory (MP2, SCS-MP2), conventional density functionals (e.g., PBE, TPSS, B3-LYP, MPW1K, BMK), and new perturbative functionals (B2-PLYP, mPW2-PLYP) are tested. In three cases, obvious errors of the experimental values could be detected, and accurate coupled-cluster [CCSD(T)] reference values have been used instead. It is found that only triple-zeta quality AO basis sets provide results close enough to the basis set limit and that sets like the popular 6-31G(d) should be avoided in accurate work. Augmentation of small basis sets with diffuse functions has a notable effect in B3-LYP calculations that is attributed to intramolecular basis set superposition error and covers basic deficiencies of the functional. The new methods based on perturbation theory (SCS-MP2, X2-PLYP) are found to be clearly superior to many other approaches; that is, they provide mean absolute deviations of less than 1.2 kcal mol-1 and only a few (<10%) outliers. The best performance in the group of conventional functionals is found for the highly parametrized BMK hybrid meta-GGA. Contrary to accepted opinion, hybrid density functionals offer no real advantage over simple GGAs. For reasonably large AO basis sets, results of poor quality are obtained with the popular B3-LYP functional that cannot be recommended for thermochemical applications in organic chemistry. The results of this study are complementary to often used benchmarks based on atomization energies and should guide chemists in their search for accurate and efficient computational thermochemistry methods.

Explicitly correlated benchmark calculations on C8H8 isomer energy separations: how accurate are DFT, double-hybrid, and composite ab initio procedures?

NASA Astrophysics Data System (ADS)

Karton, Amir; Martin, Jan M. L.

2012-10-01

Accurate isomerization energies are obtained for a set of 45 C8H8 isomers by means of the high-level, ab initio W1-F12 thermochemical protocol. The 45 isomers involve a range of hydrocarbon functional groups, including (linear and cyclic) polyacetylene, polyyne, and cumulene moieties, as well as aromatic, anti-aromatic, and highly-strained rings. Performance of a variety of DFT functionals for the isomerization energies is evaluated. This proves to be a challenging test: only six of the 56 tested functionals attain root mean square deviations (RMSDs) below 3 kcal mol-1 (the performance of MP2), namely: 2.9 (B972-D), 2.8 (PW6B95), 2.7 (B3PW91-D), 2.2 (PWPB95-D3), 2.1 (ωB97X-D), and 1.2 (DSD-PBEP86) kcal mol-1. Isomers involving highly-strained fused rings or long cumulenic chains provide a 'torture test' for most functionals. Finally, we evaluate the performance of composite procedures (e.g. G4, G4(MP2), CBS-QB3, and CBS-APNO), as well as that of standard ab initio procedures (e.g. MP2, SCS-MP2, MP4, CCSD, and SCS-CCSD). Both connected triples and post-MP4 singles and doubles are important for accurate results. SCS-MP2 actually outperforms MP4(SDQ) for this problem, while SCS-MP3 yields similar performance as CCSD and slightly bests MP4. All the tested empirical composite procedures show excellent performance with RMSDs below 1 kcal mol-1.
Very-large-area CCD image sensors: concept and cost-effective research

NASA Astrophysics Data System (ADS)

Bogaart, E. W.; Peters, I. M.; Kleimann, A. C.; Manoury, E. J. P.; Klaassens, W.; de Laat, W. T. F. M.; Draijer, C.; Frost, R.; Bosiers, J. T.

2009-01-01

A new-generation full-frame 36x48 mm2 48Mp CCD image sensor with vertical anti-blooming for professional digital still camera applications is developed by means of the so-called building block concept. The 48Mp devices are formed by stitching 1kx1k building blocks with 6.0 Âµm pixel pitch in 6x8 (hxv) format. This concept allows us to design four large-area (48Mp) and sixty-two basic (1Mp) devices per 6" wafer. The basic image sensor is relatively small in order to obtain data from many devices. Evaluation of the basic parameters such as the image pixel and on-chip amplifier provides us statistical data using a limited number of wafers. Whereas the large-area devices are evaluated for aspects typical to large-sensor operation and performance, such as the charge transport efficiency. Combined with the usability of multi-layer reticles, the sensor development is cost effective for prototyping. Optimisation of the sensor design and technology has resulted in a pixel charge capacity of 58 ke- and significantly reduced readout noise (12 electrons at 25 MHz pixel rate, after CDS). Hence, a dynamic range of 73 dB is obtained. Microlens and stack optimisation resulted in an excellent angular response that meets with the wide-angle photography demands.
Molecular orbital studies of the bonding in heavy element organometallics: Progress report

NASA Astrophysics Data System (ADS)

Bursten, B. E.

1988-03-01

Over the past two years we have made considerable progress in the understanding of the bonding in heavy element mononuclear and binuclear complexes. For mononuclear complexes, our strategy has been to study the orbital interactions between the actinide metal center and the surrounding ligands. One particular system which has been studied extensively is X sub 3 AnL (where X = Cp, Cl, NH sub 2 ; An = actinide; and L = neutral or anionic ligand). We are interested not only in the mechanics of the An-X orbital interactions, but also how the relative donor characteristics of X may influence coordination of the fourth ligand L to the actinide. For binuclear systems, we are interested not only in homobimetallic complexes, but also in heterobimetallic complexes containing actinides and transition metals. In order to make the calculations of such large systems tractable, we have transferred the X-alpha-SW codes to the newly acquired Cray XMP24 at the Ohio Supercomputer Center. This has resulted in significant savings of money and time.
Low energy isomers of (H2O)25 from a hierarchical method based on Monte Carlo Temperature Basin Paving and Molecular Tailoring Approaches benchmarked by full MP2 calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sahu, Nityananda; Gadre, Shridhar R.; Bandyopadhyay, Pradipta

We report new global minimum candidate structures for the (H2O)25 cluster that are lower in energy than the ones reported previously and correspond to hydrogen bonded networks with 42 hydrogen bonds and an interior, fully coordinated water molecule. These were obtained as a result of a hierarchical approach based on initial Monte Carlo Temperature Basin Paving (MCTBP) sampling of the cluster’s Potential Energy Surface (PES) with the Effective Fragment Potential (EFP), subsequent geometry optimization using the Molecular Tailoring fragmentation Approach (MTA) and final refinement at the second order Møller Plesset perturbation (MP2) level of theory. The MTA geometry optimizations usedmore » between 14 and 18 main fragments with maximum sizes between 11 and 14 water molecules and average size of 10 water molecules, whose energies and gradients were computed at the MP2 level. The MTA-MP2 optimized geometries were found to be quite close (within < 0.5 kcal/mol) to the ones obtained from the MP2 optimization of the whole cluster. The grafting of the MTA-MP2 energies yields electronic energies that are within < 5×10-4 a.u. from the MP2 results for the whole cluster while preserving their energy order. The MTA-MP2 method was also found to reproduce the MP2 harmonic vibrational frequencies in both the HOH bending and the OH stretching regions.« less
Large-scale FMO-MP3 calculations on the surface proteins of influenza virus, hemagglutinin (HA) and neuraminidase (NA)

NASA Astrophysics Data System (ADS)

Mochizuki, Yuji; Yamashita, Katsumi; Fukuzawa, Kaori; Takematsu, Kazutomo; Watanabe, Hirofumi; Taguchi, Naoki; Okiyama, Yoshio; Tsuboi, Misako; Nakano, Tatsuya; Tanaka, Shigenori

2010-06-01

Two proteins on the influenza virus surface have been well known. One is hemagglutinin (HA) associated with the infection to cells. The fragment molecular orbital (FMO) calculations were performed on a complex consisting of HA trimer and two Fab-fragments at the third-order Møller-Plesset perturbation (MP3) level. The numbers of residues and 6-31G basis functions were 2351 and 201276, and thus a massively parallel-vector computer was utilized to accelerate the processing. This FMO-MP3 job was completed in 5.8 h with 1024 processors. Another protein is neuraminidase (NA) involved in the escape from infected cells. The FMO-MP3 calculation was also applied to analyze the interactions between oseltamivir and surrounding residues in pharmacophore.
The anabolic response to a meal containing different amounts of protein is not limited by the maximal stimulation of protein synthesis in healthy young adults.

PubMed

Kim, Il-Young; Schutzler, Scott; Schrader, Amy; Spencer, Horace J; Azhar, Gohar; Ferrando, Arny A; Wolfe, Robert R

2016-01-01

We have determined whole body protein kinetics, i.e., protein synthesis (PS), breakdown (PB), and net balance (NB) in human subjects in the fasted state and following ingestion of ~40 g [moderate protein (MP)], which has been reported to maximize the protein synthetic response or ~70 g [higher protein (HP)] protein, more representative of the amount of protein in the dinner of an average American diet. Twenty-three healthy young adults who had performed prior resistance exercise (X-MP or X-HP) or time-matched resting (R-MP or R-HP) were studied during a primed continuous infusion of l-[(2)H5]phenylalanine and l-[(2)H2]tyrosine. Subjects were randomly assigned into an exercise (X, n = 12) or resting (R, n = 11) group, and each group was studied at the two levels of dietary protein intake in random order. PS, PB, and NB were expressed as increases above the basal, fasting values (mg·kg lean body mass(-1)·min(-1)). Exercise did not significantly affect protein kinetics and blood chemistry. Feeding resulted in positive NB at both levels of protein intake: NB was greater in response to the meal containing HP vs. MP (P < 0.00001). The greater NB with HP was achieved primarily through a greater reduction in PB and to a lesser extent stimulation of protein synthesis (for all, P < 0.0001). HP resulted in greater plasma essential amino acid responses (P < 0.01) vs. MP, with no differences in insulin and glucose responses. In conclusion, whole body net protein balance improves with greater protein intake above that previously suggested to maximally stimulating muscle protein synthesis because of a simultaneous reduction in protein breakdown. Copyright © 2016 the American Physiological Society.
A transfer learning approach for classification of clinical significant prostate cancers from mpMRI scans

NASA Astrophysics Data System (ADS)

Chen, Quan; Xu, Xiang; Hu, Shiliang; Li, Xiao; Zou, Qing; Li, Yunpeng

2017-03-01

Deep learning has shown a great potential in computer aided diagnosis. However, in many applications, large dataset is not available. This makes the training of a sophisticated deep learning neural network (DNN) difficult. In this study, we demonstrated that with transfer learning, we can quickly retrain start-of-the-art DNN models with limited data provided by the prostateX challenge. The training data consists of 330 lesions, only 78 were clinical significant. Efforts were made to balance the data during training. We used ImageNet pre-trained inceptionV3 and Vgg-16 model and obtained AUC of 0.81 and 0.83 respectively on the prostateX test data, good for a 4th place finish. We noticed that models trained for different prostate zone has different sensitivity. Applying scaling factors before merging the result improves the AUC for the final result.
Solving the Cauchy-Riemann equations on parallel computers

NASA Technical Reports Server (NTRS)

Fatoohi, Raad A.; Grosch, Chester E.

1987-01-01

Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented.
Enabling Diverse Software Stacks on Supercomputers using High Performance Virtual Clusters.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Younge, Andrew J.; Pedretti, Kevin; Grant, Ryan

While large-scale simulations have been the hallmark of the High Performance Computing (HPC) community for decades, Large Scale Data Analytics (LSDA) workloads are gaining attention within the scientific community not only as a processing component to large HPC simulations, but also as standalone scientific tools for knowledge discovery. With the path towards Exascale, new HPC runtime systems are also emerging in a way that differs from classical distributed com- puting models. However, system software for such capabilities on the latest extreme-scale DOE supercomputing needs to be enhanced to more appropriately support these types of emerging soft- ware ecosystems. In thismore » paper, we propose the use of Virtual Clusters on advanced supercomputing resources to enable systems to support not only HPC workloads, but also emerging big data stacks. Specifi- cally, we have deployed the KVM hypervisor within Cray's Compute Node Linux on a XC-series supercomputer testbed. We also use libvirt and QEMU to manage and provision VMs directly on compute nodes, leveraging Ethernet-over-Aries network emulation. To our knowledge, this is the first known use of KVM on a true MPP supercomputer. We investigate the overhead our solution using HPC benchmarks, both evaluating single-node performance as well as weak scaling of a 32-node virtual cluster. Overall, we find single node performance of our solution using KVM on a Cray is very efficient with near-native performance. However overhead increases by up to 20% as virtual cluster size increases, due to limitations of the Ethernet-over-Aries bridged network. Furthermore, we deploy Apache Spark with large data analysis workloads in a Virtual Cluster, ef- fectively demonstrating how diverse software ecosystems can be supported by High Performance Virtual Clusters.« less
Climate Ocean Modeling on a Beowulf Class System

NASA Technical Reports Server (NTRS)

Cheng, B. N.; Chao, Y.; Wang, P.; Bondarenko, M.

2000-01-01

With the growing power and shrinking cost of personal computers. the availability of fast ethernet interconnections, and public domain software packages, it is now possible to combine them to build desktop parallel computers (named Beowulf or PC clusters) at a fraction of what it would cost to buy systems of comparable power front supercomputer companies. This led as to build and assemble our own sys tem. specifically for climate ocean modeling. In this article, we present our experience with such a system, discuss its network performance, and provide some performance comparison data with both HP SPP2000 and Cray T3E for an ocean Model used in present-day oceanographic research.
Close to real life. [solving for transonic flow about lifting airfoils using supercomputers

NASA Technical Reports Server (NTRS)

Peterson, Victor L.; Bailey, F. Ron

1988-01-01

NASA's Numerical Aerodynamic Simulation (NAS) facility for CFD modeling of highly complex aerodynamic flows employs as its basic hardware two Cray-2s, an ETA-10 Model Q, an Amdahl 5880 mainframe computer that furnishes both support processing and access to 300 Gbytes of disk storage, several minicomputers and superminicomputers, and a Thinking Machines 16,000-device 'connection machine' processor. NAS, which was the first supercomputer facility to standardize operating-system and communication software on all processors, has done important Space Shuttle aerodynamics simulations and will be critical to the configurational refinement of the National Aerospace Plane and its intergrated powerplant, which will involve complex, high temperature reactive gasdynamic computations.
Computers and the Multiplicity of Polynomial Roots.

ERIC Educational Resources Information Center

Wavrik, John J.

1982-01-01

Described are stages in the development of a computer program to solve a particular algebra problem and the nature of algebraic computation is presented. A program in BASIC is provided to give ideas to others for developing their own programs. (MP)
Impact! Chandra Images a Young Supernova Blast Wave

NASA Astrophysics Data System (ADS)

2000-05-01

Two images made by NASA's Chandra X-ray Observatory, one in October 1999, the other in January 2000, show for the first time the full impact of the actual blast wave from Supernova 1987A (SN1987A). The observations are the first time that X-rays from a shock wave have been imaged at such an early stage of a supernova explosion. Recent observations of SN 1987A with the Hubble Space Telescope revealed gradually brightening hot spots from a ring of matter ejected by the star thousands of years before it exploded. Chandra's X-ray images show the cause for this brightening ring. A shock wave is smashing into portions of the ring at a speed of 10 million miles per hour (4,500 kilometers per second). The gas behind the shock wave has a temperature of about ten million degrees Celsius, and is visible only with an X-ray telescope. "With Hubble we heard the whistle from the oncoming train," said David Burrows of Pennsylvania State University, University Park, the leader of the team of scientists involved in analyzing the Chandra data on SN 1987A. "Now, with Chandra, we can see the train." The X-ray observations appear to confirm the general outlines of a model developed by team member Richard McCray of the University of Colorado, Boulder, and others, which holds that a shock wave has been moving out ahead of the debris expelled by the explosion. As this shock wave collides with material outside the ring, it heats it to millions of degrees. "We are witnessing the birth of a supernova remnant for the first time," McCray said. The Chandra images clearly show the previously unseen, shock-heated matter just inside the optical ring. Comparison with observations made with Chandra in October and January, and with Hubble in February 2000, show that the X-ray emission peaks close to the newly discovered optical hot spots, and indicate that the wave is beginning to hit the ring. In the next few years, the shock wave will light up still more material in the ring, and an inward moving, or reverse, shock wave will heat the material ejected in the explosion itself. "The supernova is digging up its own past," said McCray. The observations were made on October 6, 1999, using the Advanced CCD Imaging Spectrometer (ACIS) and the High Energy Transmission Grating, and again on January 17, 2000, using ACIS. Other members of the team were Eli Michael of the University of Colorado; Dr. Una Hwang, Dr. Steven Holt and Dr. Rob Petre of NASA's Goddard Space Flight Center in Greenbelt, MD; Professor Roger Chevalier of the University of Virginia, Charlottesville; and Professors Gordon Garmire and John Nousek of Pennsylvania State University. The results will be published in an upcoming issue of the Astrophysical Journal. The ACIS instrument was built for NASA by the Massachusetts Institute of Technology, Cambridge, and Pennsylvania State University. The High Energy Transmission Grating was built by the Massachusetts Institute of Technology. NASA's Marshall Space Flight Center in Huntsville, AL, manages the Chandra program. TRW, Inc., Redondo Beach, CA, is the prime contractor for the spacecraft. The Smithsonian's Chandra X-ray Center controls science and flight operations from Cambridge, MA. More About SN 1987A Images to illustrate this release and more information on Chandra's progress can be found on the Internet at: http://chandra.harvard.edu/photo/2000/sn1987a/index.html AND http://chandra.nasa.gov More About SN 1987A
Mucoperiosteal exostoses in the tympanic bulla of African lions (Panthera leo).

PubMed

Novales, M; Ginel, P J; Diz, A; Blanco, B; Zafra, R; Guerra, R; Mozos, E

2015-03-01

Mucoperiosteal exostoses (MpEs) of the tympanic bulla (TB), also referred as middle-ear otoliths, have been occasionally described in dogs and cats in association with clinical signs of otitis media or as an incidental finding, but they have not been recorded in other species. In this report, we describe the radiographic, gross, and histopathologic features of MpEs in 8 African lions (Panthera leo). All animals (5 males and 3 females) were adults that had been kept in captivity and had their skeletons conserved as part of an anatomic academic collection. A radiographic study revealed mineralized structures in the TB consistent with MpEs in 7 of the 16 examined TB; a computed tomography study identified MpEs in 12 of the 16 TB. Six TB from 4 lions were sectioned, and several MpEs were demineralized for histopathologic analysis. Grossly, MpEs appeared variable in number and shape. Some were globular structures that were loosely attached to the mucosal surface of the TB; others were isolated to coalescent bone spicules extending from the mucoperiosteum. Position was also variable, but MpEs frequently developed in the hypotympanum, especially on the ventromedial aspect of the TB wall. Microscopically, MpEs were composed of osteonal bone growing from the periosteum and not by dystrophic calcification of necrotic tissue debris, as is hypothesized in dogs. © The Author(s) 2014.
Using Intel's Knight Landing Processor to Accelerate Global Nested Air Quality Prediction Modeling System (GNAQPMS) Model

NASA Astrophysics Data System (ADS)

Wang, H.; Chen, H.; Chen, X.; Wu, Q.; Wang, Z.

2016-12-01

The Global Nested Air Quality Prediction Modeling System for Hg (GNAQPMS-Hg) is a global chemical transport model coupled Hg transport module to investigate the mercury pollution. In this study, we present our work of transplanting the GNAQPMS model on Intel Xeon Phi processor, Knights Landing (KNL) to accelerate the model. KNL is the second-generation product adopting Many Integrated Core Architecture (MIC) architecture. Compared with the first generation Knight Corner (KNC), KNL has more new hardware features, that it can be used as unique processor as well as coprocessor with other CPU. According to the Vtune tool, the high overhead modules in GNAQPMS model have been addressed, including CBMZ gas chemistry, advection and convection module, and wet deposition module. These high overhead modules were accelerated by optimizing code and using new techniques of KNL. The following optimized measures was done: 1) Changing the pure MPI parallel mode to hybrid parallel mode with MPI and OpenMP; 2.Vectorizing the code to using the 512-bit wide vector computation unit. 3. Reducing unnecessary memory access and calculation. 4. Reducing Thread Local Storage (TLS) for common variables with each OpenMP thread in CBMZ. 5. Changing the way of global communication from files writing and reading to MPI functions. After optimization, the performance of GNAQPMS is greatly increased both on CPU and KNL platform, the single-node test showed that optimized version has 2.6x speedup on two sockets CPU platform and 3.3x speedup on one socket KNL platform compared with the baseline version code, which means the KNL has 1.29x speedup when compared with 2 sockets CPU platform.
6-mercaptopurine inhibits atherosclerosis in apolipoprotein e*3-leiden transgenic mice through atheroprotective actions on monocytes and macrophages.

PubMed

Pols, Thijs W H; Bonta, Peter I; Pires, Nuno M M; Otermin, Iker; Vos, Mariska; de Vries, Margreet R; van Eijk, Marco; Roelofsen, Jeroen; Havekes, Louis M; Quax, Paul H A; van Kuilenburg, André B P; de Waard, Vivian; Pannekoek, Hans; de Vries, Carlie J M

2010-08-01

6-Mercaptopurine (6-MP), the active metabolite of the immunosuppressive prodrug azathioprine, is commonly used in autoimmune diseases and transplant recipients, who are at high risk for cardiovascular disease. Here, we aimed to gain knowledge on the action of 6-MP in atherosclerosis, with a focus on monocytes and macrophages. We demonstrate that 6-MP induces apoptosis of THP-1 monocytes, involving decreased expression of the intrinsic antiapoptotic factors B-cell CLL/Lymphoma-2 (Bcl-2) and Bcl2-like 1 (Bcl-x(L)). In addition, we show that 6-MP decreases expression of the monocyte adhesion molecules platelet endothelial adhesion molecule-1 (PECAM-1) and very late antigen-4 (VLA-4) and inhibits monocyte adhesion. Screening of a panel of cytokines relevant to atherosclerosis revealed that 6-MP robustly inhibits monocyte chemoattractant chemokine-1 (MCP-1) expression in macrophages stimulated with lipopolysaccharide (LPS). Finally, local delivery of 6-MP to the vessel wall, using a drug-eluting cuff, attenuates atherosclerosis in hypercholesterolemic apolipoprotein E*3-Leiden transgenic mice (P<0.05). In line with our in vitro data, this inhibition of atherosclerosis by 6-MP was accompanied with decreased lesion monocyte chemoattractant chemokine-1 levels, enhanced vascular apoptosis, and reduced macrophage content. We report novel, previously unrecognized atheroprotective actions of 6-MP in cultured monocytes/macrophages and in a mouse model of atherosclerosis, providing further insight into the effect of the immunosuppressive drug azathioprine in atherosclerosis.
A new system for assessment of growth using mandibular canine calcification stages and its correlation with modified MP3 stages.

PubMed

Hegde, Gautham; Hegde, Nanditha; Kumar, Anil; Keshavaraj

2014-07-01

Orthodontic diagnosis and treatment planning for growing children must involve growth prediction, especially in the treatment of skeletal problems. Studies have shown that a strong association exists between skeletal maturity and dental calcification stages. The present study was therefore taken up to provide a simple and practical method for assessing skeletal maturity using a dental periapical film and standard dental X-ray machine, to compare the developmental stages of the mandibular canine with that of developmental stages of modified MP3 and to find out if any correlation exists, to determine if the developmental stages of the mandibular canine alone can be used as a reliable indicator for assessment of skeletal maturity. A total of 160 periapical radiographs, of the mandibular right canine and the MP3 region was taken and assessed according to the Dermirjian's stages of dental calcification and the modified MP3 stages. The correlation coefficient between MP3 stages and developmental stages of mandibular canine was found to be significant in both male and female groups. When the canine calcification stages were compared with the MP3 stages it was found that with the exception of the D stage of canine calcification the remaining stages showed a very high correlation with the modified MP3 stages. The correlation between the mandibular canine calcification stages, and the MP3 stages was found to be significant. The canine calcification could be used as a sole indicator for assessment of skeletal maturity.
Monte Carlo MP2 on Many Graphical Processing Units.

PubMed

Doran, Alexander E; Hirata, So

2016-10-11

In the Monte Carlo second-order many-body perturbation (MC-MP2) method, the long sum-of-product matrix expression of the MP2 energy, whose literal evaluation may be poorly scalable, is recast into a single high-dimensional integral of functions of electron pair coordinates, which is evaluated by the scalable method of Monte Carlo integration. The sampling efficiency is further accelerated by the redundant-walker algorithm, which allows a maximal reuse of electron pairs. Here, a multitude of graphical processing units (GPUs) offers a uniquely ideal platform to expose multilevel parallelism: fine-grain data-parallelism for the redundant-walker algorithm in which millions of threads compute and share orbital amplitudes on each GPU; coarse-grain instruction-parallelism for near-independent Monte Carlo integrations on many GPUs with few and infrequent interprocessor communications. While the efficiency boost by the redundant-walker algorithm on central processing units (CPUs) grows linearly with the number of electron pairs and tends to saturate when the latter exceeds the number of orbitals, on a GPU it grows quadratically before it increases linearly and then eventually saturates at a much larger number of pairs. This is because the orbital constructions are nearly perfectly parallelized on a GPU and thus completed in a near-constant time regardless of the number of pairs. In consequence, an MC-MP2/cc-pVDZ calculation of a benzene dimer is 2700 times faster on 256 GPUs (using 2048 electron pairs) than on two CPUs, each with 8 cores (which can use only up to 256 pairs effectively). We also numerically determine that the cost to achieve a given relative statistical uncertainty in an MC-MP2 energy increases as O(n 3 ) or better with system size n, which may be compared with the O(n 5 ) scaling of the conventional implementation of deterministic MP2. We thus establish the scalability of MC-MP2 with both system and computer sizes.
Shared Memory Parallelization of an Implicit ADI-type CFD Code

NASA Technical Reports Server (NTRS)

Hauser, Th.; Huang, P. G.

1999-01-01

A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
MATH77 - A LIBRARY OF MATHEMATICAL SUBPROGRAMS FOR FORTRAN 77, RELEASE 4.0

NASA Technical Reports Server (NTRS)

Lawson, C. L.

1994-01-01

MATH77 is a high quality library of ANSI FORTRAN 77 subprograms implementing contemporary algorithms for the basic computational processes of science and engineering. The portability of MATH77 meets the needs of present-day scientists and engineers who typically use a variety of computing environments. Release 4.0 of MATH77 contains 454 user-callable and 136 lower-level subprograms. Usage of the user-callable subprograms is described in 69 sections of the 416 page users' manual. The topics covered by MATH77 are indicated by the following list of chapter titles in the users' manual: Mathematical Functions, Pseudo-random Number Generation, Linear Systems of Equations and Linear Least Squares, Matrix Eigenvalues and Eigenvectors, Matrix Vector Utilities, Nonlinear Equation Solving, Curve Fitting, Table Look-Up and Interpolation, Definite Integrals (Quadrature), Ordinary Differential Equations, Minimization, Polynomial Rootfinding, Finite Fourier Transforms, Special Arithmetic , Sorting, Library Utilities, Character-based Graphics, and Statistics. Besides subprograms that are adaptations of public domain software, MATH77 contains a number of unique packages developed by the authors of MATH77. Instances of the latter type include (1) adaptive quadrature, allowing for exceptional generality in multidimensional cases, (2) the ordinary differential equations solver used in spacecraft trajectory computation for JPL missions, (3) univariate and multivariate table look-up and interpolation, allowing for "ragged" tables, and providing error estimates, and (4) univariate and multivariate derivative-propagation arithmetic. MATH77 release 4.0 is a subroutine library which has been carefully designed to be usable on any computer system that supports the full ANSI standard FORTRAN 77 language. It has been successfully implemented on a CRAY Y/MP computer running UNICOS, a UNISYS 1100 computer running EXEC 8, a DEC VAX series computer running VMS, a Sun4 series computer running SunOS, a Hewlett-Packard 720 computer running HP-UX, a Macintosh computer running MacOS, and an IBM PC compatible computer running MS-DOS. Accompanying the library is a set of 196 "demo" drivers that exercise all of the user-callable subprograms. The FORTRAN source code for MATH77 comprises 109K lines of code in 375 files with a total size of 4.5Mb. The demo drivers comprise 11K lines of code and 418K. Forty-four percent of the lines of the library code and 29% of those in the demo code are comment lines. The standard distribution medium for MATH77 is a .25 inch streaming magnetic tape cartridge in UNIX tar format. It is also available on a 9track 1600 BPI magnetic tape in VAX BACKUP format and a TK50 tape cartridge in VAX BACKUP format. An electronic copy of the documentation is included on the distribution media. Previous releases of MATH77 have been used over a number of years in a variety of JPL applications. MATH77 Release 4.0 was completed in 1992. MATH77 is a copyrighted work with all copyright vested in NASA.

LAMMPS strong scaling performance optimization on Blue Gene/Q

DOE Office of Scientific and Technical Information (OSTI.GOV)

Coffman, Paul; Jiang, Wei; Romero, Nichols A.

2014-11-12

LAMMPS "Large-scale Atomic/Molecular Massively Parallel Simulator" is an open-source molecular dynamics package from Sandia National Laboratories. Significant performance improvements in strong-scaling and time-to-solution for this application on IBM's Blue Gene/Q have been achieved through computational optimizations of the OpenMP versions of the short-range Lennard-Jones term of the CHARMM force field and the long-range Coulombic interaction implemented with the PPPM (particle-particle-particle mesh) algorithm, enhanced by runtime parameter settings controlling thread utilization. Additionally, MPI communication performance improvements were made to the PPPM calculation by re-engineering the parallel 3D FFT to use MPICH collectives instead of point-to-point. Performance testing was done using anmore » 8.4-million atom simulation scaling up to 16 racks on the Mira system at Argonne Leadership Computing Facility (ALCF). Speedups resulting from this effort were in some cases over 2x.« less
CLOMP v1.5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gyllenhaal, J.

CLOMP is the C version of the Livermore OpenMP benchmark developed to measure OpenMP overheads and other performance impacts due to threading. For simplicity, it does not use MPI by default but it is expected to be run on the resources a threaded MPI task would use (e.g., a portion of a shared memory compute node). Compiling with -DWITH_MPI allows packing one or more nodes with CLOMP tasks and having CLOMP report OpenMP performance for the slowest MPI task. On current systems, the strong scaling performance results for 4, 8, or 16 threads are of the most interest. Suggested weakmore » scaling inputs are provided for evaluating future systems. Since MPI is often used to place at least one MPI task per coherence or NUMA domain, it is recommended to focus OpenMP runtime measurements on a subset of node hardware where it is most possible to have low OpenMP overheads (e.g., within one coherence domain or NUMA domain).« less
Differences in motor performance between children and adolescents in Mozambique and Portugal: impact of allometric scaling.

PubMed

Dos Santos, Fernanda Karina; Nevill, Allan; Gomes, Thayse Natacha Q F; Chaves, Raquel; Daca, Timóteo; Madeira, Aspacia; Katzmarzyk, Peter T; Prista, António; Maia, José A R

2016-05-01

Children from developed and developing countries have different anthropometric characteristics which may affect their motor performance (MP). To use the allometric approach to model the relationship between body size and MP in youth from two countries differing in socio-economic status-Portugal and Mozambique. A total of 2946 subjects, 1280 Mozambicans (688 girls) and 1666 Portuguese (826 girls), aged 10-15 years were sampled. Height and weight were measured and the reciprocal ponderal index (RPI) was computed. MP included handgrip strength, 1-mile run/walk, curl-ups and standing long jump tests. A multiplicative allometric model was adopted to adjust for body size differences across countries. Differences in MP between Mozambican and Portuguese children exist, invariably favouring the latter. The allometric models used to adjust MP for differences in body size identified the optimal body shape to be either the RPI or even more linear, i.e. approximately (height/mass(0.25)). Having adjusted the MP variables for differences in body size, the differences between Mozambican and Portuguese children were invariably reduced and, in the case of grip strength, reversed. These results reinforce the notion that significant differences exist in MP across countries, even after adjusting for differences in body size.
Engineering mesenchymal stem cell spheroids by incorporation of mechanoregulator microparticles.

PubMed

Abbasi, Fatemeh; Ghanian, Mohammad Hossein; Baharvand, Hossein; Vahidi, Bahman; Eslaminejad, Mohamadreza Baghaban

2018-05-03

Mechanical forces throughout human mesenchymal stem cell (hMSC) spheroids (mesenspheres) play a predominant role in determining cellular functions of cell growth, proliferation, and differentiation through mechanotransductional mechanisms. Here, we introduce microparticle (MP) incorporation as a mechanical intervention method to alter tensional homeostasis of the mesensphere and explore MSC differentiation in response to MP stiffness. The microparticulate mechanoregulators with different elastic modulus (34 kPa, 0.6 MPa, and 2.2 MPa) were prepared by controlled crosslinking cell-sized microdroplets of polydimethylsiloxane (PDMS). Preparation of MP-MSC composite spheroids enabled us to study the possible effects of MPs through experimental and computational assays. Our results showed that MP incorporation selectively primed MSCs toward osteogenesis, yet hindered adipogenesis. Interestingly, this behavior depended on MP mechanics, as the spheroids that contained MPs with intermediate stiffness behaved similar to control MP-free mesenspheres with more tendencies toward chondrogenesis. However, by using the soft or stiff MPs, the MP-mesenspheres significantly showed signs of osteogenesis. This could be explained by the complex of forces which acted in the cell spheroid and, totally, provided a homeostasis situation. Incorporation of cell-sized polymer MPs as mechanoregulators of cell spheroids could be utilized as a new engineering toolkit for multicellular organoids in disease modeling and tissue engineering applications. Copyright © 2018 Elsevier Ltd. All rights reserved.
Computers and Creativity.

ERIC Educational Resources Information Center

Ten Dyke, Richard P.

1982-01-01

A traditional question is whether or not computers shall ever think like humans. This question is redirected to a discussion of whether computers shall ever be truly creative. Creativity is defined and a program is described that is designed to complete creatively a series problem in mathematics. (MP)
Quantitation of intracellular metabolites of [35S]-6-mercaptopurine in L5178Y cells grown in time-course incubates.

PubMed

Breter, H J; Zahn, R K

1979-09-01

6-Mercaptopurine (6MP) metabolism was quantitatively determined in L5178Y murine lymphoma. Cells grown in time-course incubates with [35S]-6MP were extracted with cold perchloric acid, and the buffered extracts were subjected to high-performance liquid cation-exchange chromatography prior to and after hydrolysis with alkaline phosphatase. Free sulfate, 6-thiouric acid, 6-thioxanthosine, 6-thioguanosine, 6-thioinosine, free 6MP, and 6-methylthioinosine were separated from each other; identified in the radiochromatograms by elution volume, UV spectroscopic data, and enzymatic peak-shifting analyses with purine nucleoside phosphorylase; and quantitatively determined by means of 35S radioactivity. Gross intracellular 35S concentrations remained constant at 5 x 10(-5) M after 1 hr of incubation. 6MP metabolism in L5178Y cells was distinguished into an early phase (to 1 hr of incubation) in which 6MP was predominantly catabolized to 6-thiouric acid and free sulfate, into an intermediate phase (to 8 hr) in which substantial amounts of free 6MP and of ribonucleotides of 6-thioxanthosine and 6-thioguanosine were present while the concentrations of nonnucleotide oxidation products sharply decreased, and into a late phase (to 24 hr) in which the ribonucleotides of 6MP, of 6-thioguanosine and, in particular, of 6-methylthioinosine were the most abundant metabolites.
Improving the dissolution and bioavailability of 6-mercaptopurine via co-crystallization with isonicotinamide.

PubMed

Wang, Jian-Rong; Yu, Xueping; Zhou, Chun; Lin, Yunfei; Chen, Chen; Pan, Guoyu; Mei, Xuefeng

2015-03-01

6-Mercaptopurine (6-MP) is a clinically important antitumor drug. The commercially available form was provided as monohydrate and belongs to BCS class II category. Co-crystallization screening by reaction crystallization method (RCM) and monitored by powder X-ray diffraction led to the discovery of a new co-crystal formed between 6-MP and isonicotinamide (co-crystal 1). Co-crystal 1 was thoroughly characterized by X-ray diffraction, FT-IR and Raman spectroscopy, and thermal analysis. Noticeably, the in vitro and in vivo studies revealed that co-crystal 1 possesses improved dissolution rate and superior bioavailability on animal model. Copyright © 2015 Elsevier Ltd. All rights reserved.
Improvement and speed optimization of numerical tsunami modelling program using OpenMP technology

NASA Astrophysics Data System (ADS)

Chernov, A.; Zaytsev, A.; Yalciner, A.; Kurkin, A.

2009-04-01

Currently, the basic problem of tsunami modeling is low speed of calculations which is unacceptable for services of the operative notification. Existing algorithms of numerical modeling of hydrodynamic processes of tsunami waves are developed without taking the opportunities of modern computer facilities. There is an opportunity to have considerable acceleration of process of calculations by using parallel algorithms. We discuss here new approach to parallelization tsunami modeling code using OpenMP Technology (for multiprocessing systems with the general memory). Nowadays, multiprocessing systems are easily accessible for everyone. The cost of the use of such systems becomes much lower comparing to the costs of clusters. This opportunity also benefits all programmers to apply multithreading algorithms on desktop computers of researchers. Other important advantage of the given approach is the mechanism of the general memory - there is no necessity to send data on slow networks (for example Ethernet). All memory is the common for all computing processes; it causes almost linear scalability of the program and processes. In the new version of NAMI DANCE using OpenMP technology and multi-threading algorithm provide 80% gain in speed in comparison with the one-thread version for dual-processor unit. The speed increased and 320% gain was attained for four core processor unit of PCs. Thus, it was possible to reduce considerably time of performance of calculations on the scientific workstations (desktops) without complete change of the program and user interfaces. The further modernization of algorithms of preparation of initial data and processing of results using OpenMP looks reasonable. The final version of NAMI DANCE with the increased computational speed can be used not only for research purposes but also in real time Tsunami Warning Systems.
Deviations from idealised geometries part 3: approximately tetrahedral molecules of form MX 2Y 2 studied by SCF and MP2 calculations

NASA Astrophysics Data System (ADS)

Palmer, Michael H.

1997-03-01

The relatively minor deviations from true tetrahedral geometry for molecules of type MX 2Y 2 where M is tetravalent, and X, Y are either H, Me or halogen are discussed, in the light of ab initio calculations of equilibrium geometry with a large (triple zeta valence + polarisation) basis, at both the SCF and MP2 levels. The results are compared with known experimental structural and dipole moment data; in most cases a very close correlation with experiment is found, with slight improvements in the MP2 data. The study is coupled with a localised orbital study of relevance to Bent's Rule.
An Alternative Mechanism for the Dimerization of Formic Acid

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brinkman, Nicole R.; Tschumper, Gregory; Yan, Ge

Gas-phase formic acid exists primarily as a cyclic dimer. The mechanism of dimerization has been traditionally considered to be a synchronous process; however, recent experimental findings suggest a possible alternative mechanism by which two formic acid monomers proceed through an acyclic dimer to the cyclic dimer in a stepwise process. To investigate this newly proposed process of dimerization in formic acid, density functional theory and second-order Moeller-Plesset perturbation theory (MP2) have been used to optimize cis and trans monomers of formic acid, the acyclic and cyclic dimers, and the acyclic and cyclic transition states between minima. Single-point energies of themore » trans monomer, dimer minima, and transition states at the MP2/TZ2P+diff optimized geometries were computed at the coupled-cluster level of theory including singles and doubles with perturbatively applied triple excitations [CCSD(T)] with an aug-cc-pVTZ basis set to obtain an accurate determination of energy barriers and dissociation energies. A counterpoise correction was performed to determine an estimate of the basis set superposition error in computing relative energies. The explicitly correlated MP2 method of Kutzelnigg and Klopper (MP2-R12) was used to provide an independent means for obtaining the MP2 one-particle limit. The cyclic minimum is predicted to be 6.3 kcal/mol more stable than the acyclic minimum, and the barrier to double proton transfer is 7.1 kcal/mol.« less
Assessment of skeletal age using MP3 and hand-wrist radiographs and its correlation with dental and chronological ages in children.

PubMed

Bala, M; Pathak, A; Jain, R L

2010-01-01

The purpose of the study was to assess skeletal age using MP3 and hand-wrist radiographs and to find the correlation amongst the skeletal, dental and chronological ages. One hundred and sixty North-Indian healthy children in the age group 8-14 years, comprising equal number of males and females were included in the study. The children were radiographed for middle phalanx of third finger (MP3) and hand-wrist of the right hand and intra oral periapical X-ray for right permanent maxillary canine. Skeletal age was assessed from MP3 and hand-wrist radiographs according to the standards of Greulich and Pyle. The dental age was assessed from IOPA radiographs of right permanent maxillary canine based on Nolla's calcification stages. Skeletal age from MP3 and hand-wrist radiographs shows high correlation in all the age groups for both sexes. Females were advanced in skeletal maturation than males. Skeletal age showed high correlation with dental age in 12-14 years age group. Chronological age showed inconsistent correlation with dental and skeletal ages.
Searching for Physics Beyond the Standard Model: Strongly-Coupled Field Theories at the Intensity and Energy Frontiers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brower, Richard C.

This proposal is to develop the software and algorithmic infrastructure needed for the numerical study of quantum chromodynamics (QCD), and of theories that have been proposed to describe physics beyond the Standard Model (BSM) of high energy physics, on current and future computers. This infrastructure will enable users (1) to improve the accuracy of QCD calculations to the point where they no longer limit what can be learned from high-precision experiments that seek to test the Standard Model, and (2) to determine the predictions of BSM theories in order to understand which of them are consistent with the data thatmore » will soon be available from the LHC. Work will include the extension and optimizations of community codes for the next generation of leadership class computers, the IBM Blue Gene/Q and the Cray XE/XK, and for the dedicated hardware funded for our field by the Department of Energy. Members of our collaboration at Brookhaven National Laboratory and Columbia University worked on the design of the Blue Gene/Q, and have begun to develop software for it. Under this grant we will build upon their experience to produce high-efficiency production codes for this machine. Cray XE/XK computers with many thousands of GPU accelerators will soon be available, and the dedicated commodity clusters we obtain with DOE funding include growing numbers of GPUs. We will work with our partners in NVIDIA's Emerging Technology group to scale our existing software to thousands of GPUs, and to produce highly efficient production codes for these machines. Work under this grant will also include the development of new algorithms for the effective use of heterogeneous computers, and their integration into our codes. It will include improvements of Krylov solvers and the development of new multigrid methods in collaboration with members of the FASTMath SciDAC Institute, using their HYPRE framework, as well as work on improved symplectic integrators.« less
Red blood cell hypoxanthine phosphoribosyltransferase activity measured using 6-mercaptopurine as a substrate: a population study in children with acute lymphoblastic leukaemia.

PubMed Central

Lennard, L; Hale, J P; Lilleyman, J S

1993-01-01

1. 6-Mercaptopurine (6-MP) is used in the continuing chemotherapy of childhood acute lymphoblastic leukaemia. The formation of red blood cell (RBC) 6-thioguanine nucleotide (6-TGN) active metabolites, not the dose of 6-MP, is related to cytotoxicity and prognosis. But there is an apparent sex difference in 6-MP metabolism. Boys require more 6-MP than girls to produce the same range of 6-TGN concentrations. Given the same dose, they experience fewer dose reductions because of cytotoxicity, and have a higher relapse rate. 2. The enzyme hypoxanthine phosphoribosyltransferase (HPRT) catalyses the initial activation step in the metabolism of 6-MP to 6-TGNs, a step that requires endogenous phosphoribosyl pyrophosphate (PRPP) as a cosubstrate. Both HPRT and the enzyme responsible for the formation of PRPP are X-linked. 3. RBC HPRT activity was measured in two populations, 86 control children and 63 children with acute lymphoblastic leukaemia. 6-MP was used as the substrate and the formation of the nucleotide product, 6-thioinosinic acid (TIA) was measured. RBC 6-TGN concentrations were measured in the leukaemic children at a standard dose of 6-MP. 4. There was a 1.3 to 1.7 fold range in HPRT activity when measured under optimal conditions. The leukaemic children had significantly higher HPRT activities than the controls (median difference 4.2 micromol TIA ml(-1) RBCs h(-1), 95% C.I. 3.7 to 4.7, P < 0.0001). In the leukaemic children HPRT activity (range 20.4 to 26.6 micromol TIA ml(-1) RBCs h(-1), median 23.6) was not related to the production of 6-TGNs (range 60 to 1,024 pmol 8 x 10(-8) RBCs, median 323). RBC HPRT was present at a high activity even in those children with low 6-TGN concentrations. 5. When HPRT is measured under optimal conditions it does not appear to be the metabolic step responsible for the observed sex difference in 6-MP metabolism. This may be because RBC HPRT activity is not representative of other tissues but it could equally be because other sex-linked factors are influencing substrate availability. PMID:12959304
The CPU and You: Mastering the Microcomputer.

ERIC Educational Resources Information Center

Kansky, Robert

1983-01-01

Computers are both understandable and controllable. Educators need some understanding of a computer's cognitive profile, component parts, and systematic nature in order to set it to work on some of the teaching tasks that need to be done. Much computer-related vocabulary is discussed. (MP)
Beyond the Face of Race: Emo-Cognitive Explorations of White Neurosis and Racial Cray-Cray

ERIC Educational Resources Information Center

Matias, Cheryl E.; DiAngelo, Robin

2013-01-01

In this article, the authors focus on the emotional and cognitive context that underlies whiteness. They employ interdisciplinary approaches of critical Whiteness studies and critical race theory to entertain how common White responses to racial material stem from the need for Whites to deny race, a traumatizing process that begins in childhood.…
Parallel processing on the Livermore VAX 11/780-4 parallel processor system with compatibility to Cray Research, Inc. (CRI) multitasking. Version 1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Werner, N.E.; Van Matre, S.W.

1985-05-01

This manual describes the CRI Subroutine Library and Utility Package. The CRI library provides Cray multitasking functionality on the four-processor shared memory VAX 11/780-4. Additional functionality has been added for more flexibility. A discussion of the library, utilities, error messages, and example programs is provided.
Space shuttle main engine numerical modeling code modifications and analysis

NASA Technical Reports Server (NTRS)

Ziebarth, John P.

1988-01-01

The user of computational fluid dynamics (CFD) codes must be concerned with the accuracy and efficiency of the codes if they are to be used for timely design and analysis of complicated three-dimensional fluid flow configurations. A brief discussion of how accuracy and efficiency effect the CFD solution process is given. A more detailed discussion of how efficiency can be enhanced by using a few Cray Research Inc. utilities to address vectorization is presented and these utilities are applied to a three-dimensional Navier-Stokes CFD code (INS3D).
Unstructured-grid methods development: Lessons le arned

NASA Technical Reports Server (NTRS)

Batina, John T.

1991-01-01

The development is summarized of unstructured grid methods for the solution of the equations of fluid flow and some of the lessons learned are shared. The 3-D Euler equations are solved, including spatial discretizations, temporal discretizations, and boundary conditions. An example calculation with an upwind implicit method using a CFL (Courant Friedricks Lewy) number of infinity is presented for the Boeing 747 aircraft. The results obtained in less than one hour of CPU time on a Cray-2 computer, thus demonstrating the speed and robustness of the present capability.
Parallel Climate Data Assimilation PSAS Package

NASA Technical Reports Server (NTRS)

Ding, Hong Q.; Chan, Clara; Gennery, Donald B.; Ferraro, Robert D.

1996-01-01

We have designed and implemented a set of highly efficient and highly scalable algorithms for an unstructured computational package, the PSAS data assimilation package, as demonstrated by detailed performance analysis of systematic runs on up to 512node Intel Paragon. The equation solver achieves a sustained 18 Gflops performance. As the results, we achieved an unprecedented 100-fold solution time reduction on the Intel Paragon parallel platform over the Cray C90. This not only meets and exceeds the DAO time requirements, but also significantly enlarges the window of exploration in climate data assimilations.
Computer Center Reference Manual

DTIC Science & Technology

1988-06-20

Compiler options, separated by colons (default: A-:B?-:BREG-8:BT-:C-:D+:H2:A H+24:L+:O+:P-:+:RV-:S4:S+4:A ST-:T+: TREG -8:U-:V+:A-:Z+) CPU- - Cray to execute...4 a 0~ 0 -. C 6 0 _ .0’ 63 r C 6 0 C .. 60 0 6 .- 6 ILC A L U 00 00 IL C 6 1 6j 0 1.- U 0 E L 14. E Lc C0 a 60. LL 1 0’. . 𔄀 - 66 6 0 DL C 6 a 4

Some links on this page may take you to non-federal websites. Their policies may differ from this site.