nas parallel benchmark: Topics by Science.gov

Sample records for nas parallel benchmark

The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.

1991-01-01

A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Object-Oriented Implementation of the NAS Parallel Benchmarks using Charm++

NASA Technical Reports Server (NTRS)

Krishnan, Sanjeev; Bhandarkar, Milind; Kale, Laxmikant V.

1996-01-01

This report describes experiences with implementing the NAS Computational Fluid Dynamics benchmarks using a parallel object-oriented language, Charm++. Our main objective in implementing the NAS CFD kernel benchmarks was to develop a code that could be used to easily experiment with different domain decomposition strategies and dynamic load balancing. We also wished to leverage the object-orientation provided by the Charm++ parallel object-oriented language, to develop reusable abstractions that would simplify the process of developing parallel applications. We first describe the Charm++ parallel programming model and the parallel object array abstraction, then go into detail about each of the Scalar Pentadiagonal (SP) and Lower/Upper Triangular (LU) benchmarks, along with performance results. Finally we conclude with an evaluation of the methodology used.
Unstructured Adaptive (UA) NAS Parallel Benchmark. Version 1.0

NASA Technical Reports Server (NTRS)

Feng, Huiyu; VanderWijngaart, Rob; Biswas, Rupak; Mavriplis, Catherine

2004-01-01

We present a complete specification of a new benchmark for measuring the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. It complements the existing NAS Parallel Benchmark suite. The benchmark involves the solution of a stylized heat transfer problem in a cubic domain, discretized on an adaptively refined, unstructured mesh.
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Turney, Raymond D.

2001-01-01

This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.
New NAS Parallel Benchmarks Results

NASA Technical Reports Server (NTRS)

Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

1997-01-01

NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
Statistical Analysis of NAS Parallel Benchmarks and LINPACK Results

NASA Technical Reports Server (NTRS)

Meuer, Hans-Werner; Simon, Horst D.; Strohmeier, Erich; Lasinski, T. A. (Technical Monitor)

1994-01-01

In the last three years extensive performance data have been reported for parallel machines both based on the NAS Parallel Benchmarks, and on LINPACK. In this study we have used the reported benchmark results and performed a number of statistical experiments using factor, cluster, and regression analyses. In addition to the performance results of LINPACK and the eight NAS parallel benchmarks, we have also included peak performance of the machine, and the LINPACK n and n(sub 1/2) values. Some of the results and observations can be summarized as follows: 1) All benchmarks are strongly correlated with peak performance. 2) LINPACK and EP have each a unique signature. 3) The remaining NPB can grouped into three groups as follows: (CG and IS), (LU and SP), and (MG, FT, and BT). Hence three (or four with EP) benchmarks are sufficient to characterize the overall NPB performance. Our poster presentation will follow a standard poster format, and will present the data of our statistical analysis in detail.
Implementation of the NAS Parallel Benchmarks in Java

NASA Technical Reports Server (NTRS)

Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan (Technical Monitor)

2002-01-01

Several features make Java an attractive choice for High Performance Computing (HPC). In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for CFD applications.
Performance and Scalability of the NAS Parallel Benchmarks in Java

NASA Technical Reports Server (NTRS)

Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan A. (Technical Monitor)

2002-01-01

Several features make Java an attractive choice for scientific applications. In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for scientific applications.
Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, Haoqiang; VanderWijngaart, Rob F.

2003-01-01

We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of grids, but had not previously been captured in bench-marks. The new suite, named NPB Multi-Zone, is extended from the NAS Parallel Benchmarks suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the Message Passing Interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on three different parallel computers. We also use an empirical formula to investigate the performance characteristics of the multi-zone benchmarks.
The OpenMP Implementation of NAS Parallel Benchmarks and its Performance

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry

1999-01-01

As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.
Implementation of NAS Parallel Benchmarks in Java

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Schultz, Matthew; Jin, Hao-Qiang; Yan, Jerry

2000-01-01

A number of features make Java an attractive but a debatable choice for High Performance Computing (HPC). In order to gauge the applicability of Java to the Computational Fluid Dynamics (CFD) we have implemented NAS Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would move Java closer to Fortran in the competition for CFD applications.
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)

1993-01-01

A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
NAS Parallel Benchmarks. 2.4

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob; Biegel, Bryan A. (Technical Monitor)

2002-01-01

We describe a new problem size, called Class D, for the NAS Parallel Benchmarks (NPB), whose MPI source code implementation is being released as NPB 2.4. A brief rationale is given for how the new class is derived. We also describe the modifications made to the MPI (Message Passing Interface) implementation to allow the new class to be run on systems with 32-bit integers, and with moderate amounts of memory. Finally, we give the verification values for the new problem size.
MPI, HPF or OpenMP: A Study with the NAS Benchmarks

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Hribar, Michelle; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

1999-01-01

Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but the task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study,potentials of applying some of the techniques to realistic aerospace applications will be presented
MPI, HPF or OpenMP: A Study with the NAS Benchmarks

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Hribar, M.; Waheed, A.; Yan, J.; Saini, Subhash (Technical Monitor)

1999-01-01

Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but this task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study, we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study, potentials of applying some of the techniques to realistic aerospace applications will be presented.
NAS Grid Benchmarks: A Tool for Grid Space Exploration

NASA Technical Reports Server (NTRS)

Frumkin, Michael; VanderWijngaart, Rob F.; Biegel, Bryan (Technical Monitor)

2001-01-01

We present an approach for benchmarking services provided by computational Grids. It is based on the NAS Parallel Benchmarks (NPB) and is called NAS Grid Benchmark (NGB) in this paper. We present NGB as a data flow graph encapsulating an instance of an NPB code in each graph node, which communicates with other nodes by sending/receiving initialization data. These nodes may be mapped to the same or different Grid machines. Like NPB, NGB will specify several different classes (problem sizes). NGB also specifies the generic Grid services sufficient for running the bench-mark. The implementor has the freedom to choose any specific Grid environment. However, we describe a reference implementation in Java, and present some scenarios for using NGB.
NAS Grid Benchmarks. 1.0

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob; Frumkin, Michael; Biegel, Bryan A. (Technical Monitor)

2002-01-01

We provide a paper-and-pencil specification of a benchmark suite for computational grids. It is based on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks (NPB) and is called the NAS Grid Benchmarks (NGB). NGB problems are presented as data flow graphs encapsulating an instance of a slightly modified NPB task in each graph node, which communicates with other nodes by sending/receiving initialization data. Like NPB, NGB specifies several different classes (problem sizes). In this report we describe classes S, W, and A, and provide verification values for each. The implementor has the freedom to choose any language, grid environment, security model, fault tolerance/error correction mechanism, etc., as long as the resulting implementation passes the verification test and reports the turnaround time of the benchmark.
RISC Processors and High Performance Computing

NASA Technical Reports Server (NTRS)

Bailey, David H.; Saini, Subhash; Craw, James M. (Technical Monitor)

1995-01-01

This tutorial will discuss the top five RISC microprocessors and the parallel systems in which they are used. It will provide a unique cross-machine comparison not available elsewhere. The effective performance of these processors will be compared by citing standard benchmarks in the context of real applications. The latest NAS Parallel Benchmarks, both absolute performance and performance per dollar, will be listed. The next generation of the NPB will be described. The tutorial will conclude with a discussion of future directions in the field. Technology Transfer Considerations: All of these computer systems are commercially available internationally. Information about these processors is available in the public domain, mostly from the vendors themselves. The NAS Parallel Benchmarks and their results have been previously approved numerous times for public release, beginning back in 1991.
Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Ye; Ma, Xiaosong; Liu, Qing Gary

2015-01-01

Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time-and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPRIME, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters tomore » create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPRIME benchmarks. They retain the original applications' performance characteristics, in particular the relative performance across platforms.« less

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bailey, David H.

The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, althoughmore » the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage over vector supercomputers, and, if so, which of the parallel offerings would be most useful in real-world scientific computation. In part to draw attention to some of the performance reporting abuses prevalent at the time, the present author wrote a humorous essay 'Twelve Ways to Fool the Masses,' which described in a light-hearted way a number of the questionable ways in which both vendor marketing people and scientists were inflating and distorting their performance results. All of this underscored the need for an objective and scientifically defensible measure to compare performance on these systems.« less
Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications

NASA Technical Reports Server (NTRS)

Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele

2004-01-01

In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.
Analysis of 100Mb/s Ethernet for the Whitney Commodity Computing Testbed

NASA Technical Reports Server (NTRS)

Fineberg, Samuel A.; Pedretti, Kevin T.; Kutler, Paul (Technical Monitor)

1997-01-01

We evaluate the performance of a Fast Ethernet network configured with a single large switch, a single hub, and a 4x4 2D torus topology in a testbed cluster of "commodity" Pentium Pro PCs. We also evaluated a mixed network composed of ethernet hubs and switches. An MPI collective communication benchmark, and the NAS Parallel Benchmarks version 2.2 (NPB2) show that the torus network performs best for all sizes that we were able to test (up to 16 nodes). For larger networks the ethernet switch outperforms the hub, though its performance is far less than peak. The hub/switch combination tests indicate that the NAS parallel benchmarks are relatively insensitive to hub densities of less than 7 nodes per hub.
NAS Parallel Benchmark Results 11-96. 1.0

NASA Technical Reports Server (NTRS)

Bailey, David H.; Bailey, David; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

The NAS Parallel Benchmarks have been developed at NASA Ames Research Center to study the performance of parallel supercomputers. The eight benchmark problems are specified in a "pencil and paper" fashion. In other words, the complete details of the problem to be solved are given in a technical document, and except for a few restrictions, benchmarkers are free to select the language constructs and implementation techniques best suited for a particular system. These results represent the best results that have been reported to us by the vendors for the specific 3 systems listed. In this report, we present new NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz), NEC SX-4/32, SGI/CRAY T3E, SGI Origin200, and SGI Origin2000. We also report High Performance Fortran (HPF) based NPB results for IBM SP2 Wide Nodes, HP/Convex Exemplar SPP2000, and SGI/CRAY T3D. These results have been submitted by Applied Parallel Research (APR) and Portland Group Inc. (PGI). We also present sustained performance per dollar for Class B LU, SP and BT benchmarks.
High Performance Computing at NASA

NASA Technical Reports Server (NTRS)

Bailey, David H.; Cooper, D. M. (Technical Monitor)

1994-01-01

The speaker will give an overview of high performance computing in the U.S. in general and within NASA in particular, including a description of the recently signed NASA-IBM cooperative agreement. The latest performance figures of various parallel systems on the NAS Parallel Benchmarks will be presented. The speaker was one of the authors of the NAS (National Aerospace Standards) Parallel Benchmarks, which are now widely cited in the industry as a measure of sustained performance on realistic high-end scientific applications. It will be shown that significant progress has been made by the highly parallel supercomputer industry during the past year or so, with several new systems, based on high-performance RISC processors, that now deliver superior performance per dollar compared to conventional supercomputers. Various pitfalls in reporting performance will be discussed. The speaker will then conclude by assessing the general state of the high performance computing field.
A Comparison of Automatic Parallelization Tools/Compilers on the SGI Origin 2000 Using the NAS Benchmarks

NASA Technical Reports Server (NTRS)

Saini, Subhash; Frumkin, Michael; Hribar, Michelle; Jin, Hao-Qiang; Waheed, Abdul; Yan, Jerry

1998-01-01

Porting applications to new high performance parallel and distributed computing platforms is a challenging task. Since writing parallel code by hand is extremely time consuming and costly, porting codes would ideally be automated by using some parallelization tools and compilers. In this paper, we compare the performance of the hand written NAB Parallel Benchmarks against three parallel versions generated with the help of tools and compilers: 1) CAPTools: an interactive computer aided parallelization too] that generates message passing code, 2) the Portland Group's HPF compiler and 3) using compiler directives with the native FORTAN77 compiler on the SGI Origin2000.
Heterogeneous Distributed Computing for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Sunderam, Vaidy S.

1998-01-01

The research supported under this award focuses on heterogeneous distributed computing for high-performance applications, with particular emphasis on computational aerosciences. The overall goal of this project was to and investigate issues in, and develop solutions to, efficient execution of computational aeroscience codes in heterogeneous concurrent computing environments. In particular, we worked in the context of the PVM[1] system and, subsequent to detailed conversion efforts and performance benchmarking, devising novel techniques to increase the efficacy of heterogeneous networked environments for computational aerosciences. Our work has been based upon the NAS Parallel Benchmark suite, but has also recently expanded in scope to include the NAS I/O benchmarks as specified in the NHT-1 document. In this report we summarize our research accomplishments under the auspices of the grant.
Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Saini, Subhash

1997-01-01

Compilers supporting High Performance Form (HPF) features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR), Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI) combinations will be compared, based on latest NAS Parallel Benchmark results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition, we would also present NPB, (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu CAPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz), NEC SX-4/32, SGI/CRAY T3E, and SGI Origin2000. We would also present sustained performance per dollar for Class B LU, SP and BT benchmarks.
A Programming Model Performance Study Using the NAS Parallel Benchmarks

DOE PAGES

Shan, Hongzhang; Blagojević, Filip; Min, Seung-Jai; ...

2010-01-01

Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the threemore » programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors. We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.« less
Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry

1998-01-01

This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
NAS Parallel Benchmark. Results 11-96: Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks. 1.0

NASA Technical Reports Server (NTRS)

Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.
Implementation of BT, SP, LU, and FT of NAS Parallel Benchmarks in Java

NASA Technical Reports Server (NTRS)

Schultz, Matthew; Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

2000-01-01

A number of Java features make it an attractive but a debatable choice for High Performance Computing. We have implemented benchmarks working on single structured grid BT,SP,LU and FT in Java. The performance and scalability of the Java code shows that a significant improvement in Java compiler technology and in Java thread implementation are necessary for Java to compete with Fortran in HPC applications.
An efficient parallel algorithm for matrix-vector multiplication

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendrickson, B.; Leland, R.; Plimpton, S.

The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
Unstructured Adaptive Meshes: Bad for Your Memory?

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Feng, Hui-Yu; VanderWijngaart, Rob

2003-01-01

This viewgraph presentation explores the need for a NASA Advanced Supercomputing (NAS) parallel benchmark for problems with irregular dynamical memory access. This benchmark is important and necessary because: 1) Problems with localized error source benefit from adaptive nonuniform meshes; 2) Certain machines perform poorly on such problems; 3) Parallel implementation may provide further performance improvement but is difficult. Some examples of problems which use irregular dynamical memory access include: 1) Heat transfer problem; 2) Heat source term; 3) Spectral element method; 4) Base functions; 5) Elemental discrete equations; 6) Global discrete equations. Nonconforming Mesh and Mortar Element Method are covered in greater detail in this presentation.
Predicting Cost/Performance Trade-Offs for Whitney: A Commodity Computing Cluster

NASA Technical Reports Server (NTRS)

Becker, Jeffrey C.; Nitzberg, Bill; VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

1997-01-01

Recent advances in low-end processor and network technology have made it possible to build a "supercomputer" out of commodity components. We develop simple models of the NAS Parallel Benchmarks version 2 (NPB 2) to explore the cost/performance trade-offs involved in building a balanced parallel computer supporting a scientific workload. We develop closed form expressions detailing the number and size of messages sent by each benchmark. Coupling these with measured single processor performance, network latency, and network bandwidth, our models predict benchmark performance to within 30%. A comparison based on total system cost reveals that current commodity technology (200 MHz Pentium Pros with 100baseT Ethernet) is well balanced for the NPBs up to a total system cost of around $1,000,000.
Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster

NASA Technical Reports Server (NTRS)

Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland

2003-01-01

In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Testing New Programming Paradigms with NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

2000-01-01

Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
Design of Unstructured Adaptive (UA) NAS Parallel Benchmark Featuring Irregular, Dynamic Memory Accesses

NASA Technical Reports Server (NTRS)

Feng, Hui-Yu; VanderWijngaart, Rob; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2001-01-01

We describe the design of a new method for the measurement of the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. The method involves the solution of a stylized heat transfer problem on an unstructured, adaptive grid. A Spectral Element Method (SEM) with an adaptive, nonconforming mesh is selected to discretize the transport equation. The relatively high order of the SEM lowers the fraction of wall clock time spent on inter-processor communication, which eases the load balancing task and allows us to concentrate on the memory accesses. The benchmark is designed to be three-dimensional. Parallelization and load balance issues of a reference implementation will be described in detail in future reports.
Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster

NASA Technical Reports Server (NTRS)

Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)

2002-01-01

In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Scalability and Portability of Two Parallel Implementations of ADI

NASA Technical Reports Server (NTRS)

Phung, Thanh; VanderWijngaart, Rob F.

1994-01-01

Two domain decompositions for the implementation of the NAS Scalar Penta-diagonal Parallel Benchmark on MIMD systems are investigated, namely transposition and multi-partitioning. Hardware platforms considered are the Intel iPSC/860 and Paragon XP/S-15, and clusters of SGI workstations on ethernet, communicating through PVM. It is found that the multi-partitioning strategy offers the kind of coarse granularity that allows scaling up to hundreds of processors on a massively parallel machine. Moreover, efficiency is retained when the code is ported verbatim (save message passing syntax) to a PVM environment on a modest size cluster of workstations.

A new deadlock resolution protocol and message matching algorithm for the extreme-scale simulator

DOE PAGES

Engelmann, Christian; Naughton, III, Thomas J.

2016-03-22

Investigating the performance of parallel applications at scale on future high-performance computing (HPC) architectures and the performance impact of different HPC architecture choices is an important component of HPC hardware/software co-design. The Extreme-scale Simulator (xSim) is a simulation toolkit for investigating the performance of parallel applications at scale. xSim scales to millions of simulated Message Passing Interface (MPI) processes. The overhead introduced by a simulation tool is an important performance and productivity aspect. This paper documents two improvements to xSim: (1)~a new deadlock resolution protocol to reduce the parallel discrete event simulation overhead and (2)~a new simulated MPI message matchingmore » algorithm to reduce the oversubscription management overhead. The results clearly show a significant performance improvement. The simulation overhead for running the NAS Parallel Benchmark suite was reduced from 102% to 0% for the embarrassingly parallel (EP) benchmark and from 1,020% to 238% for the conjugate gradient (CG) benchmark. xSim offers a highly accurate simulation mode for better tracking of injected MPI process failures. Furthermore, with highly accurate simulation, the overhead was reduced from 3,332% to 204% for EP and from 37,511% to 13,808% for CG.« less
Understanding the Cray X1 System

NASA Technical Reports Server (NTRS)

Cheung, Samson

2004-01-01

This paper helps the reader understand the characteristics of the Cray X1 vector supercomputer system, and provides hints and information to enable the reader to port codes to the system. It provides a comparison between the basic performance of the X1 platform and other platforms that are available at NASA Ames Research Center. A set of codes, solving the Laplacian equation with different parallel paradigms, is used to understand some features of the X1 compiler. An example code from the NAS Parallel Benchmarks is used to demonstrate performance optimization on the X1 platform.
High Performance Programming Using Explicit Shared Memory Model on the Cray T3D

NASA Technical Reports Server (NTRS)

Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)

1994-01-01

The Cray T3D is the first-phase system in Cray Research Inc.'s (CRI) three-phase massively parallel processing program. In this report we describe the architecture of the T3D, as well as the CRAFT (Cray Research Adaptive Fortran) programming model, and contrast it with PVM, which is also supported on the T3D We present some performance data based on the NAS Parallel Benchmarks to illustrate both architectural and software features of the T3D.
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

NASA Technical Reports Server (NTRS)

Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
Performance and scalability evaluation of "Big Memory" on Blue Gene Linux.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoshii, K.; Iskra, K.; Naik, H.

2011-05-01

We address memory performance issues observed in Blue Gene Linux and discuss the design and implementation of 'Big Memory' - an alternative, transparent memory space introduced to eliminate the memory performance issues. We evaluate the performance of Big Memory using custom memory benchmarks, NAS Parallel Benchmarks, and the Parallel Ocean Program, at a scale of up to 4,096 nodes. We find that Big Memory successfully resolves the performance issues normally encountered in Blue Gene Linux. For the ocean simulation program, we even find that Linux with Big Memory provides better scalability than does the lightweight compute node kernel designed solelymore » for high-performance applications. Originally intended exclusively for compute node tasks, our new memory subsystem dramatically improves the performance of certain I/O node applications as well. We demonstrate this performance using the central processor of the LOw Frequency ARray radio telescope as an example.« less
The NAS kernel benchmark program

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barton, J. T.

1985-01-01

A collection of benchmark test kernels that measure supercomputer performance has been developed for the use of the NAS (Numerical Aerodynamic Simulation) program at the NASA Ames Research Center. This benchmark program is described in detail and the specific ground rules are given for running the program as a performance test.
Using domain decomposition in the multigrid NAS parallel benchmark on the Fujitsu VPP500

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, J.C.H.; Lung, H.; Katsumata, Y.

1995-12-01

In this paper, we demonstrate how domain decomposition can be applied to the multigrid algorithm to convert the code for MPP architectures. We also discuss the performance and scalability of this implementation on the new product line of Fujitsu`s vector parallel computer, VPP500. This computer has Fujitsu`s well-known vector processor as the PE each rated at 1.6 C FLOPS. The high speed crossbar network rated at 800 MB/s provides the inter-PE communication. The results show that the physical domain decomposition is the best way to solve MG problems on VPP500.
Automated Instrumentation, Monitoring and Visualization of PVM Programs Using AIMS

NASA Technical Reports Server (NTRS)

Mehra, Pankaj; VanVoorst, Brian; Yan, Jerry; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

We present views and analysis of the execution of several PVM (Parallel Virtual Machine) codes for Computational Fluid Dynamics on a networks of Sparcstations, including: (1) NAS Parallel Benchmarks CG and MG; (2) a multi-partitioning algorithm for NAS Parallel Benchmark SP; and (3) an overset grid flowsolver. These views and analysis were obtained using our Automated Instrumentation and Monitoring System (AIMS) version 3.0, a toolkit for debugging the performance of PVM programs. We will describe the architecture, operation and application of AIMS. The AIMS toolkit contains: (1) Xinstrument, which can automatically instrument various computational and communication constructs in message-passing parallel programs; (2) Monitor, a library of runtime trace-collection routines; (3) VK (Visual Kernel), an execution-animation tool with source-code clickback; and (4) Tally, a tool for statistical analysis of execution profiles. Currently, Xinstrument can handle C and Fortran 77 programs using PVM 3.2.x; Monitor has been implemented and tested on Sun 4 systems running SunOS 4.1.2; and VK uses XIIR5 and Motif 1.2. Data and views obtained using AIMS clearly illustrate several characteristic features of executing parallel programs on networked workstations: (1) the impact of long message latencies; (2) the impact of multiprogramming overheads and associated load imbalance; (3) cache and virtual-memory effects; and (4) significant skews between workstation clocks. Interestingly, AIMS can compensate for constant skew (zero drift) by calibrating the skew between a parent and its spawned children. In addition, AIMS' skew-compensation algorithm can adjust timestamps in a way that eliminates physically impossible communications (e.g., messages going backwards in time). Our current efforts are directed toward creating new views to explain the observed performance of PVM programs. Some of the features planned for the near future include: (1) ConfigView, showing the physical topology of the virtual machine, inferred using specially formatted IP (Internet Protocol) packets: and (2) LoadView, synchronous animation of PVM-program execution and resource-utilization patterns.
Applications Performance Under MPL and MPI on NAS IBM SP2

NASA Technical Reports Server (NTRS)

Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)

1994-01-01

On July 5, 1994, an IBM Scalable POWER parallel System (IBM SP2) with 64 nodes, was installed at the Numerical Aerodynamic Simulation (NAS) Facility Each node of NAS IBM SP2 is a "wide node" consisting of a RISC 6000/590 workstation module with a clock of 66.5 MHz which can perform four floating point operations per clock with a peak performance of 266 Mflop/s. By the end of 1994, 64 nodes of IBM SP2 will be upgraded to 160 nodes with a peak performance of 42.5 Gflop/s. An overview of the IBM SP2 hardware is presented. The basic understanding of architectural details of RS 6000/590 will help application scientists the porting, optimizing, and tuning of codes from other machines such as the CRAY C90 and the Paragon to the NAS SP2. Optimization techniques such as quad-word loading, effective utilization of two floating point units, and data cache optimization of RS 6000/590 is illustrated, with examples giving performance gains at each optimization step. The conversion of codes using Intel's message passing library NX to codes using native Message Passing Library (MPL) and the Message Passing Interface (NMI) library available on the IBM SP2 is illustrated. In particular, we will present the performance of Fast Fourier Transform (FFT) kernel from NAS Parallel Benchmarks (NPB) under MPL and MPI. We have also optimized some of Fortran BLAS 2 and BLAS 3 routines, e.g., the optimized Fortran DAXPY runs at 175 Mflop/s and optimized Fortran DGEMM runs at 230 Mflop/s per node. The performance of the NPB (Class B) on the IBM SP2 is compared with the CRAY C90, Intel Paragon, TMC CM-5E, and the CRAY T3D.
An Application-Based Performance Characterization of the Columbia Supercluster

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Djomehri, Jahed M.; Hood, Robert; Jin, Hoaqiang; Kiris, Cetin; Saini, Subhash

2005-01-01

Columbia is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processors each, and currently ranked as the second-fastest computer in the world. In this paper, we present the performance characteristics of Columbia obtained on up to four computing nodes interconnected via the InfiniBand and/or NUMAlink4 communication fabrics. We evaluate floating-point performance, memory bandwidth, message passing communication speeds, and compilers using a subset of the HPC Challenge benchmarks, and some of the NAS Parallel Benchmarks including the multi-zone versions. We present detailed performance results for three scientific applications of interest to NASA, one from molecular dynamics, and two from computational fluid dynamics. Our results show that both the NUMAlink4 and the InfiniBand hold promise for application scaling to a large number of processors.
Automated Generation of Message-Passing Programs: An Evaluation Using CAPTools

NASA Technical Reports Server (NTRS)

Hribar, Michelle R.; Jin, Haoqiang; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

Scientists at NASA Ames Research Center have been developing computational aeroscience applications on highly parallel architectures over the past ten years. During that same time period, a steady transition of hardware and system software also occurred, forcing us to expend great efforts into migrating and re-coding our applications. As applications and machine architectures become increasingly complex, the cost and time required for this process will become prohibitive. In this paper, we present the first set of results in our evaluation of interactive parallelization tools. In particular, we evaluate CAPTool's ability to parallelize computational aeroscience applications. CAPTools was tested on serial versions of the NAS Parallel Benchmarks and ARC3D, a computational fluid dynamics application, on two platforms: the SGI Origin 2000 and the Cray T3E. This evaluation includes performance, amount of user interaction required, limitations and portability. Based on these results, a discussion on the feasibility of computer aided parallelization of aerospace applications is presented along with suggestions for future work.
I/O-Efficient Scientific Computation Using TPIE

NASA Technical Reports Server (NTRS)

Vengroff, Darren Erik; Vitter, Jeffrey Scott

1996-01-01

In recent years, input/output (I/O)-efficient algorithms for a wide variety of problems have appeared in the literature. However, systems specifically designed to assist programmers in implementing such algorithms have remained scarce. TPIE is a system designed to support I/O-efficient paradigms for problems from a variety of domains, including computational geometry, graph algorithms, and scientific computation. The TPIE interface frees programmers from having to deal not only with explicit read and write calls, but also the complex memory management that must be performed for I/O-efficient computation. In this paper we discuss applications of TPIE to problems in scientific computation. We discuss algorithmic issues underlying the design and implementation of the relevant components of TPIE and present performance results of programs written to solve a series of benchmark problems using our current TPIE prototype. Some of the benchmarks we present are based on the NAS parallel benchmarks while others are of our own creation. We demonstrate that the central processing unit (CPU) overhead required to manage I/O is small and that even with just a single disk, the I/O overhead of I/O-efficient computation ranges from negligible to the same order of magnitude as CPU time. We conjecture that if we use a number of disks in parallel this overhead can be all but eliminated.
RISC Processors and High Performance Computing

NASA Technical Reports Server (NTRS)

Saini, Subhash; Bailey, David H.; Lasinski, T. A. (Technical Monitor)

1995-01-01

In this tutorial, we will discuss top five current RISC microprocessors: The IBM Power2, which is used in the IBM RS6000/590 workstation and in the IBM SP2 parallel supercomputer, the DEC Alpha, which is in the DEC Alpha workstation and in the Cray T3D; the MIPS R8000, which is used in the SGI Power Challenge; the HP PA-RISC 7100, which is used in the HP 700 series workstations and in the Convex Exemplar; and the Cray proprietary processor, which is used in the new Cray J916. The architecture of these microprocessors will first be presented. The effective performance of these processors will then be compared, both by citing standard benchmarks and also in the context of implementing a real applications. In the process, different programming models such as data parallel (CM Fortran and HPF) and message passing (PVM and MPI) will be introduced and compared. The latest NAS Parallel Benchmark (NPB) absolute performance and performance per dollar figures will be presented. The next generation of the NP13 will also be described. The tutorial will conclude with a discussion of general trends in the field of high performance computing, including likely future developments in hardware and software technology, and the relative roles of vector supercomputers tightly coupled parallel computers, and clusters of workstations. This tutorial will provide a unique cross-machine comparison not available elsewhere.
Automation of Data Traffic Control on DSM Architecture

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

2001-01-01

The design of distributed shared memory (DSM) computers liberates users from the duty to distribute data across processors and allows for the incremental development of parallel programs using, for example, OpenMP or Java threads. DSM architecture greatly simplifies the development of parallel programs having good performance on a few processors. However, to achieve a good program scalability on DSM computers requires that the user understand data flow in the application and use various techniques to avoid data traffic congestions. In this paper we discuss a number of such techniques, including data blocking, data placement, data transposition and page size control and evaluate their efficiency on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks. We also present a tool which automates the detection of constructs causing data congestions in Fortran array oriented codes and advises the user on code transformations for improving data traffic in the application.
Implementation of ADI: Schemes on MIMD parallel computers

NASA Technical Reports Server (NTRS)

Vanderwijngaart, Rob F.

1993-01-01

In order to simulate the effects of the impingement of hot exhaust jets of High Performance Aircraft on landing surfaces a multi-disciplinary computation coupling flow dynamics to heat conduction in the runway needs to be carried out. Such simulations, which are essentially unsteady, require very large computational power in order to be completed within a reasonable time frame of the order of an hour. Such power can be furnished by the latest generation of massively parallel computers. These remove the bottleneck of ever more congested data paths to one or a few highly specialized central processing units (CPU's) by having many off-the-shelf CPU's work independently on their own data, and exchange information only when needed. During the past year the first phase of this project was completed, in which the optimal strategy for mapping an ADI-algorithm for the three dimensional unsteady heat equation to a MIMD parallel computer was identified. This was done by implementing and comparing three different domain decomposition techniques that define the tasks for the CPU's in the parallel machine. These implementations were done for a Cartesian grid and Dirichlet boundary conditions. The most promising technique was then used to implement the heat equation solver on a general curvilinear grid with a suite of nontrivial boundary conditions. Finally, this technique was also used to implement the Scalar Penta-diagonal (SP) benchmark, which was taken from the NAS Parallel Benchmarks report. All implementations were done in the programming language C on the Intel iPSC/860 computer.
Six Years of Parallel Computing at NAS (1987 - 1993): What Have we Learned?

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Cooper, D. M. (Technical Monitor)

1994-01-01

In the fall of 1987 the age of parallelism at NAS began with the installation of a 32K processor CM-2 from Thinking Machines. In 1987 this was described as an "experiment" in parallel processing. In the six years since, NAS acquired a series of parallel machines, and conducted an active research and development effort focused on the use of highly parallel machines for applications in the computational aerosciences. In this time period parallel processing for scientific applications evolved from a fringe research topic into the one of main activities at NAS. In this presentation I will review the history of parallel computing at NAS in the context of the major progress, which has been made in the field in general. I will attempt to summarize the lessons we have learned so far, and the contributions NAS has made to the state of the art. Based on these insights I will comment on the current state of parallel computing (including the HPCC effort) and try to predict some trends for the next six years.
Automated Instrumentation, Monitoring and Visualization of PVM Programs Using AIMS

NASA Technical Reports Server (NTRS)

Mehra, Pankaj; VanVoorst, Brian; Yan, Jerry; Tucker, Deanne (Technical Monitor)

1994-01-01

We present views and analysis of the execution of several PVM codes for Computational Fluid Dynamics on a network of Sparcstations, including (a) NAS Parallel benchmarks CG and MG (White, Alund and Sunderam 1993); (b) a multi-partitioning algorithm for NAS Parallel Benchmark SP (Wijngaart 1993); and (c) an overset grid flowsolver (Smith 1993). These views and analysis were obtained using our Automated Instrumentation and Monitoring System (AIMS) version 3.0, a toolkit for debugging the performance of PVM programs. We will describe the architecture, operation and application of AIMS. The AIMS toolkit contains (a) Xinstrument, which can automatically instrument various computational and communication constructs in message-passing parallel programs; (b) Monitor, a library of run-time trace-collection routines; (c) VK (Visual Kernel), an execution-animation tool with source-code clickback; and (d) Tally, a tool for statistical analysis of execution profiles. Currently, Xinstrument can handle C and Fortran77 programs using PVM 3.2.x; Monitor has been implemented and tested on Sun 4 systems running SunOS 4.1.2; and VK uses X11R5 and Motif 1.2. Data and views obtained using AIMS clearly illustrate several characteristic features of executing parallel programs on networked workstations: (a) the impact of long message latencies; (b) the impact of multiprogramming overheads and associated load imbalance; (c) cache and virtual-memory effects; and (4significant skews between workstation clocks. Interestingly, AIMS can compensate for constant skew (zero drift) by calibrating the skew between a parent and its spawned children. In addition, AIMS' skew-compensation algorithm can adjust timestamps in a way that eliminates physically impossible communications (e.g., messages going backwards in time). Our current efforts are directed toward creating new views to explain the observed performance of PVM programs. Some of the features planned for the near future include: (a) ConfigView, showing the physical topology of the virtual machine, inferred using specially formatted IP (Internet Protocol) packets; and (b) LoadView, synchronous animation of PVM-program execution and resource-utilization patterns.
Adding Fault Tolerance to NPB Benchmarks Using ULFM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parchman, Zachary W; Vallee, Geoffroy R; Naughton III, Thomas J

2016-01-01

In the world of high-performance computing, fault tolerance and application resilience are becoming some of the primary concerns because of increasing hardware failures and memory corruptions. While the research community has been investigating various options, from system-level solutions to application-level solutions, standards such as the Message Passing Interface (MPI) are also starting to include such capabilities. The current proposal for MPI fault tolerant is centered around the User-Level Failure Mitigation (ULFM) concept, which provides means for fault detection and recovery of the MPI layer. This approach does not address application-level recovery, which is currently left to application developers. In thismore » work, we present a mod- ification of some of the benchmarks of the NAS parallel benchmark (NPB) to include support of the ULFM capabilities as well as application-level strategies and mechanisms for application-level failure recovery. As such, we present: (i) an application-level library to checkpoint and restore data, (ii) extensions of NPB benchmarks for fault tolerance based on different strategies, (iii) a fault injection tool, and (iv) some preliminary results that show the impact of such fault tolerant strategies on the application execution.« less
Analysis of 2D Torus and Hub Topologies of 100Mb/s Ethernet for the Whitney Commodity Computing Testbed

NASA Technical Reports Server (NTRS)

Pedretti, Kevin T.; Fineberg, Samuel A.; Kutler, Paul (Technical Monitor)

1997-01-01

A variety of different network technologies and topologies are currently being evaluated as part of the Whitney Project. This paper reports on the implementation and performance of a Fast Ethernet network configured in a 4x4 2D torus topology in a testbed cluster of 'commodity' Pentium Pro PCs. Several benchmarks were used for performance evaluation: an MPI point to point message passing benchmark, an MPI collective communication benchmark, and the NAS Parallel Benchmarks version 2.2 (NPB2). Our results show that for point to point communication on an unloaded network, the hub and 1 hop routes on the torus have about the same bandwidth and latency. However, the bandwidth decreases and the latency increases on the torus for each additional route hop. Collective communication benchmarks show that the torus provides roughly four times more aggregate bandwidth and eight times faster MPI barrier synchronizations than a hub based network for 16 processor systems. Finally, the SOAPBOX benchmarks, which simulate real-world CFD applications, generally demonstrated substantially better performance on the torus than on the hub. In the few cases the hub was faster, the difference was negligible. In total, our experimental results lead to the conclusion that for Fast Ethernet networks, the torus topology has better performance and scales better than a hub based network.

NASA Exhibits

NASA Technical Reports Server (NTRS)

Deardorff, Glenn; Djomehri, M. Jahed; Freeman, Ken; Gambrel, Dave; Green, Bryan; Henze, Chris; Hinke, Thomas; Hood, Robert; Kiris, Cetin; Moran, Patrick;

2001-01-01

A series of NASA presentations for the Supercomputing 2001 conference are summarized. The topics include: (1) Mars Surveyor Landing Sites "Collaboratory"; (2) Parallel and Distributed CFD for Unsteady Flows with Moving Overset Grids; (3) IP Multicast for Seamless Support of Remote Science; (4) Consolidated Supercomputing Management Office; (5) Growler: A Component-Based Framework for Distributed/Collaborative Scientific Visualization and Computational Steering; (6) Data Mining on the Information Power Grid (IPG); (7) Debugging on the IPG; (8) Debakey Heart Assist Device: (9) Unsteady Turbopump for Reusable Launch Vehicle; (10) Exploratory Computing Environments Component Framework; (11) OVERSET Computational Fluid Dynamics Tools; (12) Control and Observation in Distributed Environments; (13) Multi-Level Parallelism Scaling on NASA's Origin 1024 CPU System; (14) Computing, Information, & Communications Technology; (15) NAS Grid Benchmarks; (16) IPG: A Large-Scale Distributed Computing and Data Management System; and (17) ILab: Parameter Study Creation and Submission on the IPG.

The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit

NASA Technical Reports Server (NTRS)

Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete;

1998-01-01

Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.

The Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

NASA Technical Reports Server (NTRS)

Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)

2001-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
A Parallel Multigrid Solver for Viscous Flows on Anisotropic Structured Grids

NASA Technical Reports Server (NTRS)

Prieto, Manuel; Montero, Ruben S.; Llorente, Ignacio M.; Bushnell, Dennis M. (Technical Monitor)

2001-01-01

This paper presents an efficient parallel multigrid solver for speeding up the computation of a 3-D model that treats the flow of a viscous fluid over a flat plate. The main interest of this simulation lies in exhibiting some basic difficulties that prevent optimal multigrid efficiencies from being achieved. As the computing platform, we have used Coral, a Beowulf-class system based on Intel Pentium processors and equipped with GigaNet cLAN and switched Fast Ethernet networks. Our study not only examines the scalability of the solver but also includes a performance evaluation of Coral where the investigated solver has been used to compare several of its design choices, namely, the interconnection network (GigaNet versus switched Fast-Ethernet) and the node configuration (dual nodes versus single nodes). As a reference, the performance results have been compared with those obtained with the NAS-MG benchmark.
Evaluating the Information Power Grid using the NAS Grid Benchmarks

NASA Technical Reports Server (NTRS)

VanderWijngaartm Rob F.; Frumkin, Michael A.

2004-01-01

The NAS Grid Benchmarks (NGB) are a collection of synthetic distributed applications designed to rate the performance and functionality of computational grids. We compare several implementations of the NGB to determine programmability and efficiency of NASA's Information Power Grid (IPG), whose services are mostly based on the Globus Toolkit. We report on the overheads involved in porting existing NGB reference implementations to the IPG. No changes were made to the component tasks of the NGB can still be improved.
Automatic Data Traffic Control on DSM Architecture

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry; Kwak, Dochan (Technical Monitor)

2000-01-01

We study data traffic on distributed shared memory machines and conclude that data placement and grouping improve performance of scientific codes. We present several methods which user can employ to improve data traffic in his code. We report on implementation of a tool which detects the code fragments causing data congestions and advises user on improvements of data routing in these fragments. The capabilities of the tool include deduction of data alignment and affinity from the source code; detection of the code constructs having abnormally high cache or TLB misses; generation of data placement constructs. We demonstrate the capabilities of the tool on experiments with NAS parallel benchmarks and with a simple computational fluid dynamics application ARC3D.
Integrating the Apache Big Data Stack with HPC for Big Data

NASA Astrophysics Data System (ADS)

Fox, G. C.; Qiu, J.; Jha, S.

2014-12-01

There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development. However, the same is not so true for data intensive computing, even though commercially clouds devote much more resources to data analytics than supercomputers devote to simulations. We look at a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures. We suggest a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks and use these to identify a few key classes of hardware/software architectures. Our analysis builds on combining HPC and ABDS the Apache big data software stack that is well used in modern cloud computing. Initial results on clouds and HPC systems are encouraging. We propose the development of SPIDAL - Scalable Parallel Interoperable Data Analytics Library -- built on system aand data abstractions suggested by the HPC-ABDS architecture. We discuss how it can be used in several application areas including Polar Science.
A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

DOE PAGES

Siddique, Nafiul A.; Grubel, Patricia A.; Badawy, Abdel-Hameed A.; ...

2017-09-20

Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic applicationmore » behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkit (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Finally, our results suggest that variable cache line size can result in better performance and can also conserve power.« less
A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Siddique, Nafiul A.; Grubel, Patricia A.; Badawy, Abdel-Hameed A.

Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic applicationmore » behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkit (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Finally, our results suggest that variable cache line size can result in better performance and can also conserve power.« less
Applications Performance on NAS Intel Paragon XP/S - 15#

NASA Technical Reports Server (NTRS)

Saini, Subhash; Simon, Horst D.; Copper, D. M. (Technical Monitor)

1994-01-01

The Numerical Aerodynamic Simulation (NAS) Systems Division received an Intel Touchstone Sigma prototype model Paragon XP/S- 15 in February, 1993. The i860 XP microprocessor with an integrated floating point unit and operating in dual -instruction mode gives peak performance of 75 million floating point operations (NIFLOPS) per second for 64 bit floating point arithmetic. It is used in the Paragon XP/S-15 which has been installed at NAS, NASA Ames Research Center. The NAS Paragon has 208 nodes and its peak performance is 15.6 GFLOPS. Here, we will report on early experience using the Paragon XP/S- 15. We have tested its performance using both kernels and applications of interest to NAS. We have measured the performance of BLAS 1, 2 and 3 both assembly-coded and Fortran coded on NAS Paragon XP/S- 15. Furthermore, we have investigated the performance of a single node one-dimensional FFT, a distributed two-dimensional FFT and a distributed three-dimensional FFT Finally, we measured the performance of NAS Parallel Benchmarks (NPB) on the Paragon and compare it with the performance obtained on other highly parallel machines, such as CM-5, CRAY T3D, IBM SP I, etc. In particular, we investigated the following issues, which can strongly affect the performance of the Paragon: a. Impact of the operating system: Intel currently uses as a default an operating system OSF/1 AD from the Open Software Foundation. The paging of Open Software Foundation (OSF) server at 22 MB to make more memory available for the application degrades the performance. We found that when the limit of 26 NIB per node out of 32 MB available is reached, the application is paged out of main memory using virtual memory. When the application starts paging, the performance is considerably reduced. We found that dynamic memory allocation can help applications performance under certain circumstances. b. Impact of data cache on the i860/XP: We measured the performance of the BLAS both assembly coded and Fortran coded. We found that the measured performance of assembly-coded BLAS is much less than what memory bandwidth limitation would predict. The influence of data cache on different sizes of vectors is also investigated using one-dimensional FFTs. c. Impact of processor layout: There are several different ways processors can be laid out within the two-dimensional grid of processors on the Paragon. We have used the FFT example to investigate performance differences based on processors layout.
Performance Metrics for Monitoring Parallel Program Executions

NASA Technical Reports Server (NTRS)

Sarukkai, Sekkar R.; Gotwais, Jacob K.; Yan, Jerry; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

Existing tools for debugging performance of parallel programs either provide graphical representations of program execution or profiles of program executions. However, for performance debugging tools to be useful, such information has to be augmented with information that highlights the cause of poor program performance. Identifying the cause of poor performance necessitates the need for not only determining the significance of various performance problems on the execution time of the program, but also needs to consider the effect of interprocessor communications of individual source level data structures. In this paper, we present a suite of normalized indices which provide a convenient mechanism for focusing on a region of code with poor performance and highlights the cause of the problem in terms of processors, procedures and data structure interactions. All the indices are generated from trace files augmented with data structure information.. Further, we show with the help of examples from the NAS benchmark suite that the indices help in detecting potential cause of poor performance, based on augmented execution traces obtained by monitoring the program.
Navigation in Grid Space with the NAS Grid Benchmarks

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Hood, Robert; Biegel, Bryan A. (Technical Monitor)

2002-01-01

We present a navigational tool for computational grids. The navigational process is based on measuring the grid characteristics with the NAS Grid Benchmarks (NGB) and using the measurements to assign tasks of a grid application to the grid machines. The tool allows the user to explore the grid space and to navigate the execution at a grid application to minimize its turnaround time. We introduce the notion of gridscape as a user view of the grid and show how it can be me assured by NGB, Then we demonstrate how the gridscape can be used with two different schedulers to navigate a grid application through a rudimentary grid.
Multi-partitioning for ADI-schemes on message passing architectures

NASA Technical Reports Server (NTRS)

Vanderwijngaart, Rob F.

1994-01-01

A kind of discrete-operator splitting called Alternating Direction Implicit (ADI) has been found to be useful in simulating fluid flow problems. In particular, it is being used to study the effects of hot exhaust jets from high performance aircraft on landing surfaces. Decomposition techniques that minimize load imbalance and message-passing frequency are described. Three strategies that are investigated for implementing the NAS Scalar Penta-diagonal Parallel Benchmark (SP) are transposition, pipelined Gaussian elimination, and multipartitioning. The multipartitioning strategy, which was used on Ethernet, was found to be the most efficient, although it was considered only a moderate success because of Ethernet's limited communication properties. The efficiency derived largely from the coarse granularity of the strategy, which reduced latencies and allowed overlap of communication and computation.
NAS Requirements Checklist for Job Queuing/Scheduling Software

NASA Technical Reports Server (NTRS)

Jones, James Patton

1996-01-01

The increasing reliability of parallel systems and clusters of computers has resulted in these systems becoming more attractive for true production workloads. Today, the primary obstacle to production use of clusters of computers is the lack of a functional and robust Job Management System for parallel applications. This document provides a checklist of NAS requirements for job queuing and scheduling in order to make most efficient use of parallel systems and clusters for parallel applications. Future requirements are also identified to assist software vendors with design planning.
Evaluation Metrics for the Paragon XP/S-15

NASA Technical Reports Server (NTRS)

Traversat, Bernard; McNab, David; Nitzberg, Bill; Fineberg, Sam; Blaylock, Bruce T. (Technical Monitor)

1993-01-01

On February 17th 1993, the Numerical Aerodynamic Simulation (NAS) facility located at the NASA Ames Research Center installed a 224 node Intel Paragon XP/S-15 system. After its installation, the Paragon was found to be in a very immature state and was unable to support a NAS users' workload, composed of a wide range of development and production activities. As a first step towards addressing this problem, we implemented a set of metrics to objectively monitor the system as operating system and hardware upgrades were installed. The metrics were designed to measure four aspects of the system that we consider essential to support our workload: availability, utilization, functionality, and performance. This report presents the metrics collected from February 1993 to August 1993. Since its installation, the Paragon availability has improved from a low of 15% uptime to a high of 80%, while its utilization has remained low. Functionality and performance have improved from merely running one of the NAS Parallel Benchmarks to running all of them faster (between 1 and 2 times) than on the iPSC/860. In spite of the progress accomplished, fundamental limitations of the Paragon operating system are restricting the Paragon from supporting the NAS workload. The maximum operating system message passing (NORMA IPC) bandwidth was measured at 11 Mbytes/s, well below the peak hardware bandwidth (175 Mbytes/s), limiting overall virtual memory and Unix services (i.e. Disk and HiPPI I/O) performance. The high NX application message passing latency (184 microns), three times than on the iPSC/860, was found to significantly degrade performance of applications relying on small message sizes. The amount of memory available for an application was found to be approximately 10 Mbytes per node, indicating that the OS is taking more space than anticipated (6 Mbytes per node).
Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Carter, Jonathan; Shalf, John; Skinner, David; Ethier, Stephane; Biswas, Rupak; Djomehri, Jahed; VanderWijngaart, Rob

2003-01-01

The growing gap between sustained and peak performance for scientific applications has become a well-known problem in high performance computing. The recent development of parallel vector systems offers the potential to bridge this gap for a significant number of computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines a full spectrum of low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks using some simple optimizations. Finally, we evaluate the perfor- mance of several numerical codes from key scientific computing domains. Overall results demonstrate that the SX6 achieves high performance on a large fraction of our application suite and in many cases significantly outperforms the RISC-based architectures. However, certain classes of applications are not easily amenable to vectorization and would likely require extensive reengineering of both algorithm and implementation to utilize the SX6 effectively.
A survey of parallel programming tools

NASA Technical Reports Server (NTRS)

Cheng, Doreen Y.

1991-01-01

This survey examines 39 parallel programming tools. Focus is placed on those tool capabilites needed for parallel scientific programming rather than for general computer science. The tools are classified with current and future needs of Numerical Aerodynamic Simulator (NAS) in mind: existing and anticipated NAS supercomputers and workstations; operating systems; programming languages; and applications. They are divided into four categories: suggested acquisitions, tools already brought in; tools worth tracking; and tools eliminated from further consideration at this time.
Second Evaluation of Job Queuing/Scheduling Software. Phase 1

NASA Technical Reports Server (NTRS)

Jones, James Patton; Brickell, Cristy; Chancellor, Marisa (Technical Monitor)

1997-01-01

The recent proliferation of high performance workstations and the increased reliability of parallel systems have illustrated the need for robust job management systems to support parallel applications. To address this issue, NAS compiled a requirements checklist for job queuing/scheduling software. Next, NAS evaluated the leading job management system (JMS) software packages against the checklist. A year has now elapsed since the first comparison was published, and NAS has repeated the evaluation. This report describes this second evaluation, and presents the results of Phase 1: Capabilities versus Requirements. We show that JMS support for running parallel applications on clusters of workstations and parallel systems is still lacking, however, definite progress has been made by the vendors to correct the deficiencies. This report is supplemented by a WWW interface to the data collected, to aid other sites in extracting the evaluation information on specific requirements of interest.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Taft, James R.

1999-01-01

Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
Parallelization of Lower-Upper Symmetric Gauss-Seidel Method for Chemically Reacting Flow

NASA Technical Reports Server (NTRS)

Yoon, Seokkwan; Jost, Gabriele; Chang, Sherry

2005-01-01

Development of technologies for exploration of the solar system has revived an interest in computational simulation of chemically reacting flows since planetary probe vehicles exhibit non-equilibrium phenomena during the atmospheric entry of a planet or a moon as well as the reentry to the Earth. Stability in combustion is essential for new propulsion systems. Numerical solution of real-gas flows often increases computational work by an order-of-magnitude compared to perfect gas flow partly because of the increased complexity of equations to solve. Recently, as part of Project Columbia, NASA has integrated a cluster of interconnected SGI Altix systems to provide a ten-fold increase in current supercomputing capacity that includes an SGI Origin system. Both the new and existing machines are based on cache coherent non-uniform memory access architecture. Lower-Upper Symmetric Gauss-Seidel (LU-SGS) relaxation method has been implemented into both perfect and real gas flow codes including Real-Gas Aerodynamic Simulator (RGAS). However, the vectorized RGAS code runs inefficiently on cache-based shared-memory machines such as SGI system. Parallelization of a Gauss-Seidel method is nontrivial due to its sequential nature. The LU-SGS method has been vectorized on an oblique plane in INS3D-LU code that has been one of the base codes for NAS Parallel benchmarks. The oblique plane has been called a hyperplane by computer scientists. It is straightforward to parallelize a Gauss-Seidel method by partitioning the hyperplanes once they are formed. Another way of parallelization is to schedule processors like a pipeline using software. Both hyperplane and pipeline methods have been implemented using openMP directives. The present paper reports the performance of the parallelized RGAS code on SGI Origin and Altix systems.

Evaluation of Job Queuing/Scheduling Software: Phase I Report

NASA Technical Reports Server (NTRS)

Jones, James Patton

1996-01-01

The recent proliferation of high performance work stations and the increased reliability of parallel systems have illustrated the need for robust job management systems to support parallel applications. To address this issue, the national Aerodynamic Simulation (NAS) supercomputer facility compiled a requirements checklist for job queuing/scheduling software. Next, NAS began an evaluation of the leading job management system (JMS) software packages against the checklist. This report describes the three-phase evaluation process, and presents the results of Phase 1: Capabilities versus Requirements. We show that JMS support for running parallel applications on clusters of workstations and parallel systems is still insufficient, even in the leading JMS's. However, by ranking each JMS evaluated against the requirements, we provide data that will be useful to other sites in selecting a JMS.
Parallel Computational Fluid Dynamics: Current Status and Future Requirements

NASA Technical Reports Server (NTRS)

Simon, Horst D.; VanDalsem, William R.; Dagum, Leonardo; Kutler, Paul (Technical Monitor)

1994-01-01

One or the key objectives of the Applied Research Branch in the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Allies Research Center is the accelerated introduction of highly parallel machines into a full operational environment. In this report we discuss the performance results obtained from the implementation of some computational fluid dynamics (CFD) applications on the Connection Machine CM-2 and the Intel iPSC/860. We summarize some of the experiences made so far with the parallel testbed machines at the NAS Applied Research Branch. Then we discuss the long term computational requirements for accomplishing some of the grand challenge problems in computational aerosciences. We argue that only massively parallel machines will be able to meet these grand challenge requirements, and we outline the computer science and algorithm research challenges ahead.
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Saini, Subhash; Grassi, Charles

1994-01-01

The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
Shadow Mode Assessment Using Realistic Technologies for the National Airspace (SMART NAS)

NASA Technical Reports Server (NTRS)

Kopardekar, Parimal H.

2014-01-01

Develop a simulation and modeling capability that includes: (a) Assessment of multiple parallel universes, (b) Accepts data feeds, (c) Allows for live virtual constructive distribute environment, (d) Enables integrated examinations of concepts, algorithms, technologies and National Airspace System (NAS) architectures.
A Debugger for Computational Grid Applications

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)

2001-01-01

This viewgraph presentation gives an overview of a debugger for computational grid applications. Details are given on NAS parallel tools groups (including parallelization support tools, evaluation of various parallelization strategies, and distributed and aggregated computing), debugger dependencies, scalability, initial implementation, the process grid, and information on Globus.
Affinity-aware checkpoint restart

DOE PAGES

Saini, Ajay; Rezaei, Arash; Mueller, Frank; ...

2014-12-08

Current checkpointing techniques employed to overcome faults for HPC applications result in inferior application performance after restart from a checkpoint for a number of applications. This is due to a lack of page and core affinity awareness of the checkpoint/restart (C/R) mechanism, i.e., application tasks originally pinned to cores may be restarted on different cores, and in case of non-uniform memory architectures (NUMA), quite common today, memory pages associated with tasks on a NUMA node may be associated with a different NUMA node after restart. Here, this work contributes a novel design technique for C/R mechanisms to preserve task-to-core mapsmore » and NUMA node specific page affinities across restarts. Experimental results with BLCR, a C/R mechanism, enhanced with affinity awareness demonstrate significant performance benefits of 37%-73% for the NAS Parallel Benchmark codes and 6-12% for NAMD with negligible overheads instead of up to nearly four times longer an execution times without affinity-aware restarts on 16 cores.« less
Affinity-aware checkpoint restart

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saini, Ajay; Rezaei, Arash; Mueller, Frank

Current checkpointing techniques employed to overcome faults for HPC applications result in inferior application performance after restart from a checkpoint for a number of applications. This is due to a lack of page and core affinity awareness of the checkpoint/restart (C/R) mechanism, i.e., application tasks originally pinned to cores may be restarted on different cores, and in case of non-uniform memory architectures (NUMA), quite common today, memory pages associated with tasks on a NUMA node may be associated with a different NUMA node after restart. Here, this work contributes a novel design technique for C/R mechanisms to preserve task-to-core mapsmore » and NUMA node specific page affinities across restarts. Experimental results with BLCR, a C/R mechanism, enhanced with affinity awareness demonstrate significant performance benefits of 37%-73% for the NAS Parallel Benchmark codes and 6-12% for NAMD with negligible overheads instead of up to nearly four times longer an execution times without affinity-aware restarts on 16 cores.« less
Benchmarking Ada tasking on tightly coupled multiprocessor architectures

NASA Technical Reports Server (NTRS)

Collard, Philippe; Goforth, Andre; Marquardt, Matthew

1989-01-01

The development of benchmarks and performance measures for parallel Ada tasking is reported with emphasis on the macroscopic behavior of the benchmark across a set of load parameters. The application chosen for the study was the NASREM model for telerobot control, relevant to many NASA missions. The results of the study demonstrate the potential of parallel Ada in accomplishing the task of developing a control system for a system such as the Flight Telerobotic Servicer using the NASREM framework.
Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

DOE PAGES

Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; ...

2015-12-21

This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore » specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000 ® problems. These benchmark and scaling studies show promising results.« less
Parallel 3D Mortar Element Method for Adaptive Nonconforming Meshes

NASA Technical Reports Server (NTRS)

Feng, Huiyu; Mavriplis, Catherine; VanderWijngaart, Rob; Biswas, Rupak

2004-01-01

High order methods are frequently used in computational simulation for their high accuracy. An efficient way to avoid unnecessary computation in smooth regions of the solution is to use adaptive meshes which employ fine grids only in areas where they are needed. Nonconforming spectral elements allow the grid to be flexibly adjusted to satisfy the computational accuracy requirements. The method is suitable for computational simulations of unsteady problems with very disparate length scales or unsteady moving features, such as heat transfer, fluid dynamics or flame combustion. In this work, we select the Mark Element Method (MEM) to handle the non-conforming interfaces between elements. A new technique is introduced to efficiently implement MEM in 3-D nonconforming meshes. By introducing an "intermediate mortar", the proposed method decomposes the projection between 3-D elements and mortars into two steps. In each step, projection matrices derived in 2-D are used. The two-step method avoids explicitly forming/deriving large projection matrices for 3-D meshes, and also helps to simplify the implementation. This new technique can be used for both h- and p-type adaptation. This method is applied to an unsteady 3-D moving heat source problem. With our new MEM implementation, mesh adaptation is able to efficiently refine the grid near the heat source and coarsen the grid once the heat source passes. The savings in computational work resulting from the dynamic mesh adaptation is demonstrated by the reduction of the the number of elements used and CPU time spent. MEM and mesh adaptation, respectively, bring irregularity and dynamics to the computer memory access pattern. Hence, they provide a good way to gauge the performance of computer systems when running scientific applications whose memory access patterns are irregular and unpredictable. We select a 3-D moving heat source problem as the Unstructured Adaptive (UA) grid benchmark, a new component of the NAS Parallel Benchmarks (NPB). In this paper, we present some interesting performance results of ow OpenMP parallel implementation on different architectures such as the SGI Origin2000, SGI Altix, and Cray MTA-2.
A Look at the Impact of High-End Computing Technologies on NASA Missions

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Dunbar, Jill; Hardman, John; Bailey, F. Ron; Wheeler, Lorien; Rogers, Stuart

2012-01-01

From its bold start nearly 30 years ago and continuing today, the NASA Advanced Supercomputing (NAS) facility at Ames Research Center has enabled remarkable breakthroughs in the space agency s science and engineering missions. Throughout this time, NAS experts have influenced the state-of-the-art in high-performance computing (HPC) and related technologies such as scientific visualization, system benchmarking, batch scheduling, and grid environments. We highlight the pioneering achievements and innovations originating from and made possible by NAS resources and know-how, from early supercomputing environment design and software development, to long-term simulation and analyses critical to design safe Space Shuttle operations and associated spinoff technologies, to the highly successful Kepler Mission s discovery of new planets now capturing the world s imagination.
NAS Applications and Advanced Algorithms

NASA Technical Reports Server (NTRS)

Bailey, David H.; Biswas, Rupak; VanDerWijngaart, Rob; Kutler, Paul (Technical Monitor)

1997-01-01

This paper examines the applications most commonly run on the supercomputers at the Numerical Aerospace Simulation (NAS) facility. It analyzes the extent to which such applications are fundamentally oriented to vector computers, and whether or not they can be efficiently implemented on hierarchical memory machines, such as systems with cache memories and highly parallel, distributed memory systems.
PDS: A Performance Database Server

DOE PAGES

Berry, Michael W.; Dongarra, Jack J.; Larose, Brian H.; ...

1994-01-01

The process of gathering, archiving, and distributing computer benchmark data is a cumbersome task usually performed by computer users and vendors with little coordination. Most important, there is no publicly available central depository of performance data for all ranges of machines from personal computers to supercomputers. We present an Internet-accessible performance database server (PDS) that can be used to extract current benchmark data and literature. As an extension to the X-Windows-based user interface (Xnetlib) to the Netlib archival system, PDS provides an on-line catalog of public domain computer benchmarks such as the LINPACK benchmark, Perfect benchmarks, and the NAS parallelmore » benchmarks. PDS does not reformat or present the benchmark data in any way that conflicts with the original methodology of any particular benchmark; it is thereby devoid of any subjective interpretations of machine performance. We believe that all branches (research laboratories, academia, and industry) of the general computing community can use this facility to archive performance metrics and make them readily available to the public. PDS can provide a more manageable approach to the development and support of a large dynamic database of published performance metrics.« less
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.
Cloud-Coffee: implementation of a parallel consistency-based multiple alignment algorithm in the T-Coffee package and its benchmarking on the Amazon Elastic-Cloud.

PubMed

Di Tommaso, Paolo; Orobitg, Miquel; Guirado, Fernando; Cores, Fernado; Espinosa, Toni; Notredame, Cedric

2010-08-01

We present the first parallel implementation of the T-Coffee consistency-based multiple aligner. We benchmark it on the Amazon Elastic Cloud (EC2) and show that the parallelization procedure is reasonably effective. We also conclude that for a web server with moderate usage (10K hits/month) the cloud provides a cost-effective alternative to in-house deployment. T-Coffee is a freeware open source package available from http://www.tcoffee.org/homepage.html
Automatic Energy Schemes for High Performance Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sundriyal, Vaibhav

Although high-performance computing traditionally focuses on the efficient execution of large-scale applications, both energy and power have become critical concerns when approaching exascale. Drastic increases in the power consumption of supercomputers affect significantly their operating costs and failure rates. In modern microprocessor architectures, equipped with dynamic voltage and frequency scaling (DVFS) and CPU clock modulation (throttling), the power consumption may be controlled in software. Additionally, network interconnect, such as Infiniband, may be exploited to maximize energy savings while the application performance loss and frequency switching overheads must be carefully balanced. This work first studies two important collective communication operations, all-to-allmore » and allgather and proposes energy saving strategies on the per-call basis. Next, it targets point-to-point communications to group them into phases and apply frequency scaling to them to save energy by exploiting the architectural and communication stalls. Finally, it proposes an automatic runtime system which combines both collective and point-to-point communications into phases, and applies throttling to them apart from DVFS to maximize energy savings. The experimental results are presented for NAS parallel benchmark problems as well as for the realistic parallel electronic structure calculations performed by the widely used quantum chemistry package GAMESS. Close to the maximum energy savings were obtained with a substantially low performance loss on the given platform.« less
Machine characterization and benchmark performance prediction

NASA Technical Reports Server (NTRS)

Saavedra-Barrera, Rafael H.

1988-01-01

From runs of standard benchmarks or benchmark suites, it is not possible to characterize the machine nor to predict the run time of other benchmarks which have not been run. A new approach to benchmarking and machine characterization is reported. The creation and use of a machine analyzer is described, which measures the performance of a given machine on FORTRAN source language constructs. The machine analyzer yields a set of parameters which characterize the machine and spotlight its strong and weak points. Also described is a program analyzer, which analyzes FORTRAN programs and determines the frequency of execution of each of the same set of source language operations. It is then shown that by combining a machine characterization and a program characterization, we are able to predict with good accuracy the run time of a given benchmark on a given machine. Characterizations are provided for the Cray-X-MP/48, Cyber 205, IBM 3090/200, Amdahl 5840, Convex C-1, VAX 8600, VAX 11/785, VAX 11/780, SUN 3/50, and IBM RT-PC/125, and for the following benchmark programs or suites: Los Alamos (BMK8A1), Baskett, Linpack, Livermore Loops, Madelbrot Set, NAS Kernels, Shell Sort, Smith, Whetstone and Sieve of Erathostenes.
Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization

NASA Technical Reports Server (NTRS)

Jones, James Patton; Nitzberg, Bill

1999-01-01

The NAS facility has operated parallel supercomputers for the past 11 years, including the Intel iPSC/860, Intel Paragon, Thinking Machines CM-5, IBM SP-2, and Cray Origin 2000. Across this wide variety of machine architectures, across a span of 10 years, across a large number of different users, and through thousands of minor configuration and policy changes, the utilization of these machines shows three general trends: (1) scheduling using a naive FIFO first-fit policy results in 40-60% utilization, (2) switching to the more sophisticated dynamic backfilling scheduling algorithm improves utilization by about 15 percentage points (yielding about 70% utilization), and (3) reducing the maximum allowable job size further increases utilization. Most surprising is the consistency of these trends. Over the lifetime of the NAS parallel systems, we made hundreds, perhaps thousands, of small changes to hardware, software, and policy, yet, utilization was affected little. In particular these results show that the goal of achieving near 100% utilization while supporting a real parallel supercomputing workload is unrealistic.
I/O performance evaluation of a Linux-based network-attached storage device

NASA Astrophysics Data System (ADS)

Sun, Zhaoyan; Dong, Yonggui; Wu, Jinglian; Jia, Huibo; Feng, Guanping

2002-09-01

In a Local Area Network (LAN), clients are permitted to access the files on high-density optical disks via a network server. But the quality of read service offered by the conventional server is not satisfied because of the multiple functions on the server and the overmuch caller. This paper develops a Linux-based Network-Attached Storage (NAS) server. The Operation System (OS), composed of an optimized kernel and a miniaturized file system, is stored in a flash memory. After initialization, the NAS device is connected into the LAN. The administrator and users could configure the access the server through the web page respectively. In order to enhance the quality of access, the management of buffer cache in file system is optimized. Some benchmark programs are peformed to evaluate the I/O performance of the NAS device. Since data recorded in optical disks are usually for reading accesses, our attention is focused on the reading throughput of the device. The experimental results indicate that the I/O performance of our NAS device is excellent.
Multitasking and microtasking experience on the NA S Cray-2 and ACF Cray X-MP

NASA Technical Reports Server (NTRS)

Raiszadeh, Farhad

1987-01-01

The fast Fourier transform (FFT) kernel of the NAS benchmark program has been utilized to experiment with the multitasking library on the Cray-2 and Cray X-MP/48, and microtasking directives on the Cray X-MP. Some performance figures are shown, and the state of multitasking software is described.

Automated and Assistive Tools for Accelerated Code migration of Scientific Computing on to Heterogeneous MultiCore Systems

DTIC Science & Technology

2017-04-13

modelling code, a parallel benchmark , and a communication avoiding version of the QR algorithm. Further, several improvements to the OmpSs model were...movement; and a port of the dynamic load balancing library to OmpSs. Finally, several updates to the tools infrastructure were accomplished, including: an...OmpSs: a basic algorithm on image processing applications, a mini application representative of an ocean modelling code, a parallel benchmark , and a
Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

NASA Astrophysics Data System (ADS)

Moon, Hongsik

What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
Initial Performance Results on IBM POWER6

NASA Technical Reports Server (NTRS)

Saini, Subbash; Talcott, Dale; Jespersen, Dennis; Djomehri, Jahed; Jin, Haoqiang; Mehrotra, Piysuh

2008-01-01

The POWER5+ processor has a faster memory bus than that of the previous generation POWER5 processor (533 MHz vs. 400 MHz), but the measured per-core memory bandwidth of the latter is better than that of the former (5.7 GB/s vs. 4.3 GB/s). The reason for this is that in the POWER5+, the two cores on the chip share the L2 cache, L3 cache and memory bus. The memory controller is also on the chip and is shared by the two cores. This serializes the path to memory. For consistently good performance on a wide range of applications, the performance of the processor, the memory subsystem, and the interconnects (both latency and bandwidth) should be balanced. Recognizing this, IBM has designed the Power6 processor so as to avoid the bottlenecks due to the L2 cache, memory controller and buffer chips of the POWER5+. Unlike the POWER5+, each core in the POWER6 has its own L2 cache (4 MB - double that of the Power5+), memory controller and buffer chips. Each core in the POWER6 runs at 4.7 GHz instead of 1.9 GHz in POWER5+. In this paper, we evaluate the performance of a dual-core Power6 based IBM p6-570 system, and we compare its performance with that of a dual-core Power5+ based IBM p575+ system. In this evaluation, we have used the High- Performance Computing Challenge (HPCC) benchmarks, NAS Parallel Benchmarks (NPB), and four real-world applications--three from computational fluid dynamics and one from climate modeling.
Automatic Thread-Level Parallelization in the Chombo AMR Library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Christen, Matthias; Keen, Noel; Ligocki, Terry

2011-05-26

The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware?s architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number ofmore » existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.« less
Development of gallium arsenide high-speed, low-power serial parallel interface modules: Executive summary

NASA Technical Reports Server (NTRS)

1988-01-01

Final report to NASA LeRC on the development of gallium arsenide (GaAS) high-speed, low power serial/parallel interface modules. The report discusses the development and test of a family of 16, 32 and 64 bit parallel to serial and serial to parallel integrated circuits using a self aligned gate MESFET technology developed at the Honeywell Sensors and Signal Processing Laboratory. Lab testing demonstrated 1.3 GHz clock rates at a power of 300 mW. This work was accomplished under contract number NAS3-24676.
Data Race Benchmark Collection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liao, Chunhua; Lin, Pei-Hung; Asplund, Joshua

2017-03-21

This project is a benchmark suite of Open-MP parallel codes that have been checked for data races. The programs are marked to show which do and do not have races. This allows them to be leveraged while testing and developing race detection tools.
A comparison of five benchmarks

NASA Technical Reports Server (NTRS)

Huss, Janice E.; Pennline, James A.

1987-01-01

Five benchmark programs were obtained and run on the NASA Lewis CRAY X-MP/24. A comparison was made between the programs codes and between the methods for calculating performance figures. Several multitasking jobs were run to gain experience in how parallel performance is measured.
NAS Technical Summaries, March 1993 - February 1994

NASA Technical Reports Server (NTRS)

1995-01-01

NASA created the Numerical Aerodynamic Simulation (NAS) Program in 1987 to focus resources on solving critical problems in aeroscience and related disciplines by utilizing the power of the most advanced supercomputers available. The NAS Program provides scientists with the necessary computing power to solve today's most demanding computational fluid dynamics problems and serves as a pathfinder in integrating leading-edge supercomputing technologies, thus benefitting other supercomputer centers in government and industry. The 1993-94 operational year concluded with 448 high-speed processor projects and 95 parallel projects representing NASA, the Department of Defense, other government agencies, private industry, and universities. This document provides a glimpse at some of the significant scientific results for the year.
NAS technical summaries. Numerical aerodynamic simulation program, March 1992 - February 1993

NASA Technical Reports Server (NTRS)

1994-01-01

NASA created the Numerical Aerodynamic Simulation (NAS) Program in 1987 to focus resources on solving critical problems in aeroscience and related disciplines by utilizing the power of the most advanced supercomputers available. The NAS Program provides scientists with the necessary computing power to solve today's most demanding computational fluid dynamics problems and serves as a pathfinder in integrating leading-edge supercomputing technologies, thus benefitting other supercomputer centers in government and industry. The 1992-93 operational year concluded with 399 high-speed processor projects and 91 parallel projects representing NASA, the Department of Defense, other government agencies, private industry, and universities. This document provides a glimpse at some of the significant scientific results for the year.
Automatic Multilevel Parallelization Using OpenMP

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

2002-01-01

In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.
Automatic Multilevel Parallelization Using OpenMP

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

2002-01-01

In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.
Spherical harmonic results for the 3D Kobayashi Benchmark suite

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, P N; Chang, B; Hanebutte, U R

1999-03-02

Spherical harmonic solutions are presented for the Kobayashi benchmark suite. The results were obtained with Ardra, a scalable, parallel neutron transport code developed at Lawrence Livermore National Laboratory (LLNL). The calculations were performed on the IBM ASCI Blue-Pacific computer at LLNL.
Benchmarking hypercube hardware and software

NASA Technical Reports Server (NTRS)

Grunwald, Dirk C.; Reed, Daniel A.

1986-01-01

It was long a truism in computer systems design that balanced systems achieve the best performance. Message passing parallel processors are no different. To quantify the balance of a hypercube design, an experimental methodology was developed and the associated suite of benchmarks was applied to several existing hypercubes. The benchmark suite includes tests of both processor speed in the absence of internode communication and message transmission speed as a function of communication patterns.
An analytical benchmark and a Mathematica program for MD codes: Testing LAMMPS on the 2nd generation Brenner potential

NASA Astrophysics Data System (ADS)

Favata, Antonino; Micheletti, Andrea; Ryu, Seunghwa; Pugno, Nicola M.

2016-10-01

An analytical benchmark and a simple consistent Mathematica program are proposed for graphene and carbon nanotubes, that may serve to test any molecular dynamics code implemented with REBO potentials. By exploiting the benchmark, we checked results produced by LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) when adopting the second generation Brenner potential, we made evident that this code in its current implementation produces results which are offset from those of the benchmark by a significant amount, and provide evidence of the reason.
Parallel Ada benchmarks for the SVMS

NASA Technical Reports Server (NTRS)

Collard, Philippe E.

1990-01-01

The use of parallel processing paradigm to design and develop faster and more reliable computers appear to clearly mark the future of information processing. NASA started the development of such an architecture: the Spaceborne VHSIC Multi-processor System (SVMS). Ada will be one of the languages used to program the SVMS. One of the unique characteristics of Ada is that it supports parallel processing at the language level through the tasking constructs. It is important for the SVMS project team to assess how efficiently the SVMS architecture will be implemented, as well as how efficiently Ada environment will be ported to the SVMS. AUTOCLASS II, a Bayesian classifier written in Common Lisp, was selected as one of the benchmarks for SVMS configurations. The purpose of the R and D effort was to provide the SVMS project team with the version of AUTOCLASS II, written in Ada, that would make use of Ada tasking constructs as much as possible so as to constitute a suitable benchmark. Additionally, a set of programs was developed that would measure Ada tasking efficiency on parallel architectures as well as determine the critical parameters influencing tasking efficiency. All this was designed to provide the SVMS project team with a set of suitable tools in the development of the SVMS architecture.
A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Madduri, Kamesh; Ediger, David; Jiang, Karl

2009-02-15

We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 millionmore » vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Madduri, Kamesh; Ediger, David; Jiang, Karl

2009-05-29

We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor.more » For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
Enabling the High Level Synthesis of Data Analytics Accelerators

DOE Office of Scientific and Technical Information (OSTI.GOV)

Minutoli, Marco; Castellana, Vito G.; Tumeo, Antonino

Conventional High Level Synthesis (HLS) tools mainly tar- get compute intensive kernels typical of digital signal pro- cessing applications. We are developing techniques and ar- chitectural templates to enable HLS of data analytics appli- cations. These applications are memory intensive, present fine-grained, unpredictable data accesses, and irregular, dy- namic task parallelism. We discuss an architectural tem- plate based around a distributed controller to efficiently ex- ploit thread level parallelism. We present a memory in- terface that supports parallel memory subsystems and en- ables implementing atomic memory operations. We intro- duce a dynamic task scheduling approach to efficiently ex- ecute heavilymore » unbalanced workload. The templates are val- idated by synthesizing queries from the Lehigh University Benchmark (LUBM), a well know SPARQL benchmark.« less
Performance of OVERFLOW-D Applications based on Hybrid and MPI Paradigms on IBM Power4 System

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Biegel, Bryan (Technical Monitor)

2002-01-01

This report briefly discusses our preliminary performance experiments with parallel versions of OVERFLOW-D applications. These applications are based on MPI and hybrid paradigms on the IBM Power4 system here at the NAS Division. This work is part of an effort to determine the suitability of the system and its parallel libraries (MPI/OpenMP) for specific scientific computing objectives.
A heterogeneous computing accelerated SCE-UA global optimization method using OpenMP, OpenCL, CUDA, and OpenACC.

PubMed

Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Liang, Ke; Hong, Yang

2017-10-01

The shuffled complex evolution optimization developed at the University of Arizona (SCE-UA) has been successfully applied in various kinds of scientific and engineering optimization applications, such as hydrological model parameter calibration, for many years. The algorithm possesses good global optimality, convergence stability and robustness. However, benchmark and real-world applications reveal the poor computational efficiency of the SCE-UA. This research aims at the parallelization and acceleration of the SCE-UA method based on powerful heterogeneous computing technology. The parallel SCE-UA is implemented on Intel Xeon multi-core CPU (by using OpenMP and OpenCL) and NVIDIA Tesla many-core GPU (by using OpenCL, CUDA, and OpenACC). The serial and parallel SCE-UA were tested based on the Griewank benchmark function. Comparison results indicate the parallel SCE-UA significantly improves computational efficiency compared to the original serial version. The OpenCL implementation obtains the best overall acceleration results however, with the most complex source code. The parallel SCE-UA has bright prospects to be applied in real-world applications.

Development and Application of a Parallel LCAO Cluster Method

NASA Astrophysics Data System (ADS)

Patton, David C.

1997-08-01

CPU intensive steps in the SCF electronic structure calculations of clusters and molecules with a first-principles LCAO method have been fully parallelized via a message passing paradigm. Identification of the parts of the code that are composed of many independent compute-intensive steps is discussed in detail as they are the most readily parallelized. Most of the parallelization involves spatially decomposing numerical operations on a mesh. One exception is the solution of Poisson's equation which relies on distribution of the charge density and multipole methods. The method we use to parallelize this part of the calculation is quite novel and is covered in detail. We present a general method for dynamically load-balancing a parallel calculation and discuss how we use this method in our code. The results of benchmark calculations of the IR and Raman spectra of PAH molecules such as anthracene (C_14H_10) and tetracene (C_18H_12) are presented. These benchmark calculations were performed on an IBM SP2 and a SUN Ultra HPC server with both MPI and PVM. Scalability and speedup for these calculations is analyzed to determine the efficiency of the code. In addition, performance and usage issues for MPI and PVM are presented.
Rethinking key–value store for parallel I/O optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kougkas, Anthony; Eslami, Hassan; Sun, Xian-He

2015-01-26

Key-value stores are being widely used as the storage system for large-scale internet services and cloud storage systems. However, they are rarely used in HPC systems, where parallel file systems are the dominant storage solution. In this study, we examine the architecture differences and performance characteristics of parallel file systems and key-value stores. We propose using key-value stores to optimize overall Input/Output (I/O) performance, especially for workloads that parallel file systems cannot handle well, such as the cases with intense data synchronization or heavy metadata operations. We conducted experiments with several synthetic benchmarks, an I/O benchmark, and a real application.more » We modeled the performance of these two systems using collected data from our experiments, and we provide a predictive method to identify which system offers better I/O performance given a specific workload. The results show that we can optimize the I/O performance in HPC systems by utilizing key-value stores.« less
Construction of a parallel processor for simulating manipulators and other mechanical systems

NASA Technical Reports Server (NTRS)

Hannauer, George

1991-01-01

This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.
Implementation of a Fully-Balanced Periodic Tridiagonal Solver on a Parallel Distributed Memory Architecture

DTIC Science & Technology

1994-05-01

PARALLEL DISTRIBUTED MEMORY ARCHITECTURE LTJh T. M. Eidson 0 - 8 l 9 5 " G. Erlebacher _ _ _. _ DTIe QUALITY INSPECTED a Contract NAS I - 19480 May 1994...DISTRIBUTED MEMORY ARCHITECTURE T.M. Eidson * High Technology Corporation Hampton, VA 23665 G. Erlebachert Institute for Computer Applications in Science and...developed and evaluated. Simple model calculations as well as timing results are pres.nted to evaluate the various strategies. The particular
Hot Chips and Hot Interconnects for High End Computing Systems

NASA Technical Reports Server (NTRS)

Saini, Subhash

2005-01-01

I will discuss several processors: 1. The Cray proprietary processor used in the Cray X1; 2. The IBM Power 3 and Power 4 used in an IBM SP 3 and IBM SP 4 systems; 3. The Intel Itanium and Xeon, used in the SGI Altix systems and clusters respectively; 4. IBM System-on-a-Chip used in IBM BlueGene/L; 5. HP Alpha EV68 processor used in DOE ASCI Q cluster; 6. SPARC64 V processor, which is used in the Fujitsu PRIMEPOWER HPC2500; 7. An NEC proprietary processor, which is used in NEC SX-6/7; 8. Power 4+ processor, which is used in Hitachi SR11000; 9. NEC proprietary processor, which is used in Earth Simulator. The IBM POWER5 and Red Storm Computing Systems will also be discussed. The architectures of these processors will first be presented, followed by interconnection networks and a description of high-end computer systems based on these processors and networks. The performance of various hardware/programming model combinations will then be compared, based on latest NAS Parallel Benchmark results (MPI, OpenMP/HPF and hybrid (MPI + OpenMP). The tutorial will conclude with a discussion of general trends in the field of high performance computing, (quantum computing, DNA computing, cellular engineering, and neural networks).
Secure Network-Centric Aviation Communication (SNAC)

NASA Technical Reports Server (NTRS)

Nelson, Paul H.; Muha, Mark A.; Sheehe, Charles J.

2017-01-01

The existing National Airspace System (NAS) communications capabilities are largely unsecured, are not designed for efficient use of spectrum and collectively are not capable of servicing the future needs of the NAS with the inclusion of new operators in Unmanned Aviation Systems (UAS) or On Demand Mobility (ODM). SNAC will provide a ubiquitous secure, network-based communications architecture that will provide new service capabilities and allow for the migration of current communications to SNAC over time. The necessary change in communication technologies to digital domains will allow for the adoption of security mechanisms, sharing of link technologies, large increase in spectrum utilization, new forms of resilience and redundancy and the possibly of spectrum reuse. SNAC consists of a long term open architectural approach with increasingly capable designs used to steer research and development and enable operating capabilities that run in parallel with current NAS systems.
libvdwxc: a library for exchange-correlation functionals in the vdW-DF family

NASA Astrophysics Data System (ADS)

Hjorth Larsen, Ask; Kuisma, Mikael; Löfgren, Joakim; Pouillon, Yann; Erhart, Paul; Hyldgaard, Per

2017-09-01

We present libvdwxc, a general library for evaluating the energy and potential for the family of vdW-DF exchange-correlation functionals. libvdwxc is written in C and provides an efficient implementation of the vdW-DF method and can be interfaced with various general-purpose DFT codes. Currently, the Gpaw and Octopus codes implement interfaces to libvdwxc. The present implementation emphasizes scalability and parallel performance, and thereby enables ab initio calculations of nanometer-scale complexes. The numerical accuracy is benchmarked on the S22 test set whereas parallel performance is benchmarked on ligand-protected gold nanoparticles ({{Au}}144{({{SC}}11{{NH}}25)}60) up to 9696 atoms.
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Gaeke, Brian R.; Husbands, Parry; Li, Xiaoye S.; Oliker, Leonid; Yelick, Katherine A.; Biegel, Bryan (Technical Monitor)

2002-01-01

The increasing gap between processor and memory performance has lead to new architectural models for memory-intensive applications. In this paper, we explore the performance of a set of memory-intensive benchmarks and use them to compare the performance of conventional cache-based microprocessors to a mixed logic and DRAM processor called VIRAM. The benchmarks are based on problem statements, rather than specific implementations, and in each case we explore the fundamental hardware requirements of the problem, as well as alternative algorithms and data structures that can help expose fine-grained parallelism or simplify memory access patterns. The benchmarks are characterized by their memory access patterns, their basic control structures, and the ratio of computation to memory operation.
Roofline model toolkit: A practical tool for architectural and program analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lo, Yu Jung; Williams, Samuel; Van Straalen, Brian

We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measuremore » sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.« less
Implementation and performance of parallel Prolog interpreter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei, S.; Kale, L.V.; Balkrishna, R.

1988-01-01

In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
High Performance Distributed Computing in a Supercomputer Environment: Computational Services and Applications Issues

NASA Technical Reports Server (NTRS)

Kramer, Williams T. C.; Simon, Horst D.

1994-01-01

This tutorial proposes to be a practical guide for the uninitiated to the main topics and themes of high-performance computing (HPC), with particular emphasis to distributed computing. The intent is first to provide some guidance and directions in the rapidly increasing field of scientific computing using both massively parallel and traditional supercomputers. Because of their considerable potential computational power, loosely or tightly coupled clusters of workstations are increasingly considered as a third alternative to both the more conventional supercomputers based on a small number of powerful vector processors, as well as high massively parallel processors. Even though many research issues concerning the effective use of workstation clusters and their integration into a large scale production facility are still unresolved, such clusters are already used for production computing. In this tutorial we will utilize the unique experience made at the NAS facility at NASA Ames Research Center. Over the last five years at NAS massively parallel supercomputers such as the Connection Machines CM-2 and CM-5 from Thinking Machines Corporation and the iPSC/860 (Touchstone Gamma Machine) and Paragon Machines from Intel were used in a production supercomputer center alongside with traditional vector supercomputers such as the Cray Y-MP and C90.
Present Status and Extensions of the Monte Carlo Performance Benchmark

NASA Astrophysics Data System (ADS)

Hoogenboom, J. Eduard; Petrovic, Bojan; Martin, William R.

2014-06-01

The NEA Monte Carlo Performance benchmark started in 2011 aiming to monitor over the years the abilities to perform a full-size Monte Carlo reactor core calculation with a detailed power production for each fuel pin with axial distribution. This paper gives an overview of the contributed results thus far. It shows that reaching a statistical accuracy of 1 % for most of the small fuel zones requires about 100 billion neutron histories. The efficiency of parallel execution of Monte Carlo codes on a large number of processor cores shows clear limitations for computer clusters with common type computer nodes. However, using true supercomputers the speedup of parallel calculations is increasing up to large numbers of processor cores. More experience is needed from calculations on true supercomputers using large numbers of processors in order to predict if the requested calculations can be done in a short time. As the specifications of the reactor geometry for this benchmark test are well suited for further investigations of full-core Monte Carlo calculations and a need is felt for testing other issues than its computational performance, proposals are presented for extending the benchmark to a suite of benchmark problems for evaluating fission source convergence for a system with a high dominance ratio, for coupling with thermal-hydraulics calculations to evaluate the use of different temperatures and coolant densities and to study the correctness and effectiveness of burnup calculations. Moreover, other contemporary proposals for a full-core calculation with realistic geometry and material composition will be discussed.
Accelerating the Pace of Protein Functional Annotation With Intel Xeon Phi Coprocessors.

PubMed

Feinstein, Wei P; Moreno, Juana; Jarrell, Mark; Brylinski, Michal

2015-06-01

Intel Xeon Phi is a new addition to the family of powerful parallel accelerators. The range of its potential applications in computationally driven research is broad; however, at present, the repository of scientific codes is still relatively limited. In this study, we describe the development and benchmarking of a parallel version of eFindSite, a structural bioinformatics algorithm for the prediction of ligand-binding sites in proteins. Implemented for the Intel Xeon Phi platform, the parallelization of the structure alignment portion of eFindSite using pragma-based OpenMP brings about the desired performance improvements, which scale well with the number of computing cores. Compared to a serial version, the parallel code runs 11.8 and 10.1 times faster on the CPU and the coprocessor, respectively; when both resources are utilized simultaneously, the speedup is 17.6. For example, ligand-binding predictions for 501 benchmarking proteins are completed in 2.1 hours on a single Stampede node equipped with the Intel Xeon Phi card compared to 3.1 hours without the accelerator and 36.8 hours required by a serial version. In addition to the satisfactory parallel performance, porting existing scientific codes to the Intel Xeon Phi architecture is relatively straightforward with a short development time due to the support of common parallel programming models by the coprocessor. The parallel version of eFindSite is freely available to the academic community at www.brylinski.org/efindsite.
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

PubMed

Nadkarni, P M; Miller, P L

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
MoMaS reactive transport benchmark using PFLOTRAN

NASA Astrophysics Data System (ADS)

Park, H.

2017-12-01

MoMaS benchmark was developed to enhance numerical simulation capability for reactive transport modeling in porous media. The benchmark was published in late September of 2009; it is not taken from a real chemical system, but realistic and numerically challenging tests. PFLOTRAN is a state-of-art massively parallel subsurface flow and reactive transport code that is being used in multiple nuclear waste repository projects at Sandia National Laboratories including Waste Isolation Pilot Plant and Used Fuel Disposition. MoMaS benchmark has three independent tests with easy, medium, and hard chemical complexity. This paper demonstrates how PFLOTRAN is applied to this benchmark exercise and shows results of the easy benchmark test case which includes mixing of aqueous components and surface complexation. Surface complexations consist of monodentate and bidentate reactions which introduces difficulty in defining selectivity coefficient if the reaction applies to a bulk reference volume. The selectivity coefficient becomes porosity dependent for bidentate reaction in heterogeneous porous media. The benchmark is solved by PFLOTRAN with minimal modification to address the issue and unit conversions were made properly to suit PFLOTRAN.
MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

NASA Astrophysics Data System (ADS)

Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

2018-02-01

We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S

2015-01-01

This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
A conservative approach to parallelizing the Sharks World simulation

NASA Technical Reports Server (NTRS)

Nicol, David M.; Riffe, Scott E.

1990-01-01

Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes.
Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.

2000-01-01

Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining, gather/scatter, and redistribution. At the end of the conversion process most intermediate Charon function calls will have been removed, the non-distributed arrays will have been deleted, and virtually the only remaining Charon functions calls are the high-level, highly optimized communications. Distribution of the data is under complete control of the programmer, although a wide range of useful distributions is easily available through predefined functions. A crucial aspect of the library is that it does not allocate space for distributed arrays, but accepts programmer-specified memory. This has two major consequences. First, codes parallelized using Charon do not suffer from encapsulation; user data is always directly accessible. This provides high efficiency, and also retains the possibility of using message passing directly for highly irregular communications. Second, non-distributed arrays can be interpreted as (trivial) distributions in the Charon sense, which allows them to be mapped to truly distributed arrays, and vice versa. This is the mechanism that enables incremental parallelization. In this paper we provide a brief introduction of the library and then focus on the actual steps in the parallelization process, using some representative examples from, among others, the NAS Parallel Benchmarks. We show how a complicated two-dimensional pipeline-the prototypical non-data-parallel algorithm- can be constructed with ease. To demonstrate the flexibility of the library, we give examples of the stepwise, efficient parallel implementation of nonlocal boundary conditions common in aircraft simulations, as well as the construction of the sequence of grids required for multigrid.
Parallel tempering for the traveling salesman problem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Percus, Allon; Wang, Richard; Hyman, Jeffrey

We explore the potential of parallel tempering as a combinatorial optimization method, applying it to the traveling salesman problem. We compare simulation results of parallel tempering with a benchmark implementation of simulated annealing, and study how different choices of parameters affect the relative performance of the two methods. We find that a straightforward implementation of parallel tempering can outperform simulated annealing in several crucial respects. When parameters are chosen appropriately, both methods yield close approximation to the actual minimum distance for an instance with 200 nodes. However, parallel tempering yields more consistently accurate results when a series of independent simulationsmore » are performed. Our results suggest that parallel tempering might offer a simple but powerful alternative to simulated annealing for combinatorial optimization problems.« less

The serotonin-N-acetylserotonin-melatonin pathway as a biomarker for autism spectrum disorders.

PubMed

Pagan, C; Delorme, R; Callebert, J; Goubran-Botros, H; Amsellem, F; Drouot, X; Boudebesse, C; Le Dudal, K; Ngo-Nguyen, N; Laouamri, H; Gillberg, C; Leboyer, M; Bourgeron, T; Launay, J-M

2014-11-11

Elevated whole-blood serotonin and decreased plasma melatonin (a circadian synchronizer hormone that derives from serotonin) have been reported independently in patients with autism spectrum disorders (ASDs). Here, we explored, in parallel, serotonin, melatonin and the intermediate N-acetylserotonin (NAS) in a large cohort of patients with ASD and their relatives. We then investigated the clinical correlates of these biochemical parameters. Whole-blood serotonin, platelet NAS and plasma melatonin were assessed in 278 patients with ASD, their 506 first-degree relatives (129 unaffected siblings, 199 mothers and 178 fathers) and 416 sex- and age-matched controls. We confirmed the previously reported hyperserotonemia in ASD (40% (35-46%) of patients), as well as the deficit in melatonin (51% (45-57%)), taking as a threshold the 95th or 5th percentile of the control group, respectively. In addition, this study reveals an increase of NAS (47% (41-54%) of patients) in platelets, pointing to a disruption of the serotonin-NAS-melatonin pathway in ASD. Biochemical impairments were also observed in the first-degree relatives of patients. A score combining impairments of serotonin, NAS and melatonin distinguished between patients and controls with a sensitivity of 80% and a specificity of 85%. In patients the melatonin deficit was only significantly associated with insomnia. Impairments of melatonin synthesis in ASD may be linked with decreased 14-3-3 proteins. Although ASDs are highly heterogeneous, disruption of the serotonin-NAS-melatonin pathway is a very frequent trait in patients and may represent a useful biomarker for a large subgroup of individuals with ASD.
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

PubMed Central

Nadkarni, P. M.; Miller, P. L.

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632
Rubus: A compiler for seamless and extensible parallelism.

PubMed

Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.
Rubus: A compiler for seamless and extensible parallelism

PubMed Central

Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758
Application of computational physics within Northrop

NASA Technical Reports Server (NTRS)

George, M. W.; Ling, R. T.; Mangus, J. F.; Thompkins, W. T.

1987-01-01

An overview of Northrop programs in computational physics is presented. These programs depend on access to today's supercomputers, such as the Numerical Aerodynamical Simulator (NAS), and future growth on the continuing evolution of computational engines. Descriptions here are concentrated on the following areas: computational fluid dynamics (CFD), computational electromagnetics (CEM), computer architectures, and expert systems. Current efforts and future directions in these areas are presented. The impact of advances in the CFD area is described, and parallels are drawn to analagous developments in CEM. The relationship between advances in these areas and the development of advances (parallel) architectures and expert systems is also presented.
AdiosStMan: Parallelizing Casacore Table Data System using Adaptive IO System

NASA Astrophysics Data System (ADS)

Wang, R.; Harris, C.; Wicenec, A.

2016-07-01

In this paper, we investigate the Casacore Table Data System (CTDS) used in the casacore and CASA libraries, and methods to parallelize it. CTDS provides a storage manager plugin mechanism for third-party developers to design and implement their own CTDS storage managers. Having this in mind, we looked into various storage backend techniques that can possibly enable parallel I/O for CTDS by implementing new storage managers. After carrying on benchmarks showing the excellent parallel I/O throughput of the Adaptive IO System (ADIOS), we implemented an ADIOS based parallel CTDS storage manager. We then applied the CASA MSTransform frequency split task to verify the ADIOS Storage Manager. We also ran a series of performance tests to examine the I/O throughput in a massively parallel scenario.
PCLIPS: Parallel CLIPS

NASA Technical Reports Server (NTRS)

Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

1994-01-01

A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.
Nonlinear viscoplasticity in ASPECT: benchmarking and applications to subduction

NASA Astrophysics Data System (ADS)

Glerum, Anne; Thieulot, Cedric; Fraters, Menno; Blom, Constantijn; Spakman, Wim

2018-03-01

ASPECT (Advanced Solver for Problems in Earth's ConvecTion) is a massively parallel finite element code originally designed for modeling thermal convection in the mantle with a Newtonian rheology. The code is characterized by modern numerical methods, high-performance parallelism and extensibility. This last characteristic is illustrated in this work: we have extended the use of ASPECT from global thermal convection modeling to upper-mantle-scale applications of subduction.
Subduction modeling generally requires the tracking of multiple materials with different properties and with nonlinear viscous and viscoplastic rheologies. To this end, we implemented a frictional plasticity criterion that is combined with a viscous diffusion and dislocation creep rheology. Because ASPECT uses compositional fields to represent different materials, all material parameters are made dependent on a user-specified number of fields.
The goal of this paper is primarily to describe and verify our implementations of complex, multi-material rheology by reproducing the results of four well-known two-dimensional benchmarks: the indentor benchmark, the brick experiment, the sandbox experiment and the slab detachment benchmark. Furthermore, we aim to provide hands-on examples for prospective users by demonstrating the use of multi-material viscoplasticity with three-dimensional, thermomechanical models of oceanic subduction, putting ASPECT on the map as a community code for high-resolution, nonlinear rheology subduction modeling.
ComprehensiveBench: a Benchmark for the Extensive Evaluation of Global Scheduling Algorithms

NASA Astrophysics Data System (ADS)

Pilla, Laércio L.; Bozzetti, Tiago C.; Castro, Márcio; Navaux, Philippe O. A.; Méhaut, Jean-François

2015-10-01

Parallel applications that present tasks with imbalanced loads or complex communication behavior usually do not exploit the underlying resources of parallel platforms to their full potential. In order to mitigate this issue, global scheduling algorithms are employed. As finding the optimal task distribution is an NP-Hard problem, identifying the most suitable algorithm for a specific scenario and comparing algorithms are not trivial tasks. In this context, this paper presents ComprehensiveBench, a benchmark for global scheduling algorithms that enables the variation of a vast range of parameters that affect performance. ComprehensiveBench can be used to assist in the development and evaluation of new scheduling algorithms, to help choose a specific algorithm for an arbitrary application, to emulate other applications, and to enable statistical tests. We illustrate its use in this paper with an evaluation of Charm++ periodic load balancers that stresses their characteristics.
Supercomputing '91; Proceedings of the 4th Annual Conference on High Performance Computing, Albuquerque, NM, Nov. 18-22, 1991

NASA Technical Reports Server (NTRS)

1991-01-01

Various papers on supercomputing are presented. The general topics addressed include: program analysis/data dependence, memory access, distributed memory code generation, numerical algorithms, supercomputer benchmarks, latency tolerance, parallel programming, applications, processor design, networks, performance tools, mapping and scheduling, characterization affecting performance, parallelism packaging, computing climate change, combinatorial algorithms, hardware and software performance issues, system issues. (No individual items are abstracted in this volume)
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
Genetic Parallel Programming: design and implementation.

PubMed

Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

2006-01-01

This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
Research on computer systems benchmarking

NASA Technical Reports Server (NTRS)

Smith, Alan Jay (Principal Investigator)

1996-01-01

This grant addresses the topic of research on computer systems benchmarking and is more generally concerned with performance issues in computer systems. This report reviews work in those areas during the period of NASA support under this grant. The bulk of the work performed concerned benchmarking and analysis of CPUs, compilers, caches, and benchmark programs. The first part of this work concerned the issue of benchmark performance prediction. A new approach to benchmarking and machine characterization was reported, using a machine characterizer that measures the performance of a given system in terms of a Fortran abstract machine. Another report focused on analyzing compiler performance. The performance impact of optimization in the context of our methodology for CPU performance characterization was based on the abstract machine model. Benchmark programs are analyzed in another paper. A machine-independent model of program execution was developed to characterize both machine performance and program execution. By merging these machine and program characterizations, execution time can be estimated for arbitrary machine/program combinations. The work was continued into the domain of parallel and vector machines, including the issue of caches in vector processors and multiprocessors. All of the afore-mentioned accomplishments are more specifically summarized in this report, as well as those smaller in magnitude supported by this grant.
Interfacing Computer Aided Parallelization and Performance Analysis

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Biegel, Bryan A. (Technical Monitor)

2003-01-01

When porting sequential applications to parallel computer architectures, the program developer will typically go through several cycles of source code optimization and performance analysis. We have started a project to develop an environment where the user can jointly navigate through program structure and performance data information in order to make efficient optimization decisions. In a prototype implementation we have interfaced the CAPO computer aided parallelization tool with the Paraver performance analysis tool. We describe both tools and their interface and give an example for how the interface helps within the program development cycle of a benchmark code.
Parareal in time 3D numerical solver for the LWR Benchmark neutron diffusion transient model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baudron, Anne-Marie, E-mail: anne-marie.baudron@cea.fr; CEA-DRN/DMT/SERMA, CEN-Saclay, 91191 Gif sur Yvette Cedex; Lautard, Jean-Jacques, E-mail: jean-jacques.lautard@cea.fr

2014-12-15

In this paper we present a time-parallel algorithm for the 3D neutrons calculation of a transient model in a nuclear reactor core. The neutrons calculation consists in numerically solving the time dependent diffusion approximation equation, which is a simplified transport equation. The numerical resolution is done with finite elements method based on a tetrahedral meshing of the computational domain, representing the reactor core, and time discretization is achieved using a θ-scheme. The transient model presents moving control rods during the time of the reaction. Therefore, cross-sections (piecewise constants) are taken into account by interpolations with respect to the velocity ofmore » the control rods. The parallelism across the time is achieved by an adequate use of the parareal in time algorithm to the handled problem. This parallel method is a predictor corrector scheme that iteratively combines the use of two kinds of numerical propagators, one coarse and one fine. Our method is made efficient by means of a coarse solver defined with large time step and fixed position control rods model, while the fine propagator is assumed to be a high order numerical approximation of the full model. The parallel implementation of our method provides a good scalability of the algorithm. Numerical results show the efficiency of the parareal method on large light water reactor transient model corresponding to the Langenbuch–Maurer–Werner benchmark.« less
SU-E-T-466: Implementation of An Extension Module for Dose Response Models in the TOPAS Monte Carlo Toolkit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ramos-Mendez, J; Faddegon, B; Perl, J

2015-06-15

Purpose: To develop and verify an extension to TOPAS for calculation of dose response models (TCP/NTCP). TOPAS wraps and extends Geant4. Methods: The TOPAS DICOM interface was extended to include structure contours, for subsequent calculation of DVH’s and TCP/NTCP. The following dose response models were implemented: Lyman-Kutcher-Burman (LKB), critical element (CE), population based critical volume (CV), parallel-serials, a sigmoid-based model of Niemierko for NTCP and TCP, and a Poisson-based model for TCP. For verification, results for the parallel-serial and Poisson models, with 6 MV x-ray dose distributions calculated with TOPAS and Pinnacle v9.2, were compared to data from the benchmarkmore » configuration of the AAPM Task Group 166 (TG166). We provide a benchmark configuration suitable for proton therapy along with results for the implementation of the Niemierko, CV and CE models. Results: The maximum difference in DVH calculated with Pinnacle and TOPAS was 2%. Differences between TG166 data and Monte Carlo calculations of up to 4.2%±6.1% were found for the parallel-serial model and up to 1.0%±0.7% for the Poisson model (including the uncertainty due to lack of knowledge of the point spacing in TG166). For CE, CV and Niemierko models, the discrepancies between the Pinnacle and TOPAS results are 74.5%, 34.8% and 52.1% when using 29.7 cGy point spacing, the differences being highly sensitive to dose spacing. On the other hand, with our proposed benchmark configuration, the largest differences were 12.05%±0.38%, 3.74%±1.6%, 1.57%±4.9% and 1.97%±4.6% for the CE, CV, Niemierko and LKB models, respectively. Conclusion: Several dose response models were successfully implemented with the extension module. Reference data was calculated for future benchmarking. Dose response calculated for the different models varied much more widely for the TG166 benchmark than for the proposed benchmark, which had much lower sensitivity to the choice of DVH dose points. This work was supported by National Cancer Institute Grant R01CA140735.« less
Accelerating finite-rate chemical kinetics with coprocessors: Comparing vectorization methods on GPUs, MICs, and CPUs

NASA Astrophysics Data System (ADS)

Stone, Christopher P.; Alferman, Andrew T.; Niemeyer, Kyle E.

2018-05-01

Accurate and efficient methods for solving stiff ordinary differential equations (ODEs) are a critical component of turbulent combustion simulations with finite-rate chemistry. The ODEs governing the chemical kinetics at each mesh point are decoupled by operator-splitting allowing each to be solved concurrently. An efficient ODE solver must then take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and a nonstiff Runge-Kutta ODE solver are both implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms within OpenCL. Both methods solve multiple ODEs concurrently within the same instruction stream. The performance of these parallel implementations was measured on three chemical kinetic models of increasing size across several multicore and many-core platforms. Two separate benchmarks were conducted to clearly determine any performance advantage offered by either method. The first benchmark measured the run-time of evaluating the right-hand-side source terms in parallel and the second benchmark integrated a series of constant-pressure, homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded C++ code. The SIMT parallel model on the host and Phi was 13%-35% slower than the baseline while the SIMT model on the NVIDIA Kepler GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased significantly with the SIMD implementations on the host CPU (2.5-2.7 ×) and Xeon Phi coprocessor (4.7-4.9 ×) compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.5-1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi. The significant performance improvement provided by the SIMD parallel strategy motivates further research into more ODE solver methods that are both SIMD-friendly and computationally efficient.
FX-87 performance measurements: data-flow implementation. Technical report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hammel, R.T.; Gifford, D.K.

1988-11-01

This report documents a series of experiments performed to explore the thesis that the FX-87 effect system permits a compiler to schedule imperative programs (i.e., programs that may contain side-effects) for execution on a parallel computer. The authors analyze how much the FX-87 static effect system can improve the execution times of five benchmark programs on a parallel graph interpreter. Three of their benchmark programs do not use side-effects (factorial, fibonacci, and polynomial division) and thus did not have any effect-induced constraints. Their FX-87 performance was comparable to their performance in a purely functional language. Two of the benchmark programsmore » use side effects (DNA sequence matching and Scheme interpretation) and the compiler was able to use effect information to reduce their execution times by factors of 1.7 to 5.4 when compared with sequential execution times. These results support the thesis that a static effect system is a powerful tool for compilation to multiprocessor computers. However, the graph interpreter we used was based on unrealistic assumptions, and thus our results may not accurately reflect the performance of a practical FX-87 implementation. The results also suggest that conventional loop analysis would complement the FX-87 effect system« less
Parallelization of PANDA discrete ordinates code using spatial decomposition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Humbert, P.

2006-07-01

We present the parallel method, based on spatial domain decomposition, implemented in the 2D and 3D versions of the discrete Ordinates code PANDA. The spatial mesh is orthogonal and the spatial domain decomposition is Cartesian. For 3D problems a 3D Cartesian domain topology is created and the parallel method is based on a domain diagonal plane ordered sweep algorithm. The parallel efficiency of the method is improved by directions and octants pipelining. The implementation of the algorithm is straightforward using MPI blocking point to point communications. The efficiency of the method is illustrated by an application to the 3D-Ext C5G7more » benchmark of the OECD/NEA. (authors)« less
PCTDSE: A parallel Cartesian-grid-based TDSE solver for modeling laser-atom interactions

NASA Astrophysics Data System (ADS)

Fu, Yongsheng; Zeng, Jiaolong; Yuan, Jianmin

2017-01-01

We present a parallel Cartesian-grid-based time-dependent Schrödinger equation (TDSE) solver for modeling laser-atom interactions. It can simulate the single-electron dynamics of atoms in arbitrary time-dependent vector potentials. We use a split-operator method combined with fast Fourier transforms (FFT), on a three-dimensional (3D) Cartesian grid. Parallelization is realized using a 2D decomposition strategy based on the Message Passing Interface (MPI) library, which results in a good parallel scaling on modern supercomputers. We give simple applications for the hydrogen atom using the benchmark problems coming from the references and obtain repeatable results. The extensions to other laser-atom systems are straightforward with minimal modifications of the source code.

Job Management Requirements for NAS Parallel Systems and Clusters

NASA Technical Reports Server (NTRS)

Saphir, William; Tanner, Leigh Ann; Traversat, Bernard

1995-01-01

A job management system is a critical component of a production supercomputing environment, permitting oversubscribed resources to be shared fairly and efficiently. Job management systems that were originally designed for traditional vector supercomputers are not appropriate for the distributed-memory parallel supercomputers that are becoming increasingly important in the high performance computing industry. Newer job management systems offer new functionality but do not solve fundamental problems. We address some of the main issues in resource allocation and job scheduling we have encountered on two parallel computers - a 160-node IBM SP2 and a cluster of 20 high performance workstations located at the Numerical Aerodynamic Simulation facility. We describe the requirements for resource allocation and job management that are necessary to provide a production supercomputing environment on these machines, prioritizing according to difficulty and importance, and advocating a return to fundamental issues.
A high performance linear equation solver on the VPP500 parallel supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi

1994-12-31

This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.
Lightweight Specifications for Parallel Correctness

DTIC Science & Technology

2012-12-05

Galenson, Benjamin Hindman, Thibaud Hottelier, Pallavi Joshi, Ben- jamin Lipshitz, Leo Meyerovich, Mayur Naik, Chang-Seo Park, and Philip Reames — many...violating executions. We discuss some of these errors in detail in the CHAPTER 5. SPECIFYING AND CHECKING SEMANTIC ATOMICITY 84 Benchmark Approx. LoC
Hierarchically Parallelized Constrained Nonlinear Solvers with Automated Substructuring

NASA Technical Reports Server (NTRS)

Padovan, Joe; Kwang, Abel

1994-01-01

This paper develops a parallelizable multilevel multiple constrained nonlinear equation solver. The substructuring process is automated to yield appropriately balanced partitioning of each succeeding level. Due to the generality of the procedure,_sequential, as well as partially and fully parallel environments can be handled. This includes both single and multiprocessor assignment per individual partition. Several benchmark examples are presented. These illustrate the robustness of the procedure as well as its capability to yield significant reductions in memory utilization and calculational effort due both to updating and inversion.
Interactive visual optimization and analysis for RFID benchmarking.

PubMed

Wu, Yingcai; Chung, Ka-Kei; Qu, Huamin; Yuan, Xiaoru; Cheung, S C

2009-01-01

Radio frequency identification (RFID) is a powerful automatic remote identification technique that has wide applications. To facilitate RFID deployment, an RFID benchmarking instrument called aGate has been invented to identify the strengths and weaknesses of different RFID technologies in various environments. However, the data acquired by aGate are usually complex time varying multidimensional 3D volumetric data, which are extremely challenging for engineers to analyze. In this paper, we introduce a set of visualization techniques, namely, parallel coordinate plots, orientation plots, a visual history mechanism, and a 3D spatial viewer, to help RFID engineers analyze benchmark data visually and intuitively. With the techniques, we further introduce two workflow procedures (a visual optimization procedure for finding the optimum reader antenna configuration and a visual analysis procedure for comparing the performance and identifying the flaws of RFID devices) for the RFID benchmarking, with focus on the performance analysis of the aGate system. The usefulness and usability of the system are demonstrated in the user evaluation.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Cohen, J; Dossa, D; Gokhale, M

Critical data science applications requiring frequent access to storage perform poorly on today's computing architectures. This project addresses efficient computation of data-intensive problems in national security and basic science by exploring, advancing, and applying a new form of computing called storage-intensive supercomputing (SISC). Our goal is to enable applications that simply cannot run on current systems, and, for a broad range of data-intensive problems, to deliver an order of magnitude improvement in price/performance over today's data-intensive architectures. This technical report documents much of the work done under LDRD 07-ERD-063 Storage Intensive Supercomputing during the period 05/07-09/07. The following chapters describe:more » (1) a new file I/O monitoring tool iotrace developed to capture the dynamic I/O profiles of Linux processes; (2) an out-of-core graph benchmark for level-set expansion of scale-free graphs; (3) an entity extraction benchmark consisting of a pipeline of eight components; and (4) an image resampling benchmark drawn from the SWarp program in the LSST data processing pipeline. The performance of the graph and entity extraction benchmarks was measured in three different scenarios: data sets residing on the NFS file server and accessed over the network; data sets stored on local disk; and data sets stored on the Fusion I/O parallel NAND Flash array. The image resampling benchmark compared performance of software-only to GPU-accelerated. In addition to the work reported here, an additional text processing application was developed that used an FPGA to accelerate n-gram profiling for language classification. The n-gram application will be presented at SC07 at the High Performance Reconfigurable Computing Technologies and Applications Workshop. The graph and entity extraction benchmarks were run on a Supermicro server housing the NAND Flash 40GB parallel disk array, the Fusion-io. The Fusion system specs are as follows: SuperMicro X7DBE Xeon Dual Socket Blackford Server Motherboard; 2 Intel Xeon Dual-Core 2.66 GHz processors; 1 GB DDR2 PC2-5300 RAM (2 x 512); 80GB Hard Drive (Seagate SATA II Barracuda). The Fusion board is presently capable of 4X in a PCIe slot. The image resampling benchmark was run on a dual Xeon workstation with NVIDIA graphics card (see Chapter 5 for full specification). An XtremeData Opteron+FPGA was used for the language classification application. We observed that these benchmarks are not uniformly I/O intensive. The only benchmark that showed greater that 50% of the time in I/O was the graph algorithm when it accessed data files over NFS. When local disk was used, the graph benchmark spent at most 40% of its time in I/O. The other benchmarks were CPU dominated. The image resampling benchmark and language classification showed order of magnitude speedup over software by using co-processor technology to offload the CPU-intensive kernels. Our experiments to date suggest that emerging hardware technologies offer significant benefit to boosting the performance of data-intensive algorithms. Using GPU and FPGA co-processors, we were able to improve performance by more than an order of magnitude on the benchmark algorithms, eliminating the processor bottleneck of CPU-bound tasks. Experiments with a prototype solid state nonvolative memory available today show 10X better throughput on random reads than disk, with a 2X speedup on a graph processing benchmark when compared to the use of local SATA disk.« less
Natural Assurance Scheme: A level playing field framework for Green-Grey infrastructure development.

PubMed

Denjean, Benjamin; Altamirano, Mónica A; Graveline, Nina; Giordano, Raffaele; van der Keur, Peter; Moncoulon, David; Weinberg, Josh; Máñez Costa, María; Kozinc, Zdravko; Mulligan, Mark; Pengal, Polona; Matthews, John; van Cauwenbergh, Nora; López Gunn, Elena; Bresch, David N

2017-11-01

This paper proposes a conceptual framework to systematize the use of Nature-based solutions (NBS) by integrating their resilience potential into Natural Assurance Scheme (NAS), focusing on insurance value as corner stone for both awareness-raising and valuation. As such one of its core goal is to align research and pilot projects with infrastructure development constraints and priorities. Under NAS, the integrated contribution of natural infrastructure to Disaster Risk Reduction is valued in the context of an identified growing need for climate robust infrastructure. The potential of NAS benefits and trade-off are explored by through the alternative lens of Disaster Resilience Enhancement (DRE). Such a system requires a joint effort of specific knowledge transfer from research groups and stakeholders to potential future NAS developers and investors. We therefore match the knowledge gaps with operational stages of the development of NAS from a project designer perspective. We start by highlighting the key role of the insurance industry in incentivizing and assessing disaster and slow onset resilience enhancement strategies. In parallel we place the public sector as potential kick-starters in DRE initiatives through the existing initiatives and constraints of infrastructure procurement. Under this perspective the paper explores the required alignment of Integrated Water resources planning and Public investment systems. Ultimately this will provide the possibility for both planners and investors to design no regret NBS and mixed Grey-Green infrastructures systems. As resources and constraints are widely different between infrastructure development contexts, the framework does not provide explicit methodological choices but presents current limits of knowledge and know-how. In conclusion the paper underlines the potential of NAS to ease the infrastructure gap in water globally by stressing the advantages of investment in the protection, enhancement and restoration of natural capital as an effective climate change adaptation investment. Copyright © 2017. Published by Elsevier Inc.
Use Computer-Aided Tools to Parallelize Large CFD Applications

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Yan, J.

2000-01-01

Porting applications to high performance parallel computers is always a challenging task. It is time consuming and costly. With rapid progressing in hardware architectures and increasing complexity of real applications in recent years, the problem becomes even more sever. Today, scalability and high performance are mostly involving handwritten parallel programs using message-passing libraries (e.g. MPI). However, this process is very difficult and often error-prone. The recent reemergence of shared memory parallel (SMP) architectures, such as the cache coherent Non-Uniform Memory Access (ccNUMA) architecture used in the SGI Origin 2000, show good prospects for scaling beyond hundreds of processors. Programming on an SMP is simplified by working in a globally accessible address space. The user can supply compiler directives, such as OpenMP, to parallelize the code. As an industry standard for portable implementation of parallel programs for SMPs, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran, C and C++ to express shared memory parallelism. It promises an incremental path for parallel conversion of existing software, as well as scalability and performance for a complete rewrite or an entirely new development. Perhaps the main disadvantage of programming with directives is that inserted directives may not necessarily enhance performance. In the worst cases, it can create erroneous results. While vendors have provided tools to perform error-checking and profiling, automation in directive insertion is very limited and often failed on large programs, primarily due to the lack of a thorough enough data dependence analysis. To overcome the deficiency, we have developed a toolkit, CAPO, to automatically insert OpenMP directives in Fortran programs and apply certain degrees of optimization. CAPO is aimed at taking advantage of detailed inter-procedural dependence analysis provided by CAPTools, developed by the University of Greenwich, to reduce potential errors made by users. Earlier tests on NAS Benchmarks and ARC3D have demonstrated good success of this tool. In this study, we have applied CAPO to parallelize three large applications in the area of computational fluid dynamics (CFD): OVERFLOW, TLNS3D and INS3D. These codes are widely used for solving Navier-Stokes equations with complicated boundary conditions and turbulence model in multiple zones. Each one comprises of from 50K to 1,00k lines of FORTRAN77. As an example, CAPO took 77 hours to complete the data dependence analysis of OVERFLOW on a workstation (SGI, 175MHz, R10K processor). A fair amount of effort was spent on correcting false dependencies due to lack of necessary knowledge during the analysis. Even so, CAPO provides an easy way for user to interact with the parallelization process. The OpenMP version was generated within a day after the analysis was completed. Due to sequential algorithms involved, code sections in TLNS3D and INS3D need to be restructured by hand to produce more efficient parallel codes. An included figure shows preliminary test results of the generated OVERFLOW with several test cases in single zone. The MPI data points for the small test case were taken from a handcoded MPI version. As we can see, CAPO's version has achieved 18 fold speed up on 32 nodes of the SGI O2K. For the small test case, it outperformed the MPI version. These results are very encouraging, but further work is needed. For example, although CAPO attempts to place directives on the outer- most parallel loops in an interprocedural framework, it does not insert directives based on the best manual strategy. In particular, it lacks the support of parallelization at the multi-zone level. Future work will emphasize on the development of methodology to work in a multi-zone level and with a hybrid approach. Development of tools to perform more complicated code transformation is also needed.
Spherical Harmonic Solutions to the 3D Kobayashi Benchmark Suite

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, P.N.; Chang, B.; Hanebutte, U.R.

1999-12-29

Spherical harmonic solutions of order 5, 9 and 21 on spatial grids containing up to 3.3 million cells are presented for the Kobayashi benchmark suite. This suite of three problems with simple geometry of pure absorber with large void region was proposed by Professor Kobayashi at an OECD/NEA meeting in 1996. Each of the three problems contains a source, a void and a shield region. Problem 1 can best be described as a box in a box problem, where a source region is surrounded by a square void region which itself is embedded in a square shield region. Problems 2more » and 3 represent a shield with a void duct. Problem 2 having a straight and problem 3 a dog leg shaped duct. A pure absorber and a 50% scattering case are considered for each of the three problems. The solutions have been obtained with Ardra, a scalable, parallel neutron transport code developed at Lawrence Livermore National Laboratory (LLNL). The Ardra code takes advantage of a two-level parallelization strategy, which combines message passing between processing nodes and thread based parallelism amongst processors on each node. All calculations were performed on the IBM ASCI Blue-Pacific computer at LLNL.« less
A Parallel Particle Swarm Optimization Algorithm Accelerated by Asynchronous Evaluations

NASA Technical Reports Server (NTRS)

Venter, Gerhard; Sobieszczanski-Sobieski, Jaroslaw

2005-01-01

A parallel Particle Swarm Optimization (PSO) algorithm is presented. Particle swarm optimization is a fairly recent addition to the family of non-gradient based, probabilistic search algorithms that is based on a simplified social model and is closely tied to swarming theory. Although PSO algorithms present several attractive properties to the designer, they are plagued by high computational cost as measured by elapsed time. One approach to reduce the elapsed time is to make use of coarse-grained parallelization to evaluate the design points. Previous parallel PSO algorithms were mostly implemented in a synchronous manner, where all design points within a design iteration are evaluated before the next iteration is started. This approach leads to poor parallel speedup in cases where a heterogeneous parallel environment is used and/or where the analysis time depends on the design point being analyzed. This paper introduces an asynchronous parallel PSO algorithm that greatly improves the parallel e ciency. The asynchronous algorithm is benchmarked on a cluster assembled of Apple Macintosh G5 desktop computers, using the multi-disciplinary optimization of a typical transport aircraft wing as an example.
MPF: A portable message passing facility for shared memory multiprocessors

NASA Technical Reports Server (NTRS)

Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.

1987-01-01

The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)

2002-01-01

The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.

PubMed

Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio

2014-07-05

A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems. Copyright © 2014 Wiley Periodicals, Inc.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Caubet, Jordi; Biegel, Bryan A. (Technical Monitor)

2002-01-01

In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. This system consists of two major components: The OMPItrace dynamic instrumentation mechanism, which allows the tracing of processes and threads and the Paraver graphical user interface for inspection and analyses of the generated traces. We describe how to use the system to conduct a detailed comparative study of a benchmark code implemented in five different programming paradigms applicable for shared memory
Supercomputing 2002: NAS Demo Abstracts

NASA Technical Reports Server (NTRS)

Parks, John (Technical Monitor)

2002-01-01

The hyperwall is a new concept in visual supercomputing, conceived and developed by the NAS Exploratory Computing Group. The hyperwall will allow simultaneous and coordinated visualization and interaction of an array of processes, such as a the computations of a parameter study or the parallel evolutions of a genetic algorithm population. Making over 65 million pixels available to the user, the hyperwall will enable and elicit qualitatively new ways of leveraging computers to accomplish science. It is currently still unclear whether we will be able to transport the hyperwall to SC02. The crucial display frame still has not been completed by the metal fabrication shop, although they promised an August delivery. Also, we are still working the fragile node issue, which may require transplantation of the compute nodes from the present 2U cases into 3U cases. This modification will increase the present 3-rack configuration to 5 racks.
Benchmarking Memory Performance with the Data Cube Operator

NASA Technical Reports Server (NTRS)

Frumkin, Michael A.; Shabanov, Leonid V.

2004-01-01

Data movement across a computer memory hierarchy and across computational grids is known to be a limiting factor for applications processing large data sets. We use the Data Cube Operator on an Arithmetic Data Set, called ADC, to benchmark capabilities of computers and of computational grids to handle large distributed data sets. We present a prototype implementation of a parallel algorithm for computation of the operatol: The algorithm follows a known approach for computing views from the smallest parent. The ADC stresses all levels of grid memory and storage by producing some of 2d views of an Arithmetic Data Set of d-tuples described by a small number of integers. We control data intensity of the ADC by selecting the tuple parameters, the sizes of the views, and the number of realized views. Benchmarking results of memory performance of a number of computer architectures and of a small computational grid are presented.
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing

NASA Technical Reports Server (NTRS)

Fricker, David M.

1997-01-01

The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations

NASA Astrophysics Data System (ADS)

Teijeiro, C.; Hammerschmidt, T.; Drautz, R.; Sutmann, G.

2016-07-01

Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors.
Aircraft Configuration and Flight Crew Compliance with Procedures While Conducting Flight Deck Based Interval Management (FIM) Operations

NASA Technical Reports Server (NTRS)

Shay, Rick; Swieringa, Kurt A.; Baxley, Brian T.

2012-01-01

Flight deck based Interval Management (FIM) applications using ADS-B are being developed to improve both the safety and capacity of the National Airspace System (NAS). FIM is expected to improve the safety and efficiency of the NAS by giving pilots the technology and procedures to precisely achieve an interval behind the preceding aircraft by a specific point. Concurrently but independently, Optimized Profile Descents (OPD) are being developed to help reduce fuel consumption and noise, however, the range of speeds available when flying an OPD results in a decrease in the delivery precision of aircraft to the runway. This requires the addition of a spacing buffer between aircraft, reducing system throughput. FIM addresses this problem by providing pilots with speed guidance to achieve a precise interval behind another aircraft, even while flying optimized descents. The Interval Management with Spacing to Parallel Dependent Runways (IMSPiDR) human-in-the-loop experiment employed 24 commercial pilots to explore the use of FIM equipment to conduct spacing operations behind two aircraft arriving to parallel runways, while flying an OPD during high-density operations. This paper describes the impact of variations in pilot operations; in particular configuring the aircraft, their compliance with FIM operating procedures, and their response to changes of the FIM speed. An example of the displayed FIM speeds used incorrectly by a pilot is also discussed. Finally, this paper examines the relationship between achieving airline operational goals for individual aircraft and the need for ATC to deliver aircraft to the runway with greater precision. The results show that aircraft can fly an OPD and conduct FIM operations to dependent parallel runways, enabling operational goals to be achieved efficiently while maintaining system throughput.
PIPS-SBB: A Parallel Distributed-Memory Branch-and-Bound Algorithm for Stochastic Mixed-Integer Programs

DOE PAGES

Munguia, Lluis-Miquel; Oxberry, Geoffrey; Rajan, Deepak

2016-05-01

Stochastic mixed-integer programs (SMIPs) deal with optimization under uncertainty at many levels of the decision-making process. When solved as extensive formulation mixed- integer programs, problem instances can exceed available memory on a single workstation. In order to overcome this limitation, we present PIPS-SBB: a distributed-memory parallel stochastic MIP solver that takes advantage of parallelism at multiple levels of the optimization process. We also show promising results on the SIPLIB benchmark by combining methods known for accelerating Branch and Bound (B&B) methods with new ideas that leverage the structure of SMIPs. Finally, we expect the performance of PIPS-SBB to improve furthermore » as more functionality is added in the future.« less

Gust Acoustics Computation with a Space-Time CE/SE Parallel 3D Solver

NASA Technical Reports Server (NTRS)

Wang, X. Y.; Himansu, A.; Chang, S. C.; Jorgenson, P. C. E.; Reddy, D. R. (Technical Monitor)

2002-01-01

The benchmark Problem 2 in Category 3 of the Third Computational Aero-Acoustics (CAA) Workshop is solved using the space-time conservation element and solution element (CE/SE) method. This problem concerns the unsteady response of an isolated finite-span swept flat-plate airfoil bounded by two parallel walls to an incident gust. The acoustic field generated by the interaction of the gust with the flat-plate airfoil is computed by solving the 3D (three-dimensional) Euler equations in the time domain using a parallel version of a 3D CE/SE solver. The effect of the gust orientation on the far-field directivity is studied. Numerical solutions are presented and compared with analytical solutions, showing a reasonable agreement.
Maximal clique enumeration with data-parallel primitives

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lessley, Brenton; Perciano, Talita; Mathai, Manish

The enumeration of all maximal cliques in an undirected graph is a fundamental problem arising in several research areas. We consider maximal clique enumeration on shared-memory, multi-core architectures and introduce an approach consisting entirely of data-parallel operations, in an effort to achieve efficient and portable performance across different architectures. We study the performance of the algorithm via experiments varying over benchmark graphs and architectures. Overall, we observe that our algorithm achieves up to a 33-time speedup and 9-time speedup over state-of-the-art distributed and serial algorithms, respectively, for graphs with higher ratios of maximal cliques to total cliques. Further, we attainmore » additional speedups on a GPU architecture, demonstrating the portable performance of our data-parallel design.« less
Debugging and Analysis of Large-Scale Parallel Programs

DTIC Science & Technology

1989-09-01

Przybylski, T. Riordan , C. Rowen, and D. Van’t Hof, "A CMOS RISC Processor with Integrated System Functions," In Proc. of the 1986 COMPCON. IEEE, March 1986...Sequencers," Communications of the ACM, 22(2):115-123, 1979. 115 [Richardson, 1988] Rick Richardson, "Dhrystone 2.1 Benchmark," Usenet Distribution
Code Parallelization with CAPO: A User Manual

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)

2001-01-01

A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.
The serotonin-N-acetylserotonin–melatonin pathway as a biomarker for autism spectrum disorders

PubMed Central

Pagan, C; Delorme, R; Callebert, J; Goubran-Botros, H; Amsellem, F; Drouot, X; Boudebesse, C; Le Dudal, K; Ngo-Nguyen, N; Laouamri, H; Gillberg, C; Leboyer, M; Bourgeron, T; Launay, J-M

2014-01-01

Elevated whole-blood serotonin and decreased plasma melatonin (a circadian synchronizer hormone that derives from serotonin) have been reported independently in patients with autism spectrum disorders (ASDs). Here, we explored, in parallel, serotonin, melatonin and the intermediate N-acetylserotonin (NAS) in a large cohort of patients with ASD and their relatives. We then investigated the clinical correlates of these biochemical parameters. Whole-blood serotonin, platelet NAS and plasma melatonin were assessed in 278 patients with ASD, their 506 first-degree relatives (129 unaffected siblings, 199 mothers and 178 fathers) and 416 sex- and age-matched controls. We confirmed the previously reported hyperserotonemia in ASD (40% (35–46%) of patients), as well as the deficit in melatonin (51% (45–57%)), taking as a threshold the 95th or 5th percentile of the control group, respectively. In addition, this study reveals an increase of NAS (47% (41–54%) of patients) in platelets, pointing to a disruption of the serotonin-NAS–melatonin pathway in ASD. Biochemical impairments were also observed in the first-degree relatives of patients. A score combining impairments of serotonin, NAS and melatonin distinguished between patients and controls with a sensitivity of 80% and a specificity of 85%. In patients the melatonin deficit was only significantly associated with insomnia. Impairments of melatonin synthesis in ASD may be linked with decreased 14-3-3 proteins. Although ASDs are highly heterogeneous, disruption of the serotonin-NAS–melatonin pathway is a very frequent trait in patients and may represent a useful biomarker for a large subgroup of individuals with ASD. PMID:25386956
Cooperative parallel adaptive neighbourhood search for the disjunctively constrained knapsack problem

NASA Astrophysics Data System (ADS)

Quan, Zhe; Wu, Lei

2017-09-01

This article investigates the use of parallel computing for solving the disjunctively constrained knapsack problem. The proposed parallel computing model can be viewed as a cooperative algorithm based on a multi-neighbourhood search. The cooperation system is composed of a team manager and a crowd of team members. The team members aim at applying their own search strategies to explore the solution space. The team manager collects the solutions from the members and shares the best one with them. The performance of the proposed method is evaluated on a group of benchmark data sets. The results obtained are compared to those reached by the best methods from the literature. The results show that the proposed method is able to provide the best solutions in most cases. In order to highlight the robustness of the proposed parallel computing model, a new set of large-scale instances is introduced. Encouraging results have been obtained.
Time-Dependent Simulations of Turbopump Flows

NASA Technical Reports Server (NTRS)

Kiris, Cetin; Kwak, Dochan; Chan, William; Williams, Robert

2002-01-01

Unsteady flow simulations for RLV (Reusable Launch Vehicles) 2nd Generation baseline turbopump for one and half impeller rotations have been completed by using a 34.3 Million grid points model. MLP (Multi-Level Parallelism) shared memory parallelism has been implemented in INS3D, and benchmarked. Code optimization for cash based platforms will be completed by the end of September 2001. Moving boundary capability is obtained by using DCF module. Scripting capability from CAD (computer aided design) geometry to solution has been developed. Data compression is applied to reduce data size in post processing. Fluid/Structure coupling has been initiated.
Microscope mode secondary ion mass spectrometry imaging with a Timepix detector.

PubMed

Kiss, Andras; Jungmann, Julia H; Smith, Donald F; Heeren, Ron M A

2013-01-01

In-vacuum active pixel detectors enable high sensitivity, highly parallel time- and space-resolved detection of ions from complex surfaces. For the first time, a Timepix detector assembly was combined with a secondary ion mass spectrometer for microscope mode secondary ion mass spectrometry (SIMS) imaging. Time resolved images from various benchmark samples demonstrate the imaging capabilities of the detector system. The main advantages of the active pixel detector are the higher signal-to-noise ratio and parallel acquisition of arrival time and position. Microscope mode SIMS imaging of biomolecules is demonstrated from tissue sections with the Timepix detector.
Implementation of Benchmarking Transportation Logistics Practices and Future Benchmarking Organizations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thrower, A.W.; Patric, J.; Keister, M.

2008-07-01

The purpose of the Office of Civilian Radioactive Waste Management's (OCRWM) Logistics Benchmarking Project is to identify established government and industry practices for the safe transportation of hazardous materials which can serve as a yardstick for design and operation of OCRWM's national transportation system for shipping spent nuclear fuel and high-level radioactive waste to the proposed repository at Yucca Mountain, Nevada. The project will present logistics and transportation practices and develop implementation recommendations for adaptation by the national transportation system. This paper will describe the process used to perform the initial benchmarking study, highlight interim findings, and explain how thesemore » findings are being implemented. It will also provide an overview of the next phase of benchmarking studies. The benchmarking effort will remain a high-priority activity throughout the planning and operational phases of the transportation system. The initial phase of the project focused on government transportation programs to identify those practices which are most clearly applicable to OCRWM. These Federal programs have decades of safe transportation experience, strive for excellence in operations, and implement effective stakeholder involvement, all of which parallel OCRWM's transportation mission and vision. The initial benchmarking project focused on four business processes that are critical to OCRWM's mission success, and can be incorporated into OCRWM planning and preparation in the near term. The processes examined were: transportation business model, contract management/out-sourcing, stakeholder relations, and contingency planning. More recently, OCRWM examined logistics operations of AREVA NC's Business Unit Logistics in France. The next phase of benchmarking will focus on integrated domestic and international commercial radioactive logistic operations. The prospective companies represent large scale shippers and have vast experience in safely and efficiently shipping spent nuclear fuel and other radioactive materials. Additional business processes may be examined in this phase. The findings of these benchmarking efforts will help determine the organizational structure and requirements of the national transportation system. (authors)« less
Performance effects of irregular communications patterns on massively parallel multiprocessors

NASA Technical Reports Server (NTRS)

Saltz, Joel; Petiton, Serge; Berryman, Harry; Rifkin, Adam

1991-01-01

A detailed study of the performance effects of irregular communications patterns on the CM-2 was conducted. The communications capabilities of the CM-2 were characterized under a variety of controlled conditions. In the process of carrying out the performance evaluation, extensive use was made of a parameterized synthetic mesh. In addition, timings with unstructured meshes generated for aerodynamic codes and a set of sparse matrices with banded patterns on non-zeroes were performed. This benchmarking suite stresses the communications capabilities of the CM-2 in a range of different ways. Benchmark results demonstrate that it is possible to make effective use of much of the massive concurrency available in the communications network.
Benchmarking and performance analysis of the CM-2. [SIMD computer

NASA Technical Reports Server (NTRS)

Myers, David W.; Adams, George B., II

1988-01-01

A suite of benchmarking routines testing communication, basic arithmetic operations, and selected kernel algorithms written in LISP and PARIS was developed for the CM-2. Experiment runs are automated via a software framework that sequences individual tests, allowing for unattended overnight operation. Multiple measurements are made and treated statistically to generate well-characterized results from the noisy values given by cm:time. The results obtained provide a comparison with similar, but less extensive, testing done on a CM-1. Tests were chosen to aid the algorithmist in constructing fast, efficient, and correct code on the CM-2, as well as gain insight into what performance criteria are needed when evaluating parallel processing machines.
Role of HPC in Advancing Computational Aeroelasticity

NASA Technical Reports Server (NTRS)

Guruswamy, Guru P.

2004-01-01

On behalf of the High Performance Computing and Modernization Program (HPCMP) and NASA Advanced Supercomputing Division (NAS) a study is conducted to assess the role of supercomputers on computational aeroelasticity of aerospace vehicles. The study is mostly based on the responses to a web based questionnaire that was designed to capture the nuances of high performance computational aeroelasticity, particularly on parallel computers. A procedure is presented to assign a fidelity-complexity index to each application. Case studies based on major applications using HPCMP resources are presented.
Execution models for mapping programs onto distributed memory parallel computers

NASA Technical Reports Server (NTRS)

Sussman, Alan

1992-01-01

The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program.
Massively parallel quantum computer simulator

NASA Astrophysics Data System (ADS)

De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

2007-01-01

We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.
Performance Evaluation and Modeling Techniques for Parallel Processors. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Dimpsey, Robert Tod

1992-01-01

In practice, the performance evaluation of supercomputers is still substantially driven by singlepoint estimates of metrics (e.g., MFLOPS) obtained by running characteristic benchmarks or workloads. With the rapid increase in the use of time-shared multiprogramming in these systems, such measurements are clearly inadequate. This is because multiprogramming and system overhead, as well as other degradations in performance due to time varying characteristics of workloads, are not taken into account. In multiprogrammed environments, multiple jobs and users can dramatically increase the amount of system overhead and degrade the performance of the machine. Performance techniques, such as benchmarking, which characterize performance on a dedicated machine ignore this major component of true computer performance. Due to the complexity of analysis, there has been little work done in analyzing, modeling, and predicting the performance of applications in multiprogrammed environments. This is especially true for parallel processors, where the costs and benefits of multi-user workloads are exacerbated. While some may claim that the issue of multiprogramming is not a viable one in the supercomputer market, experience shows otherwise. Even in recent massively parallel machines, multiprogramming is a key component. It has even been claimed that a partial cause of the demise of the CM2 was the fact that it did not efficiently support time-sharing. In the same paper, Gordon Bell postulates that, multicomputers will evolve to multiprocessors in order to support efficient multiprogramming. Therefore, it is clear that parallel processors of the future will be required to offer the user a time-shared environment with reasonable response times for the applications. In this type of environment, the most important performance metric is the completion of response time of a given application. However, there are a few evaluation efforts addressing this issue.
The Metropolis Monte Carlo method with CUDA enabled Graphic Processing Units

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hall, Clifford; School of Physics, Astronomy, and Computational Sciences, George Mason University, 4400 University Dr., Fairfax, VA 22030; Ji, Weixiao

2014-02-01

We present a CPU–GPU system for runtime acceleration of large molecular simulations using GPU computation and memory swaps. The memory architecture of the GPU can be used both as container for simulation data stored on the graphics card and as floating-point code target, providing an effective means for the manipulation of atomistic or molecular data on the GPU. To fully take advantage of this mechanism, efficient GPU realizations of algorithms used to perform atomistic and molecular simulations are essential. Our system implements a versatile molecular engine, including inter-molecule interactions and orientational variables for performing the Metropolis Monte Carlo (MMC) algorithm,more » which is one type of Markov chain Monte Carlo. By combining memory objects with floating-point code fragments we have implemented an MMC parallel engine that entirely avoids the communication time of molecular data at runtime. Our runtime acceleration system is a forerunner of a new class of CPU–GPU algorithms exploiting memory concepts combined with threading for avoiding bus bandwidth and communication. The testbed molecular system used here is a condensed phase system of oligopyrrole chains. A benchmark shows a size scaling speedup of 60 for systems with 210,000 pyrrole monomers. Our implementation can easily be combined with MPI to connect in parallel several CPU–GPU duets. -- Highlights: •We parallelize the Metropolis Monte Carlo (MMC) algorithm on one CPU—GPU duet. •The Adaptive Tempering Monte Carlo employs MMC and profits from this CPU—GPU implementation. •Our benchmark shows a size scaling-up speedup of 62 for systems with 225,000 particles. •The testbed involves a polymeric system of oligopyrroles in the condensed phase. •The CPU—GPU parallelization includes dipole—dipole and Mie—Jones classic potentials.« less
Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

DOE PAGES

Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael

2015-04-08

The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on themore » performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.« less
Benchmarking NWP Kernels on Multi- and Many-core Processors

NASA Astrophysics Data System (ADS)

Michalakes, J.; Vachharajani, M.

2008-12-01

Increased computing power for weather, climate, and atmospheric science has provided direct benefits for defense, agriculture, the economy, the environment, and public welfare and convenience. Today, very large clusters with many thousands of processors are allowing scientists to move forward with simulations of unprecedented size. But time-critical applications such as real-time forecasting or climate prediction need strong scaling: faster nodes and processors, not more of them. Moreover, the need for good cost- performance has never been greater, both in terms of performance per watt and per dollar. For these reasons, the new generations of multi- and many-core processors being mass produced for commercial IT and "graphical computing" (video games) are being scrutinized for their ability to exploit the abundant fine- grain parallelism in atmospheric models. We present results of our work to date identifying key computational kernels within the dynamics and physics of a large community NWP model, the Weather Research and Forecast (WRF) model. We benchmark and optimize these kernels on several different multi- and many-core processors. The goals are to (1) characterize and model performance of the kernels in terms of computational intensity, data parallelism, memory bandwidth pressure, memory footprint, etc. (2) enumerate and classify effective strategies for coding and optimizing for these new processors, (3) assess difficulties and opportunities for tool or higher-level language support, and (4) establish a continuing set of kernel benchmarks that can be used to measure and compare effectiveness of current and future designs of multi- and many-core processors for weather and climate applications.
Parallelization of Unsteady Adaptive Mesh Refinement for Unstructured Navier-Stokes Solvers

NASA Technical Reports Server (NTRS)

Schwing, Alan M.; Nompelis, Ioannis; Candler, Graham V.

2014-01-01

This paper explores the implementation of the MPI parallelization in a Navier-Stokes solver using adaptive mesh re nement. Viscous and inviscid test problems are considered for the purpose of benchmarking, as are implicit and explicit time advancement methods. The main test problem for comparison includes e ects from boundary layers and other viscous features and requires a large number of grid points for accurate computation. Ex- perimental validation against double cone experiments in hypersonic ow are shown. The adaptive mesh re nement shows promise for a staple test problem in the hypersonic com- munity. Extension to more advanced techniques for more complicated ows is described.
Analytical theory of coherent synchrotron radiation wakefield of short bunches shielded by conducting parallel plates

NASA Astrophysics Data System (ADS)

Stupakov, Gennady; Zhou, Demin

2016-04-01

We develop a general model of coherent synchrotron radiation (CSR) impedance with shielding provided by two parallel conducting plates. This model allows us to easily reproduce all previously known analytical CSR wakes and to expand the analysis to situations not explored before. It reduces calculations of the impedance to taking integrals along the trajectory of the beam. New analytical results are derived for the radiation impedance with shielding for the following orbits: a kink, a bending magnet, a wiggler of finite length, and an infinitely long wiggler. All our formulas are benchmarked against numerical simulations with the CSRZ computer code.

Analytical theory of coherent synchrotron radiation wakefield of short bunches shielded by conducting parallel plates

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stupakov, Gennady; Zhou, Demin

2016-04-21

We develop a general model of coherent synchrotron radiation (CSR) impedance with shielding provided by two parallel conducting plates. This model allows us to easily reproduce all previously known analytical CSR wakes and to expand the analysis to situations not explored before. It reduces calculations of the impedance to taking integrals along the trajectory of the beam. New analytical results are derived for the radiation impedance with shielding for the following orbits: a kink, a bending magnet, a wiggler of finite length, and an infinitely long wiggler. All our formulas are benchmarked against numerical simulations with the CSRZ computer code.
Comprehensive School Reform at the Helm: The North Carolina Instructional Leadership Reform Program. Benchmark. Volume 5, Issue 4, Fall 2004

ERIC Educational Resources Information Center

Janc, Helen; Appelbaum, Deborah

2004-01-01

This newsletter discusses how in many ways the development of the school reform concept parallels with the evolution of thinking about principal leadership over the past decade. Under Comprehensive School Reform (CSR), schools are increasingly being asked to use data to plan for improvement and to fortify instruction and professional development…
Scalable Effective Approaches for Quadratic Assignment Problems Based on Conic Optimization and Applications

DTIC Science & Technology

2012-02-09

1nclud1ng suggestions for reduc1ng the burden. to the Department of Defense. ExecutiVe Serv1ce D>rectorate (0704-0188) Respondents should be aware...benchmark problem we contacted Bertrand LeCun who in their poject CHOC from 2005-2008 had applied their parallel B&B framework BOB++ to the RLT1
Advances in molecular quantum chemistry contained in the Q-Chem 4 program package

NASA Astrophysics Data System (ADS)

Shao, Yihan; Gan, Zhengting; Epifanovsky, Evgeny; Gilbert, Andrew T. B.; Wormit, Michael; Kussmann, Joerg; Lange, Adrian W.; Behn, Andrew; Deng, Jia; Feng, Xintian; Ghosh, Debashree; Goldey, Matthew; Horn, Paul R.; Jacobson, Leif D.; Kaliman, Ilya; Khaliullin, Rustam Z.; Kuś, Tomasz; Landau, Arie; Liu, Jie; Proynov, Emil I.; Rhee, Young Min; Richard, Ryan M.; Rohrdanz, Mary A.; Steele, Ryan P.; Sundstrom, Eric J.; Woodcock, H. Lee, III; Zimmerman, Paul M.; Zuev, Dmitry; Albrecht, Ben; Alguire, Ethan; Austin, Brian; Beran, Gregory J. O.; Bernard, Yves A.; Berquist, Eric; Brandhorst, Kai; Bravaya, Ksenia B.; Brown, Shawn T.; Casanova, David; Chang, Chun-Min; Chen, Yunqing; Chien, Siu Hung; Closser, Kristina D.; Crittenden, Deborah L.; Diedenhofen, Michael; DiStasio, Robert A., Jr.; Do, Hainam; Dutoi, Anthony D.; Edgar, Richard G.; Fatehi, Shervin; Fusti-Molnar, Laszlo; Ghysels, An; Golubeva-Zadorozhnaya, Anna; Gomes, Joseph; Hanson-Heine, Magnus W. D.; Harbach, Philipp H. P.; Hauser, Andreas W.; Hohenstein, Edward G.; Holden, Zachary C.; Jagau, Thomas-C.; Ji, Hyunjun; Kaduk, Benjamin; Khistyaev, Kirill; Kim, Jaehoon; Kim, Jihan; King, Rollin A.; Klunzinger, Phil; Kosenkov, Dmytro; Kowalczyk, Tim; Krauter, Caroline M.; Lao, Ka Un; Laurent, Adèle D.; Lawler, Keith V.; Levchenko, Sergey V.; Lin, Ching Yeh; Liu, Fenglai; Livshits, Ester; Lochan, Rohini C.; Luenser, Arne; Manohar, Prashant; Manzer, Samuel F.; Mao, Shan-Ping; Mardirossian, Narbe; Marenich, Aleksandr V.; Maurer, Simon A.; Mayhall, Nicholas J.; Neuscamman, Eric; Oana, C. Melania; Olivares-Amaya, Roberto; O'Neill, Darragh P.; Parkhill, John A.; Perrine, Trilisa M.; Peverati, Roberto; Prociuk, Alexander; Rehn, Dirk R.; Rosta, Edina; Russ, Nicholas J.; Sharada, Shaama M.; Sharma, Sandeep; Small, David W.; Sodt, Alexander; Stein, Tamar; Stück, David; Su, Yu-Chuan; Thom, Alex J. W.; Tsuchimochi, Takashi; Vanovschi, Vitalii; Vogt, Leslie; Vydrov, Oleg; Wang, Tao; Watson, Mark A.; Wenzel, Jan; White, Alec; Williams, Christopher F.; Yang, Jun; Yeganeh, Sina; Yost, Shane R.; You, Zhi-Qiang; Zhang, Igor Ying; Zhang, Xing; Zhao, Yan; Brooks, Bernard R.; Chan, Garnet K. L.; Chipman, Daniel M.; Cramer, Christopher J.; Goddard, William A., III; Gordon, Mark S.; Hehre, Warren J.; Klamt, Andreas; Schaefer, Henry F., III; Schmidt, Michael W.; Sherrill, C. David; Truhlar, Donald G.; Warshel, Arieh; Xu, Xin; Aspuru-Guzik, Alán; Baer, Roi; Bell, Alexis T.; Besley, Nicholas A.; Chai, Jeng-Da; Dreuw, Andreas; Dunietz, Barry D.; Furlani, Thomas R.; Gwaltney, Steven R.; Hsu, Chao-Ping; Jung, Yousung; Kong, Jing; Lambrecht, Daniel S.; Liang, WanZhen; Ochsenfeld, Christian; Rassolov, Vitaly A.; Slipchenko, Lyudmila V.; Subotnik, Joseph E.; Van Voorhis, Troy; Herbert, John M.; Krylov, Anna I.; Gill, Peter M. W.; Head-Gordon, Martin

2015-01-01

A summary of the technical advances that are incorporated in the fourth major release of the Q-Chem quantum chemistry program is provided, covering approximately the last seven years. These include developments in density functional theory methods and algorithms, nuclear magnetic resonance (NMR) property evaluation, coupled cluster and perturbation theories, methods for electronically excited and open-shell species, tools for treating extended environments, algorithms for walking on potential surfaces, analysis tools, energy and electron transfer modelling, parallel computing capabilities, and graphical user interfaces. In addition, a selection of example case studies that illustrate these capabilities is given. These include extensive benchmarks of the comparative accuracy of modern density functionals for bonded and non-bonded interactions, tests of attenuated second order Møller-Plesset (MP2) methods for intermolecular interactions, a variety of parallel performance benchmarks, and tests of the accuracy of implicit solvation models. Some specific chemical examples include calculations on the strongly correlated Cr2 dimer, exploring zeolite-catalysed ethane dehydrogenation, energy decomposition analysis of a charged ter-molecular complex arising from glycerol photoionisation, and natural transition orbitals for a Frenkel exciton state in a nine-unit model of a self-assembling nanotube.
Block-Parallel Data Analysis with DIY2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morozov, Dmitriy; Peterka, Tom

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
Processing large remote sensing image data sets on Beowulf clusters

USGS Publications Warehouse

Steinwand, Daniel R.; Maddox, Brian; Beckmann, Tim; Schmidt, Gail

2003-01-01

High-performance computing is often concerned with the speed at which floating- point calculations can be performed. The architectures of many parallel computers and/or their network topologies are based on these investigations. Often, benchmarks resulting from these investigations are compiled with little regard to how a large dataset would move about in these systems. This part of the Beowulf study addresses that concern by looking at specific applications software and system-level modifications. Applications include an implementation of a smoothing filter for time-series data, a parallel implementation of the decision tree algorithm used in the Landcover Characterization project, a parallel Kriging algorithm used to fit point data collected in the field on invasive species to a regular grid, and modifications to the Beowulf project's resampling algorithm to handle larger, higher resolution datasets at a national scale. Systems-level investigations include a feasibility study on Flat Neighborhood Networks and modifications of that concept with Parallel File Systems.
A Systems Approach to Scalable Transportation Network Modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perumalla, Kalyan S

2006-01-01

Emerging needs in transportation network modeling and simulation are raising new challenges with respect to scal-ability of network size and vehicular traffic intensity, speed of simulation for simulation-based optimization, and fidel-ity of vehicular behavior for accurate capture of event phe-nomena. Parallel execution is warranted to sustain the re-quired detail, size and speed. However, few parallel simulators exist for such applications, partly due to the challenges underlying their development. Moreover, many simulators are based on time-stepped models, which can be computationally inefficient for the purposes of modeling evacuation traffic. Here an approach is presented to de-signing a simulator with memory andmore » speed efficiency as the goals from the outset, and, specifically, scalability via parallel execution. The design makes use of discrete event modeling techniques as well as parallel simulation meth-ods. Our simulator, called SCATTER, is being developed, incorporating such design considerations. Preliminary per-formance results are presented on benchmark road net-works, showing scalability to one million vehicles simu-lated on one processor.« less
Sequential Feedback Scheme Outperforms the Parallel Scheme for Hamiltonian Parameter Estimation.

PubMed

Yuan, Haidong

2016-10-14

Measurement and estimation of parameters are essential for science and engineering, where the main quest is to find the highest achievable precision with the given resources and design schemes to attain it. Two schemes, the sequential feedback scheme and the parallel scheme, are usually studied in the quantum parameter estimation. While the sequential feedback scheme represents the most general scheme, it remains unknown whether it can outperform the parallel scheme for any quantum estimation tasks. In this Letter, we show that the sequential feedback scheme has a threefold improvement over the parallel scheme for Hamiltonian parameter estimations on two-dimensional systems, and an order of O(d+1) improvement for Hamiltonian parameter estimation on d-dimensional systems. We also show that, contrary to the conventional belief, it is possible to simultaneously achieve the highest precision for estimating all three components of a magnetic field, which sets a benchmark on the local precision limit for the estimation of a magnetic field.
Scaling NASA Applications to 1024 CPUs on Origin 3K

NASA Technical Reports Server (NTRS)

Taft, Jim

2002-01-01

The long and highly successful joint SGI-NASA research effort in ever larger SSI systems was to a large degree the result of the successful development of the MLP scalable parallel programming paradigm developed at ARC: 1) MLP scaling in real production codes justified ever larger systems at NAS; 2) MLP scaling on 256p Origin 2000 gave SGl impetus to productize 256p; 3) MLP scaling on 512 gave SGI courage to build 1024p O3K; and 4) History of MLP success resulted in IBM Star Cluster based MLP effort.
Parallelized multi–graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy

PubMed Central

Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.

2014-01-01

Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Parallelized multi-graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy.

PubMed

Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P

2014-07-01

Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
Benchmarking Teacher Education: A Comparative Assessment of the Top Ten Teacher-Producing Universities' Contributions to the Teacher Workforce

ERIC Educational Resources Information Center

Lin, Zeng; Gardner, Dianne

2006-01-01

The purpose of this study is to demonstrate the usefulness of the Schools and Staffing Survey (SASS), for the comparative analysis of alumni teachers. This article shows how SASS can be used as an evaluative tool by any institution that wants to appraise its alumni in comparison to those of its parallel institutions for the purposes of…
Mean Length of Utterance in Children with Specific Language Impairment and in Younger Control Children Shows Concurrent Validity and Stable and Parallel Growth Trajectories

ERIC Educational Resources Information Center

Rice, Mabel L.; Redmond, Sean M.; Hoffman, Lesa

2006-01-01

Purpose: Although mean length of utterance (MLU) is a useful benchmark in studies of children with specific language impairment (SLI), some empirical and interpretive issues are unresolved. The authors report on 2 studies examining, respectively, the concurrent validity and temporal stability of MLU equivalency between children with SLI and…
ViSAPy: a Python tool for biophysics-based generation of virtual spiking activity for evaluation of spike-sorting algorithms.

PubMed

Hagen, Espen; Ness, Torbjørn V; Khosrowshahi, Amir; Sørensen, Christina; Fyhn, Marianne; Hafting, Torkel; Franke, Felix; Einevoll, Gaute T

2015-04-30

New, silicon-based multielectrodes comprising hundreds or more electrode contacts offer the possibility to record spike trains from thousands of neurons simultaneously. This potential cannot be realized unless accurate, reliable automated methods for spike sorting are developed, in turn requiring benchmarking data sets with known ground-truth spike times. We here present a general simulation tool for computing benchmarking data for evaluation of spike-sorting algorithms entitled ViSAPy (Virtual Spiking Activity in Python). The tool is based on a well-established biophysical forward-modeling scheme and is implemented as a Python package built on top of the neuronal simulator NEURON and the Python tool LFPy. ViSAPy allows for arbitrary combinations of multicompartmental neuron models and geometries of recording multielectrodes. Three example benchmarking data sets are generated, i.e., tetrode and polytrode data mimicking in vivo cortical recordings and microelectrode array (MEA) recordings of in vitro activity in salamander retinas. The synthesized example benchmarking data mimics salient features of typical experimental recordings, for example, spike waveforms depending on interspike interval. ViSAPy goes beyond existing methods as it includes biologically realistic model noise, synaptic activation by recurrent spiking networks, finite-sized electrode contacts, and allows for inhomogeneous electrical conductivities. ViSAPy is optimized to allow for generation of long time series of benchmarking data, spanning minutes of biological time, by parallel execution on multi-core computers. ViSAPy is an open-ended tool as it can be generalized to produce benchmarking data or arbitrary recording-electrode geometries and with various levels of complexity. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
ELAPSE - NASA AMES LISP AND ADA BENCHMARK SUITE: EFFICIENCY OF LISP AND ADA PROCESSING - A SYSTEM EVALUATION

NASA Technical Reports Server (NTRS)

Davis, G. J.

1994-01-01

One area of research of the Information Sciences Division at NASA Ames Research Center is devoted to the analysis and enhancement of processors and advanced computer architectures, specifically in support of automation and robotic systems. To compare systems' abilities to efficiently process Lisp and Ada, scientists at Ames Research Center have developed a suite of non-parallel benchmarks called ELAPSE. The benchmark suite was designed to test a single computer's efficiency as well as alternate machine comparisons on Lisp, and/or Ada languages. ELAPSE tests the efficiency with which a machine can execute the various routines in each environment. The sample routines are based on numeric and symbolic manipulations and include two-dimensional fast Fourier transformations, Cholesky decomposition and substitution, Gaussian elimination, high-level data processing, and symbol-list references. Also included is a routine based on a Bayesian classification program sorting data into optimized groups. The ELAPSE benchmarks are available for any computer with a validated Ada compiler and/or Common Lisp system. Of the 18 routines that comprise ELAPSE, provided within this package are 14 developed or translated at Ames. The others are readily available through literature. The benchmark that requires the most memory is CHOLESKY.ADA. Under VAX/VMS, CHOLESKY.ADA requires 760K of main memory. ELAPSE is available on either two 5.25 inch 360K MS-DOS format diskettes (standard distribution) or a 9-track 1600 BPI ASCII CARD IMAGE format magnetic tape. The contents of the diskettes are compressed using the PKWARE archiving tools. The utility to unarchive the files, PKUNZIP.EXE, is included. The ELAPSE benchmarks were written in 1990. VAX and VMS are trademarks of Digital Equipment Corporation. MS-DOS is a registered trademark of Microsoft Corporation.
PFLOTRAN Verification: Development of a Testing Suite to Ensure Software Quality

NASA Astrophysics Data System (ADS)

Hammond, G. E.; Frederick, J. M.

2016-12-01

In scientific computing, code verification ensures the reliability and numerical accuracy of a model simulation by comparing the simulation results to experimental data or known analytical solutions. The model is typically defined by a set of partial differential equations with initial and boundary conditions, and verification ensures whether the mathematical model is solved correctly by the software. Code verification is especially important if the software is used to model high-consequence systems which cannot be physically tested in a fully representative environment [Oberkampf and Trucano (2007)]. Justified confidence in a particular computational tool requires clarity in the exercised physics and transparency in its verification process with proper documentation. We present a quality assurance (QA) testing suite developed by Sandia National Laboratories that performs code verification for PFLOTRAN, an open source, massively-parallel subsurface simulator. PFLOTRAN solves systems of generally nonlinear partial differential equations describing multiphase, multicomponent and multiscale reactive flow and transport processes in porous media. PFLOTRAN's QA test suite compares the numerical solutions of benchmark problems in heat and mass transport against known, closed-form, analytical solutions, including documentation of the exercised physical process models implemented in each PFLOTRAN benchmark simulation. The QA test suite development strives to follow the recommendations given by Oberkampf and Trucano (2007), which describes four essential elements in high-quality verification benchmark construction: (1) conceptual description, (2) mathematical description, (3) accuracy assessment, and (4) additional documentation and user information. Several QA tests within the suite will be presented, including details of the benchmark problems and their closed-form analytical solutions, implementation of benchmark problems in PFLOTRAN simulations, and the criteria used to assess PFLOTRAN's performance in the code verification procedure. References Oberkampf, W. L., and T. G. Trucano (2007), Verification and Validation Benchmarks, SAND2007-0853, 67 pgs., Sandia National Laboratories, Albuquerque, NM.
Benchmarking hardware architecture candidates for the NFIRAOS real-time controller

NASA Astrophysics Data System (ADS)

Smith, Malcolm; Kerley, Dan; Herriot, Glen; Véran, Jean-Pierre

2014-07-01

As a part of the trade study for the Narrow Field Infrared Adaptive Optics System, the adaptive optics system for the Thirty Meter Telescope, we investigated the feasibility of performing real-time control computation using a Linux operating system and Intel Xeon E5 CPUs. We also investigated a Xeon Phi based architecture which allows higher levels of parallelism. This paper summarizes both the CPU based real-time controller architecture and the Xeon Phi based RTC. The Intel Xeon E5 CPU solution meets the requirements and performs the computation for one AO cycle in an average of 767 microseconds. The Xeon Phi solution did not meet the 1200 microsecond time requirement and also suffered from unpredictable execution times. More detailed benchmark results are reported for both architectures.
Characterizing Task-Based OpenMP Programs

PubMed Central

Muddukrishna, Ananya; Jonsson, Peter A.; Brorsson, Mats

2015-01-01

Programmers struggle to understand performance of task-based OpenMP programs since profiling tools only report thread-based performance. Performance tuning also requires task-based performance in order to balance per-task memory hierarchy utilization against exposed task parallelism. We provide a cost-effective method to extract detailed task-based performance information from OpenMP programs. We demonstrate the utility of our method by quickly diagnosing performance problems and characterizing exposed task parallelism and per-task instruction profiles of benchmarks in the widely-used Barcelona OpenMP Tasks Suite. Programmers can tune performance faster and understand performance tradeoffs more effectively than existing tools by using our method to characterize task-based performance. PMID:25860023
Synergia: an accelerator modeling tool with 3-D space charge

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amundson, James F.; Spentzouris, P.; /Fermilab

2004-07-01

High precision modeling of space-charge effects, together with accurate treatment of single-particle dynamics, is essential for designing future accelerators as well as optimizing the performance of existing machines. We describe Synergia, a high-fidelity parallel beam dynamics simulation package with fully three dimensional space-charge capabilities and a higher order optics implementation. We describe the computational techniques, the advanced human interface, and the parallel performance obtained using large numbers of macroparticles. We also perform code benchmarks comparing to semi-analytic results and other codes. Finally, we present initial results on particle tune spread, beam halo creation, and emittance growth in the Fermilab boostermore » accelerator.« less
The development of a revised version of multi-center molecular Ornstein-Zernike equation

NASA Astrophysics Data System (ADS)

Kido, Kentaro; Yokogawa, Daisuke; Sato, Hirofumi

2012-04-01

Ornstein-Zernike (OZ)-type theory is a powerful tool to obtain 3-dimensional solvent distribution around solute molecule. Recently, we proposed multi-center molecular OZ method, which is suitable for parallel computing of 3D solvation structure. The distribution function in this method consists of two components, namely reference and residue parts. Several types of the function were examined as the reference part to investigate the numerical robustness of the method. As the benchmark, the method is applied to water, benzene in aqueous solution and single-walled carbon nanotube in chloroform solution. The results indicate that fully-parallelization is achieved by utilizing the newly proposed reference functions.

Analytical theory of coherent synchrotron radiation wakefield of short bunches shielded by conducting parallel plates

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stupakov, Gennady; Zhou, Demin

2016-04-21

We develop a general model of coherent synchrotron radiation (CSR) impedance with shielding provided by two parallel conducting plates. This model allows us to easily reproduce all previously known analytical CSR wakes and to expand the analysis to situations not explored before. It reduces calculations of the impedance to taking integrals along the trajectory of the beam. New analytical results are derived for the radiation impedance with shielding for the following orbits: a kink, a bending magnet, a wiggler of finite length, and an infinitely long wiggler. Furthermore, all our formulas are benchmarked against numerical simulations with the CSRZ computermore » code.« less
A GaAs vector processor based on parallel RISC microprocessors

NASA Astrophysics Data System (ADS)

Misko, Tim A.; Rasset, Terry L.

A vector processor architecture based on the development of a 32-bit microprocessor using gallium arsenide (GaAs) technology has been developed. The McDonnell Douglas vector processor (MVP) will be fabricated completely from GaAs digital integrated circuits. The MVP architecture includes a vector memory of 1 megabyte, a parallel bus architecture with eight processing elements connected in parallel, and a control processor. The processing elements consist of a reduced instruction set CPU (RISC) with four floating-point coprocessor units and necessary memory interface functions. This architecture has been simulated for several benchmark programs including complex fast Fourier transform (FFT), complex inner product, trigonometric functions, and sort-merge routine. The results of this study indicate that the MVP can process a 1024-point complex FFT at a speed of 112 microsec (389 megaflops) while consuming approximately 618 W of power in a volume of approximately 0.1 ft-cubed.
Computational Performance of a Parallelized Three-Dimensional High-Order Spectral Element Toolbox

NASA Astrophysics Data System (ADS)

Bosshard, Christoph; Bouffanais, Roland; Clémençon, Christian; Deville, Michel O.; Fiétier, Nicolas; Gruber, Ralf; Kehtari, Sohrab; Keller, Vincent; Latt, Jonas

In this paper, a comprehensive performance review of an MPI-based high-order three-dimensional spectral element method C++ toolbox is presented. The focus is put on the performance evaluation of several aspects with a particular emphasis on the parallel efficiency. The performance evaluation is analyzed with help of a time prediction model based on a parameterization of the application and the hardware resources. A tailor-made CFD computation benchmark case is introduced and used to carry out this review, stressing the particular interest for clusters with up to 8192 cores. Some problems in the parallel implementation have been detected and corrected. The theoretical complexities with respect to the number of elements, to the polynomial degree, and to communication needs are correctly reproduced. It is concluded that this type of code has a nearly perfect speed up on machines with thousands of cores, and is ready to make the step to next-generation petaflop machines.
Experimental Mapping and Benchmarking of Magnetic Field Codes on the LHD Ion Accelerator

NASA Astrophysics Data System (ADS)

Chitarin, G.; Agostinetti, P.; Gallo, A.; Marconato, N.; Nakano, H.; Serianni, G.; Takeiri, Y.; Tsumori, K.

2011-09-01

For the validation of the numerical models used for the design of the Neutral Beam Test Facility for ITER in Padua [1], an experimental benchmark against a full-size device has been sought. The LHD BL2 injector [2] has been chosen as a first benchmark, because the BL2 Negative Ion Source and Beam Accelerator are geometrically similar to SPIDER, even though BL2 does not include current bars and ferromagnetic materials. A comprehensive 3D magnetic field model of the LHD BL2 device has been developed based on the same assumptions used for SPIDER. In parallel, a detailed experimental magnetic map of the BL2 device has been obtained using a suitably designed 3D adjustable structure for the fine positioning of the magnetic sensors inside 27 of the 770 beamlet apertures. The calculated values have been compared to the experimental data. The work has confirmed the quality of the numerical model, and has also provided useful information on the magnetic non-uniformities due to the edge effects and to the tolerance on permanent magnet remanence.
Experimental Mapping and Benchmarking of Magnetic Field Codes on the LHD Ion Accelerator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chitarin, G.; University of Padova, Dept. of Management and Engineering, strad. S. Nicola, 36100 Vicenza; Agostinetti, P.

2011-09-26

For the validation of the numerical models used for the design of the Neutral Beam Test Facility for ITER in Padua [1], an experimental benchmark against a full-size device has been sought. The LHD BL2 injector [2] has been chosen as a first benchmark, because the BL2 Negative Ion Source and Beam Accelerator are geometrically similar to SPIDER, even though BL2 does not include current bars and ferromagnetic materials. A comprehensive 3D magnetic field model of the LHD BL2 device has been developed based on the same assumptions used for SPIDER. In parallel, a detailed experimental magnetic map of themore » BL2 device has been obtained using a suitably designed 3D adjustable structure for the fine positioning of the magnetic sensors inside 27 of the 770 beamlet apertures. The calculated values have been compared to the experimental data. The work has confirmed the quality of the numerical model, and has also provided useful information on the magnetic non-uniformities due to the edge effects and to the tolerance on permanent magnet remanence.« less
The International Conference on Vector and Parallel Computing (2nd)

DTIC Science & Technology

1989-01-17

Computation of the SVD of Bidiagonal Matrices" ...................................... 11 " Lattice QCD -As a Large Scale Scientific Computation...vectorizcd for the IBM 3090 Vector Facility. In addition, elapsed times " Lattice QCD -As a Large Scale Scientific have been reduced by using 3090...benchmarked Lattice QCD on a large number ofcompu- come from the wavefront solver routine. This was exten- ters: CrayX-MP and Cray 2 (vector
High-Order Methods for Computational Physics

DTIC Science & Technology

1999-03-01

computation is running in 278 Ronald D. Henderson parallel. Instead we use the concept of a voxel database (VDB) of geometric positions in the mesh [85...processor 0 Fig. 4.19. Connectivity and communications axe established by building a voxel database (VDB) of positions. A VDB maps each position to a...studies such as the highly accurate stability computations considered help expand the database for this benchmark problem. The two-dimensional linear
Hybrid parallel code acceleration methods in full-core reactor physics calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Courau, T.; Plagne, L.; Ponicot, A.

2012-07-01

When dealing with nuclear reactor calculation schemes, the need for three dimensional (3D) transport-based reference solutions is essential for both validation and optimization purposes. Considering a benchmark problem, this work investigates the potential of discrete ordinates (Sn) transport methods applied to 3D pressurized water reactor (PWR) full-core calculations. First, the benchmark problem is described. It involves a pin-by-pin description of a 3D PWR first core, and uses a 8-group cross-section library prepared with the DRAGON cell code. Then, a convergence analysis is performed using the PENTRAN parallel Sn Cartesian code. It discusses the spatial refinement and the associated angular quadraturemore » required to properly describe the problem physics. It also shows that initializing the Sn solution with the EDF SPN solver COCAGNE reduces the number of iterations required to converge by nearly a factor of 6. Using a best estimate model, PENTRAN results are then compared to multigroup Monte Carlo results obtained with the MCNP5 code. Good consistency is observed between the two methods (Sn and Monte Carlo), with discrepancies that are less than 25 pcm for the k{sub eff}, and less than 2.1% and 1.6% for the flux at the pin-cell level and for the pin-power distribution, respectively. (authors)« less
Scalable Metropolis Monte Carlo for simulation of hard shapes

NASA Astrophysics Data System (ADS)

Anderson, Joshua A.; Eric Irrgang, M.; Glotzer, Sharon C.

2016-07-01

We design and implement a scalable hard particle Monte Carlo simulation toolkit (HPMC), and release it open source as part of HOOMD-blue. HPMC runs in parallel on many CPUs and many GPUs using domain decomposition. We employ BVH trees instead of cell lists on the CPU for fast performance, especially with large particle size disparity, and optimize inner loops with SIMD vector intrinsics on the CPU. Our GPU kernel proposes many trial moves in parallel on a checkerboard and uses a block-level queue to redistribute work among threads and avoid divergence. HPMC supports a wide variety of shape classes, including spheres/disks, unions of spheres, convex polygons, convex spheropolygons, concave polygons, ellipsoids/ellipses, convex polyhedra, convex spheropolyhedra, spheres cut by planes, and concave polyhedra. NVT and NPT ensembles can be run in 2D or 3D triclinic boxes. Additional integration schemes permit Frenkel-Ladd free energy computations and implicit depletant simulations. In a benchmark system of a fluid of 4096 pentagons, HPMC performs 10 million sweeps in 10 min on 96 CPU cores on XSEDE Comet. The same simulation would take 7.6 h in serial. HPMC also scales to large system sizes, and the same benchmark with 16.8 million particles runs in 1.4 h on 2048 GPUs on OLCF Titan.
Multiprocessing the Sieve of Eratosthenes

NASA Technical Reports Server (NTRS)

Bokhari, S.

1986-01-01

The Sieve of Eratosthenes for finding prime numbers in recent years has seen much use as a benchmark algorithm for serial computers while its intrinsically parallel nature has gone largely unnoticed. The implementation of a parallel version of this algorithm for a real parallel computer, the Flex/32, is described and its performance discussed. It is shown that the algorithm is sensitive to several fundamental performance parameters of parallel machines, such as spawning time, signaling time, memory access, and overhead of process switching. Because of the nature of the algorithm, it is impossible to get any speedup beyond 4 or 5 processors unless some form of dynamic load balancing is employed. We describe the performance of our algorithm with and without load balancing and compare it with theoretical lower bounds and simulated results. It is straightforward to understand this algorithm and to check the final results. However, its efficient implementation on a real parallel machine requires thoughtful design, especially if dynamic load balancing is desired. The fundamental operations required by the algorithm are very simple: this means that the slightest overhead appears prominently in performance data. The Sieve thus serves not only as a very severe test of the capabilities of a parallel processor but is also an interesting challenge for the programmer.
Accelerating the Gillespie Exact Stochastic Simulation Algorithm using hybrid parallel execution on graphics processing units.

PubMed

Komarov, Ivan; D'Souza, Roshan M

2012-01-01

The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.
Characterization of robotics parallel algorithms and mapping onto a reconfigurable SIMD machine

NASA Technical Reports Server (NTRS)

Lee, C. S. G.; Lin, C. T.

1989-01-01

The kinematics, dynamics, Jacobian, and their corresponding inverse computations are six essential problems in the control of robot manipulators. Efficient parallel algorithms for these computations are discussed and analyzed. Their characteristics are identified and a scheme on the mapping of these algorithms to a reconfigurable parallel architecture is presented. Based on the characteristics including type of parallelism, degree of parallelism, uniformity of the operations, fundamental operations, data dependencies, and communication requirement, it is shown that most of the algorithms for robotic computations possess highly regular properties and some common structures, especially the linear recursive structure. Moreover, they are well-suited to be implemented on a single-instruction-stream multiple-data-stream (SIMD) computer with reconfigurable interconnection network. The model of a reconfigurable dual network SIMD machine with internal direct feedback is introduced. A systematic procedure internal direct feedback is introduced. A systematic procedure to map these computations to the proposed machine is presented. A new scheduling problem for SIMD machines is investigated and a heuristic algorithm, called neighborhood scheduling, that reorders the processing sequence of subtasks to reduce the communication time is described. Mapping results of a benchmark algorithm are illustrated and discussed.
Parallel computation with molecular-motor-propelled agents in nanofabricated networks.

PubMed

Nicolau, Dan V; Lard, Mercy; Korten, Till; van Delft, Falco C M J M; Persson, Malin; Bengtsson, Elina; Månsson, Alf; Diez, Stefan; Linke, Heiner; Nicolau, Dan V

2016-03-08

The combinatorial nature of many important mathematical problems, including nondeterministic-polynomial-time (NP)-complete problems, places a severe limitation on the problem size that can be solved with conventional, sequentially operating electronic computers. There have been significant efforts in conceiving parallel-computation approaches in the past, for example: DNA computation, quantum computation, and microfluidics-based computation. However, these approaches have not proven, so far, to be scalable and practical from a fabrication and operational perspective. Here, we report the foundations of an alternative parallel-computation system in which a given combinatorial problem is encoded into a graphical, modular network that is embedded in a nanofabricated planar device. Exploring the network in a parallel fashion using a large number of independent, molecular-motor-propelled agents then solves the mathematical problem. This approach uses orders of magnitude less energy than conventional computers, thus addressing issues related to power consumption and heat dissipation. We provide a proof-of-concept demonstration of such a device by solving, in a parallel fashion, the small instance {2, 5, 9} of the subset sum problem, which is a benchmark NP-complete problem. Finally, we discuss the technical advances necessary to make our system scalable with presently available technology.
Simulating Hydrologic Flow and Reactive Transport with PFLOTRAN and PETSc on Emerging Fine-Grained Parallel Computer Architectures

NASA Astrophysics Data System (ADS)

Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.

2017-12-01

As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.
Serial vs. parallel models of attention in visual search: accounting for benchmark RT-distributions.

PubMed

Moran, Rani; Zehetleitner, Michael; Liesefeld, Heinrich René; Müller, Hermann J; Usher, Marius

2016-10-01

Visual search is central to the investigation of selective visual attention. Classical theories propose that items are identified by serially deploying focal attention to their locations. While this accounts for set-size effects over a continuum of task difficulties, it has been suggested that parallel models can account for such effects equally well. We compared the serial Competitive Guided Search model with a parallel model in their ability to account for RT distributions and error rates from a large visual search data-set featuring three classical search tasks: 1) a spatial configuration search (2 vs. 5); 2) a feature-conjunction search; and 3) a unique feature search (Wolfe, Palmer & Horowitz Vision Research, 50(14), 1304-1311, 2010). In the parallel model, each item is represented by a diffusion to two boundaries (target-present/absent); the search corresponds to a parallel race between these diffusors. The parallel model was highly flexible in that it allowed both for a parametric range of capacity-limitation and for set-size adjustments of identification boundaries. Furthermore, a quit unit allowed for a continuum of search-quitting policies when the target is not found, with "single-item inspection" and exhaustive searches comprising its extremes. The serial model was found to be superior to the parallel model, even before penalizing the parallel model for its increased complexity. We discuss the implications of the results and the need for future studies to resolve the debate.
Analysis of dosimetry from the H.B. Robinson unit 2 pressure vessel benchmark using RAPTOR-M3G and ALPAN

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fischer, G.A.

2011-07-01

Document available in abstract form only, full text of document follows: The dosimetry from the H. B. Robinson Unit 2 Pressure Vessel Benchmark is analyzed with a suite of Westinghouse-developed codes and data libraries. The radiation transport from the reactor core to the surveillance capsule and ex-vessel locations is performed by RAPTOR-M3G, a parallel deterministic radiation transport code that calculates high-resolution neutron flux information in three dimensions. The cross-section library used in this analysis is the ALPAN library, an Evaluated Nuclear Data File (ENDF)/B-VII.0-based library designed for reactor dosimetry and fluence analysis applications. Dosimetry is evaluated with the industry-standard SNLRMLmore » reactor dosimetry cross-section data library. (authors)« less
Progress in Unsteady Turbopump Flow Simulations Using Overset Grid Systems

NASA Technical Reports Server (NTRS)

Kiris, Cetin C.; Chan, William; Kwak, Dochan

2002-01-01

This viewgraph presentation provides information on unsteady flow simulations for the Second Generation RLV (Reusable Launch Vehicle) baseline turbopump. Three impeller rotations were simulated by using a 34.3 million grid points model. MPI/OpenMP hybrid parallelism and MLP shared memory parallelism has been implemented and benchmarked in INS3D, an incompressible Navier-Stokes solver. For RLV turbopump simulations a speed up of more than 30 times has been obtained. Moving boundary capability is obtained by using the DCF module. Scripting capability from CAD geometry to solution is developed. Unsteady flow simulations for advanced consortium impeller/diffuser by using a 39 million grid points model are currently underway. 1.2 impeller rotations are completed. The fluid/structure coupling is initiated.
Support of Multidimensional Parallelism in the OpenMP Programming Model

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele

2003-01-01

OpenMP is the current standard for shared-memory programming. While providing ease of parallel programming, the OpenMP programming model also has limitations which often effect the scalability of applications. Examples for these limitations are work distribution and point-to-point synchronization among threads. We propose extensions to the OpenMP programming model which allow the user to easily distribute the work in multiple dimensions and synchronize the workflow among the threads. The proposed extensions include four new constructs and the associated runtime library. They do not require changes to the source code and can be implemented based on the existing OpenMP standard. We illustrate the concept in a prototype translator and test with benchmark codes and a cloud modeling code.
Multi-Purpose, Application-Centric, Scalable I/O Proxy Application

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, M. C.

2015-06-15

MACSio is a Multi-purpose, Application-Centric, Scalable I/O proxy application. It is designed to support a number of goals with respect to parallel I/O performance testing and benchmarking including the ability to test and compare various I/O libraries and I/O paradigms, to predict scalable performance of real applications and to help identify where improvements in I/O performance can be made within the HPC I/O software stack.
PeakRanger: A cloud-enabled peak caller for ChIP-seq data

PubMed Central

2011-01-01

Background Chromatin immunoprecipitation (ChIP), coupled with massively parallel short-read sequencing (seq) is used to probe chromatin dynamics. Although there are many algorithms to call peaks from ChIP-seq datasets, most are tuned either to handle punctate sites, such as transcriptional factor binding sites, or broad regions, such as histone modification marks; few can do both. Other algorithms are limited in their configurability, performance on large data sets, and ability to distinguish closely-spaced peaks. Results In this paper, we introduce PeakRanger, a peak caller software package that works equally well on punctate and broad sites, can resolve closely-spaced peaks, has excellent performance, and is easily customized. In addition, PeakRanger can be run in a parallel cloud computing environment to obtain extremely high performance on very large data sets. We present a series of benchmarks to evaluate PeakRanger against 10 other peak callers, and demonstrate the performance of PeakRanger on both real and synthetic data sets. We also present real world usages of PeakRanger, including peak-calling in the modENCODE project. Conclusions Compared to other peak callers tested, PeakRanger offers improved resolution in distinguishing extremely closely-spaced peaks. PeakRanger has above-average spatial accuracy in terms of identifying the precise location of binding events. PeakRanger also has excellent sensitivity and specificity in all benchmarks evaluated. In addition, PeakRanger offers significant improvements in run time when running on a single processor system, and very marked improvements when allowed to take advantage of the MapReduce parallel environment offered by a cloud computing resource. PeakRanger can be downloaded at the official site of modENCODE project: http://www.modencode.org/software/ranger/ PMID:21554709

Analysis of Wake VAS Benefits Using ACES Build 3.2.1: VAMS Type 1 Assessment

NASA Technical Reports Server (NTRS)

Smith, Jeremy C.

2005-01-01

The FAA and NASA are currently engaged in a Wake Turbulence Research Program to revise wake turbulence separation standards, procedures, and criteria to increase airport capacity while maintaining or increasing safety. The research program is divided into three phases: Phase I near term procedural enhancements; Phase II wind dependent Wake Vortex Advisory System (WakeVAS) Concepts of Operations (ConOps); and Phase III farther term ConOps based on wake prediction and sensing. The Phase III Wake VAS ConOps is one element of the Virtual Airspace Modelling and Simulation (VAMS) program blended concepts for enhancing the total system wide capacity of the National Airspace System (NAS). This report contains a VAMS Program Type 1 (stand-alone) assessment of the expected capacity benefits of Wake VAS at the 35 FAA Benchmark Airports and determines the consequent reduction in delay using the Airspace Concepts Evaluation System (ACES) Build 3.2.1 simulator.
Visualization of Unsteady Computational Fluid Dynamics

NASA Technical Reports Server (NTRS)

Haimes, Robert

1997-01-01

The current compute environment that most researchers are using for the calculation of 3D unsteady Computational Fluid Dynamic (CFD) results is a super-computer class machine. The Massively Parallel Processors (MPP's) such as the 160 node IBM SP2 at NAS and clusters of workstations acting as a single MPP (like NAS's SGI Power-Challenge array and the J90 cluster) provide the required computation bandwidth for CFD calculations of transient problems. If we follow the traditional computational analysis steps for CFD (and we wish to construct an interactive visualizer) we need to be aware of the following: (1) Disk space requirements. A single snap-shot must contain at least the values (primitive variables) stored at the appropriate locations within the mesh. For most simple 3D Euler solvers that means 5 floating point words. Navier-Stokes solutions with turbulence models may contain 7 state-variables. (2) Disk speed vs. Computational speeds. The time required to read the complete solution of a saved time frame from disk is now longer than the compute time for a set number of iterations from an explicit solver. Depending, on the hardware and solver an iteration of an implicit code may also take less time than reading the solution from disk. If one examines the performance improvements in the last decade or two, it is easy to see that depending on disk performance (vs. CPU improvement) may not be the best method for enhancing interactivity. (3) Cluster and Parallel Machine I/O problems. Disk access time is much worse within current parallel machines and cluster of workstations that are acting in concert to solve a single problem. In this case we are not trying to read the volume of data, but are running the solver and the solver outputs the solution. These traditional network interfaces must be used for the file system. (4) Numerics of particle traces. Most visualization tools can work upon a single snap shot of the data but some visualization tools for transient problems require dealing with time.
[Benchmark experiment to verify radiation transport calculations for dosimetry in radiation therapy].

PubMed

Renner, Franziska

2016-09-01

Monte Carlo simulations are regarded as the most accurate method of solving complex problems in the field of dosimetry and radiation transport. In (external) radiation therapy they are increasingly used for the calculation of dose distributions during treatment planning. In comparison to other algorithms for the calculation of dose distributions, Monte Carlo methods have the capability of improving the accuracy of dose calculations - especially under complex circumstances (e.g. consideration of inhomogeneities). However, there is a lack of knowledge of how accurate the results of Monte Carlo calculations are on an absolute basis. A practical verification of the calculations can be performed by direct comparison with the results of a benchmark experiment. This work presents such a benchmark experiment and compares its results (with detailed consideration of measurement uncertainty) with the results of Monte Carlo calculations using the well-established Monte Carlo code EGSnrc. The experiment was designed to have parallels to external beam radiation therapy with respect to the type and energy of the radiation, the materials used and the kind of dose measurement. Because the properties of the beam have to be well known in order to compare the results of the experiment and the simulation on an absolute basis, the benchmark experiment was performed using the research electron accelerator of the Physikalisch-Technische Bundesanstalt (PTB), whose beam was accurately characterized in advance. The benchmark experiment and the corresponding Monte Carlo simulations were carried out for two different types of ionization chambers and the results were compared. Considering the uncertainty, which is about 0.7 % for the experimental values and about 1.0 % for the Monte Carlo simulation, the results of the simulation and the experiment coincide. Copyright © 2015. Published by Elsevier GmbH.
Terminal Area Procedures for Paired Runways

NASA Technical Reports Server (NTRS)

Lozito, Sandy

2011-01-01

Parallel Runway operations have been found to increase capacity within the National Airspace (NAS) however, poor visibility conditions reduce this capacity [1]. Much research has been conducted to examine the concepts and procedures related to parallel runways however, there has been no investigation of the procedures associated with the strategic and tactical pairing of aircraft for these operations. This study developed and examined the pilot and controller procedures and information requirements for creating aircraft pairs for parallel runway operations. The goal was to achieve aircraft pairing with a temporal separation of 15s(+/- 10s error) at a coupling point that is about 12 nmi from the runway threshold. Two variables were explored for the pilot participants: Two levels of flight deck automation (current-day flight deck automation, and a prototype future automation) as well as two flight deck displays that assisted in pilot conformance monitoring. The controllers were also provided with automation to help create and maintain aircraft pairs. Data showed that the operations in this study were found to be acceptable and safe. Workload when using the pairing procedures and tools was generally low for both controllers and pilots, and situation awareness (SA) was typically moderate to high. There were some differences based upon the display and automation conditions for the pilots. Future research should consider the refinement of the concepts and tools for pilot and controller displays and automation for parallel runway concepts.
A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG

NASA Astrophysics Data System (ADS)

Griffiths, M. K.; Fedun, V.; Erdélyi, R.

2015-03-01

Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1-3D MHD code capable of modelling magnetized and gravitationally stratified plasma. The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.
Parallel computing of a digital hologram and particle searching for microdigital-holographic particle-tracking velocimetry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Satake, Shin-ichi; Kanamori, Hiroyuki; Kunugi, Tomoaki

2007-02-01

We have developed a parallel algorithm for microdigital-holographic particle-tracking velocimetry. The algorithm is used in (1) numerical reconstruction of a particle image computer using a digital hologram, and (2) searching for particles. The numerical reconstruction from the digital hologram makes use of the Fresnel diffraction equation and the FFT (fast Fourier transform),whereas the particle search algorithm looks for local maximum graduation in a reconstruction field represented by a 3D matrix. To achieve high performance computing for both calculations (reconstruction and particle search), two memory partitions are allocated to the 3D matrix. In this matrix, the reconstruction part consists of horizontallymore » placed 2D memory partitions on the x-y plane for the FFT, whereas, the particle search part consists of vertically placed 2D memory partitions set along the z axes.Consequently, the scalability can be obtained for the proportion of processor elements,where the benchmarks are carried out for parallel computation by a SGI Altix machine.« less
Parallel replica dynamics method for bistable stochastic reaction networks: Simulation and sensitivity analysis

NASA Astrophysics Data System (ADS)

Wang, Ting; Plecháč, Petr

2017-12-01

Stochastic reaction networks that exhibit bistable behavior are common in systems biology, materials science, and catalysis. Sampling of stationary distributions is crucial for understanding and characterizing the long-time dynamics of bistable stochastic dynamical systems. However, simulations are often hindered by the insufficient sampling of rare transitions between the two metastable regions. In this paper, we apply the parallel replica method for a continuous time Markov chain in order to improve sampling of the stationary distribution in bistable stochastic reaction networks. The proposed method uses parallel computing to accelerate the sampling of rare transitions. Furthermore, it can be combined with the path-space information bounds for parametric sensitivity analysis. With the proposed methodology, we study three bistable biological networks: the Schlögl model, the genetic switch network, and the enzymatic futile cycle network. We demonstrate the algorithmic speedup achieved in these numerical benchmarks. More significant acceleration is expected when multi-core or graphics processing unit computer architectures and programming tools such as CUDA are employed.
INL Results for Phases I and III of the OECD/NEA MHTGR-350 Benchmark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gerhard Strydom; Javier Ortensi; Sonat Sen

2013-09-01

The Idaho National Laboratory (INL) Very High Temperature Reactor (VHTR) Technology Development Office (TDO) Methods Core Simulation group led the construction of the Organization for Economic Cooperation and Development (OECD) Modular High Temperature Reactor (MHTGR) 350 MW benchmark for comparing and evaluating prismatic VHTR analysis codes. The benchmark is sponsored by the OECD's Nuclear Energy Agency (NEA), and the project will yield a set of reference steady-state, transient, and lattice depletion problems that can be used by the Department of Energy (DOE), the Nuclear Regulatory Commission (NRC), and vendors to assess their code suits. The Methods group is responsible formore » defining the benchmark specifications, leading the data collection and comparison activities, and chairing the annual technical workshops. This report summarizes the latest INL results for Phase I (steady state) and Phase III (lattice depletion) of the benchmark. The INSTANT, Pronghorn and RattleSnake codes were used for the standalone core neutronics modeling of Exercise 1, and the results obtained from these codes are compared in Section 4. Exercise 2 of Phase I requires the standalone steady-state thermal fluids modeling of the MHTGR-350 design, and the results for the systems code RELAP5-3D are discussed in Section 5. The coupled neutronics and thermal fluids steady-state solution for Exercise 3 are reported in Section 6, utilizing the newly developed Parallel and Highly Innovative Simulation for INL Code System (PHISICS)/RELAP5-3D code suit. Finally, the lattice depletion models and results obtained for Phase III are compared in Section 7. The MHTGR-350 benchmark proved to be a challenging simulation set of problems to model accurately, and even with the simplifications introduced in the benchmark specification this activity is an important step in the code-to-code verification of modern prismatic VHTR codes. A final OECD/NEA comparison report will compare the Phase I and III results of all other international participants in 2014, while the remaining Phase II transient case results will be reported in 2015.« less
Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance

PubMed Central

Rand, Hugh; Shumway, Martin; Trees, Eija K.; Simmons, Mustafa; Agarwala, Richa; Davis, Steven; Tillman, Glenn E.; Defibaugh-Chavez, Stephanie; Carleton, Heather A.; Klimke, William A.; Katz, Lee S.

2017-01-01

Background As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. Methods We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (reference-based SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and “known” phylogenetic trees in publicly-accessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Results Our “outbreak” benchmark datasets represent the four major foodborne bacterial pathogens (Listeria monocytogenes, Salmonella enterica, Escherichia coli, and Campylobacter jejuni) and one simulated dataset where the “known tree” can be accurately called the “true tree”. The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. Discussion These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross-institutional collaborations. Our work is part of a global effort to provide collaborative infrastructure for sequence data and analytic tools—we welcome additional benchmark datasets in our recommended format, and, if relevant, we will add these on our GitHub site. Together, these datasets, dataset format, and the underlying GitHub infrastructure present a recommended path for worldwide standardization of phylogenomic pipelines. PMID:29372115
Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance.

PubMed

Timme, Ruth E; Rand, Hugh; Shumway, Martin; Trees, Eija K; Simmons, Mustafa; Agarwala, Richa; Davis, Steven; Tillman, Glenn E; Defibaugh-Chavez, Stephanie; Carleton, Heather A; Klimke, William A; Katz, Lee S

2017-01-01

As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (reference-based SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and "known" phylogenetic trees in publicly-accessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Our "outbreak" benchmark datasets represent the four major foodborne bacterial pathogens ( Listeria monocytogenes , Salmonella enterica , Escherichia coli , and Campylobacter jejuni ) and one simulated dataset where the "known tree" can be accurately called the "true tree". The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross-institutional collaborations. Our work is part of a global effort to provide collaborative infrastructure for sequence data and analytic tools-we welcome additional benchmark datasets in our recommended format, and, if relevant, we will add these on our GitHub site. Together, these datasets, dataset format, and the underlying GitHub infrastructure present a recommended path for worldwide standardization of phylogenomic pipelines.
A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Chao; Mueller, Frank; Engelmann, Christian

2007-01-01

Checkpoint/restart (C/R) has become a requirement for long-running jobs in large-scale clusters due to a meantime- to-failure (MTTF) in the order of hours. After a failure, C/R mechanisms generally require a complete restart of an MPI job from the last checkpoint. A complete restart, however, is unnecessary since all but one node are typically still alive. Furthermore, a restart may result in lengthy job requeuing even though the original job had not exceeded its time quantum. In this paper, we overcome these shortcomings. Instead of job restart, we have developed a transparent mechanism for job pause within LAM/MPI+BLCR. This mechanismmore » allows live nodes to remain active and roll back to the last checkpoint while failed nodes are dynamically replaced by spares before resuming from the last checkpoint. Our methodology includes LAM/MPI enhancements in support of scalable group communicationwith fluctuating number of nodes, reuse of network connections, transparent coordinated checkpoint scheduling and a BLCR enhancement for job pause. Experiments in a cluster with the NAS Parallel Benchmark suite show that our overhead for job pause is comparable to that of a complete job restart. A minimal overhead of 5.6% is only incurred in case migration takes place while the regular checkpoint overhead remains unchanged. Yet, our approach alleviates the need to reboot the LAM run-time environment, which accounts for considerable overhead resulting in net savings of our scheme in the experiments. Our solution further provides full transparency and automation with the additional benefit of reusing existing resources. Executing continues after failures within the scheduled job, i.e., the application staging overhead is not incurred again in contrast to a restart. Our scheme offers additional potential for savings through incremental checkpointing and proactive diskless live migration, which we are currently working on.« less
Quasi-disjoint pentadiagonal matrix systems for the parallelization of compact finite-difference schemes and filters

NASA Astrophysics Data System (ADS)

Kim, Jae Wook

2013-05-01

This paper proposes a novel systematic approach for the parallelization of pentadiagonal compact finite-difference schemes and filters based on domain decomposition. The proposed approach allows a pentadiagonal banded matrix system to be split into quasi-disjoint subsystems by using a linear-algebraic transformation technique. As a result the inversion of pentadiagonal matrices can be implemented within each subdomain in an independent manner subject to a conventional halo-exchange process. The proposed matrix transformation leads to new subdomain boundary (SB) compact schemes and filters that require three halo terms to exchange with neighboring subdomains. The internode communication overhead in the present approach is equivalent to that of standard explicit schemes and filters based on seven-point discretization stencils. The new SB compact schemes and filters demand additional arithmetic operations compared to the original serial ones. However, it is shown that the additional cost becomes sufficiently low by choosing optimal sizes of their discretization stencils. Compared to earlier published results, the proposed SB compact schemes and filters successfully reduce parallelization artifacts arising from subdomain boundaries to a level sufficiently negligible for sophisticated aeroacoustic simulations without degrading parallel efficiency. The overall performance and parallel efficiency of the proposed approach are demonstrated by stringent benchmark tests.
Cavitation, Flow Structure and Turbulence in the Tip Region of a Rotor Blade

NASA Technical Reports Server (NTRS)

Wu, H.; Miorini, R.; Soranna, F.; Katz, J.; Michael, T.; Jessup, S.

2010-01-01

Objectives: Measure the flow structure and turbulence within a Naval, axial waterjet pump. Create a database for benchmarking and validation of parallel computational efforts. Address flow and turbulence modeling issues that are unique to this complex environment. Measure and model flow phenomena affecting cavitation within the pump and its effect on pump performance. This presentation focuses on cavitation phenomena and associated flow structure in the tip region of a rotor blade.
Time-Dependent Simulations of Incompressible Flow in a Turbopump Using Overset Grid Approach

NASA Technical Reports Server (NTRS)

Kiris, Cetin; Kwak, Dochan

2001-01-01

This viewgraph presentation provides information on mathematical modelling of the SSME (space shuttle main engine). The unsteady SSME-rig1 start-up procedure from the pump at rest has been initiated by using 34.3 million grid points. The computational model for the SSME-rig1 has been completed. Moving boundary capability is obtained by using DCF module in OVERFLOW-D. MPI (Message Passing Interface)/OpenMP hybrid parallel code has been benchmarked.
Quantitative phenotyping via deep barcode sequencing.

PubMed

Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

2009-10-01

Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.
Robust fuzzy output feedback controller for affine nonlinear systems via T-S fuzzy bilinear model: CSTR benchmark.

PubMed

Hamdy, M; Hamdan, I

2015-07-01

In this paper, a robust H∞ fuzzy output feedback controller is designed for a class of affine nonlinear systems with disturbance via Takagi-Sugeno (T-S) fuzzy bilinear model. The parallel distributed compensation (PDC) technique is utilized to design a fuzzy controller. The stability conditions of the overall closed loop T-S fuzzy bilinear model are formulated in terms of Lyapunov function via linear matrix inequality (LMI). The control law is robustified by H∞ sense to attenuate external disturbance. Moreover, the desired controller gains can be obtained by solving a set of LMI. A continuous stirred tank reactor (CSTR), which is a benchmark problem in nonlinear process control, is discussed in detail to verify the effectiveness of the proposed approach with a comparative study. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
Running Neuroimaging Applications on Amazon Web Services: How, When, and at What Cost?

PubMed

Madhyastha, Tara M; Koh, Natalie; Day, Trevor K M; Hernández-Fernández, Moises; Kelley, Austin; Peterson, Daniel J; Rajan, Sabreena; Woelfer, Karl A; Wolf, Jonathan; Grabowski, Thomas J

2017-01-01

The contribution of this paper is to identify and describe current best practices for using Amazon Web Services (AWS) to execute neuroimaging workflows "in the cloud." Neuroimaging offers a vast set of techniques by which to interrogate the structure and function of the living brain. However, many of the scientists for whom neuroimaging is an extremely important tool have limited training in parallel computation. At the same time, the field is experiencing a surge in computational demands, driven by a combination of data-sharing efforts, improvements in scanner technology that allow acquisition of images with higher image resolution, and by the desire to use statistical techniques that stress processing requirements. Most neuroimaging workflows can be executed as independent parallel jobs and are therefore excellent candidates for running on AWS, but the overhead of learning to do so and determining whether it is worth the cost can be prohibitive. In this paper we describe how to identify neuroimaging workloads that are appropriate for running on AWS, how to benchmark execution time, and how to estimate cost of running on AWS. By benchmarking common neuroimaging applications, we show that cloud computing can be a viable alternative to on-premises hardware. We present guidelines that neuroimaging labs can use to provide a cluster-on-demand type of service that should be familiar to users, and scripts to estimate cost and create such a cluster.
ARABIC TRANSLATION AND ADAPTATION OF THE HOSPITAL CONSUMER ASSESSMENT OF HEALTHCARE PROVIDERS AND SYSTEMS (HCAHPS) PATIENT SATISFACTION SURVEY INSTRUMENT.

PubMed

Dockins, James; Abuzahrieh, Ramzi; Stack, Martin

2015-01-01

To translate and adapt an effective, validated, benchmarked, and widely used patient satisfaction measurement tool for use with an Arabic-speaking population. Translation of survey's items, survey administration process development, evaluation of reliability, and international benchmarking Three hundred-bed tertiary care hospital in Jeddah, Saudi Arabia. 645 patients discharged during 2011 from the hospital's inpatient care units. INTERVENTIONS; The Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) instrument was translated into Arabic, a randomized weekly sample of patients was selected, and the survey was administered via telephone during 2011 to patients or their relatives. Scores were compiled for each of the HCAHPS questions and then for each of the six HCAHPS clinical composites, two non-clinical items, and two global items. Clinical composite scores, as well as the two non-clinical and two global items were analyzed for the 645 respondents. Clinical composites were analyzed using Spearman's correlation coefficient and Cronbach's alpha to demonstrate acceptable internal consistency for these items and scales demonstrated acceptable internal consistency for the clinical composites. (Spearman's correlation coefficient = 0.327 - 0.750, P < 0.01; Cronbach's alpha = 0.516 - 0.851) All ten HCAHPS measures were compared quarterly to US national averages with results that closely paralleled the US benchmarks. . The Arabic translation and adaptation of the HCAHPS is a valid, reliable, and feasible tool for evaluation and benchmarking of inpatient satisfaction in Arabic speaking populations.
The Scalable Checkpoint/Restart Library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moody, A.

The Scalable Checkpoint/Restart (SCR) library provides an interface that codes may use to worite our and read in application-level checkpoints in a scalable fashion. In the current implementation, checkpoint files are cached in local storage (hard disk or RAM disk) on the compute nodes. This technique provides scalable aggregate bandwidth and uses storage resources that are fully dedicated to the job. This approach addresses the two common drawbacks of checkpointing a large-scale application to a shared parallel file system, namely, limited bandwidth and file system contention. In fact, on current platforms, SCR scales linearly with the number of compute nodes.more » It has been benchmarked as high as 720GB/s on 1094 nodes of Atlas, which is nearly two orders of magnitude faster thanthe parallel file system.« less
GLAD: a system for developing and deploying large-scale bioinformatics grid.

PubMed

Teo, Yong-Meng; Wang, Xianbing; Ng, Yew-Kwong

2005-03-01

Grid computing is used to solve large-scale bioinformatics problems with gigabytes database by distributing the computation across multiple platforms. Until now in developing bioinformatics grid applications, it is extremely tedious to design and implement the component algorithms and parallelization techniques for different classes of problems, and to access remotely located sequence database files of varying formats across the grid. In this study, we propose a grid programming toolkit, GLAD (Grid Life sciences Applications Developer), which facilitates the development and deployment of bioinformatics applications on a grid. GLAD has been developed using ALiCE (Adaptive scaLable Internet-based Computing Engine), a Java-based grid middleware, which exploits the task-based parallelism. Two bioinformatics benchmark applications, such as distributed sequence comparison and distributed progressive multiple sequence alignment, have been developed using GLAD.

Data Acquisition and Linguistic Resources

NASA Astrophysics Data System (ADS)

Strassel, Stephanie; Christianson, Caitlin; McCary, John; Staderman, William; Olive, Joseph

All human language technology demands substantial quantities of data for system training and development, plus stable benchmark data to measure ongoing progress. While creation of high quality linguistic resources is both costly and time consuming, such data has the potential to profoundly impact not just a single evaluation program but language technology research in general. GALE's challenging performance targets demand linguistic data on a scale and complexity never before encountered. Resources cover multiple languages (Arabic, Chinese, and English) and multiple genres -- both structured (newswire and broadcast news) and unstructured (web text, including blogs and newsgroups, and broadcast conversation). These resources include significant volumes of monolingual text and speech, parallel text, and transcribed audio combined with multiple layers of linguistic annotation, ranging from word aligned parallel text and Treebanks to rich semantic annotation.
Parallel 3D-TLM algorithm for simulation of the Earth-ionosphere cavity

NASA Astrophysics Data System (ADS)

Toledo-Redondo, Sergio; Salinas, Alfonso; Morente-Molinera, Juan Antonio; Méndez, Antonio; Fornieles, Jesús; Portí, Jorge; Morente, Juan Antonio

2013-03-01

A parallel 3D algorithm for solving time-domain electromagnetic problems with arbitrary geometries is presented. The technique employed is the Transmission Line Modeling (TLM) method implemented in Shared Memory (SM) environments. The benchmarking performed reveals that the maximum speedup depends on the memory size of the problem as well as multiple hardware factors, like the disposition of CPUs, cache, or memory. A maximum speedup of 15 has been measured for the largest problem. In certain circumstances of low memory requirements, superlinear speedup is achieved using our algorithm. The model is employed to model the Earth-ionosphere cavity, thus enabling a study of the natural electromagnetic phenomena that occur in it. The algorithm allows complete 3D simulations of the cavity with a resolution of 10 km, within a reasonable timescale.
Coordinated Parallel Runway Approaches

NASA Technical Reports Server (NTRS)

Koczo, Steve

1996-01-01

The current air traffic environment in airport terminal areas experiences substantial delays when weather conditions deteriorate to Instrument Meteorological Conditions (IMC). Expected future increases in air traffic will put additional pressures on the National Airspace System (NAS) and will further compound the high costs associated with airport delays. To address this problem, NASA has embarked on a program to address Terminal Area Productivity (TAP). The goals of the TAP program are to provide increased efficiencies in air traffic during the approach, landing, and surface operations in low-visibility conditions. The ultimate goal is to achieve efficiencies of terminal area flight operations commensurate with Visual Meteorological Conditions (VMC) at current or improved levels of safety.
Long-range interactions and parallel scalability in molecular simulations

NASA Astrophysics Data System (ADS)

Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko

2007-01-01

Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
High-performance computational fluid dynamics: a custom-code approach

NASA Astrophysics Data System (ADS)

Fannon, James; Loiseau, Jean-Christophe; Valluri, Prashant; Bethune, Iain; Náraigh, Lennon Ó.

2016-07-01

We introduce a modified and simplified version of the pre-existing fully parallelized three-dimensional Navier-Stokes flow solver known as TPLS. We demonstrate how the simplified version can be used as a pedagogical tool for the study of computational fluid dynamics (CFDs) and parallel computing. TPLS is at its heart a two-phase flow solver, and uses calls to a range of external libraries to accelerate its performance. However, in the present context we narrow the focus of the study to basic hydrodynamics and parallel computing techniques, and the code is therefore simplified and modified to simulate pressure-driven single-phase flow in a channel, using only relatively simple Fortran 90 code with MPI parallelization, but no calls to any other external libraries. The modified code is analysed in order to both validate its accuracy and investigate its scalability up to 1000 CPU cores. Simulations are performed for several benchmark cases in pressure-driven channel flow, including a turbulent simulation, wherein the turbulence is incorporated via the large-eddy simulation technique. The work may be of use to advanced undergraduate and graduate students as an introductory study in CFDs, while also providing insight for those interested in more general aspects of high-performance computing.
Naphthenic acids in athabasca oil sands tailings waters are less biodegradable than commercial naphthenic acids.

PubMed

Scott, Angela C; MacKinnon, Michael D; Fedorak, Phillip M

2005-11-01

Naphthenic acids (NAs) are natural constituents in many petroleum sources, including bitumen in the oil sands of Northern Alberta, Canada. Bitumen extraction processes produce tailings waters that cannot be discharged to the environment because NAs are acutely toxic to aquatic species. However, aerobic biodegradation reduces the toxic character of NAs. In this study, four commercial NAs and the NAs in two oil sands tailings waters were characterized by gas chromatography-mass spectrometry. These NAs were also incubated with microorganisms in the tailings waters under aerobic, laboratory conditions. The NAs in the commercial preparations had lower molecular masses than the NAs in the tailings waters. The commercial NAs were biodegraded within 14 days, but only about 25% of the NAs native to the tailings waters were removed after 40-49 days. These results show that low molecular mass NAs (C < or =17) are more readily biodegraded than high molecular mass NAs (C > or =18). Moreover, the results indicate that biodegradation studies using commercial NAs alone will not accurately reflect the potential biodegradability of NAs in the oil sands tailings waters.
GRAMM-X public web server for protein–protein docking

PubMed Central

Tovchigrechko, Andrey; Vakser, Ilya A.

2006-01-01

Protein docking software GRAMM-X and its web interface () extend the original GRAMM Fast Fourier Transformation methodology by employing smoothed potentials, refinement stage, and knowledge-based scoring. The web server frees users from complex installation of database-dependent parallel software and maintaining large hardware resources needed for protein docking simulations. Docking problems submitted to GRAMM-X server are processed by a 320 processor Linux cluster. The server was extensively tested by benchmarking, several months of public use, and participation in the CAPRI server track. PMID:16845016
A classification and evaluation of data movement technologies for the delivery of highly voluminous scientific data products

NASA Technical Reports Server (NTRS)

Mattmann, Chris A.; Kelly, Sean; Crichton, Daniel J.; Hughes, J. Steven; Hardman, Sean; Ramirez, Paul; Joyner, Ron

2006-01-01

In this paper, we present a preliminary study of several different electronic data movement technologies. We detail our approach to classifying the technologies included in our study and present the preliminary results of some initial performance benchmarking. Our studies suggest that highly parallel TCP/IP streaming technologies, such as GridFTP and bbFTP, outperform commercial and open-source UDP-bursting technologies in several of the key data movement dimensions that we studied.
Personal supercomputing by using transputer and Intel 80860 in plasma engineering

NASA Astrophysics Data System (ADS)

Ido, S.; Aoki, K.; Ishine, M.; Kubota, M.

1992-09-01

Transputer (T800) and 64-bit RISC Intel 80860 (i860) added on a personal computer can be used as an accelerator. When 32-bit T800s in a parallel system or 64-bit i860s are used, scientific calculations are carried out several ten times as fast as in the case of commonly used 32-bit personal computers or UNIX workstations. Benchmark tests and examples of physical simulations using T800s and i860 are reported.
Real-Time Parallel Software Design Case Study: Implementation of the RASSP SAR Benchmark on the Intel Paragon.

DTIC Science & Technology

1996-01-01

Real-Time 19 5 Conclusion 23 List of References 25 ii LIST OF FIGURES FIGURE PAGE 3-1 Test Bench Pseudo Code 7 3-2 Fast Convolution...3-1 shows pseudo - code for a test bench with two application nodes. The outer test bench wrapper consists of three functions: pipeline_init, pipeline...exit_func); Figure 3-1. Test Bench Pseudo Code The application wrapper is contained in the pipeline routine and similarly consists of an
Behavioural abnormalities of the hyposulphataemic Nas1 knock-out mouse.

PubMed

Dawson, Paul Anthony; Steane, Sarah Elizabeth; Markovich, Daniel

2004-10-05

We recently generated a sodium sulphate cotransporter knock-out mouse (Nas1-/-) which has increased urinary sulphate excretion and hyposulphataemia. To examine the consequences of disturbed sulphate homeostasis in the modulation of mouse behavioural characteristics, Nas1-/- mice were compared with Nas1+/- and Nas1+/+ littermates in a series of behavioural tests. The Nas1-/- mice displayed significantly (P < 0.001) decreased marble burying behaviour (4.33 +/- 0.82 buried) when compared to Nas1+/+ (7.86 +/- 0.44) and Nas1+/- (8.40 +/- 0.37) animals, suggesting that Nas1-/- mice may have decreased object-induced anxiety. The Nas1-/- mice also displayed decreased locomotor activity by moving less distance (1.53 +/- 0.27 m, P < 0.05) in an open-field test when compared to Nas1+/+ (2.31 +/- 0.24 m) and Nas1+/- (2.15 +/- 0.19 m) mice. The three genotypes displayed similar spatiotemporal and ethological behaviours in the elevated-plus maze and open-field test, with the exception of a decreased defecation frequency by the Nas1-/- mice (40% reduction, P < 0.01). There were no significant differences between Nas1-/- and Nas1+/+ mice in a rotarod performance test of motor coordination and in the forced swim test assessing (anti-)depressant-like behaviours. This is the first study to demonstrate behavioural abnormalities in the hyposulphataemic Nas1-/- mice.
Parallelization of sequential Gaussian, indicator and direct simulation algorithms

NASA Astrophysics Data System (ADS)

Nunes, Ruben; Almeida, José A.

2010-08-01

Improving the performance and robustness of algorithms on new high-performance parallel computing architectures is a key issue in efficiently performing 2D and 3D studies with large amount of data. In geostatistics, sequential simulation algorithms are good candidates for parallelization. When compared with other computational applications in geosciences (such as fluid flow simulators), sequential simulation software is not extremely computationally intensive, but parallelization can make it more efficient and creates alternatives for its integration in inverse modelling approaches. This paper describes the implementation and benchmarking of a parallel version of the three classic sequential simulation algorithms: direct sequential simulation (DSS), sequential indicator simulation (SIS) and sequential Gaussian simulation (SGS). For this purpose, the source used was GSLIB, but the entire code was extensively modified to take into account the parallelization approach and was also rewritten in the C programming language. The paper also explains in detail the parallelization strategy and the main modifications. Regarding the integration of secondary information, the DSS algorithm is able to perform simple kriging with local means, kriging with an external drift and collocated cokriging with both local and global correlations. SIS includes a local correction of probabilities. Finally, a brief comparison is presented of simulation results using one, two and four processors. All performance tests were carried out on 2D soil data samples. The source code is completely open source and easy to read. It should be noted that the code is only fully compatible with Microsoft Visual C and should be adapted for other systems/compilers.
A Next-Generation Parallel File System Environment for the OLCF

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dillow, David A; Fuller, Douglas; Gunasekaran, Raghul

2012-01-01

When deployed in 2008/2009 the Spider system at the Oak Ridge National Laboratory s Leadership Computing Facility (OLCF) was the world s largest scale Lustre parallel file system. Envisioned as a shared parallel file system capable of delivering both the bandwidth and capacity requirements of the OLCF s diverse computational environment, Spider has since become a blueprint for shared Lustre environments deployed worldwide. Designed to support the parallel I/O requirements of the Jaguar XT5 system and other smallerscale platforms at the OLCF, the upgrade to the Titan XK6 heterogeneous system will begin to push the limits of Spider s originalmore » design by mid 2013. With a doubling in total system memory and a 10x increase in FLOPS, Titan will require both higher bandwidth and larger total capacity. Our goal is to provide a 4x increase in total I/O bandwidth from over 240GB=sec today to 1TB=sec and a doubling in total capacity. While aggregate bandwidth and total capacity remain important capabilities, an equally important goal in our efforts is dramatically increasing metadata performance, currently the Achilles heel of parallel file systems at leadership. We present in this paper an analysis of our current I/O workloads, our operational experiences with the Spider parallel file systems, the high-level design of our Spider upgrade, and our efforts in developing benchmarks that synthesize our performance requirements based on our workload characterization studies.« less
Toxicity of naphthenic acid fraction components extracted from fresh and aged oil sands process-affected waters, and commercial naphthenic acid mixtures, to fathead minnow (Pimephales promelas) embryos.

PubMed

Marentette, Julie R; Frank, Richard A; Bartlett, Adrienne J; Gillis, Patricia L; Hewitt, L Mark; Peru, Kerry M; Headley, John V; Brunswick, Pamela; Shang, Dayue; Parrott, Joanne L

2015-07-01

Naphthenic acids (NAs) are constituents of oil sands process-affected water (OSPW). These compounds can be both toxic and persistent and thus are a primary concern for the ultimate remediation of tailings ponds in northern Alberta's oil sands regions. Recent research has focused on the toxicity of NAs to the highly vulnerable early life-stages of fish. Here we examined fathead minnow embryonic survival, growth and deformities after exposure to extracted NA fraction components (NAFCs), from fresh and aged oil sands process-affected water (OSPW), as well as commercially available NA mixtures. Commercial NA mixtures were dominated by acyclic O2 species, while NAFCs from OSPW were dominated by bi- and tricyclic O2 species. Fathead minnow embryos less than 24h old were reared in tissue culture plates terminating at hatch. Both NAFC and commercial NA mixtures reduced hatch success, although NAFCs from OSPW were less toxic (EC50=5-12mg/L, nominal concentrations) than commercial NAs (2mg/L, nominal concentrations). The toxicities of NAFCs from aged and fresh OSPW were similar. Embryonic heart rates at 2 days post-fertilization (dpf) declined with increasing NAFC exposure, paralleling patterns of hatch success and rates of cardiovascular abnormalities (e.g., pericardial edemas) at hatch. Finfold deformities increased in exposures to commercial NA mixtures, not NAFCs. Thus, commercial NA mixtures are not appropriate surrogates for NAFC toxicity. Further work clarifying the mechanisms of action of NAFCs in OSPW, as well as comparisons with additional aged sources of OSPW, is merited. Crown Copyright © 2015. Published by Elsevier B.V. All rights reserved.
OpenMP performance for benchmark 2D shallow water equations using LBM

NASA Astrophysics Data System (ADS)

Sabri, Khairul; Rabbani, Hasbi; Gunawan, Putu Harry

2018-03-01

Shallow water equations or commonly referred as Saint-Venant equations are used to model fluid phenomena. These equations can be solved numerically using several methods, like Lattice Boltzmann method (LBM), SIMPLE-like Method, Finite Difference Method, Godunov-type Method, and Finite Volume Method. In this paper, the shallow water equation will be approximated using LBM or known as LABSWE and will be simulated in performance of parallel programming using OpenMP. To evaluate the performance between 2 and 4 threads parallel algorithm, ten various number of grids Lx and Ly are elaborated. The results show that using OpenMP platform, the computational time for solving LABSWE can be decreased. For instance using grid sizes 1000 × 500, the speedup of 2 and 4 threads is observed 93.54 s and 333.243 s respectively.
A neurally plausible parallel distributed processing model of event-related potential word reading data.

PubMed

Laszlo, Sarah; Plaut, David C

2012-03-01

The Parallel Distributed Processing (PDP) framework has significant potential for producing models of cognitive tasks that approximate how the brain performs the same tasks. To date, however, there has been relatively little contact between PDP modeling and data from cognitive neuroscience. In an attempt to advance the relationship between explicit, computational models and physiological data collected during the performance of cognitive tasks, we developed a PDP model of visual word recognition which simulates key results from the ERP reading literature, while simultaneously being able to successfully perform lexical decision-a benchmark task for reading models. Simulations reveal that the model's success depends on the implementation of several neurally plausible features in its architecture which are sufficiently domain-general to be relevant to cognitive modeling more generally. Copyright Â© 2011 Elsevier Inc. All rights reserved.
A CPU/MIC Collaborated Parallel Framework for GROMACS on Tianhe-2 Supercomputer.

PubMed

Peng, Shaoliang; Yang, Shunyun; Su, Wenhe; Zhang, Xiaoyu; Zhang, Tenglilang; Liu, Weiguo; Zhao, Xingming

2017-06-16

Molecular Dynamics (MD) is the simulation of the dynamic behavior of atoms and molecules. As the most popular software for molecular dynamics, GROMACS cannot work on large-scale data because of limit computing resources. In this paper, we propose a CPU and Intel® Xeon Phi Many Integrated Core (MIC) collaborated parallel framework to accelerate GROMACS using the offload mode on a MIC coprocessor, with which the performance of GROMACS is improved significantly, especially with the utility of Tianhe-2 supercomputer. Furthermore, we optimize GROMACS so that it can run on both the CPU and MIC at the same time. In addition, we accelerate multi-node GROMACS so that it can be used in practice. Benchmarking on real data, our accelerated GROMACS performs very well and reduces computation time significantly. Source code: https://github.com/tianhe2/gromacs-mic.
Porting AMG2013 to Heterogeneous CPU+GPU Nodes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Samfass, Philipp

LLNL's future advanced technology system SIERRA will feature heterogeneous compute nodes that consist of IBM PowerV9 CPUs and NVIDIA Volta GPUs. Conceptually, the motivation for such an architecture is quite straightforward: While GPUs are optimized for throughput on massively parallel workloads, CPUs strive to minimize latency for rather sequential operations. Yet, making optimal use of heterogeneous architectures raises new challenges for the development of scalable parallel software, e.g., with respect to work distribution. Porting LLNL's parallel numerical libraries to upcoming heterogeneous CPU+GPU architectures is therefore a critical factor for ensuring LLNL's future success in ful lling its national mission. Onemore » of these libraries, called HYPRE, provides parallel solvers and precondi- tioners for large, sparse linear systems of equations. In the context of this intern- ship project, I consider AMG2013 which is a proxy application for major parts of HYPRE that implements a benchmark for setting up and solving di erent systems of linear equations. In the following, I describe in detail how I ported multiple parts of AMG2013 to the GPU (Section 2) and present results for di erent experiments that demonstrate a successful parallel implementation on the heterogeneous ma- chines surface and ray (Section 3). In Section 4, I give guidelines on how my code should be used. Finally, I conclude and give an outlook for future work (Section 5).« less
Running Neuroimaging Applications on Amazon Web Services: How, When, and at What Cost?

PubMed Central

Madhyastha, Tara M.; Koh, Natalie; Day, Trevor K. M.; Hernández-Fernández, Moises; Kelley, Austin; Peterson, Daniel J.; Rajan, Sabreena; Woelfer, Karl A.; Wolf, Jonathan; Grabowski, Thomas J.

2017-01-01

The contribution of this paper is to identify and describe current best practices for using Amazon Web Services (AWS) to execute neuroimaging workflows “in the cloud.” Neuroimaging offers a vast set of techniques by which to interrogate the structure and function of the living brain. However, many of the scientists for whom neuroimaging is an extremely important tool have limited training in parallel computation. At the same time, the field is experiencing a surge in computational demands, driven by a combination of data-sharing efforts, improvements in scanner technology that allow acquisition of images with higher image resolution, and by the desire to use statistical techniques that stress processing requirements. Most neuroimaging workflows can be executed as independent parallel jobs and are therefore excellent candidates for running on AWS, but the overhead of learning to do so and determining whether it is worth the cost can be prohibitive. In this paper we describe how to identify neuroimaging workloads that are appropriate for running on AWS, how to benchmark execution time, and how to estimate cost of running on AWS. By benchmarking common neuroimaging applications, we show that cloud computing can be a viable alternative to on-premises hardware. We present guidelines that neuroimaging labs can use to provide a cluster-on-demand type of service that should be familiar to users, and scripts to estimate cost and create such a cluster. PMID:29163119
Occurrences and behaviors of naphthenic acids in a petroleum refinery wastewater treatment plant.

PubMed

Wang, Beili; Wan, Yi; Gao, Yingxin; Zheng, Guomao; Yang, Min; Wu, Song; Hu, Jianying

2015-05-05

Naphthenic acids (NAs) are one class of compounds in wastewaters from petroleum industries that are known to cause toxic effects, and their removal from oilfield wastewater is an important challenge for remediation of large volumes of petrochemical effluents. The present study investigated occurrences and behaviors of total NAs and aromatic NAs in a refinery wastewater treatment plant, located in north China, which combined physicochemical and biological processes. Concentrations of total NAs were semiquantified to be 113-392 μg/L in wastewater from all the treatment units, and the percentages of aromatic NAs in total NAs was estimated to be 2.1-8.8%. The mass reduction for total NAs and aromatic NAs was 15±16% and 7.5±24% after the physicochemical treatment, respectively. Great mass reduction (total NAs: 65±11%, aromatic NAs: 86±5%) was observed in the biological treatment units, and antiestrogenic activities observed in wastewater from physicochemical treatment units disappeared in the effluent of the activated sludge system. The distributions of mass fractions of NAs demonstrated that biodegradation via activated sludge was the major mechanism for removing alicyclic NAs, aromatic NAs, and related toxicities in the plant, and the polycyclic NA congener classes were relatively recalcitrant to biodegradation, which is a complete contrast to the preferential adsorption of NAs with higher cyclicity (low Z value). Removal efficiencies of total NAs were 73±17% in summer, which were higher than those in winter (53±15%), and the seasonal variation was possibly due to the relatively high microbial biotransformation activities in the activated sludge system in summer (indexed by O3-NAs/NAs). The results of the investigations indicated that biotransformation of NA mixtures by the activated sludge system were largely affected by temperature, and employing an efficient adsorbent together with biodegradation processes would help cost-effectively remove NAs in petroleum effluents.

Implementation of a 3D version of ponderomotive guiding center solver in particle-in-cell code OSIRIS

NASA Astrophysics Data System (ADS)

Helm, Anton; Vieira, Jorge; Silva, Luis; Fonseca, Ricardo

2016-10-01

Laser-driven accelerators gained an increased attention over the past decades. Typical modeling techniques for laser wakefield acceleration (LWFA) are based on particle-in-cell (PIC) simulations. PIC simulations, however, are very computationally expensive due to the disparity of the relevant scales ranging from the laser wavelength, in the micrometer range, to the acceleration length, currently beyond the ten centimeter range. To minimize the gap between these despair scales the ponderomotive guiding center (PGC) algorithm is a promising approach. By describing the evolution of the laser pulse envelope separately, only the scales larger than the plasma wavelength are required to be resolved in the PGC algorithm, leading to speedups in several orders of magnitude. Previous work was limited to two dimensions. Here we present the implementation of the 3D version of a PGC solver into the massively parallel, fully relativistic PIC code OSIRIS. We extended the solver to include periodic boundary conditions and parallelization in all spatial dimensions. We present benchmarks for distributed and shared memory parallelization. We also discuss the stability of the PGC solver.
Vectorial finite elements for solving the radiative transfer equation

NASA Astrophysics Data System (ADS)

Badri, M. A.; Jolivet, P.; Rousseau, B.; Le Corre, S.; Digonnet, H.; Favennec, Y.

2018-06-01

The discrete ordinate method coupled with the finite element method is often used for the spatio-angular discretization of the radiative transfer equation. In this paper we attempt to improve upon such a discretization technique. Instead of using standard finite elements, we reformulate the radiative transfer equation using vectorial finite elements. In comparison to standard finite elements, this reformulation yields faster timings for the linear system assemblies, as well as for the solution phase when using scattering media. The proposed vectorial finite element discretization for solving the radiative transfer equation is cross-validated against a benchmark problem available in literature. In addition, we have used the method of manufactured solutions to verify the order of accuracy for our discretization technique within different absorbing, scattering, and emitting media. For solving large problems of radiation on parallel computers, the vectorial finite element method is parallelized using domain decomposition. The proposed domain decomposition method scales on large number of processes, and its performance is unaffected by the changes in optical thickness of the medium. Our parallel solver is used to solve a large scale radiative transfer problem of the Kelvin-cell radiation.
Massively parallel algorithm and implementation of RI-MP2 energy calculation for peta-scale many-core supercomputers.

PubMed

Katouda, Michio; Naruse, Akira; Hirano, Yukihiko; Nakajima, Takahito

2016-11-15

A new parallel algorithm and its implementation for the RI-MP2 energy calculation utilizing peta-flop-class many-core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual-level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi-node and multi-GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi-node and multi-GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Empirical study of parallel LRU simulation algorithms

NASA Technical Reports Server (NTRS)

Carr, Eric; Nicol, David M.

1994-01-01

This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.
Towards implementation of cellular automata in Microbial Fuel Cells.

PubMed

Tsompanas, Michail-Antisthenis I; Adamatzky, Andrew; Sirakoulis, Georgios Ch; Greenman, John; Ieropoulos, Ioannis

2017-01-01

The Microbial Fuel Cell (MFC) is a bio-electrochemical transducer converting waste products into electricity using microbial communities. Cellular Automaton (CA) is a uniform array of finite-state machines that update their states in discrete time depending on states of their closest neighbors by the same rule. Arrays of MFCs could, in principle, act as massive-parallel computing devices with local connectivity between elementary processors. We provide a theoretical design of such a parallel processor by implementing CA in MFCs. We have chosen Conway's Game of Life as the 'benchmark' CA because this is the most popular CA which also exhibits an enormously rich spectrum of patterns. Each cell of the Game of Life CA is realized using two MFCs. The MFCs are linked electrically and hydraulically. The model is verified via simulation of an electrical circuit demonstrating equivalent behaviours. The design is a first step towards future implementations of fully autonomous biological computing devices with massive parallelism. The energy independence of such devices counteracts their somewhat slow transitions-compared to silicon circuitry-between the different states during computation.
Parallel Algorithms for Monte Carlo Particle Transport Simulation on Exascale Computing Architectures

NASA Astrophysics Data System (ADS)

Romano, Paul Kollath

Monte Carlo particle transport methods are being considered as a viable option for high-fidelity simulation of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in parallel efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear parallel scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the simulation. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed simulations. The analysis demonstrated that load imbalances in domain decomposed simulations arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with measured data from simulations in OpenMC on a full-core benchmark problem. Finally, a novel algorithm for decomposing large tally data was proposed, analyzed, and implemented/tested in OpenMC. The algorithm relies on disjoint sets of compute processes and tally servers. The analysis showed that for a range of parameters relevant to LWR analysis, the tally server algorithm should perform with minimal overhead. Tests were performed on Intrepid and Titan and demonstrated that the algorithm did indeed perform well over a wide range of parameters. (Copies available exclusively from MIT Libraries, libraries.mit.edu/docs - docs mit.edu)
Transcriptional and translational adaptation to aerobic nitrate anabolism in the denitrifier Paracoccus denitrificans

PubMed Central

Luque-Almagro, Victor M.; Manso, Isabel; Sullivan, Matthew J.; Rowley, Gary; Ferguson, Stuart J.; Moreno-Vivián, Conrado; Richardson, David J.; Gates, Andrew J.

2017-01-01

Transcriptional adaptation to nitrate-dependent anabolism by Paracoccus denitrificans PD1222 was studied. A total of 74 genes were induced in cells grown with nitrate as N-source compared with ammonium, including nasTSABGHC and ntrBC genes. The nasT and nasS genes were cotranscribed, although nasT was more strongly induced by nitrate than nasS. The nasABGHC genes constituted a transcriptional unit, which is preceded by a non-coding region containing hairpin structures involved in transcription termination. The nasTS and nasABGHC transcripts were detected at similar levels with nitrate or glutamate as N-source, but nasABGHC transcript was undetectable in ammonium-grown cells. The nitrite reductase NasG subunit was detected by two-dimensional polyacrylamide gel electrophoresis in cytoplasmic fractions from nitrate-grown cells, but it was not observed when either ammonium or glutamate was used as the N-source. The nasT mutant lacked both nasABGHC transcript and nicotinamide adenine dinucleotide (NADH)-dependent nitrate reductase activity. On the contrary, the nasS mutant showed similar levels of the nasABGHC transcript to the wild-type strain and displayed NasG protein and NADH–nitrate reductase activity with all N-sources tested, except with ammonium. Ammonium repression of nasABGHC was dependent on the Ntr system. The ntrBC and ntrYX genes were expressed at low levels regardless of the nitrogen source supporting growth. Mutational analysis of the ntrBCYX genes indicated that while ntrBC genes are required for nitrate assimilation, ntrYX genes can only partially restore growth on nitrate in the absence of ntrBC genes. The existence of a regulation mechanism for nitrate assimilation in P. denitrificans, by which nitrate induction operates at both transcriptional and translational levels, is proposed. PMID:28385879
Artificial sweeteners induce glucose intolerance by altering the gut microbiota.

PubMed

Suez, Jotham; Korem, Tal; Zeevi, David; Zilberman-Schapira, Gili; Thaiss, Christoph A; Maza, Ori; Israeli, David; Zmora, Niv; Gilad, Shlomit; Weinberger, Adina; Kuperman, Yael; Harmelin, Alon; Kolodkin-Gal, Ilana; Shapiro, Hagit; Halpern, Zamir; Segal, Eran; Elinav, Eran

2014-10-09

Non-caloric artificial sweeteners (NAS) are among the most widely used food additives worldwide, regularly consumed by lean and obese individuals alike. NAS consumption is considered safe and beneficial owing to their low caloric content, yet supporting scientific data remain sparse and controversial. Here we demonstrate that consumption of commonly used NAS formulations drives the development of glucose intolerance through induction of compositional and functional alterations to the intestinal microbiota. These NAS-mediated deleterious metabolic effects are abrogated by antibiotic treatment, and are fully transferrable to germ-free mice upon faecal transplantation of microbiota configurations from NAS-consuming mice, or of microbiota anaerobically incubated in the presence of NAS. We identify NAS-altered microbial metabolic pathways that are linked to host susceptibility to metabolic disease, and demonstrate similar NAS-induced dysbiosis and glucose intolerance in healthy human subjects. Collectively, our results link NAS consumption, dysbiosis and metabolic abnormalities, thereby calling for a reassessment of massive NAS usage.
Quantitative phenotyping via deep barcode sequencing

PubMed Central

Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey

2009-01-01

Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793
[QUIPS: quality improvement in postoperative pain management].

PubMed

Meissner, Winfried

2011-01-01

Despite the availability of high-quality guidelines and advanced pain management techniques acute postoperative pain management is still far from being satisfactory. The QUIPS (Quality Improvement in Postoperative Pain Management) project aims to improve treatment quality by means of standardised data acquisition, analysis of quality and process indicators, and feedback and benchmarking. During a pilot phase funded by the German Ministry of Health (BMG), a total of 12,389 data sets were collected from six participating hospitals. Outcome improved in four of the six hospitals. Process indicators, such as routine pain documentation, were only poorly correlated with outcomes. To date, more than 130 German hospitals use QUIPS as a routine quality management tool. An EC-funded parallel project disseminates the concept internationally. QUIPS demonstrates that patient-reported outcomes in postoperative pain management can be benchmarked in routine clinical practice. Quality improvement initiatives should use outcome instead of structural and process parameters. The concept is transferable to other fields of medicine. Copyright © 2011. Published by Elsevier GmbH.
Hierarchical Artificial Bee Colony Algorithm for RFID Network Planning Optimization

PubMed Central

Ma, Lianbo; Chen, Hanning; Hu, Kunyuan; Zhu, Yunlong

2014-01-01

This paper presents a novel optimization algorithm, namely, hierarchical artificial bee colony optimization, called HABC, to tackle the radio frequency identification network planning (RNP) problem. In the proposed multilevel model, the higher-level species can be aggregated by the subpopulations from lower level. In the bottom level, each subpopulation employing the canonical ABC method searches the part-dimensional optimum in parallel, which can be constructed into a complete solution for the upper level. At the same time, the comprehensive learning method with crossover and mutation operators is applied to enhance the global search ability between species. Experiments are conducted on a set of 10 benchmark optimization problems. The results demonstrate that the proposed HABC obtains remarkable performance on most chosen benchmark functions when compared to several successful swarm intelligence and evolutionary algorithms. Then HABC is used for solving the real-world RNP problem on two instances with different scales. Simulation results show that the proposed algorithm is superior for solving RNP, in terms of optimization accuracy and computation robustness. PMID:24592200
Hierarchical artificial bee colony algorithm for RFID network planning optimization.

PubMed

Ma, Lianbo; Chen, Hanning; Hu, Kunyuan; Zhu, Yunlong

2014-01-01

This paper presents a novel optimization algorithm, namely, hierarchical artificial bee colony optimization, called HABC, to tackle the radio frequency identification network planning (RNP) problem. In the proposed multilevel model, the higher-level species can be aggregated by the subpopulations from lower level. In the bottom level, each subpopulation employing the canonical ABC method searches the part-dimensional optimum in parallel, which can be constructed into a complete solution for the upper level. At the same time, the comprehensive learning method with crossover and mutation operators is applied to enhance the global search ability between species. Experiments are conducted on a set of 10 benchmark optimization problems. The results demonstrate that the proposed HABC obtains remarkable performance on most chosen benchmark functions when compared to several successful swarm intelligence and evolutionary algorithms. Then HABC is used for solving the real-world RNP problem on two instances with different scales. Simulation results show that the proposed algorithm is superior for solving RNP, in terms of optimization accuracy and computation robustness.
Two-fluid dusty shocks: simple benchmarking problems and applications to protoplanetary discs

NASA Astrophysics Data System (ADS)

Lehmann, Andrew; Wardle, Mark

2018-05-01

The key role that dust plays in the interstellar medium has motivated the development of numerical codes designed to study the coupled evolution of dust and gas in systems such as turbulent molecular clouds and protoplanetary discs. Drift between dust and gas has proven to be important as well as numerically challenging. We provide simple benchmarking problems for dusty gas codes by numerically solving the two-fluid dust-gas equations for steady, plane-parallel shock waves. The two distinct shock solutions to these equations allow a numerical code to test different forms of drag between the two fluids, the strength of that drag and the dust to gas ratio. We also provide an astrophysical application of J-type dust-gas shocks to studying the structure of accretion shocks on to protoplanetary discs. We find that two-fluid effects are most important for grains larger than 1 μm, and that the peak dust temperature within an accretion shock provides a signature of the dust-to-gas ratio of the infalling material.
2015 WFNDEC eddy current benchmark modeling of impedance variation in coil due to a crack located at the plate edge

NASA Astrophysics Data System (ADS)

Rocha, João Vicente; Camerini, Cesar; Pereira, Gabriela

2016-02-01

The 2015 World Federation of NDE Centers (WFNDEC) eddy current benchmark problem involves the inspection of two EDM notches placed at the edge of a conducting plate with a pancake coil that runs parallel to the plate's edge line. Experimental data consists of impedance variation measured with a precision LCR bridge as a XY scanner moves the coil. The authors are pleased to present the numerical results obtained with commercial FEM packages (OPERA 3-D). Values of electrical resistance and inductive reactance variation between base material and the region around the notch are plotted as function of the coil displacement over the plate. The calculations were made for frequencies of 1 kHz and 10 kHz and agreement between experimental and numerical results are excellent for all inspection conditions. Explanations are made about how the impedance is calculated as well as pros and cons of the presented methods.
Verification of ARES transport code system with TAKEDA benchmarks

NASA Astrophysics Data System (ADS)

Zhang, Liang; Zhang, Bin; Zhang, Penghe; Chen, Mengteng; Zhao, Jingchang; Zhang, Shun; Chen, Yixue

2015-10-01

Neutron transport modeling and simulation are central to many areas of nuclear technology, including reactor core analysis, radiation shielding and radiation detection. In this paper the series of TAKEDA benchmarks are modeled to verify the critical calculation capability of ARES, a discrete ordinates neutral particle transport code system. SALOME platform is coupled with ARES to provide geometry modeling and mesh generation function. The Koch-Baker-Alcouffe parallel sweep algorithm is applied to accelerate the traditional transport calculation process. The results show that the eigenvalues calculated by ARES are in excellent agreement with the reference values presented in NEACRP-L-330, with a difference less than 30 pcm except for the first case of model 3. Additionally, ARES provides accurate fluxes distribution compared to reference values, with a deviation less than 2% for region-averaged fluxes in all cases. All of these confirms the feasibility of ARES-SALOME coupling and demonstrate that ARES has a good performance in critical calculation.
Scaling of Multimillion-Atom Biological Molecular Dynamics Simulation on a Petascale Supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schulz, Roland; Lindner, Benjamin; Petridis, Loukas

2009-01-01

A strategy is described for a fast all-atom molecular dynamics simulation of multimillion-atom biological systems on massively parallel supercomputers. The strategy is developed using benchmark systems of particular interest to bioenergy research, comprising models of cellulose and lignocellulosic biomass in an aqueous solution. The approach involves using the reaction field (RF) method for the computation of long-range electrostatic interactions, which permits efficient scaling on many thousands of cores. Although the range of applicability of the RF method for biomolecular systems remains to be demonstrated, for the benchmark systems the use of the RF produces molecular dipole moments, Kirkwood G factors,more » other structural properties, and mean-square fluctuations in excellent agreement with those obtained with the commonly used Particle Mesh Ewald method. With RF, three million- and five million atom biological systems scale well up to 30k cores, producing 30 ns/day. Atomistic simulations of very large systems for time scales approaching the microsecond would, therefore, appear now to be within reach.« less
Scaling of Multimillion-Atom Biological Molecular Dynamics Simulation on a Petascale Supercomputer.

PubMed

Schulz, Roland; Lindner, Benjamin; Petridis, Loukas; Smith, Jeremy C

2009-10-13

A strategy is described for a fast all-atom molecular dynamics simulation of multimillion-atom biological systems on massively parallel supercomputers. The strategy is developed using benchmark systems of particular interest to bioenergy research, comprising models of cellulose and lignocellulosic biomass in an aqueous solution. The approach involves using the reaction field (RF) method for the computation of long-range electrostatic interactions, which permits efficient scaling on many thousands of cores. Although the range of applicability of the RF method for biomolecular systems remains to be demonstrated, for the benchmark systems the use of the RF produces molecular dipole moments, Kirkwood G factors, other structural properties, and mean-square fluctuations in excellent agreement with those obtained with the commonly used Particle Mesh Ewald method. With RF, three million- and five million-atom biological systems scale well up to ∼30k cores, producing ∼30 ns/day. Atomistic simulations of very large systems for time scales approaching the microsecond would, therefore, appear now to be within reach.
Modeling of fatigue crack induced nonlinear ultrasonics using a highly parallelized explicit local interaction simulation approach

NASA Astrophysics Data System (ADS)

Shen, Yanfeng; Cesnik, Carlos E. S.

2016-04-01

This paper presents a parallelized modeling technique for the efficient simulation of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction Simulation Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly parallelized supercomputing on powerful graphic cards. Both the explicit contact formulation and the parallel feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.
StrAuto: automation and parallelization of STRUCTURE analysis.

PubMed

Chhatre, Vikram E; Emerson, Kevin J

2017-03-24

Population structure inference using the software STRUCTURE has become an integral part of population genetic studies covering a broad spectrum of taxa including humans. The ever-expanding size of genetic data sets poses computational challenges for this analysis. Although at least one tool currently implements parallel computing to reduce computational overload of this analysis, it does not fully automate the use of replicate STRUCTURE analysis runs required for downstream inference of optimal K. There is pressing need for a tool that can deploy population structure analysis on high performance computing clusters. We present an updated version of the popular Python program StrAuto, to streamline population structure analysis using parallel computing. StrAuto implements a pipeline that combines STRUCTURE analysis with the Evanno Δ K analysis and visualization of results using STRUCTURE HARVESTER. Using benchmarking tests, we demonstrate that StrAuto significantly reduces the computational time needed to perform iterative STRUCTURE analysis by distributing runs over two or more processors. StrAuto is the first tool to integrate STRUCTURE analysis with post-processing using a pipeline approach in addition to implementing parallel computation - a set up ideal for deployment on computing clusters. StrAuto is distributed under the GNU GPL (General Public License) and available to download from http://strauto.popgen.org .
Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aaby, Brandon G; Perumalla, Kalyan S; Seal, Sudip K

2010-01-01

An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Messagemore » Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.« less

Nucleotide-dependent switch in proteasome assembly mediated by the Nas6 chaperone

PubMed Central

Li, Frances; Tian, Geng; Langager, Deanna; Sokolova, Vladyslava; Finley, Daniel; Park, Soyeon

2017-01-01

The proteasome is assembled via the nine-subunit lid, nine-subunit base, and 28-subunit core particle (CP). Previous work has shown that the chaperones Rpn14, Nas6, Hsm3, and Nas2 each bind a specific ATPase subunit of the base and antagonize base–CP interaction. Here, we show that the Nas6 chaperone also obstructs base–lid association. Nas6 alternates between these two inhibitory modes according to the nucleotide state of the base. When ATP cannot be hydrolyzed, Nas6 interferes with base–lid, but not base–CP, association. In contrast, under conditions of ATP hydrolysis, Nas6 obstructs base–CP, but not base–lid, association. Modeling of Nas6 into cryoelectron microscopy structures of the proteasome suggests that Nas6 controls both base–lid affinity and base–CP affinity through steric hindrance; Nas6 clashes with the lid in the ATP-hydrolysis–blocked proteasome, but clashes instead with the CP in the ATP-hydrolysis–competent proteasome. Thus, Nas6 provides a dual mechanism to control assembly at both major interfaces of the proteasome. PMID:28137839
Benchmarking GPU and CPU codes for Heisenberg spin glass over-relaxation

NASA Astrophysics Data System (ADS)

Bernaschi, M.; Parisi, G.; Parisi, L.

2011-06-01

We present a set of possible implementations for Graphics Processing Units (GPU) of the Over-relaxation technique applied to the 3D Heisenberg spin glass model. The results show that a carefully tuned code can achieve more than 100 GFlops/s of sustained performance and update a single spin in about 0.6 nanoseconds. A multi-hit technique that exploits the GPU shared memory further reduces this time. Such results are compared with those obtained by means of a highly-tuned vector-parallel code on latest generation multi-core CPUs.
Performance of a carbon nanotube field emission electron gun

NASA Astrophysics Data System (ADS)

Getty, Stephanie A.; King, Todd T.; Bis, Rachael A.; Jones, Hollis H.; Herrero, Federico; Lynch, Bernard A.; Roman, Patrick; Mahaffy, Paul

2007-04-01

A cold cathode field emission electron gun (e-gun) based on a patterned carbon nanotube (CNT) film has been fabricated for use in a miniaturized reflectron time-of-flight mass spectrometer (RTOF MS), with future applications in other charged particle spectrometers, and performance of the CNT e-gun has been evaluated. A thermionic electron gun has also been fabricated and evaluated in parallel and its performance is used as a benchmark in the evaluation of our CNT e-gun. Implications for future improvements and integration into the RTOF MS are discussed.
The Automated Instrumentation and Monitoring System (AIMS): Design and Architecture. 3.2

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Schmidt, Melisa; Schulbach, Cathy; Bailey, David (Technical Monitor)

1997-01-01

Whether a researcher is designing the 'next parallel programming paradigm', another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of such information can help computer and software architects to capture, and therefore, exploit behavioral variations among/within various parallel programs to take advantage of specific hardware characteristics. A software tool-set that facilitates performance evaluation of parallel applications on multiprocessors has been put together at NASA Ames Research Center under the sponsorship of NASA's High Performance Computing and Communications Program over the past five years. The Automated Instrumentation and Monitoring Systematic has three major software components: a source code instrumentor which automatically inserts active event recorders into program source code before compilation; a run-time performance monitoring library which collects performance data; and a visualization tool-set which reconstructs program execution based on the data collected. Besides being used as a prototype for developing new techniques for instrumenting, monitoring and presenting parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Currently, the execution of FORTRAN and C programs on the Intel Paragon and PALM workstations can be automatically instrumented and monitored. Performance data thus collected can be displayed graphically on various workstations. The process of performance tuning with AIMS will be illustrated using various NAB Parallel Benchmarks. This report includes a description of the internal architecture of AIMS and a listing of the source code.
Transcriptional and translational adaptation to aerobic nitrate anabolism in the denitrifier Paracoccus denitrificans.

PubMed

Luque-Almagro, Victor M; Manso, Isabel; Sullivan, Matthew J; Rowley, Gary; Ferguson, Stuart J; Moreno-Vivián, Conrado; Richardson, David J; Gates, Andrew J; Roldán, M Dolores

2017-05-10

Transcriptional adaptation to nitrate-dependent anabolism by Paracoccus denitrificans PD1222 was studied. A total of 74 genes were induced in cells grown with nitrate as N-source compared with ammonium, including nasTSABGHC and ntrBC genes. The nasT and nasS genes were cotranscribed, although nasT was more strongly induced by nitrate than nasS The nasABGHC genes constituted a transcriptional unit, which is preceded by a non-coding region containing hairpin structures involved in transcription termination. The nasTS and nasABGHC transcripts were detected at similar levels with nitrate or glutamate as N-source, but nasABGHC transcript was undetectable in ammonium-grown cells. The nitrite reductase NasG subunit was detected by two-dimensional polyacrylamide gel electrophoresis in cytoplasmic fractions from nitrate-grown cells, but it was not observed when either ammonium or glutamate was used as the N-source. The nasT mutant lacked both nasABGHC transcript and nicotinamide adenine dinucleotide (NADH)-dependent nitrate reductase activity. On the contrary, the nasS mutant showed similar levels of the nasABGHC transcript to the wild-type strain and displayed NasG protein and NADH-nitrate reductase activity with all N-sources tested, except with ammonium. Ammonium repression of nasABGHC was dependent on the Ntr system. The ntrBC and ntrYX genes were expressed at low levels regardless of the nitrogen source supporting growth. Mutational analysis of the ntrBCYX genes indicated that while ntrBC genes are required for nitrate assimilation, ntrYX genes can only partially restore growth on nitrate in the absence of ntrBC genes. The existence of a regulation mechanism for nitrate assimilation in P. denitrificans , by which nitrate induction operates at both transcriptional and translational levels, is proposed. © 2017 The Author(s).
Accelerating atomistic calculations of quantum energy eigenstates on graphic cards

NASA Astrophysics Data System (ADS)

Rodrigues, Walter; Pecchia, A.; Lopez, M.; Auf der Maur, M.; Di Carlo, A.

2014-10-01

Electronic properties of nanoscale materials require the calculation of eigenvalues and eigenvectors of large matrices. This bottleneck can be overcome by parallel computing techniques or the introduction of faster algorithms. In this paper we report a custom implementation of the Lanczos algorithm with simple restart, optimized for graphical processing units (GPUs). The whole algorithm has been developed using CUDA and runs entirely on the GPU, with a specialized implementation that spares memory and reduces at most machine-to-device data transfers. Furthermore parallel distribution over several GPUs has been attained using the standard message passing interface (MPI). Benchmark calculations performed on a GaN/AlGaN wurtzite quantum dot with up to 600,000 atoms are presented. The empirical tight-binding (ETB) model with an sp3d5s∗+spin-orbit parametrization has been used to build the system Hamiltonian (H).
Injector Design Tool Improvements: User's manual for FDNS V.4.5

NASA Technical Reports Server (NTRS)

Chen, Yen-Sen; Shang, Huan-Min; Wei, Hong; Liu, Jiwen

1998-01-01

The major emphasis of the current effort is in the development and validation of an efficient parallel machine computational model, based on the FDNS code, to analyze the fluid dynamics of a wide variety of liquid jet configurations for general liquid rocket engine injection system applications. This model includes physical models for droplet atomization, breakup/coalescence, evaporation, turbulence mixing and gas-phase combustion. Benchmark validation cases for liquid rocket engine chamber combustion conditions will be performed for model validation purpose. Test cases may include shear coaxial, swirl coaxial and impinging injection systems with combinations LOXIH2 or LOXISP-1 propellant injector elements used in rocket engine designs. As a final goal of this project, a well tested parallel CFD performance methodology together with a user's operation description in a final technical report will be reported at the end of the proposed research effort.
Amplitude analysis of four-body decays using a massively-parallel fitting framework

NASA Astrophysics Data System (ADS)

Hasse, C.; Albrecht, J.; Alves, A. A., Jr.; d'Argent, P.; Evans, T. D.; Rademacker, J.; Sokoloff, M. D.

2017-10-01

The GooFit Framework is designed to perform maximum-likelihood fits for arbitrary functions on various parallel back ends, for example a GPU. We present an extension to GooFit which adds the functionality to perform time-dependent amplitude analyses of pseudoscalar mesons decaying into four pseudoscalar final states. Benchmarks of this functionality show a significant performance increase when utilizing a GPU compared to a CPU. Furthermore, this extension is employed to study the sensitivity on the {{{D}}}0-{\\bar{{{D}}}}0 mixing parameters x and y in a time-dependent amplitude analysis of the decay D0 → K+π-π+π-. Studying a sample of 50 000 events and setting the central values to the world average of x = (0.49 ± 0.15)% and y = (0.61 ± 0.08)%, the statistical sensitivities of x and y are determined to be σ(x) = 0.019 % and σ(y) = 0.019 %.
NAS: The first year

NASA Technical Reports Server (NTRS)

Bailey, F. R.; Kutler, Paul

1988-01-01

Discussed are the capabilities of NASA's Numerical Aerodynamic Simulation (NAS) Program and its application as an advanced supercomputing system for computational fluid dynamics (CFD) research. First, the paper describes the NAS computational system, called the NAS Processing System Network, and the advanced computational capabilities it offers as a consequence of carrying out the NAS pathfinder objective. Second, it presents examples of pioneering CFD research accomplished during NAS's first operational year. Examples are included which illustrate CFD applications for predicting fluid phenomena, complementing and supplementing experimentation, and aiding in design. Finally, pacing elements and future directions for CFD and NAS are discussed.
Modern multicore and manycore architectures: Modelling, optimisation and benchmarking a multiblock CFD code

NASA Astrophysics Data System (ADS)

Hadade, Ioan; di Mare, Luca

2016-08-01

Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range of architectural features such as SIMD for data parallel execution or threads for core parallelism. The exploitation of multi-level parallelism is therefore crucial for achieving superior performance on current and future processors. This paper presents the performance tuning of a multiblock CFD solver on Intel SandyBridge and Haswell multicore CPUs and the Intel Xeon Phi Knights Corner coprocessor. Code optimisations have been applied on two computational kernels exhibiting different computational patterns: the update of flow variables and the evaluation of the Roe numerical fluxes. We discuss at great length the code transformations required for achieving efficient SIMD computations for both kernels across the selected devices including SIMD shuffles and transpositions for flux stencil computations and global memory transformations. Core parallelism is expressed through threading based on a number of domain decomposition techniques together with optimisations pertaining to alleviating NUMA effects found in multi-socket compute nodes. Results are correlated with the Roofline performance model in order to assert their efficiency for each distinct architecture. We report significant speedups for single thread execution across both kernels: 2-5X on the multicore CPUs and 14-23X on the Xeon Phi coprocessor. Computations at full node and chip concurrency deliver a factor of three speedup on the multicore processors and up to 24X on the Xeon Phi manycore coprocessor.
Multiprocessor smalltalk: Implementation, performance, and analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pallas, J.I.

1990-01-01

Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possiblemore » to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.« less
A Parallel Processing Algorithm for Remote Sensing Classification

NASA Technical Reports Server (NTRS)

Gualtieri, J. Anthony

2005-01-01

A current thread in parallel computation is the use of cluster computers created by networking a few to thousands of commodity general-purpose workstation-level commuters using the Linux operating system. For example on the Medusa cluster at NASA/GSFC, this provides for super computing performance, 130 G(sub flops) (Linpack Benchmark) at moderate cost, $370K. However, to be useful for scientific computing in the area of Earth science, issues of ease of programming, access to existing scientific libraries, and portability of existing code need to be considered. In this paper, I address these issues in the context of tools for rendering earth science remote sensing data into useful products. In particular, I focus on a problem that can be decomposed into a set of independent tasks, which on a serial computer would be performed sequentially, but with a cluster computer can be performed in parallel, giving an obvious speedup. To make the ideas concrete, I consider the problem of classifying hyperspectral imagery where some ground truth is available to train the classifier. In particular I will use the Support Vector Machine (SVM) approach as applied to hyperspectral imagery. The approach will be to introduce notions about parallel computation and then to restrict the development to the SVM problem. Pseudocode (an outline of the computation) will be described and then details specific to the implementation will be given. Then timing results will be reported to show what speedups are possible using parallel computation. The paper will close with a discussion of the results.
Optimization of Deep Drilling Performance--Development and Benchmark Testing of Advanced Diamond Product Drill Bits & HP/HT Fluids to Significantly Improve Rates of Penetration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alan Black; Arnis Judzis

2003-10-01

This document details the progress to date on the OPTIMIZATION OF DEEP DRILLING PERFORMANCE--DEVELOPMENT AND BENCHMARK TESTING OF ADVANCED DIAMOND PRODUCT DRILL BITS AND HP/HT FLUIDS TO SIGNIFICANTLY IMPROVE RATES OF PENETRATION contract for the year starting October 2002 through September 2002. The industry cost shared program aims to benchmark drilling rates of penetration in selected simulated deep formations and to significantly improve ROP through a team development of aggressive diamond product drill bit--fluid system technologies. Overall the objectives are as follows: Phase 1--Benchmark ''best in class'' diamond and other product drilling bits and fluids and develop concepts for amore » next level of deep drilling performance; Phase 2--Develop advanced smart bit--fluid prototypes and test at large scale; and Phase 3--Field trial smart bit--fluid concepts, modify as necessary and commercialize products. Accomplishments to date include the following: 4Q 2002--Project started; Industry Team was assembled; Kick-off meeting was held at DOE Morgantown; 1Q 2003--Engineering meeting was held at Hughes Christensen, The Woodlands Texas to prepare preliminary plans for development and testing and review equipment needs; Operators started sending information regarding their needs for deep drilling challenges and priorities for large-scale testing experimental matrix; Aramco joined the Industry Team as DEA 148 objectives paralleled the DOE project; 2Q 2003--Engineering and planning for high pressure drilling at TerraTek commenced; 3Q 2003--Continuation of engineering and design work for high pressure drilling at TerraTek; Baker Hughes INTEQ drilling Fluids and Hughes Christensen commence planning for Phase 1 testing--recommendations for bits and fluids.« less
Plasma Brain-Derived Neurotrophic Factor Levels in Newborn Infants with Neonatal Abstinence Syndrome.

PubMed

Subedi, Lochan; Huang, Hong; Pant, Amrita; Westgate, Philip M; Bada, Henrietta S; Bauer, John A; Giannone, Peter J; Sithisarn, Thitinart

2017-01-01

Brain-derived neurotrophic factor (BDNF) is a type of growth factor that promotes growth and survival of neurons. Fetal exposure to opiates can lead to postnatal withdrawal syndrome, which is referred as neonatal abstinence syndrome (NAS). Preclinical and clinical studies have shown an association between opiates exposure and alteration in BDNF expression in the brain and serum levels in adult. However, to date, there are no data available on the effects of opiate exposure on BDNF levels in infant who are exposed to opiates in utero and whether BDNF level may correlate with the severity of NAS. To compare plasma BDNF levels among NAS and non-NAS infants and to determine the correlation of BDNF levels and the severity of NAS. This is a prospective cohort study with no intervention involved. Infants ≥35 weeks of gestation were enrolled. BDNF level was measured using enzyme-linked immunosorbent assay technique from blood samples drawn within 48 h of life. The severity of NAS was determined by the length of hospital stay, number of medications required to treat NAS. 67 infants were enrolled, 34 NAS and 33 non-NAS. Mean gestational age did not differ between the two groups. Mean birth weight of NAS infants was significantly lower than the non-NAS infants (3,070 ± 523 vs. 3,340 ± 459 g, p = 0.028). Mean BDNF level in NAS group was 252.2 ± 91.6 ng/ml, significantly higher than 211.3 ± 66.3 ng/ml in the non-NAS group ( p = 0.04). There were no differences in BDNF levels between NAS infants that required one medication vs. more than one medication (254 ± 91 vs. 218 ± 106 ng/ml, p = 0.47). There was no correlation between the BDNF levels and length of hospital stay ( p = 0.68) among NAS infants. Overall, there were no significant correlations between BDNF levels and NAS scores except at around 15 h after admission (correlation 0.35, p = 0.045). Plasma BDNF level was significantly increased in NAS infants during the first 48 h when compared to non-NAS infants. The correlations between plasma BDNF levels and the severity of NAS warrant further study. These results suggest that BDNF may play a neuromodulatory role during withdrawal after in utero opiate exposure.
Benchmark of the local drift-kinetic models for neoclassical transport simulation in helical plasmas

NASA Astrophysics Data System (ADS)

Huang, B.; Satake, S.; Kanno, R.; Sugama, H.; Matsuoka, S.

2017-02-01

The benchmarks of the neoclassical transport codes based on the several local drift-kinetic models are reported here. Here, the drift-kinetic models are zero orbit width (ZOW), zero magnetic drift, DKES-like, and global, as classified in Matsuoka et al. [Phys. Plasmas 22, 072511 (2015)]. The magnetic geometries of Helically Symmetric Experiment, Large Helical Device (LHD), and Wendelstein 7-X are employed in the benchmarks. It is found that the assumption of E ×B incompressibility causes discrepancy of neoclassical radial flux and parallel flow among the models when E ×B is sufficiently large compared to the magnetic drift velocities. For example, Mp≤0.4 where Mp is the poloidal Mach number. On the other hand, when E ×B and the magnetic drift velocities are comparable, the tangential magnetic drift, which is included in both the global and ZOW models, fills the role of suppressing unphysical peaking of neoclassical radial-fluxes found in the other local models at Er≃0 . In low collisionality plasmas, in particular, the tangential drift effect works well to suppress such unphysical behavior of the radial transport caused in the simulations. It is demonstrated that the ZOW model has the advantage of mitigating the unphysical behavior in the several magnetic geometries, and that it also implements the evaluation of bootstrap current in LHD with the low computation cost compared to the global model.
Performance Comparison of Big Data Analytics With NEXUS and Giovanni

NASA Astrophysics Data System (ADS)

Jacob, J. C.; Huang, T.; Lynnes, C.

2016-12-01

NEXUS is an emerging data-intensive analysis framework developed with a new approach for handling science data that enables large-scale data analysis. It is available through open source. We compare performance of NEXUS and Giovanni for 3 statistics algorithms applied to NASA datasets. Giovanni is a statistics web service at NASA Distributed Active Archive Centers (DAACs). NEXUS is a cloud-computing environment developed at JPL and built on Apache Solr, Cassandra, and Spark. We compute global time-averaged map, correlation map, and area-averaged time series. The first two algorithms average over time to produce a value for each pixel in a 2-D map. The third algorithm averages spatially to produce a single value for each time step. This talk is our report on benchmark comparison findings that indicate 15x speedup with NEXUS over Giovanni to compute area-averaged time series of daily precipitation rate for the Tropical Rainfall Measuring Mission (TRMM with 0.25 degree spatial resolution) for the Continental United States over 14 years (2000-2014) with 64-way parallelism and 545 tiles per granule. 16-way parallelism with 16 tiles per granule worked best with NEXUS for computing an 18-year (1998-2015) TRMM daily precipitation global time averaged map (2.5 times speedup) and 18-year global map of correlation between TRMM daily precipitation and TRMM real time daily precipitation (7x speedup). These and other benchmark results will be presented along with key lessons learned in applying the NEXUS tiling approach to big data analytics in the cloud.
Benchmark coupled-cluster g-tensor calculations with full inclusion of the two-particle spin-orbit contributions.

PubMed

Perera, Ajith; Gauss, Jürgen; Verma, Prakash; Morales, Jorge A

2017-04-28

We present a parallel implementation to compute electron spin resonance g-tensors at the coupled-cluster singles and doubles (CCSD) level which employs the ACES III domain-specific software tools for scalable parallel programming, i.e., the super instruction architecture language and processor (SIAL and SIP), respectively. A unique feature of the present implementation is the exact (not approximated) inclusion of the five one- and two-particle contributions to the g-tensor [i.e., the mass correction, one- and two-particle paramagnetic spin-orbit, and one- and two-particle diamagnetic spin-orbit terms]. Like in a previous implementation with effective one-electron operators [J. Gauss et al., J. Phys. Chem. A 113, 11541-11549 (2009)], our implementation utilizes analytic CC second derivatives and, therefore, classifies as a true CC linear-response treatment. Therefore, our implementation can unambiguously appraise the accuracy of less costly effective one-particle schemes and provide a rationale for their widespread use. We have considered a large selection of radicals used previously for benchmarking purposes including those studied in earlier work and conclude that at the CCSD level, the effective one-particle scheme satisfactorily captures the two-particle effects less costly than the rigorous two-particle scheme. With respect to the performance of density functional theory (DFT), we note that results obtained with the B3LYP functional exhibit the best agreement with our CCSD results. However, in general, the CCSD results agree better with the experimental data than the best DFT/B3LYP results, although in most cases within the rather large experimental error bars.
The Novaco Anger Scale-Provocation Inventory (1994 version) in Dutch forensic psychiatric patients.

PubMed

Hornsveld, Ruud H J; Muris, Peter; Kraaimaat, Floris W

2011-12-01

We examined the psychometric properties of the Novaco Anger Scale-Provocation Inventory (NAS-PI, 1994 version) in Dutch violent forensic psychiatric patients and secondary vocational students. A confirmatory factor analysis of the subscale structure of the NAS was carried out, reliability was investigated, and relations were calculated between NAS-PI scores and other measures of personality traits and problem behaviors. The 3-subscale structure of the original NAS could not be confirmed. However, the internal consistency of the NAS and the PI was excellent, and the test-retest reliability of the NAS was good. The validity of the NAS and the PI was supported by a meaningful pattern of correlations with alternative measures of anger and personality traits. Forensic psychiatric outpatients displayed higher NAS scores than secondary vocational students, but inpatients scored even lower than this nonclinical control group. Our preliminary conclusion is that the NAS-PI is a valuable instrument for the assessment of anger in Dutch violent forensic psychiatric patients.
SCORPIO: A Scalable Two-Phase Parallel I/O Library With Application To A Large Scale Subsurface Simulator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sreepathi, Sarat; Sripathi, Vamsi; Mills, Richard T

2013-01-01

Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting themore » I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library, SCORPIO (SCalable block-ORiented Parallel I/O) that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.« less
Neonatal abstinence syndrome: Historical perspective, current focus, future directions.

PubMed

Jones, Hendrée E; Fielder, Andrea

2015-11-01

Neonatal abstinence syndrome (NAS) occurs following prenatal opioid exposure. It is characterized by signs and symptoms indicating central nervous system hyperirritability and autonomic nervous system, gastrointestinal tract, and respiratory system dysfunction. This article: (1) briefly reviews NAS history, including initial identification, assessment, and treatment efforts; (2) summarizes the current status of and current issues surrounding recent NAS assessment and treatment, and (3) details future directions in NAS conceptualization, measurement, and treatment. Mortality rate estimates in neonates treated for NAS exceeded 33%, and surpassed 90% for un-treated infants during the late-1800s until the mid-1900s. The focus of both assessment and treatment over the past 50years is predominantly due to two forces. First, methadone pharmacotherapy for "heroin addiction" led to women in methadone maintenance programs who were, or became pregnant. The second was defining NAS and developing a measure of neonatal withdrawal, the Neonatal Abstinence Scoring System (NASS). Various NAS treatment protocols were based on the NASS as well as other NAS measures. Future research must focus on psychometrically sound screening and assessment measures of neonatal opioid withdrawal for premature, term and older infants, measuring and treating possible withdrawal from non-opioids, particularly benzodiazepines, integrated non-pharmacological treatment of NAS, weight-based versus symptom-based treatment of NAS, and second-line treatment for NAS. Copyright © 2015 Elsevier Inc. All rights reserved.

Scaling Semantic Graph Databases in Size and Performance

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morari, Alessandro; Castellana, Vito G.; Villa, Oreste

In this paper we present SGEM, a full software system for accelerating large-scale semantic graph databases on commodity clusters. Unlike current approaches, SGEM addresses semantic graph databases by only employing graph methods at all the levels of the stack. On one hand, this allows exploiting the space efficiency of graph data structures and the inherent parallelism of graph algorithms. These features adapt well to the increasing system memory and core counts of modern commodity clusters. On the other hand, however, these systems are optimized for regular computation and batched data transfers, while graph methods usually are irregular and generate fine-grainedmore » data accesses with poor spatial and temporal locality. Our framework comprises a SPARQL to data parallel C compiler, a library of parallel graph methods and a custom, multithreaded runtime system. We introduce our stack, motivate its advantages with respect to other solutions and show how we solved the challenges posed by irregular behaviors. We present the result of our software stack on the Berlin SPARQL benchmarks with datasets up to 10 billion triples (a triple corresponds to a graph edge), demonstrating scaling in dataset size and in performance as more nodes are added to the cluster.« less
Parallel algorithms for quantum chemistry. I. Integral transformations on a hypercube multiprocessor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whiteside, R.A.; Binkley, J.S.; Colvin, M.E.

1987-02-15

For many years it has been recognized that fundamental physical constraints such as the speed of light will limit the ultimate speed of single processor computers to less than about three billion floating point operations per second (3 GFLOPS). This limitation is becoming increasingly restrictive as commercially available machines are now within an order of magnitude of this asymptotic limit. A natural way to avoid this limit is to harness together many processors to work on a single computational problem. In principle, these parallel processing computers have speeds limited only by the number of processors one chooses to acquire. Themore » usefulness of potentially unlimited processing speed to a computationally intensive field such as quantum chemistry is obvious. If these methods are to be applied to significantly larger chemical systems, parallel schemes will have to be employed. For this reason we have developed distributed-memory algorithms for a number of standard quantum chemical methods. We are currently implementing these on a 32 processor Intel hypercube. In this paper we present our algorithm and benchmark results for one of the bottleneck steps in quantum chemical calculations: the four index integral transformation.« less
Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

NASA Astrophysics Data System (ADS)

Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

2015-09-01

The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Quantitative Image Feature Engine (QIFE): an Open-Source, Modular Engine for 3D Quantitative Feature Extraction from Volumetric Medical Images.

PubMed

Echegaray, Sebastian; Bakr, Shaimaa; Rubin, Daniel L; Napel, Sandy

2017-10-06

The aim of this study was to develop an open-source, modular, locally run or server-based system for 3D radiomics feature computation that can be used on any computer system and included in existing workflows for understanding associations and building predictive models between image features and clinical data, such as survival. The QIFE exploits various levels of parallelization for use on multiprocessor systems. It consists of a managing framework and four stages: input, pre-processing, feature computation, and output. Each stage contains one or more swappable components, allowing run-time customization. We benchmarked the engine using various levels of parallelization on a cohort of CT scans presenting 108 lung tumors. Two versions of the QIFE have been released: (1) the open-source MATLAB code posted to Github, (2) a compiled version loaded in a Docker container, posted to DockerHub, which can be easily deployed on any computer. The QIFE processed 108 objects (tumors) in 2:12 (h/mm) using 1 core, and 1:04 (h/mm) hours using four cores with object-level parallelization. We developed the Quantitative Image Feature Engine (QIFE), an open-source feature-extraction framework that focuses on modularity, standards, parallelism, provenance, and integration. Researchers can easily integrate it with their existing segmentation and imaging workflows by creating input and output components that implement their existing interfaces. Computational efficiency can be improved by parallelizing execution at the cost of memory usage. Different parallelization levels provide different trade-offs, and the optimal setting will depend on the size and composition of the dataset to be processed.
Neonatal abstinence syndrome: Neurobehavior at 6 weeks of age in infants with or without pharmacological treatment for withdrawal.

PubMed

Heller, Nicole A; Logan, Beth A; Morrison, Deborah G; Paul, Jonathan A; Brown, Mark S; Hayes, Marie J

2017-07-01

Use and abuse of prescription opioids and concomitant increase in Neonatal Abstinence Syndrome (NAS), a condition that may lead to protracted pharmacological treatment in more than 60% of infants, has tripled since 2000. This study assessed neurobehavioral development using the NICU Network Neurobehavioral Scale in 6-week old infants with prenatal methadone exposure who did (NAS+; n = 23) or did not (NAS-; n = 16) require pharmacological treatment for NAS severity determined by Finnegan Scale. An unexposed, demographically similar group of infants matched for age served as comparison (COMP; n = 21). NAS+, but not NAS- group, had significantly lower scores on the regulation (p < .01) and quality of movement (p < .01) summary scales than the COMP group. The NAS+ and NAS- groups had higher scores on the stress-abstinence scale than the COMP group (p < .05). NAS diagnosis (NAS +) was associated with poorer regulation and quality of movement at 6 weeks of age compared to infants without prenatal methadone exposure from the same demographic. © 2017 Wiley Periodicals, Inc.
Distribution of non-aureus staphylococci species in udder quarters with low and high somatic cell count, and clinical mastitis.

PubMed

Condas, Larissa A Z; De Buck, Jeroen; Nobrega, Diego B; Carson, Domonique A; Roy, Jean-Philippe; Keefe, Greg P; DeVries, Trevor J; Middleton, John R; Dufour, Simon; Barkema, Herman W

2017-07-01

The effect of non-aureus staphylococci (NAS) in bovine mammary health is controversial. Overall, NAS intramammary infections (IMI) increase somatic cell count (SCC), with an effect categorized as mild, mostly causing subclinical or mild to moderate clinical mastitis. However, based on recent studies, specific NAS may affect the udder more severely. Some of these apparent discrepancies could be attributed to the large number of species that compose the NAS group. The objectives of this study were to determine (1) the SCC of quarters infected by individual NAS species compared with NAS as a group, culture-negative, and major pathogen-infected quarters; (2) the distribution of NAS species isolated from quarters with low SCC (<200,000 cells/mL) and high SCC (≥200,000 cells/mL), and clinical mastitis; and (3) the prevalence of NAS species across quarters with low and high SCC. A total of 5,507 NAS isolates, 3,561 from low SCC quarters, 1,873 from high SCC quarters, and 73 from clinical mastitis cases, were obtained from the National Cohort of Dairy Farms of the Canadian Bovine Mastitis Research Network. Of quarters with low SCC, high SCC, or clinical mastitis, 7.6, 18.5, and 4.3% were NAS positive, respectively. The effect of NAS IMI on SCC was estimated using mixed-effect linear regression; prevalence of NAS IMI was estimated using Bayesian analyses. Mean SCC of NAS-positive quarters was 70,000 cells/mL, which was higher than culture-negative quarters (32,000 cells/mL) and lower than major pathogen-positive quarters (129,000 to 183,000 cells/mL). Compared with other NAS species, SCC was highest in quarters positive for Staphylococcus capitis, Staphylococcus gallinarum, Staphylococcus hyicus, Staphylococcus agnetis, or Staphylococcus simulans. In NAS-positive quarters, Staphylococcus xylosus (12.6%), Staphylococcus cohnii (3.1%), and Staphylococcus equorum (0.6%) were more frequently isolated from quarters with low SCC than other NAS species, whereas Staphylococcus sciuri (14%) was most frequently isolated from clinical mastitis cases. Finally, in NAS-positive quarters, Staphylococcus chromogenes, S. simulans, Staphylococcus epidermidis, and Staphylococcus haemolyticus were isolated with similar frequency from among low SCC and high SCC quarters and clinical mastitis cases. Staphylococcus chromogenes, S. simulans, S. xylosus, S. haemolyticus, S. epidermidis, S. agnetis, Staphylococcus arlettae, S. capitis, S. gallinarum, S. sciuri, and Staphylococcus warneri were more prevalent in high than in low SCC quarters. Because the NAS are a large, heterogeneous group, considering them as a single group rather than at the species, or even subspecies level, has undoubtedly contributed to apparent discrepancies among studies as to their distribution and importance in IMI and mastitis. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Simple techniques for improving deep neural network outcomes on commodity hardware

NASA Astrophysics Data System (ADS)

Colina, Nicholas Christopher A.; Perez, Carlos E.; Paraan, Francis N. C.

2017-08-01

We benchmark improvements in the performance of deep neural networks (DNN) on the MNIST data test upon imple-menting two simple modifications to the algorithm that have little overhead computational cost. First is GPU parallelization on a commodity graphics card, and second is initializing the DNN with random orthogonal weight matrices prior to optimization. Eigenspectra analysis of the weight matrices reveal that the initially orthogonal matrices remain nearly orthogonal after training. The probability distributions from which these orthogonal matrices are drawn are also shown to significantly affect the performance of these deep neural networks.
GAPD: a GPU-accelerated atom-based polychromatic diffraction simulation code.

PubMed

E, J C; Wang, L; Chen, S; Zhang, Y Y; Luo, S N

2018-03-01

GAPD, a graphics-processing-unit (GPU)-accelerated atom-based polychromatic diffraction simulation code for direct, kinematics-based, simulations of X-ray/electron diffraction of large-scale atomic systems with mono-/polychromatic beams and arbitrary plane detector geometries, is presented. This code implements GPU parallel computation via both real- and reciprocal-space decompositions. With GAPD, direct simulations are performed of the reciprocal lattice node of ultralarge systems (∼5 billion atoms) and diffraction patterns of single-crystal and polycrystalline configurations with mono- and polychromatic X-ray beams (including synchrotron undulator sources), and validation, benchmark and application cases are presented.
Vector radiative transfer code SORD: Performance analysis and quick start guide

NASA Astrophysics Data System (ADS)

Korkin, Sergey; Lyapustin, Alexei; Sinyuk, Alexander; Holben, Brent; Kokhanovsky, Alexander

2017-10-01

We present a new open source polarized radiative transfer code SORD written in Fortran 90/95. SORD numerically simulates propagation of monochromatic solar radiation in a plane-parallel atmosphere over a reflecting surface using the method of successive orders of scattering (hence the name). Thermal emission is ignored. We did not improve the method in any way, but report the accuracy and runtime in 52 benchmark scenarios. This paper also serves as a quick start user's guide for the code available from ftp://maiac.gsfc.nasa.gov/pub/skorkin, from the JQSRT website, or from the corresponding (first) author.
Time Dependent Simulation of Turbopump Flows

NASA Technical Reports Server (NTRS)

Kiris, Cetin C.; Kwak, Dochan; Chan, William; Williams, Robert

2001-01-01

The objective of this viewgraph presentation is to enhance incompressible flow simulation capability for developing aerospace vehicle components, especially unsteady flow phenomena associated with high speed turbo pumps. Unsteady Space Shuttle Main Engine (SSME)-rig1 1 1/2 rotations are completed for the 34.3 million grid points model. The moving boundary capability is obtained by using the DCF module. MLP shared memory parallelism has been implemented and benchmarked in INS3D. The scripting capability from CAD geometry to solution is developed. Data compression is applied to reduce data size in post processing and fluid/structure coupling is initiated.
Influence of Natural American Spirit advertising on current and former smokers' perceptions and intentions.

PubMed

Gratale, Stefanie K; Maloney, Erin K; Sangalang, Angeline; Cappella, Joseph N

2017-10-21

This study sought to demonstrate causal effects of exposure to Natural American Spirit (NAS) advertising content on misinformed beliefs of current and former smokers, and to empirically establish these beliefs as a mechanism driving intentions to use NAS. Our study employed a randomised experimental design with 1128 adult daily, intermittent and former smokers. We compared participants who were exposed to NAS advertisements or claims made in the advertisements with those in a no-message control group to test the effects of NAS advertising content on inaccurate beliefs about NAS and attitudes and intentions towards the product. One-way analysis of variance revealed that exposure to NAS advertisements produced inaccurate beliefs about the composition of NAS cigarettes among current and former smokers (p<0.05). Planned contrasts indicated a compilation of arguments taken directly from NAS advertisements resulted in significantly greater beliefs that NAS cigarettes are healthier/safer than other cigarettes (for former smokers, t(472)=3.63, p<0.001; for current smokers, t(644)=2.86, p=0.004), demonstrating that suggestive claims used in the brand's marketing have effects on beliefs not directly addressed in the advertisements. Regression and mediation analyses showed that health-related beliefs predict attitudes towards NAS for current and former smokers, and mediate intentions to use NAS. The findings of this study provide causal support for the need for further regulatory action to address the potentially harmful ramifications of claims used in NAS advertising. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
A quantum physical design flow using ILP and graph drawing

NASA Astrophysics Data System (ADS)

Yazdani, Maryam; Saheb Zamani, Morteza; Sedighi, Mehdi

2013-10-01

Implementing large-scale quantum circuits is one of the challenges of quantum computing. One of the central challenges of accurately modeling the architecture of these circuits is to schedule a quantum application and generate the layout while taking into account the cost of communications and classical resources as well as the maximum exploitable parallelism. In this paper, we present and evaluate a design flow for arbitrary quantum circuits in ion trap technology. Our design flow consists of two parts. First, a scheduler takes a description of a circuit and finds the best order for the execution of its quantum gates using integer linear programming regarding the classical resources (qubits) and instruction dependencies. Then a layout generator receives the schedule produced by the scheduler and generates a layout for this circuit using a graph-drawing algorithm. Our experimental results show that the proposed flow decreases the average latency of quantum circuits by about 11 % for a set of attempted benchmarks and by about 9 % for another set of benchmarks compared with the best in literature.
Implementation and benchmark of a long-range corrected functional in the density functional based tight-binding method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lutsker, V.; Niehaus, T. A., E-mail: thomas.niehaus@physik.uni-regensburg.de; Aradi, B.

2015-11-14

Bridging the gap between first principles methods and empirical schemes, the density functional based tight-binding method (DFTB) has become a versatile tool in predictive atomistic simulations over the past years. One of the major restrictions of this method is the limitation to local or gradient corrected exchange-correlation functionals. This excludes the important class of hybrid or long-range corrected functionals, which are advantageous in thermochemistry, as well as in the computation of vibrational, photoelectron, and optical spectra. The present work provides a detailed account of the implementation of DFTB for a long-range corrected functional in generalized Kohn-Sham theory. We apply themore » method to a set of organic molecules and compare ionization potentials and electron affinities with the original DFTB method and higher level theory. The new scheme cures the significant overpolarization in electric fields found for local DFTB, which parallels the functional dependence in first principles density functional theory (DFT). At the same time, the computational savings with respect to full DFT calculations are not compromised as evidenced by numerical benchmark data.« less
Model development for naphthenic acids ozonation process.

PubMed

Al Jibouri, Ali Kamel H; Wu, Jiangning

2015-02-01

Naphthenic acids (NAs) are toxic constituents of oil sands process-affected water (OSPW) which is generated during the extraction of bitumen from oil sands. NAs consist mainly of carboxylic acids which are generally biorefractory. For the treatment of OSPW, ozonation is a very beneficial method. It can significantly reduce the concentration of NAs and it can also convert NAs from biorefractory to biodegradable. In this study, a factorial design (2(4)) was used for the ozonation of OSPW to study the influences of the operating parameters (ozone concentration, oxygen/ozone flow rate, pH, and mixing) on the removal of a model NAs in a semi-batch reactor. It was found that ozone concentration had the most significant effect on the NAs concentration compared to other parameters. An empirical model was developed to correlate the concentration of NAs with ozone concentration, oxygen/ozone flow rate, and pH. In addition, a theoretical analysis was conducted to gain the insight into the relationship between the removal of NAs and the operating parameters.
Free Energy Reconstruction from Logarithmic Mean-Force Dynamics Using Multiple Nonequilibrium Trajectories.

PubMed

Morishita, Tetsuya; Yonezawa, Yasushige; Ito, Atsushi M

2017-07-11

Efficient and reliable estimation of the mean force (MF), the derivatives of the free energy with respect to a set of collective variables (CVs), has been a challenging problem because free energy differences are often computed by integrating the MF. Among various methods for computing free energy differences, logarithmic mean-force dynamics (LogMFD) [ Morishita et al., Phys. Rev. E 2012 , 85 , 066702 ] invokes the conservation law in classical mechanics to integrate the MF, which allows us to estimate the free energy profile along the CVs on-the-fly. Here, we present a method called parallel dynamics, which improves the estimation of the MF by employing multiple replicas of the system and is straightforwardly incorporated in LogMFD or a related method. In the parallel dynamics, the MF is evaluated by a nonequilibrium path-ensemble using the multiple replicas based on the Crooks-Jarzynski nonequilibrium work relation. Thanks to the Crooks relation, realizing full-equilibrium states is no longer mandatory for estimating the MF. Additionally, sampling in the hidden subspace orthogonal to the CV space is highly improved with appropriate weights for each metastable state (if any), which is hardly achievable by typical free energy computational methods. We illustrate how to implement parallel dynamics by combining it with LogMFD, which we call logarithmic parallel dynamics (LogPD). Biosystems of alanine dipeptide and adenylate kinase in explicit water are employed as benchmark systems to which LogPD is applied to demonstrate the effect of multiple replicas on the accuracy and efficiency in estimating the free energy profiles using parallel dynamics.
PARALLEL HOP: A SCALABLE HALO FINDER FOR MASSIVE COSMOLOGICAL DATA SETS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Skory, Stephen; Turk, Matthew J.; Norman, Michael L.

2010-11-15

Modern N-body cosmological simulations contain billions (10{sup 9}) of dark matter particles. These simulations require hundreds to thousands of gigabytes of memory and employ hundreds to tens of thousands of processing cores on many compute nodes. In order to study the distribution of dark matter in a cosmological simulation, the dark matter halos must be identified using a halo finder, which establishes the halo membership of every particle in the simulation. The resources required for halo finding are similar to the requirements for the simulation itself. In particular, simulations have become too extensive to use commonly employed halo finders, suchmore » that the computational requirements to identify halos must now be spread across multiple nodes and cores. Here, we present a scalable-parallel halo finding method called Parallel HOP for large-scale cosmological simulation data. Based on the halo finder HOP, it utilizes message passing interface and domain decomposition to distribute the halo finding workload across multiple compute nodes, enabling analysis of much larger data sets than is possible with the strictly serial or previous parallel implementations of HOP. We provide a reference implementation of this method as a part of the toolkit {sup yt}, an analysis toolkit for adaptive mesh refinement data that include complementary analysis modules. Additionally, we discuss a suite of benchmarks that demonstrate that this method scales well up to several hundred tasks and data sets in excess of 2000{sup 3} particles. The Parallel HOP method and our implementation can be readily applied to any kind of N-body simulation data and is therefore widely applicable.« less
Constructing Neuronal Network Models in Massively Parallel Environments.

PubMed

Ippen, Tammo; Eppler, Jochen M; Plesser, Hans E; Diesmann, Markus

2017-01-01

Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.
Constructing Neuronal Network Models in Massively Parallel Environments

PubMed Central

Ippen, Tammo; Eppler, Jochen M.; Plesser, Hans E.; Diesmann, Markus

2017-01-01

Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers. PMID:28559808
PHISICS/RELAP5-3D RESULTS FOR EXERCISES II-1 AND II-2 OF THE OECD/NEA MHTGR-350 BENCHMARK

DOE Office of Scientific and Technical Information (OSTI.GOV)

Strydom, Gerhard

2016-03-01

The Idaho National Laboratory (INL) Advanced Reactor Technologies (ART) High-Temperature Gas-Cooled Reactor (HTGR) Methods group currently leads the Modular High-Temperature Gas-Cooled Reactor (MHTGR) 350 benchmark. The benchmark consists of a set of lattice-depletion, steady-state, and transient problems that can be used by HTGR simulation groups to assess the performance of their code suites. The paper summarizes the results obtained for the first two transient exercises defined for Phase II of the benchmark. The Parallel and Highly Innovative Simulation for INL Code System (PHISICS), coupled with the INL system code RELAP5-3D, was used to generate the results for the Depressurized Conductionmore » Cooldown (DCC) (exercise II-1a) and Pressurized Conduction Cooldown (PCC) (exercise II-2) transients. These exercises require the time-dependent simulation of coupled neutronics and thermal-hydraulics phenomena, and utilize the steady-state solution previously obtained for exercise I-3 of Phase I. This paper also includes a comparison of the benchmark results obtained with a traditional system code “ring” model against a more detailed “block” model that include kinetics feedback on an individual block level and thermal feedbacks on a triangular sub-mesh. The higher spatial fidelity that can be obtained by the block model is illustrated with comparisons of the maximum fuel temperatures, especially in the case of natural convection conditions that dominate the DCC and PCC events. Differences up to 125 K (or 10%) were observed between the ring and block model predictions of the DCC transient, mostly due to the block model’s capability of tracking individual block decay powers and more detailed helium flow distributions. In general, the block model only required DCC and PCC calculation times twice as long as the ring models, and it therefore seems that the additional development and calculation time required for the block model could be worth the gain that can be obtained in the spatial resolution« less
VVER-440 and VVER-1000 reactor dosimetry benchmark - BUGLE-96 versus ALPAN VII.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Duo, J. I.

2011-07-01

Document available in abstract form only, full text of document follows: Analytical results of the vodo-vodyanoi energetichesky reactor-(VVER-) 440 and VVER-1000 reactor dosimetry benchmarks developed from engineering mockups at the Nuclear Research Inst. Rez LR-0 reactor are discussed. These benchmarks provide accurate determination of radiation field parameters in the vicinity and over the thickness of the reactor pressure vessel. Measurements are compared to calculated results with two sets of tools: TORT discrete ordinates code and BUGLE-96 cross-section library versus the newly Westinghouse-developed RAPTOR-M3G and ALPAN VII.0. The parallel code RAPTOR-M3G enables detailed neutron distributions in energy and space in reducedmore » computational time. ALPAN VII.0 cross-section library is based on ENDF/B-VII.0 and is designed for reactor dosimetry applications. It uses a unique broad group structure to enhance resolution in thermal-neutron-energy range compared to other analogous libraries. The comparison of fast neutron (E > 0.5 MeV) results shows good agreement (within 10%) between BUGLE-96 and ALPAN VII.O libraries. Furthermore, the results compare well with analogous results of participants of the REDOS program (2005). Finally, the analytical results for fast neutrons agree within 15% with the measurements, for most locations in all three mockups. In general, however, the analytical results underestimate the attenuation through the reactor pressure vessel thickness compared to the measurements. (authors)« less

IPRT polarized radiative transfer model intercomparison project - Three-dimensional test cases (phase B)

NASA Astrophysics Data System (ADS)

Emde, Claudia; Barlakas, Vasileios; Cornet, Céline; Evans, Frank; Wang, Zhen; Labonotte, Laurent C.; Macke, Andreas; Mayer, Bernhard; Wendisch, Manfred

2018-04-01

Initially unpolarized solar radiation becomes polarized by scattering in the Earth's atmosphere. In particular molecular scattering (Rayleigh scattering) polarizes electromagnetic radiation, but also scattering of radiation at aerosols, cloud droplets (Mie scattering) and ice crystals polarizes. Each atmospheric constituent produces a characteristic polarization signal, thus spectro-polarimetric measurements are frequently employed for remote sensing of aerosol and cloud properties. Retrieval algorithms require efficient radiative transfer models. Usually, these apply the plane-parallel approximation (PPA), assuming that the atmosphere consists of horizontally homogeneous layers. This allows to solve the vector radiative transfer equation (VRTE) efficiently. For remote sensing applications, the radiance is considered constant over the instantaneous field-of-view of the instrument and each sensor element is treated independently in plane-parallel approximation, neglecting horizontal radiation transport between adjacent pixels (Independent Pixel Approximation, IPA). In order to estimate the errors due to the IPA approximation, three-dimensional (3D) vector radiative transfer models are required. So far, only a few such models exist. Therefore, the International Polarized Radiative Transfer (IPRT) working group of the International Radiation Commission (IRC) has initiated a model intercomparison project in order to provide benchmark results for polarized radiative transfer. The group has already performed an intercomparison for one-dimensional (1D) multi-layer test cases [phase A, 1]. This paper presents the continuation of the intercomparison project (phase B) for 2D and 3D test cases: a step cloud, a cubic cloud, and a more realistic scenario including a 3D cloud field generated by a Large Eddy Simulation (LES) model and typical background aerosols. The commonly established benchmark results for 3D polarized radiative transfer are available at the IPRT website (http://www.meteo.physik.uni-muenchen.de/ iprt).
Computation of the free energy due to electron density fluctuation of a solute in solution: A QM/MM method with perturbation approach combined with a theory of solutions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Suzuoka, Daiki; Takahashi, Hideaki, E-mail: hideaki@m.tohoku.ac.jp; Morita, Akihiro

2014-04-07

We developed a perturbation approach to compute solvation free energy Δμ within the framework of QM (quantum mechanical)/MM (molecular mechanical) method combined with a theory of energy representation (QM/MM-ER). The energy shift η of the whole system due to the electronic polarization of the solute is evaluated using the second-order perturbation theory (PT2), where the electric field formed by surrounding solvent molecules is treated as the perturbation to the electronic Hamiltonian of the isolated solute. The point of our approach is that the energy shift η, thus obtained, is to be adopted for a novel energy coordinate of the distributionmore » functions which serve as fundamental variables in the free energy functional developed in our previous work. The most time-consuming part in the QM/MM-ER simulation can be, thus, avoided without serious loss of accuracy. For our benchmark set of molecules, it is demonstrated that the PT2 approach coupled with QM/MM-ER gives hydration free energies in excellent agreements with those given by the conventional method utilizing the Kohn-Sham SCF procedure except for a few molecules in the benchmark set. A variant of the approach is also proposed to deal with such difficulties associated with the problematic systems. The present approach is also advantageous to parallel implementations. We examined the parallel efficiency of our PT2 code on multi-core processors and found that the speedup increases almost linearly with respect to the number of cores. Thus, it was demonstrated that QM/MM-ER coupled with PT2 deserves practical applications to systems of interest.« less
UAS-NAS Stakeholder Feedback Report

NASA Technical Reports Server (NTRS)

Randall, Debra; Murphy, Jim; Grindle, Laurie

2016-01-01

The need to fly UAS in the NAS to perform missions of vital importance to national security and defense, emergency management, science, and to enable commercial applications has been continually increasing over the past few years. To address this need, the NASA Aeronautics Research Mission Directorate (ARMD) Integrated Aviation Systems Program (IASP) formulated and funded the Unmanned Aircraft Systems (UAS) Integration in the National Airspace System (NAS) Project (hereafter referred to as UAS-NAS Project) from 2011 to 2016. The UAS-NAS Project identified the following need statement: The UAS community needs routine access to the global airspace for all classes of UAS. The Project identified the following goal: To provide research findings to reduce technical barriers associated with integrating UAS into the NAS utilizing integrated system level tests in a relevant environment. This report provides a summary of the collaborations between the UAS-NAS Project and its primary stakeholders and how the Project applied and incorporated the feedback.
HACC: Extreme Scaling and Performance Across Diverse Architectures

NASA Astrophysics Data System (ADS)

Habib, Salman; Morozov, Vitali; Frontiere, Nicholas; Finkel, Hal; Pope, Adrian; Heitmann, Katrin

2013-11-01

Supercomputing is evolving towards hybrid and accelerator-based architectures with millions of cores. The HACC (Hardware/Hybrid Accelerated Cosmology Code) framework exploits this diverse landscape at the largest scales of problem size, obtaining high scalability and sustained performance. Developed to satisfy the science requirements of cosmological surveys, HACC melds particle and grid methods using a novel algorithmic structure that flexibly maps across architectures, including CPU/GPU, multi/many-core, and Blue Gene systems. We demonstrate the success of HACC on two very different machines, the CPU/GPU system Titan and the BG/Q systems Sequoia and Mira, attaining unprecedented levels of scalable performance. We demonstrate strong and weak scaling on Titan, obtaining up to 99.2% parallel efficiency, evolving 1.1 trillion particles. On Sequoia, we reach 13.94 PFlops (69.2% of peak) and 90% parallel efficiency on 1,572,864 cores, with 3.6 trillion particles, the largest cosmological benchmark yet performed. HACC design concepts are applicable to several other supercomputer applications.
Static analysis techniques for semiautomatic synthesis of message passing software skeletons

DOE PAGES

Sottile, Matthew; Dagit, Jason; Zhang, Deli; ...

2015-06-29

The design of high-performance computing architectures demands performance analysis of large-scale parallel applications to derive various parameters concerning hardware design and software development. The process of performance analysis and benchmarking an application can be done in several ways with varying degrees of fidelity. One of the most cost-effective ways is to do a coarse-grained study of large-scale parallel applications through the use of program skeletons. The concept of a “program skeleton” that we discuss in this article is an abstracted program that is derived from a larger program where source code that is determined to be irrelevant is removed formore » the purposes of the skeleton. In this work, we develop a semiautomatic approach for extracting program skeletons based on compiler program analysis. Finally, we demonstrate correctness of our skeleton extraction process by comparing details from communication traces, as well as show the performance speedup of using skeletons by running simulations in the SST/macro simulator.« less
Parallel Online Temporal Difference Learning for Motor Control.

PubMed

Caarls, Wouter; Schuitema, Erik

2016-07-01

Temporal difference (TD) learning, a key concept in reinforcement learning, is a popular method for solving simulated control problems. However, in real systems, this method is often avoided in favor of policy search methods because of its long learning time. But policy search suffers from its own drawbacks, such as the necessity of informed policy parameterization and initialization. In this paper, we show that TD learning can work effectively in real robotic systems as well, using parallel model learning and planning. Using locally weighted linear regression and trajectory sampled planning with 14 concurrent threads, we can achieve a speedup of almost two orders of magnitude over regular TD control on simulated control benchmarks. For a real-world pendulum swing-up task and a two-link manipulator movement task, we report a speedup of 20× to 60× , with a real-time learning speed of less than half a minute. The results are competitive with state-of-the-art policy search.
Global Patch Matching

NASA Astrophysics Data System (ADS)

Huang, X.; Hu, K.; Ling, X.; Zhang, Y.; Lu, Z.; Zhou, G.

2017-09-01

This paper introduces a novel global patch matching method that focuses on how to remove fronto-parallel bias and obtain continuous smooth surfaces with assuming that the scenes covered by stereos are piecewise continuous. Firstly, simple linear iterative cluster method (SLIC) is used to segment the base image into a series of patches. Then, a global energy function, which consists of a data term and a smoothness term, is built on the patches. The data term is the second-order Taylor expansion of correlation coefficients, and the smoothness term is built by combing connectivity constraints and the coplanarity constraints are combined to construct the smoothness term. Finally, the global energy function can be built by combining the data term and the smoothness term. We rewrite the global energy function in a quadratic matrix function, and use least square methods to obtain the optimal solution. Experiments on Adirondack stereo and Motorcycle stereo of Middlebury benchmark show that the proposed method can remove fronto-parallel bias effectively, and produce continuous smooth surfaces.
Million city traveling salesman problem solution by divide and conquer clustering with adaptive resonance neural networks.

PubMed

Mulder, Samuel A; Wunsch, Donald C

2003-01-01

The Traveling Salesman Problem (TSP) is a very hard optimization problem in the field of operations research. It has been shown to be NP-complete, and is an often-used benchmark for new optimization techniques. One of the main challenges with this problem is that standard, non-AI heuristic approaches such as the Lin-Kernighan algorithm (LK) and the chained LK variant are currently very effective and in wide use for the common fully connected, Euclidean variant that is considered here. This paper presents an algorithm that uses adaptive resonance theory (ART) in combination with a variation of the Lin-Kernighan local optimization algorithm to solve very large instances of the TSP. The primary advantage of this algorithm over traditional LK and chained-LK approaches is the increased scalability and parallelism allowed by the divide-and-conquer clustering paradigm. Tours obtained by the algorithm are lower quality, but scaling is much better and there is a high potential for increasing performance using parallel hardware.
Nucleic acids digestion by enzymes in the stomach of snakehead (Channa argus) and banded grouper (Epinephelus awoara).

PubMed

Liu, Yu; Zhang, Yanfang; Jiang, Wei; Wang, Jing; Pan, Xiaoming; Wu, Wei; Cao, Minjie; Dong, Ping; Liang, Xingguo

2017-02-01

Dietary nucleic acids (NAs) were important nutrients. However, the digestion of NAs in stomach has not been studied. In this study, the digestion of NAs by enzymes from fish stomach was investigated. The snakehead pepsins (SP) which were the main enzymes in stomach were extracted and purified. The purity of SP was evaluated by SDS-PAGE and HPLC. The snakehead pepsin 2 (SP2) which was the main component in the extracts was used for investigating the protein and NAs digestion activity. SP2 could digest NAs, including λ DNA and salmon sperm DNA. Interestingly, the digestion could be inhibited by treatment of alkaline solution at pH 8.0 and pepstatin A, and the digestion could happen either in the presence or absence of hemoglobin (Hb) and BSA as the protein substrates. Similarly, the stomach enzymes of banded grouper also showed the NAs digestion activity. NAs could be digested by the stomach enzymes of snakehead and banded grouper. It may be helpful for understanding both animal nutrition and NAs metabolic pathway.
Comparisons of self-ratings on managerial competencies, research capability, time management, executive power, workload and work stress among nurse administrators.

PubMed

Kang, Chun-Mei; Chiu, Hsiao-Ting; Hu, Yi-Chun; Chen, Hsiao-Lien; Lee, Pi-Hsia; Chang, Wen-Yin

2012-10-01

To assess the level of and the differences in managerial competencies, research capability, time management, executive power, workload and work-stress ratings among nurse administrators (NAs), and to determine the best predictors of managerial competencies for NAs. Although NAs require multifaceted managerial competencies, research related to NAs' managerial competencies is limited. A cross-sectional survey was conducted with 330 NAs from 16 acute care hospitals. Managerial competencies were determined through a self-developed questionnaire. Data were collected in 2011. All NAs gave themselves the highest rating on integrity and the lowest on both financial/budgeting and business acumen. All scores for managerial competencies, research capability, time management and executive power showed a statistically significant correlation. The stepwise regression analysis revealed that age; having received NA training; having completed a nursing project independently; and scores for research capability, executive power and workload could explain 63.2% of the total variance in managerial competencies. The present study provides recommendations for future administrative training programmes to increase NAs' managerial competency in fulfilling their management roles and functions. The findings inform leaders of hospitals where NAs need to develop additional competencies concerning the type of training NAs need to function proficiently. © 2012 Blackwell Publishing Ltd.
Nanoassemblies from amphiphilic cytarabine prodrug for leukemia targeted therapy.

PubMed

Liu, Jing; Zhao, Dujuan; He, Wenxiu; Zhang, Huiyuan; Li, Zhonghao; Luan, Yuxia

2017-02-01

The anti-leukemia effect of cytarabine (Ara-C) is severely restricted by its high hydrophilic properties and rapid plasma degradation. Herein, a novel amphiphilic small molecular prodrug of Ara-C was developed by coupling a short aliphatic chain, hexanoic acid (HA) to 4-NH 2 of the parent drug. Based on the amphiphilic nature, the resulting bioconjugate (HA-Ara) could spontaneously self-assemble into stable spherical nanoassemblies (NAs) with an extremely high drug loading (∼71wt%). Moreover, folate receptor (FR)-targeting NAs with high grafting efficient folic acid - bovine serum albumin (FA-BSA) conjugate immobilized on the surface (NAs/FA-BSA) was prepared. The results of MTT assays on FR-positive K562 cells and FR-negative A549 cells demonstrated higher cytotoxicity of HA-Ara NAs than the native drug. Especially, the IC 50 values revealed that NAs/FA-BSA was 3 and 2-fold effective than non-targeted NAs after 24 and 48h treatment with K562 cells, respectively indicating FR-mediated enhanced anti-tumor efficacy. In vitro cellular uptake, larger accumulation of HA-Ara NAs were observed in comparative with the free FITC and the results further confirmed the selective uptake of NAs/FA-BSA in folate receptor enriched cancer cells. Above all, self-assembled HA-Ara NAs exhibited potential superiority for Ara-C delivery and FA-modified NAs would be an excellent candidate for targeting leukemia therapy. Copyright © 2016 Elsevier Inc. All rights reserved.
Electro-physiological changes in the brain induced by caffeine or glucose nasal spray.

PubMed

De Pauw, K; Roelands, B; Van Cutsem, J; Marusic, U; Torbeyns, T; Meeusen, R

2017-01-01

A direct link between the mouth cavity and the brain for glucose (GLUC) and caffeine (CAF) has been established. The aim of this study is to determine whether a direct link for both substrates also exist between the nasal cavity and the brain. Ten healthy male subjects (age 22 ± 1 years) performed three experimental trials, separated by at least 2 days. Each trial included a 20-s nasal spray (NAS) period in which solutions placebo (PLAC), GLUC, or CAF were provided in a double-blind, randomized order. During each trial, four cognitive Stroop tasks were performed: two familiarization trials and one pre- and one post-NAS trial. Reaction times and accuracy for different stimuli (neutral, NEUTR; congruent, CON; incongruent INCON) were determined. Electroencephalography was continuously measured throughout the trials. During the Stroop tasks pre- and post-NAS, the P300 was assessed and during NAS, source localization was performed using standardized low-resolution brain electromagnetic tomography (sLORETA). NAS activated the anterior cingulate cortex (ACC). CAF-NAS also increased θ and β activity in frontal cortices. Furthermore, GLUC-NAS increased the β activity within the insula. GLUC-NAS also increased the P300 amplitude with INCON (P = 0.046) and reduced P300 amplitude at F3-F4 and P300 latency at CP1-CP2-Cz with NEUTR (P = 0.001 and P = 0.016, respectively). The existence of nasal bitter and sweet taste receptors possibly induce these brain responses. Greater cognitive efficiency was observed with GLUC-NAS. CAF-NAS activated cingulate, insular, and sensorymotor cortices, whereas GLUC-NAS activated sensory, cingulate, and insular cortices. However, no effect on the Stroop task was found.
Differences in phytotoxicity and dissipation between ionized and nonionized oil sands naphthenic acids in wetland plants.

PubMed

Armstrong, Sarah A; Headley, John V; Peru, Kerry M; Germida, James J

2009-10-01

Naphthenic acids (NAs) are composed of alkyl-substituted acyclic and cycloaliphatic carboxylic acids and, because they are acutely toxic to fish, are of toxicological concern. During the caustic hot-water extraction of oil from the bitumen in oil sands deposits, NAs become concentrated in the resulting tailings pond water. The present study investigated if dissipation of NAs occurs in the presence of hydroponically grown emergent macrophytes (Typha latifolia, Phragmites australis, and Scirpus acutus) to determine the potential for phytoremediation of these compounds. Plants were grown with oil sands NAs (pKa approximately 5-6) in medium at pH 7.8 (predominantly ionized NAs) and pH 5.0 (predominantly nonionized NAs) to determine if, by altering their chemical form, NAs may be more accessible to plants and, thus, undergo increased dissipation. Whereas the oil sands NA mixture in its nonionized form was more toxic to wetland plants than its ionized form, neither form appeared to be sequestered by wetland plants. The present study demonstrated that plants may selectively enhance the dissipation of individual nonionized NA compounds, which contributes to toxicity reduction but does not translate into detectable total NA dissipation within experimental error and natural variation. Plants were able to reduce the toxicity of a NA system over 30 d, increasing the median lethal concentration (LC50; % of hydroponic solution) of the medium for Daphnia magna by 23.3% +/- 8.1% (mean +/- standard error; nonionized NAs) and 37.0% +/- 2.7% (ionized NAs) as determined by acute toxicity bioassays. This reduction in toxicity was 7.3% +/- 2.6% (nonionized NAs) and 45.0% +/- 6.8% (ionized NAs) greater than that in unplanted systems.
Characterization of naphthenic acids in oil sands wastewaters by gas chromatography-mass spectrometry.

PubMed

Holowenko, Fervone M; MacKinnon, Michael D; Fedorak, Phillip M

2002-06-01

The water produced during the extraction of bitumen from oil sands is toxic to aquatic organisms due largely to a group of naturally occurring organic acids, naphthenic acids (NAs), that are solubilized from the bitumen during processing. NAs are a complex mixture of alkyl-substituted acyclic and cycloaliphatic carboxylic acids, with the general chemical formula CnH(2n + Z)O2, where n is the carbon number and Z specifies a homologous family. Gas chromatography-electron impact mass spectrometry was used to characterize NAs in nine water samples derived from oil sands extraction processes. For each sample, the analysis provided the relative abundances for up to 156 base peaks, with each representing at least one NA structure. Plotting the relative abundances of NAs as three-dimensional bar graphs showed differences among samples. The relative abundance of NAs with carbon numbers < or = 21 to those in the "C22 + cluster" (sum of all NAs with carbon numbers > or = 22 in Z families 0 to -12) proved useful for comparing the water samples that had a range of toxicities. A decrease in toxicity of process-affected waters accompanied an increase in the proportion of NAs in the "C22 + cluster", likely caused by biodegradation of NAs with carbon numbers of < or = 21. In addition, an increase in the proportion of NAs in the "C22 + cluster" accompanied a decrease in the total NAs in the process-affected waters, again suggesting the selective removal of NAs with carbon numbers of < or = 21. This is the first investigation in which changes in the fingerprint of the NA fraction of process-affected waters from the oil sands operations has corresponded with measured toxicity in these waters.
Biodegradation and detoxification of naphthenic acids in oil sands process affected waters.

PubMed

Yue, Siqing; Ramsay, Bruce A; Wang, Jiaxi; Ramsay, Juliana A

2016-12-01

After oil sands process affected water (OSPW) was treated in a continuous flow biofilm reactor, about 40% of the organic compounds in the acid extractable fraction (AEF) including naphthenic acids (NAs) were degraded resulting in a reduction of 73% in the Microtox acute toxicity and of 22% in the yeast estrogenic assay. Using effect directed analysis, treated and untreated OSPW were fractionated by solid phase extraction and the fractions with the largest decrease in toxicity and estrogenicity were selected for analysis by electrospray ionization combined with linear ion trap and a high-resolution Orbitrap mass spectrometer (negative ion mode). The aim of this study was to determine whether compositional changes between the untreated and treated fractions provide insight related to biodegradation and detoxification of NAs. The O2S, O3S and O4S compounds were either not major contributors of toxicity or estrogenicity or the more toxic or estrogenic ones were biodegraded. The O3- and O4-NAs seem to be more readily metabolized than O2NAs and their degradation would contribute to detoxification. The decrease in acute toxicity may be associated with the degradation of C12 and C13 bicyclic and C12-C14 tricyclic NAs while the decrease in estrogenicity may be linked to the degradation of C16 O2-NAs with double bond equivalents (DBE)=5 and 6, C16 and 17 O2-NAs with DBE=7, and C19-O2-NAs with DBE=8. The residual acute toxicity may be caused by recalcitrant components and/or degradation products such as the O2 bicyclic and tricyclic NAs, particularly the C14 and C15 bicyclic and C14-C16 tricyclic NAs as well as the polycyclic aromatic NAs (DBE≥5 compounds). The decrease in estrogenicity may be linked to the degradation of the O3 and O4 oxidized NAs while much of the residual estrogenicity may be due to the recalcitrant polycyclic aromatic O2-NAs. Hence, treatment to further detoxify OSPW should target these compounds. Copyright © 2016 Elsevier B.V. All rights reserved.
Communications of Staphylococcus aureus and non-aureus Staphylococcus species from bovine intramammary infections and teat apex colonization.

PubMed

Mahmmod, Yasser S; Klaas, Ilka Christine; Svennesen, Line; Pedersen, Karl; Ingmer, Hanne

2018-05-16

The role of non-aureus staphylococci (NAS) in the risk of acquisition of intramammary infections with Staphylococcus aureus is vague and still under debate. The objectives of this study were to (1) investigate the distribution patterns of NAS species from milk and teat skin in dairy herds with automatic milking systems, and (2) examine if the isolated NAS influences the expression of S. aureus virulence factors controlled by the accessory gene regulator (agr) quorum sensing system. In 8 herds, 14 to 20 cows with elevated somatic cell count were randomly selected for teat skin swabbing and aseptic quarter foremilk samples from right hind and left front quarters. Teat skin swabs were collected using the modified wet-dry method and milk samples were taken aseptically for bacterial culture. Colonies from quarters with suspicion of having NAS in milk or teat skin samples (or both) were subjected to MALDI-TOF assay for species identification. To investigate the interaction between S. aureus and NAS, 81 isolates NAS were subjected to a qualitative β-galactosidase reporter plate assay. In total, 373 NAS isolates were identified representing 105 from milk and 268 from teat skin of 284 quarters (= 142 cows). Sixteen different NAS species were identified, 15 species from teat skin and 10 species from milk. The most prevalent NAS species identified from milk were Staphylococcus epidermidis (50%), Staphylococcus haemolyticus (15%), and Staphylococcus chromogenes (11%), accounting for 76%. Meanwhile, the most prevalent NAS species from teat skin were Staphylococcus equorum (43%), S. haemolyticus (16%), and Staphylococcus cohnii (14%), accounting for 73%. Using reporter gene fusions monitoring transcriptional activity of key virulence factors and regulators, we found that out of 81 supernatants of NAS isolates, 77% reduced expression of hla, encoding a-hemolysin, 70% reduced expression of RNAIII, the key effector molecule of agr, and 61% reduced expression of spa encoding protein A of S. aureus, respectively. Our NAS isolates showed 3 main patterns: (1) downregulation effect such as S. chromogenes (milk) and Staphylococcus xylosus (milk and teat), (2) no effect such as Staphylococcus sciuri (teat) and S. vitulinus (teat), and the third pattern (c) variable effect such as S. epidermidis (milk and teat) and S. equorum (milk and teat). The pattern of cross-talk between NAS species and S. aureus virulence genes varied according to the involved NAS species, habitat type, and herd factors. The knowledge of how NAS influences S. aureus virulence factor expression could explain the varying protective effect of NAS on S. aureus intramammary infections. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
The Need for Vendor Source Code at NAS. Revised

NASA Technical Reports Server (NTRS)

Carter, Russell; Acheson, Steve; Blaylock, Bruce; Brock, David; Cardo, Nick; Ciotti, Bob; Poston, Alan; Wong, Parkson; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

The Numerical Aerodynamic Simulation (NAS) Facility has a long standing practice of maintaining buildable source code for installed hardware. There are two reasons for this: NAS's designated pathfinding role, and the need to maintain a smoothly running operational capacity given the widely diversified nature of the vendor installations. NAS has a need to maintain support capabilities when vendors are not able; diagnose and remedy hardware or software problems where applicable; and to support ongoing system software development activities whether or not the relevant vendors feel support is justified. This note provides an informal history of these activities at NAS, and brings together the general principles that drive the requirement that systems integrated into the NAS environment run binaries built from source code, onsite.
Large-scale molecular dynamics simulation of DNA: implementation and validation of the AMBER98 force field in LAMMPS.

PubMed

Grindon, Christina; Harris, Sarah; Evans, Tom; Novik, Keir; Coveney, Peter; Laughton, Charles

2004-07-15

Molecular modelling played a central role in the discovery of the structure of DNA by Watson and Crick. Today, such modelling is done on computers: the more powerful these computers are, the more detailed and extensive can be the study of the dynamics of such biological macromolecules. To fully harness the power of modern massively parallel computers, however, we need to develop and deploy algorithms which can exploit the structure of such hardware. The Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is a scalable molecular dynamics code including long-range Coulomb interactions, which has been specifically designed to function efficiently on parallel platforms. Here we describe the implementation of the AMBER98 force field in LAMMPS and its validation for molecular dynamics investigations of DNA structure and flexibility against the benchmark of results obtained with the long-established code AMBER6 (Assisted Model Building with Energy Refinement, version 6). Extended molecular dynamics simulations on the hydrated DNA dodecamer d(CTTTTGCAAAAG)(2), which has previously been the subject of extensive dynamical analysis using AMBER6, show that it is possible to obtain excellent agreement in terms of static, dynamic and thermodynamic parameters between AMBER6 and LAMMPS. In comparison with AMBER6, LAMMPS shows greatly improved scalability in massively parallel environments, opening up the possibility of efficient simulations of order-of-magnitude larger systems and/or for order-of-magnitude greater simulation times.
Neonatal Abstinence Syndrome and Maternal Substance Use in Wisconsin, 2009-2014.

PubMed

Atwell, Karina A; Weiss, Harold B; Gibson, Crystal; Miller, Richard; Corden, Timothy E

2016-12-01

Increasing rates of neonatal abstinence syndrome (NAS), most commonly linked to maternal opioid use, are a growing concern within clinical and public health domains. The study aims to describe the statewide burden of NAS and maternal substance use, focusing on opioids in Wisconsin from 2009 to 2014. Trends in NAS and maternal substance use diagnosis rates were calculated using Wisconsin’s Hospital Discharge Data. Demographic and payer characteristics, health service utilization, and clinical outcomes were compared for newborns with and without NAS. Demographic and payer characteristics were compared between women with and without substance use identified at time of delivery. Rates of NAS and maternal substance use, most notably opioid use, increased significantly between 2009 and 2014. The majority of newborns diagnosed with NAS, and women identified with substance use, were non-Hispanic, white, and Medicaid-insured. Disproportionate rates of NAS and maternal opioid use were observed in American Indian/Alaska Native and Medicaid populations compared to white and privately insured groups, respectively. Women age 20-29 years had the highest rates of opioid use compared to the reference group (10-19 years). Odds of adverse clinical outcomes and levels of health service utilization were significantly higher for newborns with NAS. Similar to trends nationally, our findings show an increase in maternal opioid use and NAS rates in Wisconsin over time, with disproportionate effects in certain demographic groups. These findings support the need for targeted interventions in clinical and public health settings aimed at prevention and burden reduction of NAS and maternal substance use in Wisconsin.
The effect of naphthenic acids on physiological characteristics of the microalgae Phaeodactylum tricornutum and Platymonas helgolandica var. tsingtaoensis.

PubMed

Zhang, Huanxin; Tang, Xuexi; Shang, Jiagen; Zhao, Xinyu; Qu, Tongfei; Wang, Ying

2018-05-11

Naphthenic acids (NAs) account for 1-2% of crude oil and represent its main acidic component. However, the aquatoxic effects of NAs on marine phytoplankton and their ecological risks have remained largely unknown. Using the marine microalgae Phaeodactylum tricornutum and Platymonas helgolandica var. tsingtaoensis as the target, we studied the effects of NAs on their growth, cell morphology and physiological characteristics. The cell density decreased as the concentrations of NAs increased, indicating that they had an adverse effect on growth of the investigated algae in a concentration-dependent manner. Moreover, scanning electron microscopy revealed NAs exposure caused damage such as deformed cells, shrunken surface and ruptured cell structures. Exposure to NAs at higher concentrations for 48 h significantly increased the content of chlorophyll (Chl) a and b in P. tricornutum, but decreased their levels in P. helgolandica var. tsingtaoensis. NAs with concentrations no higher than 4 mg/L gradually enhanced the Chl fluorescence (ChlF) parameters and decreased the ChlF parameters at higher concentrations for the two marine microalgae. Additionally, NAs induced hormesis on photosynthetic efficiency of the two microalgae and also have the species difference in their aquatic toxicity. Overall, the results of this study provide a better understanding of the physiological responses of phytoplankton and will enable better risk assessments of NAs. Copyright © 2018 Elsevier Ltd. All rights reserved.

Nurse's Aides in Nursing Homes: The Relationship between Organization and Quality.

ERIC Educational Resources Information Center

Bowers, Barbara; Becker, Marion

1992-01-01

Examined work of the nurse's aides (NAs) through observation and in-depth interviews with 30 NAs. Found efforts of NAs were clearly focused on getting work done well enough to stay out of trouble and decisions with implications for quality of care were often driven by whether NAs would be reprimanded by supervisors or ostracized by peers.…
Coupled-Flow Simulation of HP-LP Turbines Has Resulted in Significant Fuel Savings

NASA Technical Reports Server (NTRS)

Veres, Joseph P.

2001-01-01

Our objective was to create a high-fidelity Navier-Stokes computer simulation of the flow through the turbines of a modern high-bypass-ratio turbofan engine. The simulation would have to capture the aerodynamic interactions between closely coupled high- and low-pressure turbines. A computer simulation of the flow in the GE90 turbofan engine's high-pressure (HP) and low-pressure (LP) turbines was created at GE Aircraft Engines under contract with the NASA Glenn Research Center. The three-dimensional steady-state computer simulation was performed using Glenn's average-passage approach named APNASA. The areas upstream and downstream of each blade row mutually interact with each other during engine operation. The embedded blade row operating conditions are modeled since the average passage equations in APNASA actively include the effects of the adjacent blade rows. The turbine airfoils, platforms, and casing are actively cooled by compressor bleed air. Hot gas leaks around the tips of rotors through labyrinth seals. The flow exiting the high work HP turbines is partially transonic and, therefore, has a strong shock system in the transition region. The simulation was done using 121 processors of a Silicon Graphics Origin 2000 (NAS 02K) cluster at the NASA Ames Research Center, with a parallel efficiency of 87 percent in 15 hr. The typical average-passage analysis mesh size per blade row was 280 by 45 by 55, or approx.700,000 grid points. The total number of blade rows was 18 for a combined HP and LP turbine system including the struts in the transition duct and exit guide vane, which contain 12.6 million grid points. Design cycle turnaround time requirements ran typically from 24 to 48 hr of wall clock time. The number of iterations for convergence was 10,000 at 8.03x10(exp -5) sec/iteration/grid point (NAS O2K). Parallel processing by up to 40 processors is required to meet the design cycle time constraints. This is the first-ever flow simulation of an HP and LP turbine. In addition, it includes the struts in the transition duct and exit guide vanes.
SDA 7: A modular and parallel implementation of the simulation of diffusional association software

PubMed Central

Martinez, Michael; Romanowska, Julia; Kokh, Daria B.; Ozboyaci, Musa; Yu, Xiaofeng; Öztürk, Mehmet Ali; Richter, Stefan

2015-01-01

The simulation of diffusional association (SDA) Brownian dynamics software package has been widely used in the study of biomacromolecular association. Initially developed to calculate bimolecular protein–protein association rate constants, it has since been extended to study electron transfer rates, to predict the structures of biomacromolecular complexes, to investigate the adsorption of proteins to inorganic surfaces, and to simulate the dynamics of large systems containing many biomacromolecular solutes, allowing the study of concentration‐dependent effects. These extensions have led to a number of divergent versions of the software. In this article, we report the development of the latest version of the software (SDA 7). This release was developed to consolidate the existing codes into a single framework, while improving the parallelization of the code to better exploit modern multicore shared memory computer architectures. It is built using a modular object‐oriented programming scheme, to allow for easy maintenance and extension of the software, and includes new features, such as adding flexible solute representations. We discuss a number of application examples, which describe some of the methods available in the release, and provide benchmarking data to demonstrate the parallel performance. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. PMID:26123630
Parallelized modelling and solution scheme for hierarchically scaled simulations

NASA Technical Reports Server (NTRS)

Padovan, Joe

1995-01-01

This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.
Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing

NASA Astrophysics Data System (ADS)

Amooie, M. A.; Moortgat, J.

2017-12-01

We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.
Fine-grained parallelism accelerating for RNA secondary structure prediction with pseudoknots based on FPGA.

PubMed

Xia, Fei; Jin, Guoqing

2014-06-01

PKNOTS is a most famous benchmark program and has been widely used to predict RNA secondary structure including pseudoknots. It adopts the standard four-dimensional (4D) dynamic programming (DP) method and is the basis of many variants and improved algorithms. Unfortunately, the O(N(6)) computing requirements and complicated data dependency greatly limits the usefulness of PKNOTS package with the explosion in gene database size. In this paper, we present a fine-grained parallel PKNOTS package and prototype system for accelerating RNA folding application based on FPGA chip. We adopted a series of storage optimization strategies to resolve the "Memory Wall" problem. We aggressively exploit parallel computing strategies to improve computational efficiency. We also propose several methods that collectively reduce the storage requirements for FPGA on-chip memory. To the best of our knowledge, our design is the first FPGA implementation for accelerating 4D DP problem for RNA folding application including pseudoknots. The experimental results show a factor of more than 50x average speedup over the PKNOTS-1.08 software running on a PC platform with Intel Core2 Q9400 Quad CPU for input RNA sequences. However, the power consumption of our FPGA accelerator is only about 50% of the general-purpose micro-processors.
Multiscale asymmetric orthogonal wavelet kernel for linear programming support vector learning and nonlinear dynamic systems identification.

PubMed

Lu, Zhao; Sun, Jing; Butts, Kenneth

2014-05-01

Support vector regression for approximating nonlinear dynamic systems is more delicate than the approximation of indicator functions in support vector classification, particularly for systems that involve multitudes of time scales in their sampled data. The kernel used for support vector learning determines the class of functions from which a support vector machine can draw its solution, and the choice of kernel significantly influences the performance of a support vector machine. In this paper, to bridge the gap between wavelet multiresolution analysis and kernel learning, the closed-form orthogonal wavelet is exploited to construct new multiscale asymmetric orthogonal wavelet kernels for linear programming support vector learning. The closed-form multiscale orthogonal wavelet kernel provides a systematic framework to implement multiscale kernel learning via dyadic dilations and also enables us to represent complex nonlinear dynamics effectively. To demonstrate the superiority of the proposed multiscale wavelet kernel in identifying complex nonlinear dynamic systems, two case studies are presented that aim at building parallel models on benchmark datasets. The development of parallel models that address the long-term/mid-term prediction issue is more intricate and challenging than the identification of series-parallel models where only one-step ahead prediction is required. Simulation results illustrate the effectiveness of the proposed multiscale kernel learning.
Flexbar 3.0 - SIMD and multicore parallelization.

PubMed

Roehr, Johannes T; Dieterich, Christoph; Reinert, Knut

2017-09-15

High-throughput sequencing machines can process many samples in a single run. For Illumina systems, sequencing reads are barcoded with an additional DNA tag that is contained in the respective sequencing adapters. The recognition of barcode and adapter sequences is hence commonly needed for the analysis of next-generation sequencing data. Flexbar performs demultiplexing based on barcodes and adapter trimming for such data. The massive amounts of data generated on modern sequencing machines demand that this preprocessing is done as efficiently as possible. We present Flexbar 3.0, the successor of the popular program Flexbar. It employs now twofold parallelism: multi-threading and additionally SIMD vectorization. Both types of parallelism are used to speed-up the computation of pair-wise sequence alignments, which are used for the detection of barcodes and adapters. Furthermore, new features were included to cover a wide range of applications. We evaluated the performance of Flexbar based on a simulated sequencing dataset. Our program outcompetes other tools in terms of speed and is among the best tools in the presented quality benchmark. https://github.com/seqan/flexbar. johannes.roehr@fu-berlin.de or knut.reinert@fu-berlin.de. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
2D imaging of helium ion velocity in the DIII-D divertor

NASA Astrophysics Data System (ADS)

Samuell, C. M.; Porter, G. D.; Meyer, W. H.; Rognlien, T. D.; Allen, S. L.; Briesemeister, A.; Mclean, A. G.; Zeng, L.; Jaervinen, A. E.; Howard, J.

2018-05-01

Two-dimensional imaging of parallel ion velocities is compared to fluid modeling simulations to understand the role of ions in determining divertor conditions and benchmark the UEDGE fluid modeling code. Pure helium discharges are used so that spectroscopic He+ measurements represent the main-ion population at small electron temperatures. Electron temperatures and densities in the divertor match simulated values to within about 20%-30%, establishing the experiment/model match as being at least as good as those normally obtained in the more regularly simulated deuterium plasmas. He+ brightness (HeII) comparison indicates that the degree of detachment is captured well by UEDGE, principally due to the inclusion of E ×B drifts. Tomographically inverted Coherence Imaging Spectroscopy measurements are used to determine the He+ parallel velocities which display excellent agreement between the model and the experiment near the divertor target where He+ is predicted to be the main-ion species and where electron-dominated physics dictates the parallel momentum balance. Upstream near the X-point where He+ is a minority species and ion-dominated physics plays a more important role, there is an underestimation of the flow velocity magnitude by a factor of 2-3. These results indicate that more effort is required to be able to correctly predict ion momentum in these challenging regimes.
Fast quantum Monte Carlo on a GPU

NASA Astrophysics Data System (ADS)

Lutsyshyn, Y.

2015-02-01

We present a scheme for the parallelization of quantum Monte Carlo method on graphical processing units, focusing on variational Monte Carlo simulation of bosonic systems. We use asynchronous execution schemes with shared memory persistence, and obtain an excellent utilization of the accelerator. The CUDA code is provided along with a package that simulates liquid helium-4. The program was benchmarked on several models of Nvidia GPU, including Fermi GTX560 and M2090, and the Kepler architecture K20 GPU. Special optimization was developed for the Kepler cards, including placement of data structures in the register space of the Kepler GPUs. Kepler-specific optimization is discussed.
Electro-osmotic transport in wet processing of textiles

DOEpatents

Cooper, John F.

1998-01-01

Electro-osmotic (or electrokinetic) transport is used to efficiently force a solution (or water) through the interior of the fibers or yarns of textile materials for wet processing of textiles. The textile material is passed between electrodes that apply an electric field across the fabric. Used alone or in parallel with conventional hydraulic washing (forced convection), electro-osmotic transport greatly reduces the amount of water used in wet processing. The amount of water required to achieve a fixed level of rinsing of tint can be reduced, for example, to 1-5 lbs water per pound of fabric from an industry benchmark of 20 lbs water/lb fabric.
Does the Intel Xeon Phi processor fit HEP workloads?

NASA Astrophysics Data System (ADS)

Nowak, A.; Bitzes, G.; Dotti, A.; Lazzaro, A.; Jarp, S.; Szostek, P.; Valsan, L.; Botezatu, M.; Leduc, J.

2014-06-01

This paper summarizes the five years of CERN openlab's efforts focused on the Intel Xeon Phi co-processor, from the time of its inception to public release. We consider the architecture of the device vis a vis the characteristics of HEP software and identify key opportunities for HEP processing, as well as scaling limitations. We report on improvements and speedups linked to parallelization and vectorization on benchmarks involving software frameworks such as Geant4 and ROOT. Finally, we extrapolate current software and hardware trends and project them onto accelerators of the future, with the specifics of offline and online HEP processing in mind.
Electro-osmotic transport in wet processing of textiles

DOEpatents

Cooper, J.F.

1998-09-22

Electro-osmotic (or electrokinetic) transport is used to efficiently force a solution (or water) through the interior of the fibers or yarns of textile materials for wet processing of textiles. The textile material is passed between electrodes that apply an electric field across the fabric. Used alone or in parallel with conventional hydraulic washing (forced convection), electro-osmotic transport greatly reduces the amount of water used in wet processing. The amount of water required to achieve a fixed level of rinsing of tint can be reduced, for example, to 1--5 lbs water per pound of fabric from an industry benchmark of 20 lbs water/lb fabric. 5 figs.
Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features.

PubMed

Yu, Dongjun; Wu, Xiaowei; Shen, Hongbin; Yang, Jian; Tang, Zhenmin; Qi, Yong; Yang, Jingyu

2012-12-01

Membrane proteins are encoded by ~ 30% in the genome and function importantly in the living organisms. Previous studies have revealed that membrane proteins' structures and functions show obvious cell organelle-specific properties. Hence, it is highly desired to predict membrane protein's subcellular location from the primary sequence considering the extreme difficulties of membrane protein wet-lab studies. Although many models have been developed for predicting protein subcellular locations, only a few are specific to membrane proteins. Existing prediction approaches were constructed based on statistical machine learning algorithms with serial combination of multi-view features, i.e., different feature vectors are simply serially combined to form a super feature vector. However, such simple combination of features will simultaneously increase the information redundancy that could, in turn, deteriorate the final prediction accuracy. That's why it was often found that prediction success rates in the serial super space were even lower than those in a single-view space. The purpose of this paper is investigation of a proper method for fusing multiple multi-view protein sequential features for subcellular location predictions. Instead of serial strategy, we propose a novel parallel framework for fusing multiple membrane protein multi-view attributes that will represent protein samples in complex spaces. We also proposed generalized principle component analysis (GPCA) for feature reduction purpose in the complex geometry. All the experimental results through different machine learning algorithms on benchmark membrane protein subcellular localization datasets demonstrate that the newly proposed parallel strategy outperforms the traditional serial approach. We also demonstrate the efficacy of the parallel strategy on a soluble protein subcellular localization dataset indicating the parallel technique is flexible to suite for other computational biology problems. The software and datasets are available at: http://www.csbio.sjtu.edu.cn/bioinf/mpsp.
High-Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Park, K. C.; Gumaste, U.; Chen, P.-S.; Lesoinne, M.; Stern, P.

1996-01-01

This research program dealt with the application of high-performance computing methods to the numerical simulation of complete jet engines. The program was initiated in January 1993 by applying two-dimensional parallel aeroelastic codes to the interior gas flow problem of a bypass jet engine. The fluid mesh generation, domain decomposition and solution capabilities were successfully tested. Attention was then focused on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by these structural displacements. The latter is treated by a ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field fluid elements. New partitioned analysis procedures to treat this coupled three-component problem were developed during 1994 and 1995. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers, including the iPSC-860, Paragon XP/S and the IBM SP2. For the global steady-state axisymmetric analysis of a complete engine we have decided to use the NASA-sponsored ENG10 program, which uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor tor parallel versions of ENG10 was developed. During 1995 and 1996 we developed the capability tor the first full 3D aeroelastic simulation of a multirow engine stage. This capability was tested on the IBM SP2 parallel supercomputer at NASA Ames. Benchmark results were presented at the 1196 Computational Aeroscience meeting.
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arampatzis, Giorgos, E-mail: garab@math.uoc.gr; Katsoulakis, Markos A., E-mail: markos@math.umass.edu; Plechac, Petr, E-mail: plechac@math.udel.edu

2012-10-01

We present a mathematical framework for constructing and analyzing parallel algorithms for lattice kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. Rather than focusing on constructing exactly the stochastic trajectories, our approach relies on approximating the evolution of observables, such as density, coverage, correlations and so on. More specifically, we develop a spatial domain decomposition of the Markov operator (generator) that describes the evolution of all observables according to the kinetic Monte Carlo algorithm. This domain decompositionmore » corresponds to a decomposition of the Markov generator into a hierarchy of operators and can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). Based on this operator decomposition, we formulate parallel Fractional step kinetic Monte Carlo algorithms by employing the Trotter Theorem and its randomized variants; these schemes, (a) are partially asynchronous on each fractional step time-window, and (b) are characterized by their communication schedule between processors. The proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules. We carry out a detailed benchmarking of the parallel KMC schemes using available exact solutions, for example, in Ising-type systems and we demonstrate the capabilities of the method to simulate complex spatially distributed reactions at very large scales on GPUs. Finally, we discuss work load balancing between processors and propose a re-balancing scheme based on probabilistic mass transport methods.« less
Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoginath, Srikanth B; Perumalla, Kalyan S

2013-01-01

With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results frommore » experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.« less
Diversity and abundance of nitrate assimilation genes in the northern South china sea.

PubMed

Cai, Haiyuan; Jiao, Nianzhi

2008-11-01

Marine heterotrophic microorganisms that assimilate nitrate play an important role in nitrogen and carbon cycling in the water column. The nasA gene, encoding the nitrate assimilation enzyme, was selected as a functional marker to examine the nitrate assimilation community in the South China Sea (SCS). PCR amplification, restriction fragment length polymorphism (RFLP) screening, and phylogenetic analysis of nasA gene sequences were performed to characterize in situ nitrate assimilatory bacteria. Furthermore, the effects of nutrients and other environmental factors on the genetic heterogeneity of nasA fragments from the SCS were evaluated at the surface in three stations, and at two other depths in one of these stations. The diversity indices and rarefaction curves indicated that the nasA gene was more diverse in offshore waters than in the Pearl River estuary. The phylotype rank abundance curve showed an abundant and unique RFLP pattern in all five libraries, indicating that a high diversity but low abundance of nasA existed in the study areas. Phylogenetic analysis of environmental nasA gene sequences further revealed that the nasA gene fragments came from several common aquatic microbial groups, including the Proteobacteria, Cytophaga-Flavobacteria (CF), and Cyanobacteria. In addition to the direct PCR/sequence analysis of environmental samples, we also cultured a number of nitrate assimilatory bacteria isolated from the field. Comparison of nasA genes from these isolates and from the field samples indicated the existence of horizontal nasA gene transfer. Application of real-time quantitative PCR to these nasA genes revealed a great variation in their abundance at different investigation sites and water depths.
Validation of the nursing workload scoring systems "Nursing Activities Score" (NAS), and "Therapeutic Intervention Scoring System For Critically Ill Children" (TISS-C) in a Greek Paediatric Intensive Care Unit.

PubMed

Nieri, Alexandra-Stavroula; Manousaki, Kalliopi; Kalafati, Maria; Padilha, Katia Grilio; Stafseth, Siv K; Katsoulas, Theodoros; Matziou, Vasiliki; Giannakopoulou, Margarita

2018-04-11

To assess the reliability and validity of the Greek version of Nursing Activities Score (NAS), and Therapeutic Intervention Scoring System for Critically Ill Children (TISS-C) in a Greek Paediatric Intensive Care Unit (PICU). A methodological study was performed in one PICU of the largest Paediatric Hospital in Athens-Greece. The culturally adapted and validated Greek NAS version, enriched according to the Norwegian paediatric one (P-NAS), was used. TISS-C and Norwegian paediatric interventions were translated to Greek language and backwards. Therapeutic Intervention Scoring System (TISS-28) was used as a gold standard. Two independent observers simultaneously recorded 30 daily P-NAS and TISS-C records. Totally, 188 daily P-NAS, TISS-C and TISS-28 reports in a sample of 29 patients have been obtained during 5 weeks. Descriptive statistics, reliability and validity measures were applied using SPSS (ver 22.0) (p ≤ 0.05). Kappa was 0.963 for P-NAS and 0.9895 for TISS-C (p < 0.001) and Intraclass Correlation Coefficient for all scale items of TISS-C was 1.00 (p < 0.001). P-NAS, TISS-28 and TISS-C measurements were significantly correlated (0.680 ≤ rho ≤ 0.743, p < 0.001). The mean score(±SD) for TISS-28, P-NAS and TISS-C was 23.05(±5.72), 58.14(±13.98) and 20.21(±9.66) respectively. These results support the validity of P-NAS and TISS-C scales to be used in Greek PICUs. Copyright © 2018 Elsevier Ltd. All rights reserved.
Development of an Attitude Scale to Assess K-12 Teachers' Attitudes toward Nanotechnology

NASA Astrophysics Data System (ADS)

Lan, Yu-Ling

2012-05-01

To maximize the contributions of nanotechnology to this society, at least 60 countries have put efforts into this field. In Taiwan, a government-funded K-12 Nanotechnology Programme was established to train K-12 teachers with adequate nanotechnology literacy to foster the next generation of Taiwanese people with sufficient knowledge in nanotechnology. In the present study, the Nanotechnology Attitude Scale for K-12 teachers (NAS-T) was developed to assess K-12 teachers' attitudes toward nanotechnology. The NAS-T included 23 Likert-scale items that can be grouped into three components: importance of nanotechnology, affective tendencies in science teaching, and behavioural tendencies to teach nanotechnology. A sample of 233 K-12 teachers who have participated in the K-12 Nanotechnology Programme was included in the present study to investigate the psychometric properties of the NAS-T. The exploratory factor analysis of this teacher sample suggested that the NAS-T was a three-factor model that explained 64.11% of the total variances. This model was also confirmed by the confirmatory factor analysis to validate the factor structure of the NAS-T. The Cronbach's alpha values of three NAS-T subscales ranged from 0.89 to 0.95. Moderate to strong correlations among teachers' NAS-T domain scores, self-perception of own nanoscience knowledge, and their science-teaching efficacy demonstrated good convergent validity of the NAS-T. As a whole, psychometric properties of the NAS-T indicated that this instrument is an effective instrument for assessing K-12 teachers' attitudes toward nanotechnology. The NAS-T will serve as a valuable tool to evaluate teachers' attitude changes after participating in the K-12 Nanotechnology Programme.

Effect of N-Acetylserotonin on Intestinal Recovery Following Intestinal Ischemia-Reperfusion Injury in a Rat.

PubMed

Ben Shahar, Yoav; Sukhotnik, Igor; Bitterman, Nir; Pollak, Yulia; Bejar, Jacob; Chepurov, Dmitriy; Coran, Arnold; Bitterman, Arie

2016-02-01

N-acetylserotonin (NAS) is a naturally occurring chemical intermediate in the biosynthesis of melatonin. Extensive studies in various experimental models have established that treatment with NAS significantly protects heart and kidney injury from ischemia-reperfusion (IR). The purpose of the present study was to examine the effect of NAS on intestinal recovery and enterocyte turnover after intestinal IR injury in rats. Male Sprague-Dawley rats were divided into four experimental groups: (1) Sham rats underwent laparotomy, (2) sham-NAS rats underwent laparotomy and were treated with intraperitoneal (IP) NAS (20 mg/kg); (3) IR rats underwent occlusion of both superior mesenteric artery and portal vein for 30 minutes, followed by 48 hours of reperfusion, and (4) IR-NAS rats underwent IR and were treated with IP NAS (20 mg/kg) immediately before abdominal closure. Intestinal structural changes, Park injury score, enterocyte proliferation, and enterocyte apoptosis were determined 24 hours following IR. The expression of Bax, Bcl-2, p-ERK, and caspase-3 in the intestinal mucosa was determined using real-time polymerase chain reaction, Western blot, and immunohistochemistry. A nonparametric Kruskal-Wallis analysis of variance test was used for statistical analysis with p less than 0.05 considered statistically significant. Treatment with NAS resulted in a significant increase in mucosal weight in jejunum and ileum, villus height in the ileum, and crypt depth in jejunum and ileum compared with IR animals. IR-NAS rats also had a significantly proliferation rates as well as a lower apoptotic index in jejunum and ileum which was accompanied by higher Bcl-2 levels compared with IR animals. Treatment with NAS prevents gut mucosal damage and inhibits programmed cell death following intestinal IR in a rat. Georg Thieme Verlag KG Stuttgart · New York.
Male Sex Associated With Increased Risk of Neonatal Abstinence Syndrome.

PubMed

Charles, M Katherine; Cooper, William O; Jansson, Lauren M; Dudley, Judith; Slaughter, James C; Patrick, Stephen W

2017-06-01

Neonatal abstinence syndrome (NAS) is a postnatal opioid withdrawal syndrome. Factors associated with development of the syndrome are poorly understood; however, infant sex may influence the risk of NAS. Our objective was to determine if infant sex was associated with the development or severity of the syndrome in a large population-based cohort. This retrospective cohort study used vital statistics and prescription, outpatient, and inpatient administrative data for mothers and infants enrolled in the Tennessee Medicaid program between 2009 and 2011. Multivariable logistic regression models were used to evaluate the association between male sex and diagnosis of NAS, accounting for potential demographic and clinical confounders. NAS severity, as evidenced by hospital length of stay, was modeled by using negative binomial regression. Of 102 695 infants, 927 infants were diagnosed with NAS (484 male subjects and 443 female subjects). Adjustments were made for the following: maternal age, race, and education; maternal hepatitis C infection, anxiety, or depression; in utero exposure to selective serotonin reuptake inhibitors and cigarettes; infant birth weight, small for gestational age, and year; and the interaction between opioid type and opioid amount. Male infants were more likely than female infants to be diagnosed with NAS (adjusted odds ratio, 1.18 [95% confidence interval, 1.05-1.33]) and NAS requiring treatment (adjusted odds ratio, 1.24 [95% confidence interval, 1.04-1.47]). However, there was no sex-based difference in severity for those diagnosed with NAS. Treatment of NAS should be tailored to an infant's individual risk for the syndrome. Clinicians should be mindful that male sex is an important risk factor in the diagnosis of NAS. Copyright © 2017 by the American Academy of Pediatrics.
Determination of composition of non-homogeneous GaInNAs layers

NASA Astrophysics Data System (ADS)

Pucicki, D.; Bielak, K.; Ściana, B.; Radziewicz, D.; Latkowska-Baranowska, M.; Kováč, J.; Vincze, A.; Tłaczała, M.

2016-01-01

Dilute nitride GaInNAs alloys grown on GaAs have become perspective materials for so called low-cost GaAs-based devices working within the optical wavelength range up to 1.6 μm. The multilayer structures of GaInNAs/GaAs multi-quantum well (MQW) samples usually are analyzed by using high resolution X-ray diffraction (HRXRD) measurements. However, demands for precise structural characterization of the GaInNAs containing heterostructures requires taking into consideration all inhomogeneities of such structures. This paper describes some of the material challenges and progress in structural characterization of GaInNAs layers. A new algorithm for structural characterization of dilute nitrides which bounds contactless electro-reflectance (CER) or photo-reflectance (PR) measurements and HRXRD analysis results together with GaInNAs quantum well band diagram calculation is presented. The triple quantum well (3QW) GaInNAs/GaAs structures grown by atmospheric-pressure metalorganic vapor-phase epitaxy (AP-MOVPE) were investigated according to the proposed algorithm. Thanks to presented algorithm, more precise structural data including the nonuniformity in the growth direction of GaInNAs/GaAs QWs were achieved. Therefore, the proposed algorithm is mentioned as a nondestructive method for characterization of multicomponent inhomogeneous semiconductor structures with quantum wells.
Improved packing of protein side chains with parallel ant colonies.

PubMed

Quan, Lijun; Lü, Qiang; Li, Haiou; Xia, Xiaoyan; Wu, Hongjie

2014-01-01

The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms.
Benchmark Comparison of Cloud Analytics Methods Applied to Earth Observations

NASA Technical Reports Server (NTRS)

Lynnes, Chris; Little, Mike; Huang, Thomas; Jacob, Joseph; Yang, Phil; Kuo, Kwo-Sen

2016-01-01

Cloud computing has the potential to bring high performance computing capabilities to the average science researcher. However, in order to take full advantage of cloud capabilities, the science data used in the analysis must often be reorganized. This typically involves sharding the data across multiple nodes to enable relatively fine-grained parallelism. This can be either via cloud-based file systems or cloud-enabled databases such as Cassandra, Rasdaman or SciDB. Since storing an extra copy of data leads to increased cost and data management complexity, NASA is interested in determining the benefits and costs of various cloud analytics methods for real Earth Observation cases. Accordingly, NASA's Earth Science Technology Office and Earth Science Data and Information Systems project have teamed with cloud analytics practitioners to run a benchmark comparison on cloud analytics methods using the same input data and analysis algorithms. We have particularly looked at analysis algorithms that work over long time series, because these are particularly intractable for many Earth Observation datasets which typically store data with one or just a few time steps per file. This post will present side-by-side cost and performance results for several common Earth observation analysis operations.
Application configuration selection for energy-efficient execution on multicore systems

DOE PAGES

Wang, Shinan; Luo, Bing; Shi, Weisong; ...

2015-09-21

Balanced performance and energy consumption are incorporated in the design of modern computer systems. Several runtime factors, such as concurrency levels, thread mapping strategies, and dynamic voltage and frequency scaling (DVFS) should be considered in order to achieve optimal energy efficiency fora workload. Selecting appropriate run-time factors, however, is one of the most challenging tasks because the run-time factors are architecture-specific and workload-specific. And while most existing works concentrate on either static analysis of the workload or run-time prediction results, we present a hybrid two-step method that utilizes concurrency levels and DVFS settings to achieve the energy efficiency configuration formore » a worldoad. The experimental results based on a Xeon E5620 server with NPB and PARSEC benchmark suites show that the model is able to predict the energy efficient configuration accurately. On average, an additional 10% EDP (Energy Delay Product) saving is obtained by using run-time DVFS for the entire system. An off-line optimal solution is used to compare with the proposed scheme. Finally, the experimental results show that the average extra EDP saved by the optimal solution is within 5% on selective parallel benchmarks.« less
Benchmark Comparison of Cloud Analytics Methods Applied to Earth Observations

NASA Astrophysics Data System (ADS)

Lynnes, C.; Little, M. M.; Huang, T.; Jacob, J. C.; Yang, C. P.; Kuo, K. S.

2016-12-01

Cloud computing has the potential to bring high performance computing capabilities to the average science researcher. However, in order to take full advantage of cloud capabilities, the science data used in the analysis must often be reorganized. This typically involves sharding the data across multiple nodes to enable relatively fine-grained parallelism. This can be either via cloud-based filesystems or cloud-enabled databases such as Cassandra, Rasdaman or SciDB. Since storing an extra copy of data leads to increased cost and data management complexity, NASA is interested in determining the benefits and costs of various cloud analytics methods for real Earth Observation cases. Accordingly, NASA's Earth Science Technology Office and Earth Science Data and Information Systems project have teamed with cloud analytics practitioners to run a benchmark comparison on cloud analytics methods using the same input data and analysis algorithms. We have particularly looked at analysis algorithms that work over long time series, because these are particularly intractable for many Earth Observation datasets which typically store data with one or just a few time steps per file. This post will present side-by-side cost and performance results for several common Earth observation analysis operations.
Nonlinear 3D visco-resistive MHD modeling of fusion plasmas: a comparison between numerical codes

NASA Astrophysics Data System (ADS)

Bonfiglio, D.; Chacon, L.; Cappello, S.

2008-11-01

Fluid plasma models (and, in particular, the MHD model) are extensively used in the theoretical description of laboratory and astrophysical plasmas. We present here a successful benchmark between two nonlinear, three-dimensional, compressible visco-resistive MHD codes. One is the fully implicit, finite volume code PIXIE3D [1,2], which is characterized by many attractive features, notably the generalized curvilinear formulation (which makes the code applicable to different geometries) and the possibility to include in the computation the energy transport equation and the extended MHD version of Ohm's law. In addition, the parallel version of the code features excellent scalability properties. Results from this code, obtained in cylindrical geometry, are compared with those produced by the semi-implicit cylindrical code SpeCyl, which uses finite differences radially, and spectral formulation in the other coordinates [3]. Both single and multi-mode simulations are benchmarked, regarding both reversed field pinch (RFP) and ohmic tokamak magnetic configurations. [1] L. Chacon, Computer Physics Communications 163, 143 (2004). [2] L. Chacon, Phys. Plasmas 15, 056103 (2008). [3] S. Cappello, Plasma Phys. Control. Fusion 46, B313 (2004) & references therein.
Neonatal abstinence syndrome and associated health care expenditures: United States, 2000-2009.

PubMed

Patrick, Stephen W; Schumacher, Robert E; Benneyworth, Brian D; Krans, Elizabeth E; McAllister, Jennifer M; Davis, Matthew M

2012-05-09

Neonatal abstinence syndrome (NAS) is a postnatal drug withdrawal syndrome primarily caused by maternal opiate use. No national estimates are available for the incidence of maternal opiate use at the time of delivery or NAS. To determine the national incidence of NAS and antepartum maternal opiate use and to characterize trends in national health care expenditures associated with NAS between 2000 and 2009. A retrospective, serial, cross-sectional analysis of a nationally representative sample of newborns with NAS. The Kids' Inpatient Database (KID) was used to identify newborns with NAS by International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) code. The Nationwide Inpatient Sample (NIS) was used to identify mothers using diagnosis related groups for vaginal and cesarean deliveries. Clinical conditions were identified using ICD-9-CM diagnosis codes. NAS and maternal opiate use were described as an annual frequency per 1000 hospital births. Missing hospital charges (<5% of cases) were estimated using multiple imputation. Trends in health care utilization outcomes over time were evaluated using variance-weighted regression. All hospital charges were adjusted for inflation to 2009 US dollars. Incidence of NAS and maternal opiate use, and related hospital charges. The separate years (2000, 2003, 2006, and 2009) of national discharge data included 2920 to 9674 unweighted discharges with NAS and 987 to 4563 unweighted discharges for mothers diagnosed with antepartum opiate use, within data sets including 784,191 to 1.1 million discharges for children (KID) and 816,554 to 879,910 discharges for all ages of delivering mothers (NIS). Between 2000 and 2009, the incidence of NAS among newborns increased from 1.20 (95% CI, 1.04-1.37) to 3.39 (95% CI, 3.12-3.67) per 1000 hospital births per year (P for trend < .001). Antepartum maternal opiate use also increased from 1.19 (95% CI, 1.01-1.35) to 5.63 (95% CI, 4.40-6.71) per 1000 hospital births per year (P for trend < .001). In 2009, newborns with NAS were more likely than all other hospital births to have low birthweight (19.1%; SE, 0.5%; vs 7.0%; SE, 0.2%), have respiratory complications (30.9%; SE, 0.7%; vs 8.9%; SE, 0.1%), and be covered by Medicaid (78.1%; SE, 0.8%; vs 45.5%; SE, 0.7%; all P < .001). Mean hospital charges for discharges with NAS increased from $39,400 (95% CI, $33,400-$45,400) in 2000 to $53,400 (95% CI, $49,000-$57,700) in 2009 (P for trend < .001). By 2009, 77.6% of charges for NAS were attributed to state Medicaid programs. Between 2000 and 2009, a substantial increase in the incidence of NAS and maternal opiate use in the United States was observed, as well as hospital charges related to NAS.
NAS infrastructure management system build 1.5 computer-human interface

DOT National Transportation Integrated Search

2001-01-01

Human factors engineers from the National Airspace System (NAS) Human Factors Branch (ACT-530) of the Federal Aviation Administration William J. Hughes Technical Center conducted an evaluation of the NAS Infrastructure Management System (NIMS) Build ...
National Airspace System (NAS) open system architecture and protocols

DOT National Transportation Integrated Search

2003-08-14

This standard establishes the open systems data communications architecture and authorized protocol standards for the National Airspace System (NAS). The NAS will consist of various types of processors and communications networks procured from a vari...
Analysis of manpower and career characteristics of nurse anesthetists in Taiwan: results of a cross-sectional survey of 113 institutes.

PubMed

Dai, Wen-Jan; Chao, Yi-Fang; Kuo, Chien-Ju; Liang, Kuei-Min; Chen, Ta-Liang

2009-12-01

Manpower and the quality of nurse anesthetists (NA) have become critical concerns in Taiwan over the past few decades because of increasing clinical demand and the supervision of NAs by anesthesiologists. To understand manpower distribution, clinical load, job description and limitations, and job satisfaction of NAs, we conducted a cross-sectional survey in Taiwan in 2005. The structure of the questionnaire was initially developed by a drafting group that included members of the Taiwan Society of Anesthesiologists and the Taiwan Association of Nurse Anesthetists. The validity and reliability of the questionnaire was evaluated by specialists. The survey contained questions regarding the demographic characteristics of manpower (anesthesiologist/NA ratio), clinical work load, present job roles, professional expectations, job satisfaction, and reasons for career transfer. The questionnaires were mailed to the superintendents or matrons of NAs, and the administrators of anesthesiology departments across 228 institutions with different accreditation levels and 1953 NA staff between February 1 and December 31, 2005. The validity and reliability of the questionnaire for the department chief and anesthesiology nursing staff was 0.8 and 0.7, respectively. Questionnaires were returned by 113 executives (39 anesthesiology department directors, 74 NA superintendents or matrons) with a response rate of 49.6%, and from 1452 NAs with a response rate of 74.3%. The average clinical load (2002-2004) for the anesthesiologists was 1500-1700 cases/year and 350-380 cases/year for the NAs. The manpower ratio of attending anesthesiologists to NAs was 1:4.3, while the medical centers held the highest ratio. The job stipulation for NAs in Taiwan was compatible with that in the United States and there was a high consistency of opinions between the directors and NA superintendents or matrons. The average rate of career transfer was relatively low (5.5%). From the executives' view, the concerns regarding management of NAs included limited staff capacity, recruiting difficulty, shortage of staff and undefined roles for NAs. On the other hand, NAs showed relatively high job satisfaction and work acceptance and a lower turnover rate in comparison with the general nursing staff. This study represents the first large-scale assessment of the distribution, clinical load, and job satisfaction for NAs in Taiwan. The roles of NAs, which include preoperative preliminary preparation and postoperative intensive care, need to be more well-defined. To improve the quality of NAs and anesthetic care in Taiwan, it is vital to establish an official accreditation system and formal education programs, to institute well-defined and standardized job descriptions, and to improve resource allocation for NAs.
Neonatal Abstinence Syndrome (NAS) in Southwestern Border States: Examining Trends, Population Correlates, and Implications for Policy.

PubMed

Hussaini, Khaleel S; Garcia Saavedra, Luigi F

2018-03-23

Introduction Neonatal abstinence syndrome (NAS) is withdrawal syndrome in newborns following birth and is primarily caused by maternal drug use during pregnancy. This study examines trends, population correlates, and policy implications of NAS in two Southwest border states. Materials and Methods A cross-sectional analysis of Hospital Inpatient Discharge Data (HIDD) was utilized to examine the incidence of NAS in the Southwest border states of Arizona (AZ) and New Mexico (NM). All inpatient hospital births in AZ and NM from January 1, 2008 through December 31, 2013 with ICD9-CM codes for NAS (779.5), cocaine (760.72), or narcotics (760.75) were extracted. Results During 2008-2013 there were 1472 NAS cases in AZ and 888 in NM. The overall NAS rate during this period was 2.83 per 1000 births (95% CI 2.68-2.97) in AZ and 5.31 (95% CI 4.96-5.66) in NM. NAS rates increased 157% in AZ and 174% in NM. NAS newborns were more likely to have low birth weight, have respiratory distress, more likely to have feeding difficulties, and more likely to be on state Medicaid insurance. AZ border region (border with Mexico) had NAS rates significantly higher than the state rate (4.06 per 1000 births [95% CI 3.68-4.44] vs. 2.83 [95% CI 2.68-2.97], respectively). In NM, the border region rate (2.09 per 1000 births [95% CI 1.48-2.69]) was significantly lower than the state rate (5.31 [95% CI 4.96-5.66]). Conclusions Despite a dramatic increase in the incidence of NAS in the U.S. and, in particular, the Southwest border states of AZ and NM, there is still scant research on the overall incidence of NAS, its assessment in the southwest border, and associated long-term outcomes. The Healthy Border (HB) 2020 binational initiative of the U.S.-Mexico Border Health Commission is an initiative that addresses several public health priorities that not only include chronic and degenerative diseases, infectious diseases, injury prevention, maternal and child health but also mental health and addiction. The growing opioid epidemic and rise in NAS cases in the Southwest border, as partially shown in this study, provides another opportunity to track health illnesses and outcomes in the Southwest border, especially because there are targeted resources through High Intensity Drug Trafficking Areas (HIDTA) funding.
Neonatal Abstinence Syndrome: Trend and Expenditure in Louisiana Medicaid, 2003-2013.

PubMed

Okoroh, Ekwutosi M; Gee, Rebekah E; Jiang, Baogong; McNeil, Melissa B; Hardy-Decuir, Beverly A; Zapata, Amy L

2017-07-01

Objectives Determine trends in incidence and expenditure for perinatal drug exposure and neonatal abstinence syndrome (NAS) among Louisiana's Medicaid population. We also describe the maternal characteristics of NAS affected infants. Methods Retrospective cohort analysis using linked Medicaid and vital records data from 2003 to 2013. Conducted incidence and cost trends for drug exposed infants with and without NAS. Also performed comparison statistics among drug exposed infants with and without NAS and those not drug exposed. Results As rate of perinatal drug exposure increased, NAS rate per 1000 live Medicaid births also increased, from 2.1 (2003) to 3.6 (2007) to 8.0 (2013) (P for trend <0.0001). Total medical cost paid by Medicaid also increased from $1.3 million to $3.6 million to $8.7 million (P for trend <0.0001). Compared with drug exposed infants without NAS and those not drug exposed, infants with NAS were more likely to be white, have feeding difficulties, respiratory distress syndrome, sepsis, and seizures, all of which had an association at P < 0.0001. Over one-third (33.2%) of the mothers of infants with NAS had an opioid dependency in combination with a mental illness; with depression being most common. Conclusions for Practice Over an 11-year period, NAS rate among Louisiana's Medicaid infants quadrupled and the cost for caring for the affected infants increased six-fold. Medicaid, as the predominant payer for pregnant women and children affected by substance use disorders, must play a more active role in expanding access to comprehensive substance abuse treatment programs.
The NYC native air sampling pilot project: using HVAC filter data for urban biological incident characterization.

PubMed

Ackelsberg, Joel; Leykam, Frederic M; Hazi, Yair; Madsen, Larry C; West, Todd H; Faltesek, Anthony; Henderson, Gavin D; Henderson, Christopher L; Leighton, Terrance

2011-09-01

Native air sampling (NAS) is distinguished from dedicated air sampling (DAS) devices (eg, BioWatch) that are deployed to detect aerosol disseminations of biological threat agents. NAS uses filter samples from heating, ventilation, and air conditioning (HVAC) systems in commercial properties for environmental sampling after DAS detection of biological threat agent incidents. It represents an untapped, scientifically sound, efficient, widely distributed, and comparably inexpensive resource for postevent environmental sampling. Calculations predict that postevent NAS would be more efficient than environmental surface sampling by orders of magnitude. HVAC filter samples could be collected from pre-identified surrounding NAS facilities to corroborate the DAS alarm and delineate the path taken by the bioaerosol plume. The New York City (NYC) Native Air Sampling Pilot Project explored whether native air sampling would be acceptable to private sector stakeholders and could be implemented successfully in NYC. Building trade associations facilitated outreach to and discussions with property owners and managers, who expedited contact with building managers of candidate NAS properties that they managed or owned. Nominal NAS building requirements were determined; procedures to identify and evaluate candidate NAS facilities were developed; data collection tools and other resources were designed and used to expedite candidate NAS building selection and evaluation in Manhattan; and exemplar environmental sampling playbooks for emergency responders were completed. In this sample, modern buildings with single or few corporate tenants were the best NAS candidate facilities. The Pilot Project successfully demonstrated that in one urban setting a native air sampling strategy could be implemented with effective public-private collaboration.
How the perspectives of nursing assistants and frail elderly residents on their daily interaction in nursing homes affect their interaction: a qualitative study.

PubMed

Lung, Chi-Chi; Liu, Justina Yat Wa

2016-01-14

Good support from and positive relations with institutional staff can enhance the psychosocial wellbeing of residents admitted to a nursing home. Nursing assistants (NAs) interact most frequently with residents and play an important role in developing good rapport with them. Most studies have described the daily interactions between NAs and residents as task oriented. Only few have attempted to explore the perspectives of NAs and residents on their daily interactions. Therefore, the aim of this study was to identify the types of daily interactions perceived by NAs and residents. We also investigated those intentions/beliefs held by NAs and residents that might direct their interactive behaviors. A descriptive, exploratory, qualitative approach was used to explore the perspectives of 18 NAs (mean age: 51) and 15 residents (mean age: 84.4) on their daily interactions. Unstructured in-depth interviews were used to collect data. All of the interviews were conducted from July to December 2013. The collected data were transcribed verbatim and analyzed by content analysis. Three types of interactions were found that described the NAs' and residents' perspectives on their daily interactions: (1) physiologically-oriented daily interactions; (2) cordial interactions intended to maintain a harmonious atmosphere; and (3) reciprocal social interactions intended to develop closer rapport. One or more themes reflecting the participants' intentions or beliefs were identified from each group to support each type of interaction. An over-emphasis on the formal caring relationship and over-concern about maintaining a harmonious atmosphere contributed to a superficial and distant relationship between the two parties. Building close rapport takes time and involves repeated reciprocal social interactions. The findings showed that with good intentions to establish closer rapport, both NAs and residents did favors for each other. All of those favors were easily integrated in the care provided to the residents without increasing the workload of the NAs. Modifying the training given to NAs and adjusting institutional policies are crucial to raising the competence of the NAs in building good relationships with residents. Positive interactions improve the psychosocial wellbeing of the residents and encourage them to cooperate during the delivery of care, thereby improving their overall health and contributing to the NAs' job satisfaction.
A Method for Suppressing Line Overload Phenomena Using NAS Battery Systems

NASA Astrophysics Data System (ADS)

Ohtaka, Toshiya; Iwamoto, Shinichi

In this paper, we pay attention to the superior operating control function and instantaneous discharging characteristics of NAS battery systems, and propose a method for determining installation planning and operating control schemes of NAS battery systems for suppressing line overload phenomena. In the stage of planning, a target contingency is identified, and an optimal allocation and capacity of NAS battery systems and an amount of generation changes are determined for the contingency. In the stage of operation, the control strategy of NAS battery system is determined. Simulations are carried out for verifying the validity of the proposed method using the IEEJ 1 machine V system model and an example 2 machine 16 bus system model.
Relationship between Weather, Traffic and Delay Based on Empirical Methods

NASA Technical Reports Server (NTRS)

Sridhar, Banavar; Swei, Sean S. M.

2006-01-01

The steady rise in demand for air transportation over the years has put much emphasis on the need for sophisticated air traffic flow management (TFM) within the National Airspace System (NAS). The NAS refers to hardware, software and people, including runways, radars, networks, FAA, airlines, etc., involved in air traffic management (ATM) in the US. One of the metrics that has been used to assess the performance of NAS is the actual delays provided through FAA's Air Traffic Operations Network (OPSNET). The OPSNET delay data includes those reportable delays, i.e. delays of 15 minutes or more experienced by Instrument Flight Rule (IFR) flights, submitted by the FAA facilities. These OPSNET delays are caused by the application of TFM initiatives in response to, for instance, weather conditions, increased traffic volume, equipment outages, airline operations, and runway conditions. TFM initiatives such as, ground stops, ground delay programs, rerouting, airborne holding, and miles-in-trail restrictions, are actions which are needed to control the air traffic demand to mitigate the demand-capacity imbalance due to the reduction in capacity. Consequently, TFM initiatives result in NAS delays. Of all the causes, weather has been identified as the most important causal factor for NAS delays. Therefore, in order to accurately assess the NAS performance, it has become necessary to create a baseline for NAS performance and establish a model which characterizes the relation between weather and NAS delays.
Does Maternal Buprenorphine Dose Affect Severity or Incidence of Neonatal Abstinence Syndrome?

PubMed

Wong, Jacqueline; Saver, Barry; Scanlan, James M; Gianutsos, Louis Paul; Bhakta, Yachana; Walsh, James; Plawman, Abigail; Sapienza, David; Rudolf, Vania

2018-06-13

To measure the incidence, onset, duration, and severity of neonatal abstinence syndrome (NAS) in infants born to mothers receiving buprenorphine and to assess the association between buprenorphine dose and NAS outcomes. We reviewed charts of all mother-infant pairs maintained on buprenorphine who delivered in our hospital from January 1, 2000 to April 1, 2016. In 89 infants, NAS incidence requiring morphine was 43.8%. Means for morphine-treated infants included: 55.2 hours to morphine start, 15.9 days on morphine, and 20 days hospital stay. NAS requiring morphine treatment occurred in 48.5% and 41.4% of infants of mothers receiving ≤8 mg/d buprenorphine versus >8 mg/d, respectively (P = 0.39). We found no significant associations of maternal buprenorphine dose with peak NAS score, NAS severity requiring morphine, time to morphine start, peak morphine dose, or days on morphine. Among the other factors examined, only exclusive breastfeeding was significantly associated with neonatal outcomes, specifically lower odds of morphine treatment (odds ratio 0.24, P = 0.003). These findings suggest higher buprenorphine doses can be prescribed to pregnant women receiving medication therapy for addiction without increasing NAS severity. Our finding of reduced risk of NAS requiring morphine treatment also suggests breastfeeding is both safe and beneficial for these infants and should be encouraged.
Cell cycle effect on the activity of deoxynucleoside analogue metabolising enzymes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fyrberg, Anna; Albertioni, Freidoun; Lotfi, Kourosh

Deoxynucleoside analogues (dNAs) are cytotoxic towards both replicating and indolent malignancies. The impact of fluctuations in the metabolism of dNAs in relation to cell cycle could have strong implications regarding the activity of dNAs. Deoxycytidine kinase (dCK) and deoxyguanosine kinase (dGK) are important enzymes for phosphorylation/activation of dNAs. These drugs can be dephosphorylated/deactivated by 5'-nucleotidases (5'-NTs) and elevated activities of 5'-NTs and decreased dCK and/or dGK activities represent resistance mechanisms towards dNAs. The activities of dCK, dGK, and three 5'-NTs were investigated in four human leukemic cell lines in relationship to cell cycle progression and cytotoxicity of dNAs. Synchronization ofmore » cell cultures to arrest in G0/G1 by serum-deprivation was performed followed by serum-supplementation for cell cycle progression. The activities of dCK and dGK increased up to 3-fold in CEM, HL60, and MOLT-4 cells as they started to proliferate, while the activity of cytosolic nucleotidase I was reduced in proliferating cells. CEM, HL60, and MOLT-4 cells were also more sensitive to cladribine, cytarabine, 9-{beta}-D-arabinofuranosylguanine and clofarabine than K562 cells which demonstrated lower levels and less alteration of these enzymes and were least susceptible to the cytotoxic effects of most dNAs. The results suggest that, in the cell lines studied, the proliferation process is associated with a general shift in the direction of activation of dNAs by inducing activities of dCK/dGK and reducing the activity of cN-I which is favourable for the cytotoxic effects of cladribine, cytarabine and, 9-{beta}-D-arabinofuranosylguanine. These results emphasize the importance of cellular proliferation and dNA metabolism by both phosphorylation and dephosphorylation for susceptibility to dNAs. It underscores the need to understand the mechanisms of action and resistance to dNAs in order to increase efficacy of dNAs treatment by new rational.« less

[Activities of Bay Area Research Corporation

NASA Technical Reports Server (NTRS)

2003-01-01

During the final year of this effort the HALFSHEL code was converted to work on a fast single processor workstation from it s parallel configuration. This was done because NASA Ames NAS facility stopped supporting space science and we no longer had access to parallel computer time. The single processor version of HALFSHEL was upgraded to address low density cells by using a a 3-D SOR solver to solve the equation Delta central dot E = 0. We then upgraded the ionospheric load packages to provide a multiple species load of the ionosphere out to 1.4 Rm. With these new tools we began to perform a series of simulations to address the major topic of this research effort; determining the loss rate of O(sup +) and O2(sup +) from Mars. The simulations used the nominal Parker spiral field and in one case used a field perpendicular to the solar wind flow. The simulations were performed for three different solar EUV fluxes consistent with the different solar evolutionary states believed to exist before today. The 1 EUV case is the nominal flux of today. The 3 EUV flux is called Epoch 2 and has three times the flux of todays. The 6 EUV case is Epoch 3 and has 6 times the EUV flux of today.
Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils.

PubMed

Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton

2014-11-11

We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.
FaCSI: A block parallel preconditioner for fluid-structure interaction in hemodynamics

NASA Astrophysics Data System (ADS)

Deparis, Simone; Forti, Davide; Grandperrin, Gwenol; Quarteroni, Alfio

2016-12-01

Modeling Fluid-Structure Interaction (FSI) in the vascular system is mandatory to reliably compute mechanical indicators in vessels undergoing large deformations. In order to cope with the computational complexity of the coupled 3D FSI problem after discretizations in space and time, a parallel solution is often mandatory. In this paper we propose a new block parallel preconditioner for the coupled linearized FSI system obtained after space and time discretization. We name it FaCSI to indicate that it exploits the Factorized form of the linearized FSI matrix, the use of static Condensation to formally eliminate the interface degrees of freedom of the fluid equations, and the use of a SIMPLE preconditioner for saddle-point problems. FaCSI is built upon a block Gauss-Seidel factorization of the FSI Jacobian matrix and it uses ad-hoc preconditioners for each physical component of the coupled problem, namely the fluid, the structure and the geometry. In the fluid subproblem, after operating static condensation of the interface fluid variables, we use a SIMPLE preconditioner on the reduced fluid matrix. Moreover, to efficiently deal with a large number of processes, FaCSI exploits efficient single field preconditioners, e.g., based on domain decomposition or the multigrid method. We measure the parallel performances of FaCSI on a benchmark cylindrical geometry and on a problem of physiological interest, namely the blood flow through a patient-specific femoropopliteal bypass. We analyze the dependence of the number of linear solver iterations on the cores count (scalability of the preconditioner) and on the mesh size (optimality).
Use of general purpose graphics processing units with MODFLOW

USGS Publications Warehouse

Hughes, Joseph D.; White, Jeremy T.

2013-01-01

To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.
Understanding the similarities and differences between ozone and peroxone in the degradation of naphthenic acids: Comparative performance for potential treatment.

PubMed

Meshref, Mohamed N A; Klamerth, Nikolaus; Islam, Md Shahinoor; McPhedran, Kerry N; Gamal El-Din, Mohamed

2017-08-01

Ozonation at high doses is a costly treatment for oil sands process-affected water (OSPW) naphthenic acids (NAs) degradation. To decrease costs and limit doses, different peroxone (hydrogen peroxide/ozone; H 2 O 2 :O 3 ) processes using mild-ozone doses of 30 and 50 mg/L were investigated. The degradation efficiency of O x -NAs (classical (O 2 -NAs) + oxidized NAs) improved from 58% at 30 mg/L ozone to 59%, 63% and 76% at peroxone (1:1), 50 mg/L ozone, and peroxone (1:2), respectively. Suppressing the hydroxyl radical (•OH) pathway by adding tert-butyl alcohol did significantly reduce the degradation in all treatments, while molecular ozone contribution was around 50% and 34% for O 2 -NAs and O x -NAs, respectively. Structure reactivity toward degradation was observed with degradation increase for both O 2 -NAs and O x -NAs with increase of both carbon (n) and hydrogen deficiency/or |-Z| numbers in all treatments. However, the combined effect of n and Z showed specific insights and differences between ozone and peroxone treatments. The degradation pathway for |-Z|≥10 isomers in ozone treatments through molecular ozone was significant compared to •OH. Though peroxone (1:2) highly reduced the fluorophore organics and toxicity to Vibrio fischeri, the best oxidant utilization in the degradation of O 2 -NAs (mg/L) per ozone dose (mg/L) was observed in the peroxone (1:1) (0.91) and 30 mg/L ozone treatments (0.92). At n = 9-11, peroxone (1:1) had similar or enhanced effect on the O 2 -NAs degradation compared to 50 mg/L ozone. Enhancing •OH pathway through peroxone versus ozone may be an effective OSPW treatment that will allow its safe release into receiving environments with marginal cost addition. Copyright © 2017 Elsevier Ltd. All rights reserved.
Environmental, health, and safety issues of sodium-sulfur batteries for electric and hybrid vehicles. Volume 1, Cell and battery safety

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ohi, J M

1992-09-01

This report is the first of four volumes that identify and assess the environmental, health, and safety issues involved in using sodium-sulfur (Na/S) battery technology as the energy source in electric and hybrid vehicles that may affect the commercialization of Na/S batteries. This and the other reports on recycling, shipping, and vehicle safety are intended to help the Electric and Hybrid Propulsion Division of the Office of Transportation Technologies in the US Department of Energy (DOE/EHP) determine the direction of its research, development, and demonstration (RD&D) program for Na/S battery technology. The reports review the status of Na/S battery RD&Dmore » and identify potential hazards and risks that may require additional research or that may affect the design and use of Na/S batteries. This volume covers cell design and engineering as the basis of safety for Na/S batteries and describes and assesses the potential chemical, electrical, and thermal hazards and risks of Na/S cells and batteries as well as the RD&D performed, under way, or to address these hazards and risks. The report is based on a review of the literature and on discussions with experts at DOE, national laboratories and agencies, universities, and private industry. Subsequent volumes will address environmental, health, and safety issues involved in shipping cells and batteries, using batteries to propel electric vehicles, and recycling and disposing of spent batteries. The remainder of this volume is divided into two major sections on safety at the cell and battery levels. The section on Na/S cells describes major component and potential failure modes, design, life testing and failure testing, thermal cycling, and the safety status of Na/S cells. The section on batteries describes battery design, testing, and safety status. Additional EH&S information on Na/S batteries is provided in the appendices.« less
Comparison of UV/hydrogen peroxide, potassium ferrate(VI), and ozone in oxidizing the organic fraction of oil sands process-affected water (OSPW).

PubMed

Wang, Chengjin; Klamerth, Nikolaus; Messele, Selamawit Ashagre; Singh, Arvinder; Belosevic, Miodrag; Gamal El-Din, Mohamed

2016-09-01

The efficiency of three different oxidation processes, UV/H2O2 oxidation, ferrate(VI) oxidation, and ozonation with and without hydroxyl radical (OH) scavenger tert-butyl alcohol (TBA) on the removal of organic compounds from oil sands process-affected water (OSPW) was investigated and compared. The removal of aromatics and naphthenic acids (NAs) was explored by synchronous fluorescence spectra (SFS), ion mobility spectra (IMS), proton and carbon nuclear magnetic resonance ((1)H and (13)C NMR), and ultra-performance liquid chromatography coupled with time-of-flight mass spectrometry (UPLC TOF-MS). UV/H2O2 oxidation occurred through radical reaction and photolysis, transforming one-ring, two-ring, and three-ring fluorescing aromatics simultaneously and achieving 42.4% of classical NAs removal at 2.0 mM H2O2 and 950 mJ/cm(2) UV dose provided with medium pressure mercury lamp. Ferrate(VI) oxidation exhibited high selectivity, preferentially removing two-ring and three-ring fluorescing aromatics, sulfur-containing NAs (NAs + S), and NAs with high carbon and high hydrogen deficiency. At 2.0 mM Fe(VI), 46.7% of classical NAs was removed. Ozonation achieved almost complete removal of fluorescing aromatics, NAs + S, and classical NAs (NAs with two oxygen atoms) at the dose of 2.0 mM O3. Both molecular ozone reaction and OH reaction were important pathways in transforming the organics in OSPW as supported by ozonation performance with and without TBA. (1)H NMR analyses further confirmed the removal of aromatics and NAs both qualitatively and quantitatively. All the three oxidation processes reduced the acute toxicity towards Vibrio fischeri and on goldfish primary kidney macrophages (PKMs), with ozonation being the most efficient. Copyright © 2016 Elsevier Ltd. All rights reserved.
Self-assembled GaInNAs/GaAsN quantum dot lasers: solid source molecular beam epitaxy growth and high-temperature operation

PubMed Central

Liu, CY; Sun, ZZ; Yew, KC

2006-01-01

Self-assembled GaInNAs quantum dots (QDs) were grown on GaAs (001) substrate using solid-source molecular-beam epitaxy (SSMBE) equipped with a radio-frequency nitrogen plasma source. The GaInNAs QD growth characteristics were extensively investigated using atomic-force microscopy (AFM), photoluminescence (PL), and transmission electron microscopy (TEM) measurements. Self-assembled GaInNAs/GaAsN single layer QD lasers grown using SSMBE have been fabricated and characterized. The laser worked under continuous wave (CW) operation at room temperature (RT) with emission wavelength of 1175.86 nm. Temperature-dependent measurements have been carried out on the GaInNAs QD lasers. The lowest obtained threshold current density in this work is ∼1.05 kA/cm2from a GaInNAs QD laser (50 × 1,700 µm2) at 10 °C. High-temperature operation up to 65 °C was demonstrated from an unbonded GaInNAs QD laser (50 × 1,060 µm2), with high characteristic temperature of 79.4 K in the temperature range of 10–60 °C.
Chemical fingerprinting of naphthenic acids at an oil sands end pit lake by comprehensive two-dimensional gas chromatography/time-of-flight mass spectrometry (GC×GC/TOFMS)

NASA Astrophysics Data System (ADS)

Bowman, D. T.; Arriaga, D.; Morris, P.; Risacher, F.; Warren, L. A.; McCarry, B. E.; Slater, G.

2016-12-01

Naphthenic acids (NAs) are naturally occurring in Athabasca oil sands and accumulate in tailings as a result of water-based extraction processes. NAs contribute to the toxicity of tailings and oil sands process-affected water (OSPW). NAs exist as a complex mixture, so the development of an analytical technique to characterize them has been an on-going challenge. The monitoring of individual NAs and their associated isomers through multidimensional chromatography has the potential to provide greater insight into the behavior of NAs in the environment. For NAs whose proportions do not change during environmental processing, NA ratios may provide a means to develop fingerprints characteristic of specific sources. Alternatively, relative changes in the proportions of NAs may provide a tracer of their occurrence and extent of removal. As yet, only a few studies have begun to explore these possibilities. In this study, comprehensive two dimensional gas chromatography/time-of-flight mass spectrometry was used to monitor individual naphthenic acids in an end pit lake in Alberta, Canada. NA profiles from different depths and sampling locations were compared to evaluate the spatial variations at the site.
Natural attenuation software (NAS): Assessing remedial strategies and estimating timeframes

USGS Publications Warehouse

Mendez, E.; Widdowson, M.; Chapelle, F.; Casey, C.

2005-01-01

Natural Attenuation Software (NAS) is a screening tool to estimate remediation timeframes for monitored natural attenuation (MNA) and to assist in decision-making on the level of source zone treatment in conjunction with MNA using site-specific remediation objectives. Natural attenuation processes that NAS models include are advection, dispersion, sorption, non-aqueous phase liquid (NAPL) dissolution, and biodegradation of either petroleum hydrocarbons or chlorinated ethylenes. Newly-implemented enhancements to NAS designed to maximize the utility of NAS for site managers were observed. NAS has expanded source contaminant specification options to include chlorinated ethanes and chlorinated methanes, and to allow for the analysis of any other user-defined contaminants that may be subject to microbially-mediated transformations (heavy metals, radioisotopes, etc.). Included is the capability to model co-mingled plumes, with constituents from multiple contaminant categories. To enable comparison of remediation timeframe estimates between MNA and specific engineered remedial actions , NAS was modified to incorporate an estimation technique for timeframes associated with pump-and-treat remediation technology for comparison to MNA. This is an abstract of a paper presented at the 8th International In Situ and On-Site Bioremediation Symposium (Baltimore, MD 6/6-9/2005).
An Implementation Plan for NFS at NASA's NAS Facility

NASA Technical Reports Server (NTRS)

Lam, Terance L.; Kutler, Paul (Technical Monitor)

1998-01-01

This document discusses how NASA's NAS can benefit from the Sun Microsystems' Network File System (NFS). A case study is presented to demonstrate the effects of NFS on the NAS supercomputing environment. Potential problems are addressed and an implementation strategy is proposed.
Advanced Computational Methods for Security Constrained Financial Transmission Rights: Structure and Parallelism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Elbert, Stephen T.; Kalsi, Karanjit; Vlachopoulou, Maria

Financial Transmission Rights (FTRs) help power market participants reduce price risks associated with transmission congestion. FTRs are issued based on a process of solving a constrained optimization problem with the objective to maximize the FTR social welfare under power flow security constraints. Security constraints for different FTR categories (monthly, seasonal or annual) are usually coupled and the number of constraints increases exponentially with the number of categories. Commercial software for FTR calculation can only provide limited categories of FTRs due to the inherent computational challenges mentioned above. In this paper, a novel non-linear dynamical system (NDS) approach is proposed tomore » solve the optimization problem. The new formulation and performance of the NDS solver is benchmarked against widely used linear programming (LP) solvers like CPLEX™ and tested on large-scale systems using data from the Western Electricity Coordinating Council (WECC). The NDS is demonstrated to outperform the widely used CPLEX algorithms while exhibiting superior scalability. Furthermore, the NDS based solver can be easily parallelized which results in significant computational improvement.« less
Particle-in-Cell laser-plasma simulation on Xeon Phi coprocessors

NASA Astrophysics Data System (ADS)

Surmin, I. A.; Bastrakov, S. I.; Efimenko, E. S.; Gonoskov, A. A.; Korzhimanov, A. V.; Meyerov, I. B.

2016-05-01

This paper concerns the development of a high-performance implementation of the Particle-in-Cell method for plasma simulation on Intel Xeon Phi coprocessors. We discuss the suitability of the method for Xeon Phi architecture and present our experience in the porting and optimization of the existing parallel Particle-in-Cell code PICADOR. Direct porting without code modification gives performance on Xeon Phi close to that of an 8-core CPU on a benchmark problem with 50 particles per cell. We demonstrate step-by-step optimization techniques, such as improving data locality, enhancing parallelization efficiency and vectorization leading to an overall 4.2 × speedup on CPU and 7.5 × on Xeon Phi compared to the baseline version. The optimized version achieves 16.9 ns per particle update on an Intel Xeon E5-2660 CPU and 9.3 ns per particle update on an Intel Xeon Phi 5110P. For a real problem of laser ion acceleration in targets with surface grating, where a large number of macroparticles per cell is required, the speedup of Xeon Phi compared to CPU is 1.6 ×.
Taming parallel I/O complexity with auto-tuning

DOE PAGES

Behzad, Babak; Luu, Huong Vu Thanh; Huchette, Joseph; ...

2013-11-17

We present an auto-tuning system for optimizing I/O performance of HDF5 applications and demonstrate its value across platforms, applications, and at scale. The system uses a genetic algorithm to search a large space of tunable parameters and to identify effective settings at all layers of the parallel I/O stack. The parameter settings are applied transparently by the auto-tuning system via dynamically intercepted HDF5 calls. To validate our auto-tuning system, we applied it to three I/O benchmarks (VPIC, VORPAL, and GCRM) that replicate the I/O activity of their respective applications. We tested the system with different weak-scaling configurations (128, 2048, andmore » 4096 CPU cores) that generate 30 GB to 1 TB of data, and executed these configurations on diverse HPC platforms (Cray XE6, IBM BG/P, and Dell Cluster). In all cases, the auto-tuning framework identified tunable parameters that substantially improved write performance over default system settings. In conclusion, we consistently demonstrate I/O write speedups between 2x and 100x for test configurations.« less
Using video-oriented instructions to speed up sequence comparison.

PubMed

Wozniak, A

1997-04-01

This document presents an implementation of the well-known Smith-Waterman algorithm for comparison of proteic and nucleic sequences, using specialized video instructions. These instructions, SIMD-like in their design, make possible parallelization of the algorithm at the instruction level. Benchmarks on an ULTRA SPARC running at 167 MHz show a speed-up factor of two compared to the same algorithm implemented with integer instructions on the same machine. Performance reaches over 18 million matrix cells per second on a single processor, giving to our knowledge the fastest implementation of the Smith-Waterman algorithm on a workstation. The accelerated procedure was introduced in LASSAP--a LArge Scale Sequence compArison Package software developed at INRIA--which handles parallelism at higher level. On a SUN Enterprise 6000 server with 12 processors, a speed of nearly 200 million matrix cells per second has been obtained. A sequence of length 300 amino acids is scanned against SWISSPROT R33 (1,8531,385 residues) in 29 s. This procedure is not restricted to databank scanning. It applies to all cases handled by LASSAP (intra- and inter-bank comparisons, Z-score computation, etc.
Targeted proteomics coming of age - SRM, PRM and DIA performance evaluated from a core facility perspective.

PubMed

Kockmann, Tobias; Trachsel, Christian; Panse, Christian; Wahlander, Asa; Selevsek, Nathalie; Grossmann, Jonas; Wolski, Witold E; Schlapbach, Ralph

2016-08-01

Quantitative mass spectrometry is a rapidly evolving methodology applied in a large number of omics-type research projects. During the past years, new designs of mass spectrometers have been developed and launched as commercial systems while in parallel new data acquisition schemes and data analysis paradigms have been introduced. Core facilities provide access to such technologies, but also actively support the researchers in finding and applying the best-suited analytical approach. In order to implement a solid fundament for this decision making process, core facilities need to constantly compare and benchmark the various approaches. In this article we compare the quantitative accuracy and precision of current state of the art targeted proteomics approaches single reaction monitoring (SRM), parallel reaction monitoring (PRM) and data independent acquisition (DIA) across multiple liquid chromatography mass spectrometry (LC-MS) platforms, using a readily available commercial standard sample. All workflows are able to reproducibly generate accurate quantitative data. However, SRM and PRM workflows show higher accuracy and precision compared to DIA approaches, especially when analyzing low concentrated analytes. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Communication Studies of DMP and SMP Machines

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

Understanding the interplay between machines and problems is key to obtaining high performance on parallel machines. This paper investigates the interplay between programming paradigms and communication capabilities of parallel machines. In particular, we explicate the communication capabilities of the IBM SP-2 distributed-memory multiprocessor and the SGI PowerCHALLENGEarray symmetric multiprocessor. Two benchmark problems of bitonic sorting and Fast Fourier Transform are selected for experiments. Communication-efficient algorithms are developed to exploit the overlapping capabilities of the machines. Programs are written in Message-Passing Interface for portability and identical codes are used for both machines. Various data sizes and message sizes are used to test the machines' communication capabilities. Experimental results indicate that the communication performance of the multiprocessors are consistent with the size of messages. The SP-2 is sensitive to message size but yields a much higher communication overlapping because of the communication co-processor. The PowerCHALLENGEarray is not highly sensitive to message size and yields a low communication overlapping. Bitonic sorting yields lower performance compared to FFT due to a smaller computation-to-communication ratio.
Design and evaluation of an architecture for a digital signal processor for instrumentation applications

NASA Astrophysics Data System (ADS)

Fellman, Ronald D.; Kaneshiro, Ronald T.; Konstantinides, Konstantinos

1990-03-01

The authors present the design and evaluation of an architecture for a monolithic, programmable, floating-point digital signal processor (DSP) for instrumentation applications. An investigation of the most commonly used algorithms in instrumentation led to a design that satisfies the requirements for high computational and I/O (input/output) throughput. In the arithmetic unit, a 16- x 16-bit multiplier and a 32-bit accumulator provide the capability for single-cycle multiply/accumulate operations, and three format adjusters automatically adjust the data format for increased accuracy and dynamic range. An on-chip I/O unit is capable of handling data block transfers through a direct memory access port and real-time data streams through a pair of parallel I/O ports. I/O operations and program execution are performed in parallel. In addition, the processor includes two data memories with independent addressing units, a microsequencer with instruction RAM, and multiplexers for internal data redirection. The authors also present the structure and implementation of a design environment suitable for the algorithmic, behavioral, and timing simulation of a complete DSP system. Various benchmarking results are reported.
Environmental, health, and safety issues of sodium-sulfur batteries for electric and hybrid vehicles. Volume 2, Battery recycling and disposal

DOE Office of Scientific and Technical Information (OSTI.GOV)

Corbus, D

1992-09-01

Recycling and disposal of spent sodium-sulfur (Na/S) batteries are important issues that must be addressed as part of the commercialization process of Na/S battery-powered electric vehicles. The use of Na/S batteries in electric vehicles will result in significant environmental benefits, and the disposal of spent batteries should not detract from those benefits. In the United States, waste disposal is regulated under the Resource Conservation and Recovery Act (RCRA). Understanding these regulations will help in selecting recycling and disposal processes for Na/S batteries that are environmentally acceptable and cost effective. Treatment processes for spent Na/S battery wastes are in the beginningmore » stages of development, so a final evaluation of the impact of RCRA regulations on these treatment processes is not possible. The objectives of tills report on battery recycling and disposal are as follows: Provide an overview of RCRA regulations and requirements as they apply to Na/S battery recycling and disposal so that battery developers can understand what is required of them to comply with these regulations; Analyze existing RCRA regulations for recycling and disposal and anticipated trends in these regulations and perform a preliminary regulatory analysis for potential battery disposal and recycling processes. This report assumes that long-term Na/S battery disposal processes will be capable of handling large quantities of spent batteries. The term disposal includes treatment processes that may incorporate recycling of battery constituents. The environmental regulations analyzed in this report are limited to US regulations. This report gives an overview of RCRA and discusses RCRA regulations governing Na/S battery disposal and a preliminary regulatory analysis for Na/S battery disposal.« less
Supercomputer simulations of structure formation in the Universe

NASA Astrophysics Data System (ADS)

Ishiyama, Tomoaki

2017-06-01

We describe the implementation and performance results of our massively parallel MPI†/OpenMP‡ hybrid TreePM code for large-scale cosmological N-body simulations. For domain decomposition, a recursive multi-section algorithm is used and the size of domains are automatically set so that the total calculation time is the same for all processes. We developed a highly-tuned gravity kernel for short-range forces, and a novel communication algorithm for long-range forces. For two trillion particles benchmark simulation, the average performance on the fullsystem of K computer (82,944 nodes, the total number of core is 663,552) is 5.8 Pflops, which corresponds to 55% of the peak speed.

Low-frequency quadrupole impedance of undulators and wigglers

DOE PAGES

Blednykh, A.; Bassi, G.; Hidaka, Y.; ...

2016-10-25

An analytical expression of the low-frequency quadrupole impedance for undulators and wigglers is derived and benchmarked against beam-based impedance measurements done at the 3 GeV NSLS-II storage ring. The adopted theoretical model, valid for an arbitrary number of electromagnetic layers with parallel geometry, allows to calculate the quadrupole impedance for arbitrary values of the magnetic permeability μ r. Here, in the comparison of the analytical results with the measurements for variable magnet gaps, two limit cases of the permeability have been studied: the case of perfect magnets (μ r → ∞), and the case in which the magnets are fullymore » saturated (μ r = 1).« less
Fairness of QoS supporting in optical burst switching

NASA Astrophysics Data System (ADS)

Xuan, Xuelei; Liu, Hua; Chen, Chunfeng; Zhang, Zhizhong

2004-04-01

In this paper we investigate the fairness problem of offset-time-based quality of service (QoS) scheme proposed by Qiao and Dixit in optical burst switching (OBS) networks. In the proposed schemes, QoS relies on the fact that the requests for reservation further into the future, but for practical, benchmark offset-time of data bursts at the intermediate nodes is not equal to each other. Here, a new offset-time-based QoS scheme is introduced, where data bursts are classified according to their offset-time and isolated in the wavelength domain or time domain to achieve the parallel reservation. Through simulation, it is found that this scheme achieves fairness among data bursts with different priority.
A New Code SORD for Simulation of Polarized Light Scattering in the Earth Atmosphere

NASA Technical Reports Server (NTRS)

Korkin, Sergey; Lyapustin, Alexei; Sinyuk, Aliaksandr; Holben, Brent

2016-01-01

We report a new publicly available radiative transfer (RT) code for numerical simulation of polarized light scattering in plane-parallel atmosphere of the Earth. Using 44 benchmark tests, we prove high accuracy of the new RT code, SORD (Successive ORDers of scattering). We describe capabilities of SORD and show run time for each test on two different machines. At present, SORD is supposed to work as part of the Aerosol Robotic NETwork (AERONET) inversion algorithm. For natural integration with the AERONET software, SORD is coded in Fortran 90/95. The code is available by email request from the corresponding (first) author or from ftp://climate1.gsfc.nasa.gov/skorkin/SORD/.
High-order finite difference formulations for the incompressible Navier-Stokes equations on the CM-5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tafti, D.

1995-12-01

The paper describes the features and implementation of a general purpose high-order accurate finite difference computer program for direct and large-eddy simulations of turbulence on the CM-5 in the data parallel mode. Benchmarking studies for a direct simulation of turbulent channel flow are discussed. Performance of up to 8.8 GFLOPS is obtained for the high-order formulations on 512 processing nodes of the CM-5. The execution time for a simulation with 24 million nodes in a domain with two periodic directions is in the range of 0.2 {mu}secs/time-step/degree of freedom on 512 processing nodes of the CM-5.
Turbulent shear layers in confining channels

NASA Astrophysics Data System (ADS)

Benham, Graham P.; Castrejon-Pita, Alfonso A.; Hewitt, Ian J.; Please, Colin P.; Style, Rob W.; Bird, Paul A. D.

2018-06-01

We present a simple model for the development of shear layers between parallel flows in confining channels. Such flows are important across a wide range of topics from diffusers, nozzles and ducts to urban air flow and geophysical fluid dynamics. The model approximates the flow in the shear layer as a linear profile separating uniform-velocity streams. Both the channel geometry and wall drag affect the development of the flow. The model shows good agreement with both particle image velocimetry experiments and computational turbulence modelling. The simplicity and low computational cost of the model allows it to be used for benchmark predictions and design purposes, which we demonstrate by investigating optimal pressure recovery in diffusers with non-uniform inflow.
GPU acceleration for digitally reconstructed radiographs using bindless texture objects and CUDA/OpenGL interoperability.

PubMed

Abdellah, Marwan; Eldeib, Ayman; Owis, Mohamed I

2015-01-01

This paper features an advanced implementation of the X-ray rendering algorithm that harnesses the giant computing power of the current commodity graphics processors to accelerate the generation of high resolution digitally reconstructed radiographs (DRRs). The presented pipeline exploits the latest features of NVIDIA Graphics Processing Unit (GPU) architectures, mainly bindless texture objects and dynamic parallelism. The rendering throughput is substantially improved by exploiting the interoperability mechanisms between CUDA and OpenGL. The benchmarks of our optimized rendering pipeline reflect its capability of generating DRRs with resolutions of 2048(2) and 4096(2) at interactive and semi interactive frame-rates using an NVIDIA GeForce 970 GTX device.
Beyond core count: a look at new mainstream computing platforms for HEP workloads

NASA Astrophysics Data System (ADS)

Szostek, P.; Nowak, A.; Bitzes, G.; Valsan, L.; Jarp, S.; Dotti, A.

2014-06-01

As Moore's Law continues to deliver more and more transistors, the mainstream processor industry is preparing to expand its investments in areas other than simple core count. These new interests include deep integration of on-chip components, advanced vector units, memory, cache and interconnect technologies. We examine these moving trends with parallelized and vectorized High Energy Physics workloads in mind. In particular, we report on practical experience resulting from experiments with scalable HEP benchmarks on the Intel "Ivy Bridge-EP" and "Haswell" processor families. In addition, we examine the benefits of the new "Haswell" microarchitecture and its impact on multiple facets of HEP software. Finally, we report on the power efficiency of new systems.
Operational implications and proposed infrastructure changes for NAS integration of remotely piloted aircraft (RPA)

DOT National Transportation Integrated Search

2014-12-01

The intent of this report is to provide (1) an initial assessment of National Airspace System (NAS) infrastructure affected by continuing development and deployment of unmanned aircraft systems into the NAS, and (2) a description of process challenge...
Co-observation of sporadic K layer and sporadic Na layer at Beijing, China (40.6°N, 116.2°E)

NASA Astrophysics Data System (ADS)

Jiao, Jing

2016-07-01

A double-laser beam lidar was successfully developed to simultaneously measure K and Na layers at Beijing (40.6°N, 116.2°E) in 2010. Statistical analysis of the parameters of sporadic K (Ks) and sporadic Na (Nas) layers was performed over two years of lidar data, and different characteristics of them were found. The average Ks occurrence (2.9 %) was lower than that of Nas (5.9 %); the Nas occurrence had a maximum (19.3 %) in May-June months and a minimum (1.6 %) in January-February months, while the Ks occurrence had a maximum (4.9 %) in January-February months and a minimum (1.0 %) in September-Octerbor months; most Ks peaks tended to occur around 93 km, which was ~ 2 km lower than that of Nas (~ 95 km); the Ks peak density was often at least one order of magnitude lower than that of Nas; notably, two Ks with high peak densities (> 1000 cm-3) were observed, which was much higher than K density (15-300 cm-3) reported before. The ascending time of Ks was often longer than its descending time, but an opposite trend occurred for Nas. During the 152 cases of joint observation for the K and Na layers, 21 % (32/152) were cases in which Ks and Nas events simultaneously occurred, while 79% (120/152) were cases in which only one layer (K or Na) exhibited a strong Ks or Nas.
Better than $l/Mflops sustained: a scalable PC-based parallel computer for lattice QCD

NASA Astrophysics Data System (ADS)

Fodor, Zoltán; Katz, Sándor D.; Papp, Gábor

2003-05-01

We study the feasibility of a PC-based parallel computer for medium to large scale lattice QCD simulations. The Eötvös Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single precision sustained performance for dynamical QCD without communication is 1510 Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives a total performance of 208 Gflops for Wilson and 133 Gflops for staggered QCD, respectively (for 64-bit applications the performance is approximately halved). The novel feature of our system is its communication architecture. In order to have a scalable, cost-effective machine we use Gigabit Ethernet cards for nearest-neighbor communications in a two-dimensional mesh. This type of communication is cost effective (only 30% of the hardware costs is spent on the communication). According to our benchmark measurements this type of communication results in around 40% communication time fraction for lattices upto 48 3·96 in full QCD simulations. The price/sustained-performance ratio for full QCD is better than l/Mflops for Wilson (and around 1.5/Mflops for staggered) quarks for practically any lattice size, which can fit in our parallel computer. The communication software is freely available upon request for non-profit organizations.
Large-scale virtual screening on public cloud resources with Apache Spark.

PubMed

Capuccini, Marco; Ahmed, Laeeq; Schaal, Wesley; Laure, Erwin; Spjuth, Ola

2017-01-01

Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against [Formula: see text]2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub (https://github.com/mcapuccini/spark-vs).Graphical abstract.
Unmanned Aircraft Systems (UAS) Integration in the National Airspace System (NAS) Project FY17 Annual Review

NASA Technical Reports Server (NTRS)

Sakahara, Robert; Hackenberg, Davis; Johnson, William

2017-01-01

This presentation was presented to the Integrated Aviation Systems Program at the FY17 Annual Review of the UAS-NAS project. The presentation captures the overview of the work completed by the UAS-NAS project and its subprojects.
Pluralizing the Monoculture

ERIC Educational Resources Information Center

Balch, Stephen H.

2007-01-01

The National Association of Scholars (NAS) is now entering its twentieth year. The organization has grown over ten-fold since their national launch. When NAS made its debut, today's extensive infrastructure for higher education reform did not yet exist. In this article, the author discusses the contributions of NAS to the academe. The author also…
Meeting of Experts on NASA's Unmanned Aircraft System (UAS) Integration in the National Airspace Systems (NAS) Project

NASA Technical Reports Server (NTRS)

Wolfe, Jean; Bauer, Jeff; Bixby, C.J.; Lauderdale, Todd; Shively, Jay; Griner, James; Hayhurst, Kelly

2010-01-01

Topics discussed include: Aeronautics Research Mission Directorate Integrated Systems Research Program (ISRP) and UAS Integration in the NAS Project; UAS Integration into the NAS Project; Separation Assurance and Collision Avoidance; Pilot Aircraft Interface Objectives/Rationale; Communication; Certification; and Integrated Tests and Evaluations.
Low temperature grown GaNAsSb: A promising material for photoconductive switch application

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tan, K. H.; Yoon, S. F.; Wicaksono, S.

2013-09-09

We report a photoconductive switch using low temperature grown GaNAsSb as the active material. The GaNAsSb layer was grown at 200 °C by molecular beam epitaxy in conjunction with a radio frequency plasma-assisted nitrogen source and a valved antimony cracker source. The low temperature growth of the GaNAsSb layer increased the dark resistivity of the switch and shortened the carrier lifetime. The switch exhibited a dark resistivity of 10{sup 7} Ω cm, a photo-absorption of up to 2.1 μm, and a carrier lifetime of ∼1.3 ps. These results strongly support the suitability of low temperature grown GaNAsSb in the photoconductivemore » switch application.« less
Effect of antimony on the deep-level traps in GaInNAsSb thin films

DOE Office of Scientific and Technical Information (OSTI.GOV)

Islam, Muhammad Monirul, E-mail: islam.monir.ke@u.tsukuba.ac.jp; Miyashita, Naoya; Ahsan, Nazmul

2014-09-15

Admittance spectroscopy has been performed to investigate the effect of antimony (Sb) on GaInNAs material in relation to the deep-level defects in this material. Two electron traps, E1 and E2 at an energy level 0.12 and 0.41 eV below the conduction band (E{sub C}), respectively, were found in undoped GaInNAs. Bias-voltage dependent admittance confirmed that E1 is an interface-type defect being spatially localized at the GaInNAs/GaAs interface, while E2 is a bulk-type defect located around mid-gap of GaInNAs layer. Introduction of Sb improved the material quality which was evident from the reduction of both the interface and bulk-type defects.
Racial and ethnic disparities in work-related injuries and socio-economic resources among nursing assistants employed in US nursing homes.

PubMed

Tak, SangWoo; Alterman, Toni; Baron, Sherry; Calvert, Geoffrey M

2010-10-01

We aimed to estimate the proportion of nursing assistants (NAs) in the US with work-related injuries and insufficient socio-economic resources by race/ethnicity. Data from the 2004 National Nursing Assistant Survey (NNAS), a nationally representative sample survey of NAs employed in United States nursing homes, were analyzed accounting for the complex survey design. Among 2,880 participants, 44% reported "scratch, open wounds, or cuts" followed by "back injuries" (17%), "black eyes or other types of bruising" (16%), and "human bites" (12%). When compared to non-Hispanic white NAs, the adjusted rate ratio (RR) for wound/cut was 0.74 for non-Hispanic black NAs (95% confidence interval [CI]: 0.65-0.85). RRs for black eyes/bruises were 0.18 for non-Hispanic black NAs (95% CI: 0.12-0.26), and 0.55 for Hispanic NAs (95% CI: 0.37-0.82). Minority racial and ethnic groups were less likely to report having experienced injuries compared with non-Hispanic white NAs. Future research should focus on identifying preventable risk factors, such as differences by race and ethnicity in the nature of NA jobs and the extent of their engagement in assisting patients with activities of daily living. © 2010 Wiley-Liss, Inc.
Internal-Modified Dithiol DNA–Directed Au Nanoassemblies: Geometrically Controlled Self–Assembly and Quantitative Surface–Enhanced Raman Scattering Properties

PubMed Central

Yan, Yuan; Shan, Hangyong; Li, Min; Chen, Shu; Liu, Jianyu; Cheng, Yanfang; Ye, Cui; Yang, Zhilin; Lai, Xuandi; Hu, Jianqiang

2015-01-01

In this work, a hierarchical DNA–directed self–assembly strategy to construct structure–controlled Au nanoassemblies (NAs) has been demonstrated by conjugating Au nanoparticles (NPs) with internal–modified dithiol single-strand DNA (ssDNA) (Au–B–A or A–B–Au–B–A). It is found that the dithiol–ssDNA–modified Au NPs and molecule quantity of thiol–modified ssDNA grafted to Au NPs play critical roles in the assembly of geometrically controlled Au NAs. Through matching Au–DNA self–assembly units, geometrical structures of the Au NAs can be tailored from one–dimensional (1D) to quasi–2D and 2D. Au–B–A conjugates readily give 1D and quasi–2D Au NAs while 2D Au NAs can be formed by A–B–Au–B–A building blocks. Surface-enhanced Raman scattering (SERS) measurements and 3D finite–difference time domain (3D-FDTD) calculation results indicate that the geometrically controllable Au NAs have regular and linearly “hot spots”–number–depended SERS properties. For a certain number of NPs, the number of “hot spots” and accordingly enhancement factor of Au NAs can be quantitatively evaluated, which open a new avenue for quantitative analysis based on SERS technique. PMID:26581251
Kushenin Combined with Nucleos(t)ide Analogues for Chronic Hepatitis B: A Systematic Review and Meta-Analysis

PubMed Central

Chen, Zhe; Ma, Xiao; Zhao, Yanling; Wang, Jiabo; Zhang, Yaming; Zhu, Yun; Wang, Lifu; Chen, Chang; Wei, Shizhang; Yang, Zhirui; Gong, Man; Shen, Honghui; Bai, Zhaofang; Guo, Yuming; Niu, Ming; Xiao, Xiaohe

2015-01-01

Objective. To evaluate the efficacy and safety of Kushenin (KS) combined with nucleoside analogues (NAs) for chronic hepatitis B (CHB). Methods. Randomized controlled trials (RCTs) of KS combined with NAs for CHB were identified through 7 databases. Frequencies of loss of serum HBeAg, HBeAg seroconversion, undetectable serum HBV-DNA, ALT normalization, and adverse events at 48 weeks were abstracted by two reviewers. The Cochrane software was performed to assess the risk of bias in the included trials. Data were analyzed with Review Manager 5.3 software. Results. 18 RCTs involving 1684 subjects with CHB were included in the analysis. KS combined with NAs including lamivudine (LAM), entecavir (ETV), adefovir dipivoxil (ADV), and telbivudine (TLV) showed different degree of improvement in CHB indices. KS combined with NAs increased the frequency of loss of serum HBeAg, HBeAg seroconversion, undetectable HBV-DNA levels, and ALT normalization compared with single agents. It also decreased serum ALT and AST level after one-year treatment. However, KS combined with TLV did not show a significant difference in CHB indices. The side-effects of KS combined with NAs were light and of low frequency. Conclusion. KS combined with NAs improves the efficacy of NAs in CHB. PMID:26347789
Impact of ozonation on naphthenic acids speciation and toxicity of oil sands process-affected water to Vibrio fischeri and mammalian immune system.

PubMed

Wang, Nan; Chelme-Ayala, Pamela; Perez-Estrada, Leonidas; Garcia-Garcia, Erick; Pun, Jonathan; Martin, Jonathan W; Belosevic, Miodrag; Gamal El-Din, Mohamed

2013-06-18

Oil sands process-affected water (OSPW) is the water contained in tailings impoundment structures in oil sands operations. There are concerns about the environmental impacts of the release of OSPW because of its toxicity. In this study, ozonation followed by biodegradation was used to remediate OSPW. The impacts of the ozone process evolution on the naphthenic acids (NAs) speciation and acute toxicity were evaluated. Ion-mobility spectrometry (IMS) was used to preliminarily separate isomeric and homologous species. The results showed limited effects of the ozone reactor size on the treatment performance in terms of contaminant removal. In terms of NAs speciation, high reactivity of NAs with higher number of carbons and rings was only observed in a region of high reactivity (i.e., utilized ozone dose lower than 50 mg/L). It was also found that nearly 0.5 mg/L total NAs was oxidized per mg/L of utilized ozone dose, at utilized ozone doses lower than 50 mg/L. IMS showed that ozonation was able to degrade NAs, oxidized NAs, and sulfur/nitrogenated NAs. Complete removal of toxicity toward Vibrio fischeri was achieved after ozonation followed by 28-day biodegradation period. In vitro and in vivo assays indicated that ozonation reduced the OSPW toxicity to mice.

Ozonation of oil sands process-affected water accelerates microbial bioremediation.

PubMed

Martin, Jonathan W; Barri, Thaer; Han, Xiumei; Fedorak, Phillip M; El-Din, Mohamed Gamal; Perez, Leonidas; Scott, Angela C; Jiang, Jason Tiange

2010-11-01

Ozonation can degrade toxic naphthenic acids (NAs) in oil sands process-affected water (OSPW), but even after extensive treatment a residual NA fraction remains. Here we hypothesized that mild ozonation would selectively oxidize the most biopersistent NA fraction, thereby accelerating subsequent NA biodegradation and toxicity removal by indigenous microbes. OSPW was ozonated to achieve approximately 50% and 75% NA degradation, and the major ozonation byproducts included oxidized NAs (i.e., hydroxy- or keto-NAs). However, oxidized NAs are already present in untreated OSPW and were shown to be formed during the microbial biodegradation of NAs. Ozonation alone did not affect OSPW toxicity, based on Microtox; however, there was a significant acceleration of toxicity removal in ozonated OSPW following inoculation with native microbes. Furthermore, all residual NAs biodegraded significantly faster in ozonated OSPW. The opposite trend was found for ozonated commercial NAs, which are known to contain no significant biopersistent fraction. Thus, we suggest that ozonation preferentially degraded the most biopersistent OSPW NA fraction, and that ozonation is complementary to the biodegradation capacity of microbial populations in OSPW. The toxicity of ozonated OSPW to higher organisms needs to be assessed, but there is promise that this technique could be applied to accelerate the bioremediation of large volumes of OSPW in Northern Alberta, Canada.
Internal-Modified Dithiol DNA-Directed Au Nanoassemblies: Geometrically Controlled Self-Assembly and Quantitative Surface-Enhanced Raman Scattering Properties

NASA Astrophysics Data System (ADS)

Yan, Yuan; Shan, Hangyong; Li, Min; Chen, Shu; Liu, Jianyu; Cheng, Yanfang; Ye, Cui; Yang, Zhilin; Lai, Xuandi; Hu, Jianqiang

2015-11-01

In this work, a hierarchical DNA-directed self-assembly strategy to construct structure-controlled Au nanoassemblies (NAs) has been demonstrated by conjugating Au nanoparticles (NPs) with internal-modified dithiol single-strand DNA (ssDNA) (Au-B-A or A-B-Au-B-A). It is found that the dithiol-ssDNA-modified Au NPs and molecule quantity of thiol-modified ssDNA grafted to Au NPs play critical roles in the assembly of geometrically controlled Au NAs. Through matching Au-DNA self-assembly units, geometrical structures of the Au NAs can be tailored from one-dimensional (1D) to quasi-2D and 2D. Au-B-A conjugates readily give 1D and quasi-2D Au NAs while 2D Au NAs can be formed by A-B-Au-B-A building blocks. Surface-enhanced Raman scattering (SERS) measurements and 3D finite-difference time domain (3D-FDTD) calculation results indicate that the geometrically controllable Au NAs have regular and linearly “hot spots”-number-depended SERS properties. For a certain number of NPs, the number of “hot spots” and accordingly enhancement factor of Au NAs can be quantitatively evaluated, which open a new avenue for quantitative analysis based on SERS technique.
The UBO-TSUFD tsunami inundation model: validation and application to a tsunami case study focused on the city of Catania, Italy

NASA Astrophysics Data System (ADS)

Tinti, S.; Tonini, R.

2013-07-01

Nowadays numerical models are a powerful tool in tsunami research since they can be used (i) to reconstruct modern and historical events, (ii) to cast new light on tsunami sources by inverting tsunami data and observations, (iii) to build scenarios in the frame of tsunami mitigation plans, and (iv) to produce forecasts of tsunami impact and inundation in systems of early warning. In parallel with the general recognition of the importance of numerical tsunami simulations, the demand has grown for reliable tsunami codes, validated through tests agreed upon by the tsunami community. This paper presents the tsunami code UBO-TSUFD that has been developed at the University of Bologna, Italy, and that solves the non-linear shallow water (NSW) equations in a Cartesian frame, with inclusion of bottom friction and exclusion of the Coriolis force, by means of a leapfrog (LF) finite-difference scheme on a staggered grid and that accounts for moving boundaries to compute sea inundation and withdrawal at the coast. Results of UBO-TSUFD applied to four classical benchmark problems are shown: two benchmarks are based on analytical solutions, one on a plane wave propagating on a flat channel with a constant slope beach; and one on a laboratory experiment. The code is proven to perform very satisfactorily since it reproduces quite well the benchmark theoretical and experimental data. Further, the code is applied to a realistic tsunami case: a scenario of a tsunami threatening the coasts of eastern Sicily, Italy, is defined and discussed based on the historical tsunami of 11 January 1693, i.e. one of the most severe events in the Italian history.
Culturing oil sands microbes as mixed species communities enhances ex situ model naphthenic acid degradation

PubMed Central

Demeter, Marc A.; Lemire, Joseph A.; Yue, Gordon; Ceri, Howard; Turner, Raymond J.

2015-01-01

Oil sands surface mining for bitumen results in the formation of oil sands process water (OSPW), containing acutely toxic naphthenic acids (NAs). Potential exists for OSPW toxicity to be mitigated by aerobic degradation of the NAs by microorganisms indigenous to the oil sands tailings ponds, the success of which is dependent on the methods used to exploit the metabolisms of the environmental microbial community. Having hypothesized that the xenobiotic tolerant biofilm mode-of-life may represent a feasible way to harness environmental microbes for ex situ treatment of OSPW NAs, we aerobically grew OSPW microbes as single and mixed species biofilm and planktonic cultures under various conditions for the purpose of assaying their ability to tolerate and degrade NAs. The NAs evaluated were a diverse mixture of eight commercially available model compounds. Confocal microscopy confirmed the ability of mixed and single species OSPW cultures to grow as biofilms in the presence of the NAs evaluated. qPCR enumeration demonstrated that the addition of supplemental nutrients at concentrations of 1 g L-1 resulted in a more numerous population than 0.001 g L-1 supplementation by approximately 1 order of magnitude. GC-FID analysis revealed that mixed species cultures (regardless of the mode of growth) are the most effective at degrading the NAs tested. All constituent NAs evaluated were degraded below detectable limits with the exception of 1-adamantane carboxylic acid (ACA); subsequent experimentation with ACA as the sole NA also failed to exhibit degradation of this compound. Single species cultures degraded select few NA compounds. The degradation trends highlighted many structure-persistence relationships among the eight NAs tested, demonstrating the effect of side chain configuration and alkyl branching on compound recalcitrance. Of all the isolates, the Rhodococcus spp. degraded the greatest number of NA compounds, although still less than the mixed species cultures. Overall, these observations lend support to the notion that harnessing a community of microorganisms as opposed to targeted isolates can enhance NA degradation ex situ. Moreover, the variable success caused by NA structure related persistence emphasized the difficulties associated with employing bioremediation to treat complex, undefined mixtures of toxicants such as OSPW NAs. PMID:26388865
A tiered asthma hazard characterization and exposure assessment approach for evaluation of consumer product ingredients.

PubMed

Maier, Andrew; Vincent, Melissa J; Parker, Ann; Gadagbui, Bernard K; Jayjock, Michael

2015-12-01

Asthma is a complex syndrome with significant consequences for those affected. The number of individuals affected is growing, although the reasons for the increase are uncertain. Ensuring the effective management of potential exposures follows from substantial evidence that exposure to some chemicals can increase the likelihood of asthma responses. We have developed a safety assessment approach tailored to the screening of asthma risks from residential consumer product ingredients as a proactive risk management tool. Several key features of the proposed approach advance the assessment resources often used for asthma issues. First, a quantitative health benchmark for asthma or related endpoints (irritation and sensitization) is provided that extends qualitative hazard classification methods. Second, a parallel structure is employed to include dose-response methods for asthma endpoints and methods for scenario specific exposure estimation. The two parallel tracks are integrated in a risk characterization step. Third, a tiered assessment structure is provided to accommodate different amounts of data for both the dose-response assessment (i.e., use of existing benchmarks, hazard banding, or the threshold of toxicological concern) and exposure estimation (i.e., use of empirical data, model estimates, or exposure categories). Tools building from traditional methods and resources have been adapted to address specific issues pertinent to asthma toxicology (e.g., mode-of-action and dose-response features) and the nature of residential consumer product use scenarios (e.g., product use patterns and exposure durations). A case study for acetic acid as used in various sentinel products and residential cleaning scenarios was developed to test the safety assessment methodology. In particular, the results were used to refine and verify relationships among tiered approaches such that each lower data tier in the approach provides a similar or greater margin of safety for a given scenario. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Developing a Shuffled Complex-Self Adaptive Hybrid Evolution (SC-SAHEL) Framework for Water Resources Management and Water-Energy System Optimization

NASA Astrophysics Data System (ADS)

Rahnamay Naeini, M.; Sadegh, M.; AghaKouchak, A.; Hsu, K. L.; Sorooshian, S.; Yang, T.

2017-12-01

Meta-Heuristic optimization algorithms have gained a great deal of attention in a wide variety of fields. Simplicity and flexibility of these algorithms, along with their robustness, make them attractive tools for solving optimization problems. Different optimization methods, however, hold algorithm-specific strengths and limitations. Performance of each individual algorithm obeys the "No-Free-Lunch" theorem, which means a single algorithm cannot consistently outperform all possible optimization problems over a variety of problems. From users' perspective, it is a tedious process to compare, validate, and select the best-performing algorithm for a specific problem or a set of test cases. In this study, we introduce a new hybrid optimization framework, entitled Shuffled Complex-Self Adaptive Hybrid EvoLution (SC-SAHEL), which combines the strengths of different evolutionary algorithms (EAs) in a parallel computing scheme, and allows users to select the most suitable algorithm tailored to the problem at hand. The concept of SC-SAHEL is to execute different EAs as separate parallel search cores, and let all participating EAs to compete during the course of the search. The newly developed SC-SAHEL algorithm is designed to automatically select, the best performing algorithm for the given optimization problem. This algorithm is rigorously effective in finding the global optimum for several strenuous benchmark test functions, and computationally efficient as compared to individual EAs. We benchmark the proposed SC-SAHEL algorithm over 29 conceptual test functions, and two real-world case studies - one hydropower reservoir model and one hydrological model (SAC-SMA). Results show that the proposed framework outperforms individual EAs in an absolute majority of the test problems, and can provide competitive results to the fittest EA algorithm with more comprehensive information during the search. The proposed framework is also flexible for merging additional EAs, boundary-handling techniques, and sampling schemes, and has good potential to be used in Water-Energy system optimal operation and management.
[Methods in neonatal abstinence syndrome (NAS): results of a nationwide survey in Austria].

PubMed

Bauchinger, S; Sapetschnig, I; Danda, M; Sommer, C; Resch, B; Urlesberger, B; Raith, W

2015-08-01

Neonatal abstinence syndrome (NAS) occurs in neonates whose mothers have taken addictive drugs or were under substitution therapy during pregnancy. Incidence numbers of NAS are on the rise globally, even in Austria NAS is not rare anymore. The aim of our survey was to reveal the status quo of dealing with NAS in Austria. A questionnaire was sent to 20 neonatology departments all over Austria, items included questions on scoring, therapy, breast-feeding and follow-up procedures. The response rate was 95%, of which 94.7% had written guidelines concerning NAS. The median number of children being treated per year for NAS was 4. Finnegan scoring system is used in 100% of the responding departments. Morphine is being used most often, in opiate abuse (100%) as well as in multiple substance abuse (44.4%). The most frequent forms of morphine preparation are morphine and diluted tincture of opium. Frequency as well as dosage of medication vary broadly. 61.1% of the departments supported breast-feeding, regulations concerned participation in a substitution programme and general contraindications (HIV, HCV, HBV). Our results revealed that there is a big west-east gradient in patients being treated per year. NAS is not a rare entity anymore in Austria (up to 50 cases per year in Vienna). Our survey showed that most neonatology departments in Austria treat their patients following written guidelines. Although all of them base these guidelines on international recommendations there is no national consensus. © Georg Thieme Verlag KG Stuttgart · New York.
Quantum Chemical Investigation on Photochemical Reactions of Nonanoic Acids at Air-Water Interface.

PubMed

Xiao, Pin; Wang, Qian; Fang, Wei-Hai; Cui, Ganglong

2017-06-08

Photoinduced chemical reactions of organic compounds at the marine boundary layer have recently attracted significant experimental attention because this kind of photoreactions has been proposed to have substantial impact on local new particle formation and their photoproducts could be a source of secondary organic aerosols. In this work, we have employed first-principles density functional theory method combined with cluster models to systematically explore photochemical reaction pathways of nonanoic acids (NAs) to form volatile saturated and unsaturated C 9 and C 8 aldehydes at air-water interfaces. On the basis of the results, we have found that the formation of C 9 aldehydes is not initiated by intermolecular Norrish type II reaction between two NAs but by intramolecular T 1 C-O bond fission of NA generating acyl and hydroxyl radicals. Subsequently, saturated C 9 aldehydes are formed through hydrogenation reaction of acyl radical by another intact NA. Following two dehydrogenation reactions, unsaturated C 9 aldehydes are generated. In parallel, the pathway to C 8 aldehydes is initiated by T 1 C-C bond fission of NA, which generates octyl and carboxyl radicals; then, an octanol is formed through recombination reaction of octyl with hydroxyl radical. In the following, two dehydrogenation reactions result into an enol intermediate from which saturated C 8 aldehydes are produced via NA-assisted intermolecular hydrogen transfer. Finally, two dehydrogenation reactions generate unsaturated C 8 aldehydes. In these reactions, water and NA molecules are found to play important roles. They significantly reduce relevant reaction barriers. Our work has also explored oxygenation reactions of NA with molecular oxygen and radical-radical dimerization reactions.
Burnout, psychosomatic symptoms and job satisfaction among Dutch nurse anaesthetists: a survey.

PubMed

Meeusen, V; VAN Dam, K; Brown-Mahoney, C; VAN Zundert, A; Knape, H

2010-05-01

To meet the increasing demand for healthcare providers, it is crucial to recruit and retain more nurse anaesthetists (NAs). The majority of NAs in the Netherlands are >45 years old, and retaining them in their jobs is very important. This study investigates the relationships among burnout, physical health and job satisfaction among Dutch NAs. Two thousand NAs working in Dutch hospitals were invited to participate in this online questionnaire. We tested the relationships among burnout, psychosomatic symptoms, sickness absence, perceived general health and job satisfaction. Nine hundred and twenty-three questionnaires were completed and analysed (46% response rate). Burnout and psychosomatic symptoms were negatively associated with job satisfaction, and predicted 27% of job satisfaction. Perceived general health was positively and sickness absence was negatively related to job satisfaction. Older NAs had a higher incidence of burnout than their younger counterparts. The results confirmed the importance of a healthy psychosocial work environment for promoting job satisfaction. To prevent burnout, further research is necessary to determine the factors causing stress. These findings may also apply to anaesthesiologists who share many tasks and work in close cooperation with NAs.
Mechanistic investigation of industrial wastewater naphthenic acids removal using granular activated carbon (GAC) biofilm based processes.

PubMed

Islam, Md Shahinoor; Zhang, Yanyan; McPhedran, Kerry N; Liu, Yang; Gamal El-Din, Mohamed

2016-01-15

Naphthenic acids (NAs) found in oil sands process-affected waters (OSPW) have known environmental toxicity and are resistant to conventional wastewater treatments. The granular activated carbon (GAC) biofilm treatment process has been shown to effectively treat OSPW NAs via combined adsorption/biodegradation processes despite the lack of research investigating their individual contributions. Presently, the NAs removals due to the individual processes of adsorption and biodegradation in OSPW bioreactors were determined using sodium azide to inhibit biodegradation. For raw OSPW, after 28 days biodegradation and adsorption contributed 14% and 63% of NA removal, respectively. For ozonated OSPW, biodegradation removed 18% of NAs while adsorption reduced NAs by 73%. Microbial community 454-pyrosequencing of bioreactor matrices indicated the importance of biodegradation given the diverse carbon degrading families including Acidobacteriaceae, Ectothiorhodospiraceae, and Comamonadaceae. Overall, results highlight the ability to determine specific processes of NAs removals in the combined treatment process in the presence of diverse bacteria metabolic groups found in GAC bioreactors. Copyright © 2015 Elsevier B.V. All rights reserved.
Transformational leadership and workplace injury and absenteeism: analysis of a National Nursing Assistant Survey.

PubMed

Lee, Doohee; Coustasse, Alberto; Sikula, Andrew

2011-01-01

Transformational leadership (TL) has long been popular among management scholars and health services researchers, but no research studies have empirically tested the association of TL with workplace injuries and absenteeism among nursing assistants (NAs). This cross-sectional study seeks to explore whether TL is associated with workplace injuries and absenteeism among NAs. We analyzed the 2004 National Nursing Assistant Survey data (n = 2,882). A multivariate logistic regression analysis was performed to test the role of TL in the context of workplace performances. Results reveal that the TL model was positively linked to workplace injury in the level of NAs. Injury-related absenteeism was also associated with the TL style, indicating that TL behaviors may help address workplace absence among NAs. Findings suggest that introducing TL practices may benefit NAs in improving workplace performances.
Padrões de refluxo nas veias safenas em homens com insuficiência venosa crônica

PubMed Central

Engelhorn, Carlos Alberto; Coral, Francisco Eduardo; Soares, Isabela Chaves Monteiro; Corrêa, Gabriel Fernando de Araújo; Ogeda, Jaqueline Pozzolo; Hara, Larissa Yuri; Murasse, Luisa Saemi

2016-01-01

Resumo Contexto A insuficiência venosa crônica (IVCr) é frequente e predomina nas mulheres, mas ainda há poucas informações sobre o refluxo nas veias safenas na população masculina. Objetivos Identificar os diferentes padrões de refluxo nas veias safenas magnas (VSMs) e parvas (VSPs) em homens, correlacionando esses dados com a apresentação clínica conforme a classificação Clínica, Etiológica, Anatômica e Fisiopatológica (CEAP). Métodos Foram avaliados 369 membros inferiores de 207 homens pela ultrassonografia vascular (UV) com diagnóstico clínico de IVCr primária. As variáveis analisadas foram a classificação CEAP, o padrão de refluxo nas VSMs e VSPs e a correlação entre os dois. Resultados Nos 369 membros avaliados, 72,9% das VSMs apresentaram refluxo com predominância do padrão segmentar (33,8%). Nas VSPs, 16% dos membros inferiores analisados apresentaram refluxo, sendo o mais frequente o padrão distal (33,9%). Dos membros classificados como C4, C5 e C6, 100% apresentaram refluxo na VSM com predominância do refluxo proximal (25,64%), e 38,46% apresentaram refluxo na VSP com equivalência entre os padrões distal e proximal (33,3%). Refluxo na junção safeno-femoral (JSF) foi detectado em 7,1% dos membros nas classes C0 e C1, 35,6% nas classes C2 e C3, e 64,1% nas classes C4 a C6. Conclusões O padrão de refluxo segmentar é predominante na VSM, e o padrão de refluxo distal é predominante na VSP. A ocorrência de refluxo na JSF é maior em pacientes com IVCr mais avançada. PMID:29930603
Misperceptions of harm among Natural American Spirit smokers: results from wave 1 of the Population Assessment of Tobacco and Health (PATH) study (2013-2014).

PubMed

Pearson, Jennifer L; Johnson, Amanda; Villanti, Andrea; Glasser, Allison M; Collins, Lauren; Cohn, Amy; Rose, Shyanika W; Niaura, Raymond; Stanton, Cassandra A

2017-03-01

This study estimated differences in cigarette harm perceptions among smokers of the Natural American Spirit (NAS) brand-marketed as 'natural', 'organic' and 'additive-free'-compared to other smokers, and examined correlates of NAS use. Data were drawn from wave 1 of the Population Assessment of Tobacco and Health (PATH) study, a nationally representative study of US adults (2013-2014). Weighted analyses using a subset of current adult smokers (n=10 565) estimated the prevalence of NAS use (vs all other brands) and examined associations between NAS use and sociodemographics, tobacco/substance use, tobacco harm perceptions, quit intentions, quit attempts and mental/behavioural health. Overall, 2.3% of adult smokers (920 000 people in the USA) reported NAS as their usual brand. Nearly 64% of NAS smokers inaccurately believed that their brand is less harmful than other brands compared to 8.3% of smokers of other brands, after controlling for potential confounders (aOR 22.82). Younger age (18-34 vs 35+; aOR 1.54), frequent thinking about tobacco harms (aOR 1.84), past 30-day alcohol use (aOR 1.57), past 30-day marijuana use (aOR 1.87) and sexual orientation (lesbian, gay, bisexual, 'other' or 'questioning' vs heterosexual; aOR 2.07) were also associated with increased odds of smoking NAS. The majority of NAS smokers inaccurately believes that their cigarettes are less harmful than other brands. Given the brand's rapid growth and its more common use in vulnerable groups (eg, young adults, lesbian, gay, bisexual, 'other' or 'questioning' adults), corrective messaging and enforcement action are necessary to correct harm misperceptions of NAS cigarettes. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Effect of N-Acetylserotonin on TLR-4 and MyD88 Expression during Intestinal Ischemia-Reperfusion in a Rat Model.

PubMed

Sukhotnik, Igor; Ben Shahar, Yoav; Halabi, Salim; Bitterman, Nir; Dorfman, Tatiana; Pollak, Yulia; Coran, Arnold; Bitterman, Arie

2018-01-05

Accumulating evidence indicates that changes in intestinal toll-like receptors (TLRs) precede histological injury in a rodent model of necrotizing enterocolitis. N-acetylserotonin (NAS) is a naturally occurring chemical intermediate in the biosynthesis of melatonin. A recent study has shown that treatment with NAS prevents gut mucosal damage and inhibits programmed cell death following intestinal ischemia-reperfusion (IR). The objective of this study was to determine the effects of NAS on TLR-4, myeloid differentiation factor 88 (Myd88), and TNF-α receptor-associated factor 6 (TRAF6) expression in intestinal mucosa following intestinal IR in a rat. Male Sprague-Dawley rats were randomly assigned to one of the four experimental groups: 1) Sham rats underwent laparotomy; 2) Sham-NAS rats underwent laparotomy and were treated with intraperitoneal (IP) NAS (20 mg/kg); 3) IR rats underwent occlusion of both superior mesenteric artery and portal vein for 20 minutes followed by 48 hours of reperfusion; and 4) IR-NAS rats underwent IR and were treated with IP NAS immediately before abdominal closure. Intestinal structural changes, mucosal TLR-4, MyD88, and TRAF6 mucosal gene, and protein expression were examined using real-time PCR, Western blot, and immunohistochemistry. Significant mucosal damage in IR rats was accompanied by a significant upregulation of TLR-4, MyD88, and TRAF6 gene and protein expression in intestinal mucosa compared with control animals. The administration of NAS decreased the intestinal injury score, inhibited cell apoptosis, and significantly reduced the expression of TLR-4, MyD88, and TRAF6. Treatment with NAS is associated with downregulation of TLR-4, MyD88, and TRAF6 expression along with a concomitant decrease in intestinal mucosal injury caused by intestinal IR in a rat. Georg Thieme Verlag KG Stuttgart · New York.
Ghrelin and NUCB2/Nesfatin-1 expression in unilateral testicular torsion-induced rats with and without N-acetylcysteine.

PubMed

Sarac, M; Bakal, U; Tartar, T; Kuloglu, T; Yardim, M; Artas, G; Aydin, S; Kazez, A

2017-08-15

Testicular torsion (TT) is a common urological problem in the field of pediatric surgery. The degree and duration of torsion determines the degree of testicular damage; however, its effects on the expression of octanoylated ghrelin and nucleobindin 2 (NUCB2) /nesfatin-1 synthetized from testicular tissue remain unclear. We explored the effects of experimentally induced unilateral TT on serum and contralateral testicular tissue ghrelin and NUCB2/nesfatin-1 levels, and determined whether N-acetyl cysteine (NAS) treatment had any effects on their expression. A total of 42 Wistar Albino strain rats were divided into 7 groups: Group (G) I control, GII sham, GIII 12-hour torsion, GIV 12-hour torsion + detorsion + 100 mg/kg NAS, GV 24-hour torsion, GVI 24-hour torsion + detorsion + 100 mg/kg NAS, and GVII 100 mg/kg NAS. Octanoylated ghrelin and NUCB2/nesfatin-1 concentrations were evaluated in serum using the ELISA method and in testicular tissue with immunohistochemical methods. Immunoreactivity of octanoylated ghrelin significantly increased in GI compared to GIII, GV, and GVI (p<0.05). NUCB2/nesfatin-1 immunoreactivity increased in GV and GVIII relative to GI (p<0.05). In the 12-hour torsion group, a significant decrease in octanoylated ghrelin levels with NAS treatment was observed; however, in the 24-hour torsion group, a significant decrease was not observed. In the 12-hour torsion + NAS treatment group, a significant change was not observed in NUCB2/nesfatin-1 expression. Following 24-hour torsion, an increase in NUCB2/nesfatin-1 levels was observed, and NAS treatment did not reverse this increase. It was determined that increases in the expression of octanoylated ghrelin and NUCB2/nesfatin-1, the latter of which was a result of TT, reflect damage in this tissue. Importantly, NAS treatment could prevent this damage. Thus, there may be a clinical application for the combined use of NAS and octanoylated ghrelin in preventing TT-related infertility.
NAS-current status and future plans

NASA Technical Reports Server (NTRS)

Bailey, F. R.

1987-01-01

The Numerical Aerodynamic Simulation (NAS) has met its first major milestone, the NAS Processing System Network (NPSN) Initial Operating Configuration (IOC). The program has met its goal of providing a national supercomputer facility capable of greatly enhancing the Nation's research and development efforts. Furthermore, the program is fulfilling its pathfinder role by defining and implementing a paradigm for supercomputing system environments. The IOC is only the begining and the NAS Program will aggressively continue to develop and implement emerging supercomputer, communications, storage, and software technologies to strengthen computations as a critical element in supporting the Nation's leadership role in aeronautics.
Effect of alkyl side chain location and cyclicity on the aerobic biotransformation of naphthenic acids.

PubMed

Misiti, Teresa M; Tezel, Ulas; Pavlostathis, Spyros G

2014-07-15

Aerobic biodegradation of naphthenic acids is of importance to the oil industry for the long-term management and environmental impact of process water and wastewater. The effect of structure, particularly the location of the alkyl side chain as well as cyclicity, on the aerobic biotransformation of 10 model naphthenic acids (NAs) was investigated. Using an aerobic, mixed culture, enriched with a commercial NA mixture (NA sodium salt; TCI Chemicals), batch biotransformation assays were conducted with individual model NAs, including eight 8-carbon isomers. It was shown that NAs with a quaternary carbon at the α- or β-position or a tertiary carbon at the β- and/or β'-position are recalcitrant or have limited biodegradability. In addition, branched NAs exhibited lag periods and lower degradation rates than nonbranched or simple cyclic NAs. Two NA isomers used in a closed bottle, aerobic biodegradation assay were mineralized, while 21 and 35% of the parent compound carbon was incorporated into the biomass. The NA biodegradation probability estimated by two widely used models (BIOWIN 2 and 6) and a recently developed model (OCHEM) was compared to the biodegradability of the 10 model NAs tested in this study as well as other related NAs. The biodegradation probability estimated by the OCHEM model agreed best with the experimental data and was best correlated with the measured NA biodegradation rate.
Management of neonatal abstinence syndrome in neonates born to opioid maintained women.

PubMed

Ebner, Nina; Rohrmeister, Klaudia; Winklbaur, Bernadette; Baewert, Andjela; Jagsch, Reinhold; Peternell, Alexandra; Thau, Kenneth; Fischer, Gabriele

2007-03-16

Neonates born to opioid-maintained mothers are at risk of developing neonatal abstinence syndrome (NAS), which often requires pharmacological treatment. This study examined the effect of opioid maintenance treatment on the incidence and timing of NAS, and compared two different NAS treatments (phenobarbital versus morphine hydrochloride). Fifty-three neonates born to opioid-maintained mothers were included in this study. The mothers received methadone (n=22), slow-release oral morphine (n=17) or buprenorphine (n=14) throughout pregnancy. Irrespective of maintenance treatment, all neonates showed APGAR scores comparable to infants of non-opioid dependent mothers. No difference was found between the three maintenance groups regarding neonatal weight, length or head circumference. Sixty percent (n=32) of neonates required treatment for NAS [68% in the methadone-maintained group (n=15), 82% in the morphine-maintained group (n=14), and 21% in the buprenorphine-maintained group (n=3)]. The mean duration from birth to requirement of NAS treatment was 33 h for the morphine-maintained group, 34 h for the buprenorphine-maintained group and 58 h for the methadone-maintained group. In neonates requiring NAS treatment, those receiving morphine required a significantly shorter mean duration of treatment (9.9 days) versus those treated with phenobarbital (17.7 days). Results suggest that morphine hydrochloride is preferable for neonates suffering NAS due to opioid withdrawal.
Evaluation of the nursing workload through the Nine Equivalents for Nursing Manpower Use Scale and the Nursing Activities Score: a prospective correlation study.

PubMed

Carmona-Monge, Francisco Javier; Rollán Rodríguez, Gloria Ma; Quirós Herranz, Cristina; García Gómez, Sonia; Marín-Morales, Dolores

2013-08-01

To determine the relationship between nursing workload measured through the nine equivalents of nursing manpower use (NEMS) scale and that measured through the nursing activities score (NAS) scale and to analyse staff needs as determined through each of the scales. The study used a descriptive prospective correlational design to collect data between October 2007 and July 2009. Nursing workload data for 730 ICU patients were collected daily using the NAS and NEMS scales. Both scales were then correlated and used to estimate staff needs. 6815 score pairs were collected, which reflected the nursing workload for each patient as calculated daily using both scales. Pearson's correlation coefficient for individual measurements obtained through the NAS and the NEMS corresponded to .672, and to .932 for the daily total workload in the unit. The staffing requirements based on the NAS scale scores were significantly higher than those based on the NEMS scale. A high correlation existed for individual measurements using both scales and for the total workload measurement in the unit. The main difference was found when analysing staffing requirements, with higher staff numbers needed for the NAS scale. Both NAS and NEMS can be used to measure the nursing workload in the ICU. Staffing requirements using NAS were higher than those using NEMS. Copyright © 2013 Elsevier Ltd. All rights reserved.
The NAS Alert System: A look at the first eight years

USGS Publications Warehouse

Fuller, Pamela L.; Neilson, Matt; Huge, Dane H.

2013-01-01

The U.S. Geological Survey's Nonindigenous Aquatic Species (NAS) database program (http://nas.er.usgs.gov) tracks the distribution of introduced aquatic organisms across the United States. Awareness of, and timely response to, novel species introductions by those involved in nonindigenous aquatic species management and research requires a framework for rapid dissemination of occurrence data as it is incorporated into the NAS database. In May 2004, the NAS program developed an alert system to notify registered users of new introductions as part of a national early detection/rapid response system. This article summarizes information on system users and dispatched alerts from the system's inception through the end of 2011. The NAS alert system has registered over 1,700 users, with approximately 800 current subscribers. A total of 1,189 alerts had been transmitted through 2011. More alerts were sent for Florida (134 alerts) than for any other state. Fishes comprise the largest taxonomic group of alerts (440), with mollusks, plants, and crustaceans each containing over 100 alerts. Most alerts were for organisms that were intentionally released (414 alerts), with shipping, escape from captivity, and hitchhiking also representing major vectors. To explore the archive of sent alerts and to register, the search and signup page for the alert system can be found online at http://nas.er.usgs.gov/AlertSystem/default.aspx.

IPRT polarized radiative transfer model intercomparison project - Phase A

NASA Astrophysics Data System (ADS)

Emde, Claudia; Barlakas, Vasileios; Cornet, Céline; Evans, Frank; Korkin, Sergey; Ota, Yoshifumi; Labonnote, Laurent C.; Lyapustin, Alexei; Macke, Andreas; Mayer, Bernhard; Wendisch, Manfred

2015-10-01

The polarization state of electromagnetic radiation scattered by atmospheric particles such as aerosols, cloud droplets, or ice crystals contains much more information about the optical and microphysical properties than the total intensity alone. For this reason an increasing number of polarimetric observations are performed from space, from the ground and from aircraft. Polarized radiative transfer models are required to interpret and analyse these measurements and to develop retrieval algorithms exploiting polarimetric observations. In the last years a large number of new codes have been developed, mostly for specific applications. Benchmark results are available for specific cases, but not for more sophisticated scenarios including polarized surface reflection and multi-layer atmospheres. The International Polarized Radiative Transfer (IPRT) working group of the International Radiation Commission (IRC) has initiated a model intercomparison project in order to fill this gap. This paper presents the results of the first phase A of the IPRT project which includes ten test cases, from simple setups with only one layer and Rayleigh scattering to rather sophisticated setups with a cloud embedded in a standard atmosphere above an ocean surface. All scenarios in the first phase A of the intercomparison project are for a one-dimensional plane-parallel model geometry. The commonly established benchmark results are available at the IPRT website.
Status of BOUT fluid turbulence code: improvements and verification

NASA Astrophysics Data System (ADS)

Umansky, M. V.; Lodestro, L. L.; Xu, X. Q.

2006-10-01

BOUT is an electromagnetic fluid turbulence code for tokamak edge plasma [1]. BOUT performs time integration of reduced Braginskii plasma fluid equations, using spatial discretization in realistic geometry and employing a standard ODE integration package PVODE. BOUT has been applied to several tokamak experiments and in some cases calculated spectra of turbulent fluctuations compared favorably to experimental data. On the other hand, the desire to understand better the code results and to gain more confidence in it motivated investing effort in rigorous verification of BOUT. Parallel to the testing the code underwent substantial modification, mainly to improve its readability and tractability of physical terms, with some algorithmic improvements as well. In the verification process, a series of linear and nonlinear test problems was applied to BOUT, targeting different subgroups of physical terms. The tests include reproducing basic electrostatic and electromagnetic plasma modes in simplified geometry, axisymmetric benchmarks against the 2D edge code UEDGE in real divertor geometry, and neutral fluid benchmarks against the hydrodynamic code LCPFCT. After completion of the testing, the new version of the code is being applied to actual tokamak edge turbulence problems, and the results will be presented. [1] X. Q. Xu et al., Contr. Plas. Phys., 36,158 (1998). *Work performed for USDOE by Univ. Calif. LLNL under contract W-7405-ENG-48.
Data assimilation and prognostic whole ice sheet modelling with the variationally derived, higher order, open source, and fully parallel ice sheet model VarGlaS

NASA Astrophysics Data System (ADS)

Brinkerhoff, D. J.; Johnson, J. V.

2013-07-01

We introduce a novel, higher order, finite element ice sheet model called VarGlaS (Variational Glacier Simulator), which is built on the finite element framework FEniCS. Contrary to standard procedure in ice sheet modelling, VarGlaS formulates ice sheet motion as the minimization of an energy functional, conferring advantages such as a consistent platform for making numerical approximations, a coherent relationship between motion and heat generation, and implicit boundary treatment. VarGlaS also solves the equations of enthalpy rather than temperature, avoiding the solution of a contact problem. Rather than include a lengthy model spin-up procedure, VarGlaS possesses an automated framework for model inversion. These capabilities are brought to bear on several benchmark problems in ice sheet modelling, as well as a 500 yr simulation of the Greenland ice sheet at high resolution. VarGlaS performs well in benchmarking experiments and, given a constant climate and a 100 yr relaxation period, predicts a mass evolution of the Greenland ice sheet that matches present-day observations of mass loss. VarGlaS predicts a thinning in the interior and thickening of the margins of the ice sheet.
A hybrid interface tracking - level set technique for multiphase flow with soluble surfactant

NASA Astrophysics Data System (ADS)

Shin, Seungwon; Chergui, Jalel; Juric, Damir; Kahouadji, Lyes; Matar, Omar K.; Craster, Richard V.

2018-04-01

A formulation for soluble surfactant transport in multiphase flows recently presented by Muradoglu and Tryggvason (JCP 274 (2014) 737-757) [17] is adapted to the context of the Level Contour Reconstruction Method, LCRM, (Shin et al. IJNMF 60 (2009) 753-778, [8]) which is a hybrid method that combines the advantages of the Front-tracking and Level Set methods. Particularly close attention is paid to the formulation and numerical implementation of the surface gradients of surfactant concentration and surface tension. Various benchmark tests are performed to demonstrate the accuracy of different elements of the algorithm. To verify surfactant mass conservation, values for surfactant diffusion along the interface are compared with the exact solution for the problem of uniform expansion of a sphere. The numerical implementation of the discontinuous boundary condition for the source term in the bulk concentration is compared with the approximate solution. Surface tension forces are tested for Marangoni drop translation. Our numerical results for drop deformation in simple shear are compared with experiments and results from previous simulations. All benchmarking tests compare well with existing data thus providing confidence that the adapted LCRM formulation for surfactant advection and diffusion is accurate and effective in three-dimensional multiphase flows with a structured mesh. We also demonstrate that this approach applies easily to massively parallel simulations.
First Applications of the New Parallel Krylov Solver for MODFLOW on a National and Global Scale

NASA Astrophysics Data System (ADS)

Verkaik, J.; Hughes, J. D.; Sutanudjaja, E.; van Walsum, P.

2016-12-01

Integrated high-resolution hydrologic models are increasingly being used for evaluating water management measures at field scale. Their drawbacks are large memory requirements and long run times. Examples of such models are The Netherlands Hydrological Instrument (NHI) model and the PCRaster Global Water Balance (PCR-GLOBWB) model. Typical simulation periods are 30-100 years with daily timesteps. The NHI model predicts water demands in periods of drought, supporting operational and long-term water-supply decisions. The NHI is a state-of-the-art coupling of several models: a 7-layer MODFLOW groundwater model ( 6.5M 250m cells), a MetaSWAP model for the unsaturated zone (Richards emulator of 0.5M cells), and a surface water model (MOZART-DM). The PCR-GLOBWB model provides a grid-based representation of global terrestrial hydrology and this work uses the version that includes a 2-layer MODFLOW groundwater model ( 4.5M 10km cells). The Parallel Krylov Solver (PKS) speeds up computation by both distributed memory parallelization (Message Passing Interface) and shared memory parallelization (Open Multi-Processing). PKS includes conjugate gradient, bi-conjugate gradient stabilized, and generalized minimal residual linear accelerators that use an overlapping additive Schwarz domain decomposition preconditioner. PKS can be used for both structured and unstructured grids and has been fully integrated in MODFLOW-USG using METIS partitioning and in iMODFLOW using RCB partitioning. iMODFLOW is an accelerated version of MODFLOW-2005 that is implicitly and online coupled to MetaSWAP. Results for benchmarks carried out on the Cartesius Dutch supercomputer (https://userinfo.surfsara.nl/systems/cartesius) for the PCRGLOB-WB model and on a 2x16 core Windows machine for the NHI model show speedups up to 10-20 and 5-10, respectively.
Visualization of Octree Adaptive Mesh Refinement (AMR) in Astrophysical Simulations

NASA Astrophysics Data System (ADS)

Labadens, M.; Chapon, D.; Pomaréde, D.; Teyssier, R.

2012-09-01

Computer simulations are important in current cosmological research. Those simulations run in parallel on thousands of processors, and produce huge amount of data. Adaptive mesh refinement is used to reduce the computing cost while keeping good numerical accuracy in regions of interest. RAMSES is a cosmological code developed by the Commissariat à l'énergie atomique et aux énergies alternatives (English: Atomic Energy and Alternative Energies Commission) which uses Octree adaptive mesh refinement. Compared to grid based AMR, the Octree AMR has the advantage to fit very precisely the adaptive resolution of the grid to the local problem complexity. However, this specific octree data type need some specific software to be visualized, as generic visualization tools works on Cartesian grid data type. This is why the PYMSES software has been also developed by our team. It relies on the python scripting language to ensure a modular and easy access to explore those specific data. In order to take advantage of the High Performance Computer which runs the RAMSES simulation, it also uses MPI and multiprocessing to run some parallel code. We would like to present with more details our PYMSES software with some performance benchmarks. PYMSES has currently two visualization techniques which work directly on the AMR. The first one is a splatting technique, and the second one is a custom ray tracing technique. Both have their own advantages and drawbacks. We have also compared two parallel programming techniques with the python multiprocessing library versus the use of MPI run. The load balancing strategy has to be smartly defined in order to achieve a good speed up in our computation. Results obtained with this software are illustrated in the context of a massive, 9000-processor parallel simulation of a Milky Way-like galaxy.
Scalable direct Vlasov solver with discontinuous Galerkin method on unstructured mesh.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, J.; Ostroumov, P. N.; Mustapha, B.

2010-12-01

This paper presents the development of parallel direct Vlasov solvers with discontinuous Galerkin (DG) method for beam and plasma simulations in four dimensions. Both physical and velocity spaces are in two dimesions (2P2V) with unstructured mesh. Contrary to the standard particle-in-cell (PIC) approach for kinetic space plasma simulations, i.e., solving Vlasov-Maxwell equations, direct method has been used in this paper. There are several benefits to solving a Vlasov equation directly, such as avoiding noise associated with a finite number of particles and the capability to capture fine structure in the plasma. The most challanging part of a direct Vlasov solvermore » comes from higher dimensions, as the computational cost increases as N{sup 2d}, where d is the dimension of the physical space. Recently, due to the fast development of supercomputers, the possibility has become more realistic. Many efforts have been made to solve Vlasov equations in low dimensions before; now more interest has focused on higher dimensions. Different numerical methods have been tried so far, such as the finite difference method, Fourier Spectral method, finite volume method, and spectral element method. This paper is based on our previous efforts to use the DG method. The DG method has been proven to be very successful in solving Maxwell equations, and this paper is our first effort in applying the DG method to Vlasov equations. DG has shown several advantages, such as local mass matrix, strong stability, and easy parallelization. These are particularly suitable for Vlasov equations. Domain decomposition in high dimensions has been used for parallelization; these include a highly scalable parallel two-dimensional Poisson solver. Benchmark results have been shown and simulation results will be reported.« less
Real-time processing of radar return on a parallel computer

NASA Technical Reports Server (NTRS)

Aalfs, David D.

1992-01-01

NASA is working with the FAA to demonstrate the feasibility of pulse Doppler radar as a candidate airborne sensor to detect low altitude windshears. The need to provide the pilot with timely information about possible hazards has motivated a demand for real-time processing of a radar return. Investigated here is parallel processing as a means of accommodating the high data rates required. A PC based parallel computer, called the transputer, is used to investigate issues in real time concurrent processing of radar signals. A transputer network is made up of an array of single instruction stream processors that can be networked in a variety of ways. They are easily reconfigured and software development is largely independent of the particular network topology. The performance of the transputer is evaluated in light of the computational requirements. A number of algorithms have been implemented on the transputers in OCCAM, a language specially designed for parallel processing. These include signal processing algorithms such as the Fast Fourier Transform (FFT), pulse-pair, and autoregressive modelling, as well as routing software to support concurrency. The most computationally intensive task is estimating the spectrum. Two approaches have been taken on this problem, the first and most conventional of which is to use the FFT. By using table look-ups for the basis function and other optimizing techniques, an algorithm has been developed that is sufficient for real time. The other approach is to model the signal as an autoregressive process and estimate the spectrum based on the model coefficients. This technique is attractive because it does not suffer from the spectral leakage problem inherent in the FFT. Benchmark tests indicate that autoregressive modeling is feasible in real time.
Automatic data partitioning on distributed memory multicomputers. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Gupta, Manish

1992-01-01

Distributed-memory parallel computers are increasingly being used to provide high levels of performance for scientific applications. Unfortunately, such machines are not very easy to program. A number of research efforts seek to alleviate this problem by developing compilers that take over the task of generating communication. The communication overheads and the extent of parallelism exploited in the resulting target program are determined largely by the manner in which data is partitioned across different processors of the machine. Most of the compilers provide no assistance to the programmer in the crucial task of determining a good data partitioning scheme. A novel approach is presented, the constraints-based approach, to the problem of automatic data partitioning for numeric programs. In this approach, the compiler identifies some desirable requirements on the distribution of various arrays being referenced in each statement, based on performance considerations. These desirable requirements are referred to as constraints. For each constraint, the compiler determines a quality measure that captures its importance with respect to the performance of the program. The quality measure is obtained through static performance estimation, without actually generating the target data-parallel program with explicit communication. Each data distribution decision is taken by combining all the relevant constraints. The compiler attempts to resolve any conflicts between constraints such that the overall execution time of the parallel program is minimized. This approach has been implemented as part of a compiler called Paradigm, that accepts Fortran 77 programs, and specifies the partitioning scheme to be used for each array in the program. We have obtained results on some programs taken from the Linpack and Eispack libraries, and the Perfect Benchmarks. These results are quite promising, and demonstrate the feasibility of automatic data partitioning for a significant class of scientific application programs with regular computations.
77 FR 39489 - Notice of Public Meetings for the Naval Air Station Key West Airfield Operations Draft...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-07-03

... via the U.S. Postal Service to Naval Facilities Engineering Command Southeast, NAS Key West Air... the project Web site ( http://www.keywesteis.com ). All statements, oral or written, submitted during... Engineering Command Southeast, NAS Key West Air Operations EIS Project Manager, P.O. Box 30, Building 903, NAS...
77 FR 43275 - Extension of Public Comment Period for the Draft Environmental Impact Statement for Naval Air...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-07-24

... Facilities Engineering Command Southeast, NAS Key West Air Operations EIS Project Manager, P.O. Box 30... Facilities Engineering Command Southeast, NAS Key West Air Operations EIS Project Manager, P.O. Box 30, Building 903, NAS Jacksonville, FL 32212 or electronically via the project Web site ( http://www.keywesteis...
78 FR 12951 - TRICARE; Elimination of the Non-Availability Statement (NAS) Requirement for Non-Emergency...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-02-26

... There are no anticipated budgetary health care cost increases. Requiring a NAS for a relatively few non... Inpatient Mental Health Care AGENCY: Office of the Secretary, Department of Defense. ACTION: Final rule... inpatient mental health care in order for a TRICARE Standard beneficiary's claim to be paid. Currently, NAS...
Assessing the Progress of New American Schools: A Status Report.

ERIC Educational Resources Information Center

Berends, Mark

This report presents a data-collection plan for addressing these questions: (1) What were the New American Schools (NAS) like before they implemented their restructuring design efforts? (2) How have these evolved over time? (3) Are the critical components of the NAS designs being implemented across a wide array of schools? (4) Do the NAS designs…
Estimating the in situ biodegradation of naphthenic acids in oil sands process waters by HPLC/HRMS.

PubMed

Han, Xiumei; MacKinnon, Michael D; Martin, Jonathan W

2009-06-01

The oil sands industry in Northern Alberta produces large volumes of oil sands process water (OSPW) containing high concentrations of persistent naphthenic acids (NAs; C(n)H(2n+Z)O(2)). Due to the growing volumes of OSPW that need to be reclaimed, it is important to understand the fate of NAs in aquatic systems. A recent laboratory study revealed several potential markers of microbial biodegradation for NAs; thus here we examined for these signatures in field-aged OSPW on the site of Syncrude Canada Ltd. (Fort McMurray, AB). NA concentrations were lower in older OSPW; however parent NA signatures were remarkably similar among all OSPW samples examined, with no discernible enrichment of the highly cyclic fraction as was observed in the laboratory. Comparison of NA signatures in fresh oil sands ore extracts to OSPW in active settling basins, however, suggested that the least cyclic fraction (i.e. Z=0 and Z=-2 homologues) may undergo relatively rapid biodegradation in active settling basins. Further evidence for biodegradation of NAs came from a significantly higher proportion of oxidized NAs (i.e. C(n)H(2n+Z)O(3)+C(n)H(2n+Z)O(4)) in the oldest OSPW from experimental reclamation ponds. Taken together, there is indirect evidence for rapid biodegradation of relatively labile Z=0 and Z=-2 NAs in active settling basins, but the remaining steady-state fraction of NAs in OSPW appear to be very recalcitrant, with half-lives on the order of 12.8-13.6 years. Alternative fate mechanisms to explain the slow disappearance of parent NAs from OSPW are discussed, including adsorption and atmospheric partitioning.
Outpatient Management of Neonatal Abstinence Syndrome: A Quality Improvement Project.

PubMed

Chau, Kim T; Nguyen, Jacqueline; Miladinovic, Branko; Lilly, Carol M; Ashmeade, Terri L; Balakrishnan, Maya

2016-11-01

An increasing number of infants are diagnosed with neonatal abstinence syndrome (NAS). The study's primary objectives were to describe an academic medical center's level IV neonatal ICU's (NICU's) comprehensive outpatient NAS management effort, measure guideline compliance, and assess its safety. Secondary objectives were to describe the duration and cumulative methadone exposure, and to improve parent and provider knowledge of NAS. The study included 22 infants having a gestational age of 35-41 weeks, diagnosed with NAS, and discharged for outpatient methadone management. Discharges spanned 10 months and included 3 improvement periods. The outpatient program includes comprehensive discharge planning, a focused electronic health record (EHR) template, management guidelines, and parent and provider education. Providers complied with using the outpatient management guideline and EHR template, and assessed weight, NAS symptoms, and methadone dose during appointments. Two infants required NAS-related hospital readmission in the study period. From improvement period 1 to period 3 there was no difference in total outpatient days on methadone (58, 53, 74 days, respectively) or cumulative methadone dose (2.7, 2.6, 3.1mg/kg, respectively). A downward trend pattern in cumulative methadone exposure was noted in improvement period 2. Pre- and postimplementation surveys revealed that after implementation, parents had better understanding of NAS before delivery (71% vs. 100%, p = 0.009), while providers had increased comfort with outpatient management (24% vs. 67%, p < 0.001) and educating parents (48% vs. 82%, p = 0.001). This preliminary study suggests that outpatient NAS management can be safe when a comprehensive management program is implemented and can result in provider compliance with the program. Copyright 2016 The Joint Commission.
Colloidal properties of single component naphthenic acids and complex naphthenic acid mixtures.

PubMed

Mohamed, Mohamed H; Wilson, Lee D; Peru, Kerry M; Headley, John V

2013-04-01

Tensiometry was used to provide estimates of the critical micelle concentration (cmc) values for three sources of naphthenic acids (NAs) and three examples of single component NAs (S1-S3) in aqueous solution at pH 10.5 and 295 K. Two commercially available mixtures of NAs and an industrially derived mixture of NAs obtained from Alberta oil sands process water (OSPW) were investigated. The three examples of single component NAs (C(n)H(2n+z)O2) were chosen with variable z-series to represent chemical structures with 0-2 rings, as follows: 2-hexyldecanoic acid (z=0; S1), trans-4-pentylcyclohexanecarboxylic acid (z=-2; S2) and dicyclohexylacetic acid (z=-4; S3). The estimated cmc values for S1 (35.6 μM), S2 (0.545 mM), and S3 (4.71 mM) vary over a wide range according to their relative lipophile characteristics of each carboxylate anion. The cmc values for the three complex mixtures of NAs were evaluated. Two disctinct cmc values were observed (second listed in brackets) as follows: Commercial sample 1; 50.9 μM (109 μM), Commercial sample 2; 22.3 μM (52.2 μM), and Alberta derived OSPW; 154 μM (417 μM). These results provide strong support favouring two general classes of NAs in the mixtures investigated with distinct cmc values. We propose that the two groups may be linked to a recalcitrant fraction with a relatively large range of cmc values (52.2-417 μM) and a readily biodegradable fraction with a relatively low range of cmc values (22.3-154 μM) depending on the source of NAs in a given mixture. Copyright © 2013 Elsevier Inc. All rights reserved.
Low-Energy Water Recovery from Subsurface Brines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Choi, Young Chul; Kim, Gyu Dong; Hendren, Zachary

A novel non-aqueous phase solvent (NAS) desalination process was proposed and developed in this research project. The NAS desalination process uses less energy than thermal processes, doesn’t require any additional chemicals for precipitation, and can be utilized to treat high TDS brine. In this project, our experimental work determined that water solubility changes and selective absorption are the key characteristics of NAS technology for successful desalination. Three NAS desalination mechanisms were investigated: (1) CO2 switchable, (2) high-temp absorption to low-temp desorption (thermally switchable), and (3) low-temp absorption to high-temp desorption (thermally switchable). Among these mechanisms, thermally switchable (low-temp absorption tomore » high-temp desorption) showed the highest water recovery and relatively high salt rejection. A test procedure for semi-continuous, bench scale NAS desalination process was also developed and used to assess performance under a range of conditions.« less
Nurse Assistant Communication Strategies About Pressure Ulcers in Nursing Homes.

PubMed

Alexander, Gregory L

2015-07-01

There is growing recognition of benefits of sophisticated information technology (IT) in nursing homes (NHs). In this research, we explore strategies nursing assistants (NAs) use to communicate pressure ulcer prevention practices in NHs with variable IT sophistication measures. Primary qualitative data were collected during focus groups with NAs in 16 NHs located across Missouri. NAs (n = 213) participated in 31 focus groups. Three major themes referencing communication strategies for pressure ulcer prevention were identified, including Passing on Information, Keeping Track of Needs and Information Access. NAs use a variety of strategies to prioritize care, and strategies are different based on IT sophistication level. NA work is an important part of patient care. However, little information about their work is included in communication, leaving patient records incomplete. NAs' communication is becoming increasingly important in the care of the millions of chronically ill elders in NHs. © The Author(s) 2014.
A new code SORD for simulation of polarized light scattering in the Earth atmosphere

NASA Astrophysics Data System (ADS)

Korkin, Sergey; Lyapustin, Alexei; Sinyuk, Aliaksandr; Holben, Brent

2016-05-01

We report a new publicly available radiative transfer (RT) code for numerical simulation of polarized light scattering in plane-parallel Earth atmosphere. Using 44 benchmark tests, we prove high accuracy of the new RT code, SORD (Successive ORDers of scattering1, 2). We describe capabilities of SORD and show run time for each test on two different machines. At present, SORD is supposed to work as part of the Aerosol Robotic NETwork3 (AERONET) inversion algorithm. For natural integration with the AERONET software, SORD is coded in Fortran 90/95. The code is available by email request from the corresponding (first) author or from ftp://climate1.gsfc.nasa.gov/skorkin/SORD/ or ftp://maiac.gsfc.nasa.gov/pub/SORD.zip
Katome: de novo DNA assembler implemented in rust

NASA Astrophysics Data System (ADS)

Neumann, Łukasz; Nowak, Robert M.; Kuśmirek, Wiktor

2017-08-01

Katome is a new de novo sequence assembler written in the Rust programming language, designed with respect to future parallelization of the algorithms, run time and memory usage optimization. The application uses new algorithms for the correct assembly of repetitive sequences. Performance and quality tests were performed on various data, comparing the new application to `dnaasm', `ABySS' and `Velvet' genome assemblers. Quality tests indicate that the new assembler creates more contigs than well-established solutions, but the contigs have better quality with regard to mismatches per 100kbp and indels per 100kbp. Additionally, benchmarks indicate that the Rust-based implementation outperforms `dnaasm', `ABySS' and `Velvet' assemblers, written in C++, in terms of assembly time. Lower memory usage in comparison to `dnaasm' is observed.

Verification of TEMPEST with neoclassical transport theory

NASA Astrophysics Data System (ADS)

Xiong, Z.; Cohen, B. I.; Cohen, R. H.; Dorr, M.; Hittinger, J.; Kerbel, G.; Nevins, W. M.; Rognlien, T.; Umansky, M.; Xu, X.

2006-10-01

TEMPEST is an edge gyro-kinetic continuum code developed to study boundary plasma transport over the region extending from the H-mode pedestal across the separatrix to the divertor plates. For benchmark purposes, we present results from the 4D (2r,2v) TEMPEST for both steady-state transport and time-dependent Geodesic Acoustic Modes (GAMs). We focus on an annular region inside the separatrix of a circular cross-section tokamak where analytical and numerical results are available. The parallel flow velocity and radial particle flux are obtained for different collisional regimes and compared with previous neoclassical results. The effect of radial electric field and the transition to steep edge gradients is emphasized. The dynamical response of GAMs is also shown and compared to recent theory.
Genetically improved BarraCUDA.

PubMed

Langdon, W B; Lam, Brian Yee Hong

2017-01-01

BarraCUDA is an open source C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. Recently its source code was optimised using "Genetic Improvement". The genetically improved (GI) code is up to three times faster on short paired end reads from The 1000 Genomes Project and 60% more accurate on a short BioPlanet.com GCAT alignment benchmark. GPGPU BarraCUDA running on a single K80 Tesla GPU can align short paired end nextGen sequences up to ten times faster than bwa on a 12 core server. The speed up was such that the GI version was adopted and has been regularly downloaded from SourceForge for more than 12 months.
GRADSPMHD: A parallel MHD code based on the SPH formalism

NASA Astrophysics Data System (ADS)

Vanaverbeke, S.; Keppens, R.; Poedts, S.

2014-03-01

We present GRADSPMHD, a completely Lagrangian parallel magnetohydrodynamics code based on the SPH formalism. The implementation of the equations of SPMHD in the “GRAD-h” formalism assembles known results, including the derivation of the discretized MHD equations from a variational principle, the inclusion of time-dependent artificial viscosity, resistivity and conductivity terms, as well as the inclusion of a mixed hyperbolic/parabolic correction scheme for satisfying the ∇ṡB→ constraint on the magnetic field. The code uses a tree-based formalism for neighbor finding and can optionally use the tree code for computing the self-gravity of the plasma. The structure of the code closely follows the framework of our parallel GRADSPH FORTRAN 90 code which we added previously to the CPC program library. We demonstrate the capabilities of GRADSPMHD by running 1, 2, and 3 dimensional standard benchmark tests and we find good agreement with previous work done by other researchers. The code is also applied to the problem of simulating the magnetorotational instability in 2.5D shearing box tests as well as in global simulations of magnetized accretion disks. We find good agreement with available results on this subject in the literature. Finally, we discuss the performance of the code on a parallel supercomputer with distributed memory architecture. Catalogue identifier: AERP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERP_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 620503 No. of bytes in distributed program, including test data, etc.: 19837671 Distribution format: tar.gz Programming language: FORTRAN 90/MPI. Computer: HPC cluster. Operating system: Unix. Has the code been vectorized or parallelized?: Yes, parallelized using MPI. RAM: ˜30 MB for a Sedov test including 15625 particles on a single CPU. Classification: 12. Nature of problem: Evolution of a plasma in the ideal MHD approximation. Solution method: The equations of magnetohydrodynamics are solved using the SPH method. Running time: The test provided takes approximately 20 min using 4 processors.
Improved packing of protein side chains with parallel ant colonies

PubMed Central

2014-01-01

Introduction The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. Methods We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. Results We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. Conclusions This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms. PMID:25474164
Parallelization Issues and Particle-In Codes.

NASA Astrophysics Data System (ADS)

Elster, Anne Cathrine

1994-01-01

"Everything should be made as simple as possible, but not simpler." Albert Einstein. The field of parallel scientific computing has concentrated on parallelization of individual modules such as matrix solvers and factorizers. However, many applications involve several interacting modules. Our analyses of a particle-in-cell code modeling charged particles in an electric field, show that these accompanying dependencies affect data partitioning and lead to new parallelization strategies concerning processor, memory and cache utilization. Our test-bed, a KSR1, is a distributed memory machine with a globally shared addressing space. However, most of the new methods presented hold generally for hierarchical and/or distributed memory systems. We introduce a novel approach that uses dual pointers on the local particle arrays to keep the particle locations automatically partially sorted. Complexity and performance analyses with accompanying KSR benchmarks, have been included for both this scheme and for the traditional replicated grids approach. The latter approach maintains load-balance with respect to particles. However, our results demonstrate it fails to scale properly for problems with large grids (say, greater than 128-by-128) running on as few as 15 KSR nodes, since the extra storage and computation time associated with adding the grid copies, becomes significant. Our grid partitioning scheme, although harder to implement, does not need to replicate the whole grid. Consequently, it scales well for large problems on highly parallel systems. It may, however, require load balancing schemes for non-uniform particle distributions. Our dual pointer approach may facilitate this through dynamically partitioned grids. We also introduce hierarchical data structures that store neighboring grid-points within the same cache -line by reordering the grid indexing. This alignment produces a 25% savings in cache-hits for a 4-by-4 cache. A consideration of the input data's effect on the simulation may lead to further improvements. For example, in the case of mean particle drift, it is often advantageous to partition the grid primarily along the direction of the drift. The particle-in-cell codes for this study were tested using physical parameters, which lead to predictable phenomena including plasma oscillations and two-stream instabilities. An overview of the most central references related to parallel particle codes is also given.
Assessing 1D Atmospheric Solar Radiative Transfer Models: Interpretation and Handling of Unresolved Clouds.

NASA Astrophysics Data System (ADS)

Barker, H. W.; Stephens, G. L.; Partain, P. T.; Bergman, J. W.; Bonnel, B.; Campana, K.; Clothiaux, E. E.; Clough, S.; Cusack, S.; Delamere, J.; Edwards, J.; Evans, K. F.; Fouquart, Y.; Freidenreich, S.; Galin, V.; Hou, Y.; Kato, S.; Li, J.; Mlawer, E.; Morcrette, J.-J.; O'Hirok, W.; Räisänen, P.; Ramaswamy, V.; Ritter, B.; Rozanov, E.; Schlesinger, M.; Shibata, K.; Sporyshev, P.; Sun, Z.; Wendisch, M.; Wood, N.; Yang, F.

2003-08-01

The primary purpose of this study is to assess the performance of 1D solar radiative transfer codes that are used currently both for research and in weather and climate models. Emphasis is on interpretation and handling of unresolved clouds. Answers are sought to the following questions: (i) How well do 1D solar codes interpret and handle columns of information pertaining to partly cloudy atmospheres? (ii) Regardless of the adequacy of their assumptions about unresolved clouds, do 1D solar codes perform as intended?One clear-sky and two plane-parallel, homogeneous (PPH) overcast cloud cases serve to elucidate 1D model differences due to varying treatments of gaseous transmittances, cloud optical properties, and basic radiative transfer. The remaining four cases involve 3D distributions of cloud water and water vapor as simulated by cloud-resolving models. Results for 25 1D codes, which included two line-by-line (LBL) models (clear and overcast only) and four 3D Monte Carlo (MC) photon transport algorithms, were submitted by 22 groups. Benchmark, domain-averaged irradiance profiles were computed by the MC codes. For the clear and overcast cases, all MC estimates of top-of-atmosphere albedo, atmospheric absorptance, and surface absorptance agree with one of the LBL codes to within ±2%. Most 1D codes underestimate atmospheric absorptance by typically 15-25 W m-2 at overhead sun for the standard tropical atmosphere regardless of clouds.Depending on assumptions about unresolved clouds, the 1D codes were partitioned into four genres: (i) horizontal variability, (ii) exact overlap of PPH clouds, (iii) maximum/random overlap of PPH clouds, and (iv) random overlap of PPH clouds. A single MC code was used to establish conditional benchmarks applicable to each genre, and all MC codes were used to establish the full 3D benchmarks. There is a tendency for 1D codes to cluster near their respective conditional benchmarks, though intragenre variances typically exceed those for the clear and overcast cases. The majority of 1D codes fall into the extreme category of maximum/random overlap of PPH clouds and thus generally disagree with full 3D benchmark values. Given the fairly limited scope of these tests and the inability of any one code to perform extremely well for all cases begs the question that a paradigm shift is due for modeling 1D solar fluxes for cloudy atmospheres.
Looking Backward: Parting Reflections on Higher Education Reform from NAS's Founding President

ERIC Educational Resources Information Center

Balch, Stephen H.

2012-01-01

Twenty-five years at the helm of the National Association of Scholars (NAS) have left the author with vivid memories: of knocks and bruises, peaks of exhilaration and, especially, unforgettable characters. But as for lessons learned, that's a very different story. In this article, the author shares some of the successes that happened in NAS for…
High Latitude Electromagnetic Plasma Wave Emissions.

DTIC Science & Technology

1982-05-01

supported by NASA through contracts NAS5-26819 and NAS5-25690 with Goddard Space Flight Center, and grants NGL-16-0O1- 002 and NGL-16-001- 043 from NASA...Grants NGL- 16-001- 043 and NGL-16-001-002 from NASA Headquarters and Contracts NAS5-26819 and NAS5-25690 with Goddard Space Flight Center. .J...c w 0 (n0 (00 cr -J <O -- cr. w 0 u0 QC\\ Il Lii 03 0 ODCJ \\ 0T 0i 06 ( L 00 -L~w (zH) kON~fnO38J z) cr- >U)U N N< 0r U wzL >w woo NOM
UAS in the NAS: Survey Responses by ATC, Manned Aircraft Pilots, and UAS Pilots

NASA Technical Reports Server (NTRS)

Comstock, James R., Jr.; McAdaragh, Raymon; Ghatas, Rania W.; Burdette, Daniel W.; Trujillo, Anna C.

2014-01-01

NASA currently is working with industry and the Federal Aviation Administration (FAA) to establish future requirements for Unmanned Aircraft Systems (UAS) flying in the National Airspace System (NAS). To work these issues NASA has established a multi-center "UAS Integration in the NAS" project. In order to establish Ground Control Station requirements for UAS, the perspective of each of the major players in NAS operations was desired. Three on-line surveys were administered that focused on Air Traffic Controllers (ATC), pilots of manned aircraft, and pilots of UAS. Follow-up telephone interviews were conducted with some survey respondents. The survey questions addressed UAS control, navigation, and communications from the perspective of small and large unmanned aircraft. Questions also addressed issues of UAS equipage, especially with regard to sense and avoid capabilities. From the civilian ATC and military ATC perspectives, of particular interest are how mixed operations (manned / UAS) have worked in the past and the role of aircraft equipage. Knowledge gained from this information is expected to assist the NASA UAS Integration in the NAS project in directing research foci thus assisting the FAA in the development of rules, regulations, and policies related to UAS in the NAS.
UAS in the NAS: Survey Responses by ATC, Manned Aircraft Pilots, and UAS Pilots

NASA Technical Reports Server (NTRS)

Comstock, James R., Jr.; McAdaragh, Raymon; Ghatas, Rania W.; Burdette, Daniel W.; Trujillo, Anna C.

2013-01-01

NASA currently is working with industry and the Federal Aviation Administration (FAA) to establish future requirements for Unmanned Aircraft Systems (UAS) flying in the National Airspace System (NAS). To work these issues NASA has established a multi-center UAS Integration in the NAS project. In order to establish Ground Control Station requirements for UAS, the perspective of each of the major players in NAS operations was desired. Three on-line surveys were administered that focused on Air Traffic Controllers (ATC), pilots of manned aircraft, and pilots of UAS. Follow-up telephone interviews were conducted with some survey respondents. The survey questions addressed UAS control, navigation, and communications from the perspective of small and large unmanned aircraft. Questions also addressed issues of UAS equipage, especially with regard to sense and avoid capabilities. From the ATC and military ATC perspective, of particular interest is how mixed-operations (manned/UAS) have worked in the past and the role of aircraft equipage. Knowledge gained from this information is expected to assist the NASA UAS in the NAS project in directing research foci thus assisting the FAA in the development of rules, regulations, and policies related to UAS in the NAS.
NAS Experiences of Porting CM Fortran Codes to HPF on IBM SP2 and SGI Power Challenge

NASA Technical Reports Server (NTRS)

Saini, Subhash

1995-01-01

Current Connection Machine (CM) Fortran codes developed for the CM-2 and the CM-5 represent an important class of parallel applications. Several users have employed CM Fortran codes in production mode on the CM-2 and the CM-5 for the last five to six years, constituting a heavy investment in terms of cost and time. With Thinking Machines Corporation's decision to withdraw from the hardware business and with the decommissioning of many CM-2 and CM-5 machines, the best way to protect the substantial investment in CM Fortran codes is to port the codes to High Performance Fortran (HPF) on highly parallel systems. HPF is very similar to CM Fortran and thus represents a natural transition. Conversion issues involved in porting CM Fortran codes on the CM-5 to HPF are presented. In particular, the differences between data distribution directives and the CM Fortran Utility Routines Library, as well as the equivalent functionality in the HPF Library are discussed. Several CM Fortran codes (Cannon algorithm for matrix-matrix multiplication, Linear solver Ax=b, 1-D convolution for 2-D datasets, Laplace's Equation solver, and Direct Simulation Monte Carlo (DSMC) codes have been ported to Subset HPF on the IBM SP2 and the SGI Power Challenge. Speedup ratios versus number of processors for the Linear solver and DSMC code are presented.
Comprehensive Phylogenetic Analysis of Bovine Non-aureus Staphylococci Species Based on Whole-Genome Sequencing

PubMed Central

Naushad, Sohail; Barkema, Herman W.; Luby, Christopher; Condas, Larissa A. Z.; Nobrega, Diego B.; Carson, Domonique A.; De Buck, Jeroen

2016-01-01

Non-aureus staphylococci (NAS), a heterogeneous group of a large number of species and subspecies, are the most frequently isolated pathogens from intramammary infections in dairy cattle. Phylogenetic relationships among bovine NAS species are controversial and have mostly been determined based on single-gene trees. Herein, we analyzed phylogeny of bovine NAS species using whole-genome sequencing (WGS) of 441 distinct isolates. In addition, evolutionary relationships among bovine NAS were estimated from multilocus data of 16S rRNA, hsp60, rpoB, sodA, and tuf genes and sequences from these and numerous other single genes/proteins. All phylogenies were created with FastTree, Maximum-Likelihood, Maximum-Parsimony, and Neighbor-Joining methods. Regardless of methodology, WGS-trees clearly separated bovine NAS species into five monophyletic coherent clades. Furthermore, there were consistent interspecies relationships within clades in all WGS phylogenetic reconstructions. Except for the Maximum-Parsimony tree, multilocus data analysis similarly produced five clades. There were large variations in determining clades and interspecies relationships in single gene/protein trees, under different methods of tree constructions, highlighting limitations of using single genes for determining bovine NAS phylogeny. However, based on WGS data, we established a robust phylogeny of bovine NAS species, unaffected by method or model of evolutionary reconstructions. Therefore, it is now possible to determine associations between phylogeny and many biological traits, such as virulence, antimicrobial resistance, environmental niche, geographical distribution, and host specificity. PMID:28066335
Study on the interactions between toxic nitroanilines and lysozyme by spectroscopic approaches and molecular modeling.

PubMed

Gu, Yunlan; Wang, Yanqing; Zhang, Hongmei

2018-05-05

Being exogenous environmental pollutants, nitroanilines (NAs) are highly toxic and have mutagenic and carcinogenic activity. Being lack of studies on interactions between NAs and lysozyme at molecular level, the binding interactions of lysozyme with o-nitroaniline (oNA), m-nitroaniline (mNA) and p-nitroaniline (pNA) were investigated by means of steady-state fluorescence, synchronous fluorescence, UV-vis absorption spectroscopy, as well as molecular modeling. The experimental results revealed that the fluorescence of lysozyme is quenched by oNA and mNA through a static quenching, while the fluorescence quenching triggered by pNA is a combined dynamic and static quenching. The number of binding sites (n) and the binding constant (K b ) corresponding thermodynamic parameters ΔH ⊖ , ΔS ⊖ , ΔG ⊖ at different temperatures were calculated. The reactions between NAs and lysozyme were spontaneous and entropy driven and the binding of NAs to lysozyme induced conformation changes of lysozyme. The difference of the position of -NO 2 group affected the binding and the binding constants K b decreased in the following pattern: K b (pNA) >K b (mNA) >K b (oNA). Molecular docking studies were performed to reveal the most favorable binding sites of NAs on lysozyme. Our recently results could offer mechanistic insights into the nature of the binding interactions between NAs and lysozyme and provide information about the toxicity risk of NAs to human health. Copyright © 2018 Elsevier B.V. All rights reserved.
Comparative bladder tumor promoting activity of sodium saccharin, sodium ascorbate, related acids, and calcium salts in rats.

PubMed

Cohen, S M; Ellwein, L B; Okamura, T; Masui, T; Johansson, S L; Smith, R A; Wehner, J M; Khachab, M; Chappel, C I; Schoenig, G P

1991-04-01

Sodium saccharin and sodium ascorbate are known to promote urinary bladder carcinogenesis in rats following initiation with N-[4-(5-nitro-2-furyl)-2-thiazolyl]formamide (FANFT) or N-butyl-N-(4-hydroxybutyl) nitrosamine. Sodium salts of other organic acids have also been shown to be bladder tumor promoters. In addition, these substances increase urothelial proliferation in short term assays in rats when fed at high doses. When they have been tested, the acid forms of these salts are without either promoting or cell proliferative inducing activity. The following experiment was designed to compare the tumor promoting activity of various forms of saccharin and to evaluate the role in promotion of urinary sodium, calcium, and pH as well as other factors. Twenty groups of 40 male F344 rats, 5 weeks of age, were fed either FANFT or control diet during a 6-week initiation phase followed by feeding of a test compound for 72 weeks in the second phase. The chemicals were administered to the first 18 groups in Agway Prolab 3200 diet and the last 2 groups were fed NIH-07 diet. The treatments were as follows: (a) FANFT----5% sodium saccharin (NaS); (b) FANFT----3% NaS; (c) FANFT----5.2% calcium saccharin (CaS); (d) FANFT----3.12% CaS; (e) FANFT----4.21% acid saccharin (S); (f) FANFT----2.53% S; (g) FANFT----5% sodium ascorbate; (h) FANFT----4.44% ascorbic acid; (i) FANFT----5% NaS plus 1.15% CaCO3; (j) FANFT----5.2% CaS plus 1.34% NaCl; (k) FANFT----5% NaS plus 1.23% NH4Cl; (l) FANFT----1.15% CaCO3; (m) FANFT----1.34% NaCl; (n) FANFT----control; (o) control----5% NaS; (p) control----5.2% CaS; (q) control----4.21% S; (r) Control----control; (s) FANFT----5% NaS (NIH-07 diet); (t) FANFT----control (NIH-07 diet). NaS, CaS and S without prior FANFT administration were without tumorigenic activity. NaS was found to have tumor promoting activity, showing a positive response at the 5 and 3% dose levels, with significantly greater activity at the higher dose. CaS had slight tumor promoting activity but without a dose response, and S showed no tumor promoting activity. In addition, NaCl showed weak tumor promoting activity, but CaCO3 was without activity. NH4Cl completely inhibited the tumor promoting activity of NaS when concurrently administered with it. NaCl administered with CaS or CaCO3 administered with NaS showed activity similar to that of NaS. Sodium ascorbate was also shown to have tumor promoting activity, with slightly less activity than NaS. Ascorbic acid showed no tumor promoting activity.(ABSTRACT TRUNCATED AT 400 WORDS)
Merging parallel tempering with sequential geostatistical resampling for improved posterior exploration of high-dimensional subsurface categorical fields

NASA Astrophysics Data System (ADS)

Laloy, Eric; Linde, Niklas; Jacques, Diederik; Mariethoz, Grégoire

2016-04-01

The sequential geostatistical resampling (SGR) algorithm is a Markov chain Monte Carlo (MCMC) scheme for sampling from possibly non-Gaussian, complex spatially-distributed prior models such as geologic facies or categorical fields. In this work, we highlight the limits of standard SGR for posterior inference of high-dimensional categorical fields with realistically complex likelihood landscapes and benchmark a parallel tempering implementation (PT-SGR). Our proposed PT-SGR approach is demonstrated using synthetic (error corrupted) data from steady-state flow and transport experiments in categorical 7575- and 10,000-dimensional 2D conductivity fields. In both case studies, every SGR trial gets trapped in a local optima while PT-SGR maintains an higher diversity in the sampled model states. The advantage of PT-SGR is most apparent in an inverse transport problem where the posterior distribution is made bimodal by construction. PT-SGR then converges towards the appropriate data misfit much faster than SGR and partly recovers the two modes. In contrast, for the same computational resources SGR does not fit the data to the appropriate error level and hardly produces a locally optimal solution that looks visually similar to one of the two reference modes. Although PT-SGR clearly surpasses SGR in performance, our results also indicate that using a small number (16-24) of temperatures (and thus parallel cores) may not permit complete sampling of the posterior distribution by PT-SGR within a reasonable computational time (less than 1-2 weeks).
Exploiting Parallel R in the Cloud with SPRINT

PubMed Central

Piotrowski, M.; McGilvary, G.A.; Sloan, T. M.; Mewissen, M.; Lloyd, A.D.; Forster, T.; Mitchell, L.; Ghazal, P.; Hill, J.

2012-01-01

Background Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need. Objectives Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon’s Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. Methods The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences. Results It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of algorithm. Resource underutilization can further improve the time to result. End-user’s location impacts on costs due to factors such as local taxation. Conclusions: Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds. PMID:23223611
Massively Parallel Processing for Fast and Accurate Stamping Simulations

NASA Astrophysics Data System (ADS)

Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu

2005-08-01

The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.
Efficiently modeling neural networks on massively parallel computers

NASA Technical Reports Server (NTRS)

Farber, Robert M.

1993-01-01

Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.
Exploiting parallel R in the cloud with SPRINT.

PubMed

Piotrowski, M; McGilvary, G A; Sloan, T M; Mewissen, M; Lloyd, A D; Forster, T; Mitchell, L; Ghazal, P; Hill, J

2013-01-01

Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need. Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon's Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences. It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of the algorithm. Resource underutilization can further improve the time to result. End-user's location impacts on costs due to factors such as local taxation. Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds.
Performance Improvements of the CYCOFOS Flow Model

NASA Astrophysics Data System (ADS)

Radhakrishnan, Hari; Moulitsas, Irene; Syrakos, Alexandros; Zodiatis, George; Nikolaides, Andreas; Hayes, Daniel; Georgiou, Georgios C.

2013-04-01

The CYCOFOS-Cyprus Coastal Ocean Forecasting and Observing System has been operational since early 2002, providing daily sea current, temperature, salinity and sea level forecasting data for the next 4 and 10 days to end-users in the Levantine Basin, necessary for operational application in marine safety, particularly concerning oil spills and floating objects predictions. CYCOFOS flow model, similar to most of the coastal and sub-regional operational hydrodynamic forecasting systems of the MONGOOS-Mediterranean Oceanographic Network for Global Ocean Observing System is based on the POM-Princeton Ocean Model. CYCOFOS is nested with the MyOcean Mediterranean regional forecasting data and with SKIRON and ECMWF for surface forcing. The increasing demand for higher and higher resolution data to meet coastal and offshore downstream applications motivated the parallelization of the CYCOFOS POM model. This development was carried out in the frame of the IPcycofos project, funded by the Cyprus Research Promotion Foundation. The parallel processing provides a viable solution to satisfy these demands without sacrificing accuracy or omitting any physical phenomena. Prior to IPcycofos project, there are been several attempts to parallelise the POM, as for example the MP-POM. The existing parallel code models rely on the use of specific outdated hardware architectures and associated software. The objective of the IPcycofos project is to produce an operational parallel version of the CYCOFOS POM code that can replicate the results of the serial version of the POM code used in CYCOFOS. The parallelization of the CYCOFOS POM model use Message Passing Interface-MPI, implemented on commodity computing clusters running open source software and not depending on any specialized vendor hardware. The parallel CYCOFOS POM code constructed in a modular fashion, allowing a fast re-locatable downscaled implementation. The MPI takes advantage of the Cartesian nature of the POM mesh, and use the built-in functionality of MPI routines to split the mesh, using a weighting scheme, along longitude and latitude among the processors. Each server processor work on the model based on domain decomposition techniques. The new parallel CYCOFOS POM code has been benchmarked against the serial POM version of CYCOFOS for speed, accuracy, and resolution and the results are more than satisfactory. With a higher resolution CYCOFOS Levantine model domain the forecasts need much less time than the serial CYCOFOS POM coarser version, both with identical accuracy.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.