parallel benchmarks npb: Topics by Science.gov

Sample records for parallel benchmarks npb

New NAS Parallel Benchmarks Results

NASA Technical Reports Server (NTRS)

Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

1997-01-01

NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
NAS Grid Benchmarks: A Tool for Grid Space Exploration

NASA Technical Reports Server (NTRS)

Frumkin, Michael; VanderWijngaart, Rob F.; Biegel, Bryan (Technical Monitor)

2001-01-01

We present an approach for benchmarking services provided by computational Grids. It is based on the NAS Parallel Benchmarks (NPB) and is called NAS Grid Benchmark (NGB) in this paper. We present NGB as a data flow graph encapsulating an instance of an NPB code in each graph node, which communicates with other nodes by sending/receiving initialization data. These nodes may be mapped to the same or different Grid machines. Like NPB, NGB will specify several different classes (problem sizes). NGB also specifies the generic Grid services sufficient for running the bench-mark. The implementor has the freedom to choose any specific Grid environment. However, we describe a reference implementation in Java, and present some scenarios for using NGB.
NAS Grid Benchmarks. 1.0

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob; Frumkin, Michael; Biegel, Bryan A. (Technical Monitor)

2002-01-01

We provide a paper-and-pencil specification of a benchmark suite for computational grids. It is based on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks (NPB) and is called the NAS Grid Benchmarks (NGB). NGB problems are presented as data flow graphs encapsulating an instance of a slightly modified NPB task in each graph node, which communicates with other nodes by sending/receiving initialization data. Like NPB, NGB specifies several different classes (problem sizes). In this report we describe classes S, W, and A, and provide verification values for each. The implementor has the freedom to choose any language, grid environment, security model, fault tolerance/error correction mechanism, etc., as long as the resulting implementation passes the verification test and reports the turnaround time of the benchmark.
NAS Parallel Benchmarks. 2.4

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob; Biegel, Bryan A. (Technical Monitor)

2002-01-01

We describe a new problem size, called Class D, for the NAS Parallel Benchmarks (NPB), whose MPI source code implementation is being released as NPB 2.4. A brief rationale is given for how the new class is derived. We also describe the modifications made to the MPI (Message Passing Interface) implementation to allow the new class to be run on systems with 32-bit integers, and with moderate amounts of memory. Finally, we give the verification values for the new problem size.
Statistical Analysis of NAS Parallel Benchmarks and LINPACK Results

NASA Technical Reports Server (NTRS)

Meuer, Hans-Werner; Simon, Horst D.; Strohmeier, Erich; Lasinski, T. A. (Technical Monitor)

1994-01-01

In the last three years extensive performance data have been reported for parallel machines both based on the NAS Parallel Benchmarks, and on LINPACK. In this study we have used the reported benchmark results and performed a number of statistical experiments using factor, cluster, and regression analyses. In addition to the performance results of LINPACK and the eight NAS parallel benchmarks, we have also included peak performance of the machine, and the LINPACK n and n(sub 1/2) values. Some of the results and observations can be summarized as follows: 1) All benchmarks are strongly correlated with peak performance. 2) LINPACK and EP have each a unique signature. 3) The remaining NPB can grouped into three groups as follows: (CG and IS), (LU and SP), and (MG, FT, and BT). Hence three (or four with EP) benchmarks are sufficient to characterize the overall NPB performance. Our poster presentation will follow a standard poster format, and will present the data of our statistical analysis in detail.
Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Saini, Subhash

1997-01-01

Compilers supporting High Performance Form (HPF) features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR), Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI) combinations will be compared, based on latest NAS Parallel Benchmark results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition, we would also present NPB, (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu CAPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz), NEC SX-4/32, SGI/CRAY T3E, and SGI Origin2000. We would also present sustained performance per dollar for Class B LU, SP and BT benchmarks.
Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Turney, Raymond D.

2001-01-01

This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.
NAS Parallel Benchmark Results 11-96. 1.0

NASA Technical Reports Server (NTRS)

Bailey, David H.; Bailey, David; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

The NAS Parallel Benchmarks have been developed at NASA Ames Research Center to study the performance of parallel supercomputers. The eight benchmark problems are specified in a "pencil and paper" fashion. In other words, the complete details of the problem to be solved are given in a technical document, and except for a few restrictions, benchmarkers are free to select the language constructs and implementation techniques best suited for a particular system. These results represent the best results that have been reported to us by the vendors for the specific 3 systems listed. In this report, we present new NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz), NEC SX-4/32, SGI/CRAY T3E, SGI Origin200, and SGI Origin2000. We also report High Performance Fortran (HPF) based NPB results for IBM SP2 Wide Nodes, HP/Convex Exemplar SPP2000, and SGI/CRAY T3D. These results have been submitted by Applied Parallel Research (APR) and Portland Group Inc. (PGI). We also present sustained performance per dollar for Class B LU, SP and BT benchmarks.
NAS Parallel Benchmark. Results 11-96: Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks. 1.0

NASA Technical Reports Server (NTRS)

Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.
Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Ye; Ma, Xiaosong; Liu, Qing Gary

2015-01-01

Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time-and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPRIME, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters tomore » create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPRIME benchmarks. They retain the original applications' performance characteristics, in particular the relative performance across platforms.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bailey, David H.

The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, althoughmore » the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage over vector supercomputers, and, if so, which of the parallel offerings would be most useful in real-world scientific computation. In part to draw attention to some of the performance reporting abuses prevalent at the time, the present author wrote a humorous essay 'Twelve Ways to Fool the Masses,' which described in a light-hearted way a number of the questionable ways in which both vendor marketing people and scientists were inflating and distorting their performance results. All of this underscored the need for an objective and scientifically defensible measure to compare performance on these systems.« less
Adding Fault Tolerance to NPB Benchmarks Using ULFM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parchman, Zachary W; Vallee, Geoffroy R; Naughton III, Thomas J

2016-01-01

In the world of high-performance computing, fault tolerance and application resilience are becoming some of the primary concerns because of increasing hardware failures and memory corruptions. While the research community has been investigating various options, from system-level solutions to application-level solutions, standards such as the Message Passing Interface (MPI) are also starting to include such capabilities. The current proposal for MPI fault tolerant is centered around the User-Level Failure Mitigation (ULFM) concept, which provides means for fault detection and recovery of the MPI layer. This approach does not address application-level recovery, which is currently left to application developers. In thismore » work, we present a mod- ification of some of the benchmarks of the NAS parallel benchmark (NPB) to include support of the ULFM capabilities as well as application-level strategies and mechanisms for application-level failure recovery. As such, we present: (i) an application-level library to checkpoint and restore data, (ii) extensions of NPB benchmarks for fault tolerance based on different strategies, (iii) a fault injection tool, and (iv) some preliminary results that show the impact of such fault tolerant strategies on the application execution.« less
Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, Haoqiang; VanderWijngaart, Rob F.

2003-01-01

We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of grids, but had not previously been captured in bench-marks. The new suite, named NPB Multi-Zone, is extended from the NAS Parallel Benchmarks suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the Message Passing Interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on three different parallel computers. We also use an empirical formula to investigate the performance characteristics of the multi-zone benchmarks.
RISC Processors and High Performance Computing

NASA Technical Reports Server (NTRS)

Bailey, David H.; Saini, Subhash; Craw, James M. (Technical Monitor)

1995-01-01

This tutorial will discuss the top five RISC microprocessors and the parallel systems in which they are used. It will provide a unique cross-machine comparison not available elsewhere. The effective performance of these processors will be compared by citing standard benchmarks in the context of real applications. The latest NAS Parallel Benchmarks, both absolute performance and performance per dollar, will be listed. The next generation of the NPB will be described. The tutorial will conclude with a discussion of future directions in the field. Technology Transfer Considerations: All of these computer systems are commercially available internationally. Information about these processors is available in the public domain, mostly from the vendors themselves. The NAS Parallel Benchmarks and their results have been previously approved numerous times for public release, beginning back in 1991.
Predicting Cost/Performance Trade-Offs for Whitney: A Commodity Computing Cluster

NASA Technical Reports Server (NTRS)

Becker, Jeffrey C.; Nitzberg, Bill; VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

1997-01-01

Recent advances in low-end processor and network technology have made it possible to build a "supercomputer" out of commodity components. We develop simple models of the NAS Parallel Benchmarks version 2 (NPB 2) to explore the cost/performance trade-offs involved in building a balanced parallel computer supporting a scientific workload. We develop closed form expressions detailing the number and size of messages sent by each benchmark. Coupling these with measured single processor performance, network latency, and network bandwidth, our models predict benchmark performance to within 30%. A comparison based on total system cost reveals that current commodity technology (200 MHz Pentium Pros with 100baseT Ethernet) is well balanced for the NPBs up to a total system cost of around $1,000,000.
Analysis of 100Mb/s Ethernet for the Whitney Commodity Computing Testbed

NASA Technical Reports Server (NTRS)

Fineberg, Samuel A.; Pedretti, Kevin T.; Kutler, Paul (Technical Monitor)

1997-01-01

We evaluate the performance of a Fast Ethernet network configured with a single large switch, a single hub, and a 4x4 2D torus topology in a testbed cluster of "commodity" Pentium Pro PCs. We also evaluated a mixed network composed of ethernet hubs and switches. An MPI collective communication benchmark, and the NAS Parallel Benchmarks version 2.2 (NPB2) show that the torus network performs best for all sizes that we were able to test (up to 16 nodes). For larger networks the ethernet switch outperforms the hub, though its performance is far less than peak. The hub/switch combination tests indicate that the NAS parallel benchmarks are relatively insensitive to hub densities of less than 7 nodes per hub.
The National Practice Benchmark for Oncology: 2015 Report for 2014 Data

PubMed Central

Balch, Carla; Ogle, John D.

2016-01-01

The National Practice Benchmark (NPB) is a unique tool used to measure oncology practices against others across the country in a meaningful way despite variations in practice demographics, size, and setting. In today’s challenging economic environment, each practice positions service offerings and competitive advantages to attract patients. Although the data in the NPB report are primarily reported by community oncology practices, the business structure and arrangements with regional health care systems are also reflected in the benchmark report. The ability to produce detailed metrics is an accomplishment of excellence in business and clinical management. With these metrics, a practice should be able to measure and analyze its current business practices and make appropriate changes, if necessary. In this report, we build on the foundation initially established by Oncology Metrics (acquired by Flatiron Health in 2014) over years of data collection and refine definitions to deliver the NPB, which is uniquely meaningful in the oncology market. PMID:27006357
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Saini, Subhash; Grassi, Charles

1994-01-01

The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
RISC Processors and High Performance Computing

NASA Technical Reports Server (NTRS)

Saini, Subhash; Bailey, David H.; Lasinski, T. A. (Technical Monitor)

1995-01-01

In this tutorial, we will discuss top five current RISC microprocessors: The IBM Power2, which is used in the IBM RS6000/590 workstation and in the IBM SP2 parallel supercomputer, the DEC Alpha, which is in the DEC Alpha workstation and in the Cray T3D; the MIPS R8000, which is used in the SGI Power Challenge; the HP PA-RISC 7100, which is used in the HP 700 series workstations and in the Convex Exemplar; and the Cray proprietary processor, which is used in the new Cray J916. The architecture of these microprocessors will first be presented. The effective performance of these processors will then be compared, both by citing standard benchmarks and also in the context of implementing a real applications. In the process, different programming models such as data parallel (CM Fortran and HPF) and message passing (PVM and MPI) will be introduced and compared. The latest NAS Parallel Benchmark (NPB) absolute performance and performance per dollar figures will be presented. The next generation of the NP13 will also be described. The tutorial will conclude with a discussion of general trends in the field of high performance computing, including likely future developments in hardware and software technology, and the relative roles of vector supercomputers tightly coupled parallel computers, and clusters of workstations. This tutorial will provide a unique cross-machine comparison not available elsewhere.
The National Practice Benchmark for oncology, 2014 report on 2013 data.

PubMed

Towle, Elaine L; Barr, Thomas R; Senese, James L

2014-11-01

The National Practice Benchmark (NPB) is a unique tool to measure oncology practices against others across the country in a way that allows meaningful comparisons despite differences in practice size or setting. In today's economic environment every oncology practice, regardless of business structure or affiliation, should be able to produce, monitor, and benchmark basic metrics to meet current business pressures for increased efficiency and efficacy of care. Although we recognize that the NPB survey results do not capture the experience of all oncology practices, practices that can and do participate demonstrate exceptional managerial capability, and this year those practices are recognized for their participation. In this report, we continue to emphasize the methodology introduced last year in which we reported medical revenue net of the cost of the drugs as net medical revenue for the hematology/oncology product line. The effect of this is to capture only the gross margin attributable to drugs as revenue. New this year, we introduce six measures of clinical data density and expand the radiation oncology benchmarks. Copyright © 2014 by American Society of Clinical Oncology.

Analysis of 2D Torus and Hub Topologies of 100Mb/s Ethernet for the Whitney Commodity Computing Testbed

NASA Technical Reports Server (NTRS)

Pedretti, Kevin T.; Fineberg, Samuel A.; Kutler, Paul (Technical Monitor)

1997-01-01

A variety of different network technologies and topologies are currently being evaluated as part of the Whitney Project. This paper reports on the implementation and performance of a Fast Ethernet network configured in a 4x4 2D torus topology in a testbed cluster of 'commodity' Pentium Pro PCs. Several benchmarks were used for performance evaluation: an MPI point to point message passing benchmark, an MPI collective communication benchmark, and the NAS Parallel Benchmarks version 2.2 (NPB2). Our results show that for point to point communication on an unloaded network, the hub and 1 hop routes on the torus have about the same bandwidth and latency. However, the bandwidth decreases and the latency increases on the torus for each additional route hop. Collective communication benchmarks show that the torus provides roughly four times more aggregate bandwidth and eight times faster MPI barrier synchronizations than a hub based network for 16 processor systems. Finally, the SOAPBOX benchmarks, which simulate real-world CFD applications, generally demonstrated substantially better performance on the torus than on the hub. In the few cases the hub was faster, the difference was negligible. In total, our experimental results lead to the conclusion that for Fast Ethernet networks, the torus topology has better performance and scales better than a hub based network.
Testing New Programming Paradigms with NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

2000-01-01

Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
Applications Performance Under MPL and MPI on NAS IBM SP2

NASA Technical Reports Server (NTRS)

Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)

1994-01-01

On July 5, 1994, an IBM Scalable POWER parallel System (IBM SP2) with 64 nodes, was installed at the Numerical Aerodynamic Simulation (NAS) Facility Each node of NAS IBM SP2 is a "wide node" consisting of a RISC 6000/590 workstation module with a clock of 66.5 MHz which can perform four floating point operations per clock with a peak performance of 266 Mflop/s. By the end of 1994, 64 nodes of IBM SP2 will be upgraded to 160 nodes with a peak performance of 42.5 Gflop/s. An overview of the IBM SP2 hardware is presented. The basic understanding of architectural details of RS 6000/590 will help application scientists the porting, optimizing, and tuning of codes from other machines such as the CRAY C90 and the Paragon to the NAS SP2. Optimization techniques such as quad-word loading, effective utilization of two floating point units, and data cache optimization of RS 6000/590 is illustrated, with examples giving performance gains at each optimization step. The conversion of codes using Intel's message passing library NX to codes using native Message Passing Library (MPL) and the Message Passing Interface (NMI) library available on the IBM SP2 is illustrated. In particular, we will present the performance of Fast Fourier Transform (FFT) kernel from NAS Parallel Benchmarks (NPB) under MPL and MPI. We have also optimized some of Fortran BLAS 2 and BLAS 3 routines, e.g., the optimized Fortran DAXPY runs at 175 Mflop/s and optimized Fortran DGEMM runs at 230 Mflop/s per node. The performance of the NPB (Class B) on the IBM SP2 is compared with the CRAY C90, Intel Paragon, TMC CM-5E, and the CRAY T3D.
Application configuration selection for energy-efficient execution on multicore systems

DOE PAGES

Wang, Shinan; Luo, Bing; Shi, Weisong; ...

2015-09-21

Balanced performance and energy consumption are incorporated in the design of modern computer systems. Several runtime factors, such as concurrency levels, thread mapping strategies, and dynamic voltage and frequency scaling (DVFS) should be considered in order to achieve optimal energy efficiency fora workload. Selecting appropriate run-time factors, however, is one of the most challenging tasks because the run-time factors are architecture-specific and workload-specific. And while most existing works concentrate on either static analysis of the workload or run-time prediction results, we present a hybrid two-step method that utilizes concurrency levels and DVFS settings to achieve the energy efficiency configuration formore » a worldoad. The experimental results based on a Xeon E5620 server with NPB and PARSEC benchmark suites show that the model is able to predict the energy efficient configuration accurately. On average, an additional 10% EDP (Energy Delay Product) saving is obtained by using run-time DVFS for the entire system. An off-line optimal solution is used to compare with the proposed scheme. Finally, the experimental results show that the average extra EDP saved by the optimal solution is within 5% on selective parallel benchmarks.« less
Oncology practice trends from the national practice benchmark.

PubMed

Barr, Thomas R; Towle, Elaine L

2012-09-01

In 2011, we made predictions on the basis of data from the National Practice Benchmark (NPB) reports from 2005 through 2010. With the new 2011 data in hand, we have revised last year's predictions and projected for the next 3 years. In addition, we make some new predictions that will be tracked in future benchmarking surveys. We also outline a conceptual framework for contemplating these data based on an ecological model of the oncology delivery system. The 2011 NPB data are consistent with last year's prediction of a decrease in the operating margins necessary to sustain a community oncology practice. With the new data in, we now predict these reductions to occur more slowly than previously forecast. We note an ease to the squeeze observed in last year's trend analysis, which will allow more time for practices to adapt their business models for survival and offer the best of these practices an opportunity to invest earnings into operations to prepare for the inevitable shift away from historic payment methodology for clinical service. This year, survey respondents reported changes in business structure, first measured in the 2010 data, indicating an increase in the percentage of respondents who believe that change is coming soon, but the majority still have confidence in the viability of their existing business structure. Although oncology practices are in for a bumpy ride, things are looking less dire this year for practices participating in our survey.
Oncology Practice Trends From the National Practice Benchmark

PubMed Central

Barr, Thomas R.; Towle, Elaine L.

2012-01-01

In 2011, we made predictions on the basis of data from the National Practice Benchmark (NPB) reports from 2005 through 2010. With the new 2011 data in hand, we have revised last year's predictions and projected for the next 3 years. In addition, we make some new predictions that will be tracked in future benchmarking surveys. We also outline a conceptual framework for contemplating these data based on an ecological model of the oncology delivery system. The 2011 NPB data are consistent with last year's prediction of a decrease in the operating margins necessary to sustain a community oncology practice. With the new data in, we now predict these reductions to occur more slowly than previously forecast. We note an ease to the squeeze observed in last year's trend analysis, which will allow more time for practices to adapt their business models for survival and offer the best of these practices an opportunity to invest earnings into operations to prepare for the inevitable shift away from historic payment methodology for clinical service. This year, survey respondents reported changes in business structure, first measured in the 2010 data, indicating an increase in the percentage of respondents who believe that change is coming soon, but the majority still have confidence in the viability of their existing business structure. Although oncology practices are in for a bumpy ride, things are looking less dire this year for practices participating in our survey. PMID:23277766
An evaluation of pulse oximeters in dogs, cats and horses.

PubMed

Matthews, Nora S; Hartke, Sherrie; Allen, John C

2003-01-01

Evaluation of five pulse oximeters in dogs, cats and horses with sensors placed at five sites and hemoglobin saturation at three plateaus. Prospective randomized multispecies experimental trial. Five healthy dogs, cats and horses. Animals were anesthetized and instrumented with ECG leads and arterial catheters. Five pulse oximeters (Nellcor Puritan Bennett-395, NPB-190, NPB-290, NPB-40 and Surgi-Vet V3304) with sensors at five sites were studied in a 5 x 5 Latin square design. Ten readings (SpO2) were taken at each of three hemoglobin saturation plateaus (98, 85 and 72%) in each animal. Arterial samples were drawn concurrently and hemoglobin saturation was measured with a co-oximeter. Accuracy of saturation measurements was calculated as the root mean squared difference (RMSD), a composite of bias and precision, for each model tested in each species. Accuracy varied widely. In dogs, the RMSD for the NPB-395, NPB-190, NPB-290, NPB-40 and V3304 were 2.7, 2.2, 2.4, 1.7 and 2.7% respectively. Failure to produce readings for the NPB-395, NPB-190, NPB-290, NPB-40 and V3304 were 0, 0, 0.7, 0, and 20%, respectively. The Pearson correlation coefficients for the tongue, toe, ear, lip and prepuce or vulva were 0.95, 0.97, 0.69, 0.87 and 0.95, respectively. In horses, the RMSD for the NPB-395, NPB-190, NPB-290, NPB-40 and V3304 were 3.1, 3.0, 4.7, 3.3 and 2.1%, respectively while rates of failure to produce readings were 10, 21, 0, 17 and 60%, respectively. The Pearson correlation coefficients for the tongue, nostril, ear, lip and prepuce or vulva were 0.98, 0.94, 0.88, 0.93 and 0.94, respectively. In cats, the RMSD for all data for the NPB-395, NPB-190, NPB-290, NPB-40 and V3304 were 5.9, 5.6, 7.9, 7.9 and 10.7%, respectively while failure rates were 0, 0.7, 0, 20 and 32%, respectively. The correlation coefficients for the tongue, rear paw, ear, lip and front paw were 0.54, 0.79,.0.64, 0.49 and 0.57, respectively. For saturations above 90% in cats, the RMSD for the NPB-395, NPB-190, NPB-290, NPB-40 and V3304 were 2.6, 4.4, 4.0, 3.5 and 4.8%, respectively, while failure rates were 0, 1.7, 0, 25 and 43%, respectively. Accuracy and failure rates (failure to produce a reading) varied widely from model to model and from species to species. Generally, among the models tested in the clinically relevant range (90-100%) RMSD ranged from 2-5% while failure rates were highest in the V3304.
Parallel 3D Mortar Element Method for Adaptive Nonconforming Meshes

NASA Technical Reports Server (NTRS)

Feng, Huiyu; Mavriplis, Catherine; VanderWijngaart, Rob; Biswas, Rupak

2004-01-01

High order methods are frequently used in computational simulation for their high accuracy. An efficient way to avoid unnecessary computation in smooth regions of the solution is to use adaptive meshes which employ fine grids only in areas where they are needed. Nonconforming spectral elements allow the grid to be flexibly adjusted to satisfy the computational accuracy requirements. The method is suitable for computational simulations of unsteady problems with very disparate length scales or unsteady moving features, such as heat transfer, fluid dynamics or flame combustion. In this work, we select the Mark Element Method (MEM) to handle the non-conforming interfaces between elements. A new technique is introduced to efficiently implement MEM in 3-D nonconforming meshes. By introducing an "intermediate mortar", the proposed method decomposes the projection between 3-D elements and mortars into two steps. In each step, projection matrices derived in 2-D are used. The two-step method avoids explicitly forming/deriving large projection matrices for 3-D meshes, and also helps to simplify the implementation. This new technique can be used for both h- and p-type adaptation. This method is applied to an unsteady 3-D moving heat source problem. With our new MEM implementation, mesh adaptation is able to efficiently refine the grid near the heat source and coarsen the grid once the heat source passes. The savings in computational work resulting from the dynamic mesh adaptation is demonstrated by the reduction of the the number of elements used and CPU time spent. MEM and mesh adaptation, respectively, bring irregularity and dynamics to the computer memory access pattern. Hence, they provide a good way to gauge the performance of computer systems when running scientific applications whose memory access patterns are irregular and unpredictable. We select a 3-D moving heat source problem as the Unstructured Adaptive (UA) grid benchmark, a new component of the NAS Parallel Benchmarks (NPB). In this paper, we present some interesting performance results of ow OpenMP parallel implementation on different architectures such as the SGI Origin2000, SGI Altix, and Cray MTA-2.
Initial Performance Results on IBM POWER6

NASA Technical Reports Server (NTRS)

Saini, Subbash; Talcott, Dale; Jespersen, Dennis; Djomehri, Jahed; Jin, Haoqiang; Mehrotra, Piysuh

2008-01-01

The POWER5+ processor has a faster memory bus than that of the previous generation POWER5 processor (533 MHz vs. 400 MHz), but the measured per-core memory bandwidth of the latter is better than that of the former (5.7 GB/s vs. 4.3 GB/s). The reason for this is that in the POWER5+, the two cores on the chip share the L2 cache, L3 cache and memory bus. The memory controller is also on the chip and is shared by the two cores. This serializes the path to memory. For consistently good performance on a wide range of applications, the performance of the processor, the memory subsystem, and the interconnects (both latency and bandwidth) should be balanced. Recognizing this, IBM has designed the Power6 processor so as to avoid the bottlenecks due to the L2 cache, memory controller and buffer chips of the POWER5+. Unlike the POWER5+, each core in the POWER6 has its own L2 cache (4 MB - double that of the Power5+), memory controller and buffer chips. Each core in the POWER6 runs at 4.7 GHz instead of 1.9 GHz in POWER5+. In this paper, we evaluate the performance of a dual-core Power6 based IBM p6-570 system, and we compare its performance with that of a dual-core Power5+ based IBM p575+ system. In this evaluation, we have used the High- Performance Computing Challenge (HPCC) benchmarks, NAS Parallel Benchmarks (NPB), and four real-world applications--three from computational fluid dynamics and one from climate modeling.
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)

1993-01-01

A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.

1991-01-01

A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Central neuropeptide B administration activates stress hormone secretion and stimulates feeding in male rats.

PubMed

Samson, W K; Baker, J R; Samson, C K; Samson, H W; Taylor, M M

2004-10-01

Neuropeptide B (NPB) was identified to be an endogenous, peptide ligand for the orphan receptors GPR7 and GPR8. Because GPR7 is expressed in rat brain and, in particular, in the hypothalamus, we hypothesized that NPB might interact with neuroendocrine systems that control hormone release from the anterior pituitary gland. No significant effects of NPB were observed on the in vitro releases of prolactin, adrenocorticotropic hormone (ACTH) or growth hormone (GH) when log molar concentrations ranging from 1 pM to 100 nM NPB were incubated with dispersed anterior pituitary cells harvested from male rats. In addition NPB (100 nM) did not alter the concentration response stimulation of prolactin secretion by thyrotropin-releasing hormone, ACTH secretion by corticotropin-releasing factor (CRF) and GH secretion by GH-releasing hormone. However, NPB, when injected into the lateral cerebroventricle (i.c.v.) of conscious, unrestrained male rats, elevated prolactin and corticosterone, and lowered GH levels in circulation. The threshold dose for the effect on corticosterone and prolactin levels was 1.0 nmol, while that for the effect on GH release was 3.0 nmol NPB. Pretreatment with a polyclonal anti-CRF antiserum completely blocked the ability of NPB to stimulate ACTH release and significantly inhibited the effect of NPB on plasma corticosterone levels. NPB administration i.c.v. did not significantly alter plasma vasopressin and oxytocin levels in conscious rats. It did stimulate feeding (minimum effective dose 1.0 nmol) in sated animals in a manner similar to that of the other endogenous ligand for GPR7, neuropeptide W. We conclude that NPB can act in the brain to modulate neuroendocrine signals accessing the anterior pituitary gland, but does not itself act as a releasing or inhibiting factor in the gland, at least with regard to prolactin, ACTH and GH secretion.
Neuropeptide B in Nile tilapia Oreochromis niloticus: molecular cloning and its effects on the regulation of food intake and mRNA expression of growth hormone and prolactin.

PubMed

Yang, Lu; Sun, Caiyun; Li, Wensheng

2014-05-01

Neuropeptide B (NPB) regulates food intake, energy homeostasis and hormone secretion in mammals via two G-protein coupled receptors, termed as GPR 7 and GPR 8. However, there is no study that reports the function of NPB in teleosts. In this study, the full-length cDNA of prepro-NPB with the size of 663bp was cloned from the hypothalamus of Nile tilapia. The CDS of the prepro-NPB is 387bp which encodes a precursor protein with the size of 128a.a. This precursor contains a mature peptide with the size of 29a.a, and it was named as NPB29. Tissue distribution study showed that this gene was mainly expressed in different parts of brain, especially in the diencephalon as well as hypothalamus, and the spinal cord in Nile tilapia. Fasting significantly stimulated the mRNA expression of NPB in the brain area without hypothalamus, and refeeding after fasting for 3 and 14days also showed similar effects on NPB expression. While, only short-term fasting (3days) and refeeding after fasting for 7 and 14days induced mRNA expression of NPB in the hypothalamus. Intraperitoneal (i.p.) injection of NPB remarkably elevated the mRNA expression of hypothalamic neuropeptide Y (NPY), cholecystokinin 1 (CCK1) and pituitary prolactin (PRL), whereas significantly inhibited growth hormone (GH) expression in pituitary. These observations in the present study suggested that NPB may participate in the regulation of feeding and gene expression of pituitary GH and PRL in Nile tilapia. Copyright © 2014 Elsevier Inc. All rights reserved.
Positioning growth of NPB crystalline nanowires on the PTCDA nanocrystal template.

PubMed

Wang, Hong; Lin, Haiping; Fan, Xing; Ostendorp, Stefan; Wang, Yandong; Huang, Lizhen; Jiang, Lin; Li, Youyong; Wilde, Gerhard; Fuchs, Harald; Wang, Wenchong; Chi, Lifeng

2018-05-31

Non-planar organic molecules often form amorphous films via vapor phase deposition on surfaces. In this study, we demonstrate for the first time that direct crystalline growth of non-planar NPB is possible when the orientation of initially deposited molecules on a PTCDA nanocrystal template is controlled to make it analogous to the structure of the molecular crystal. The crystalline NPB nanowires can be further positioned by controlling the site-selective growth of PTCDA nanocrystal templates at pre-determined locations. Short channel bottom contact OFET array with the NPB nanowires directly grown on electrodes were subsequently fabricated. The hole mobility of NPB nanowires is improved by 40-fold in comparison to that of the amorphous films.
Hole-transport limited S-shaped I-V curves in planar heterojunction organic photovoltaic cells

NASA Astrophysics Data System (ADS)

Zhang, Minlu; Wang, Hui; Tang, C. W.

2011-11-01

Current-voltage (I-V) characteristics of planar heterojunction organic photovoltaic cells based on N',N'-Di-[(1-naphthyl)-N',N'-diphenyl]-1,1'-biphenyl)-4,4'-diamine (NPB) and C60 are investigated. Through variation of the layer thickness and composition, specifically chemical doping NPB with MoOx, we show that the hole-transport limitation in the NPB layer is the determining factor in shaping the I-V characteristics of NPB/C60 cells.
Permanent polarization and charge distribution in organic light-emitting diodes (OLEDs): Insights from near-infrared charge-modulation spectroscopy of an operating OLED

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marchetti, Alfred P.; Haskins, Terri L.; Young, Ralph H.

2014-03-21

Vapor-deposited Alq{sub 3} layers typically possess a strong permanent electrical polarization, whereas NPB layers do not. (Alq{sub 3} is tris(8-quinolinolato)aluminum(III); NPB is 4,4′-bis[N-(1-naphthyl)-N-phenylamino]biphenyl.) The cause is a net orientation of the Alq{sub 3} molecules with their large dipole moments. Here we report on consequences for an organic light-emitting diode (OLED) with an NPB hole-transport layer and Alq{sub 3} electron-transport layer. The discontinuous polarization at the NPB|Alq{sub 3} interface has the same effect as a sheet of immobile negative charge there. It is more than compensated by a large concentration of injected holes (NPB{sup +}) when the OLED is running. Wemore » discuss the implications and consequences for the quantum efficiency and the drive voltage of this OLED and others. We also speculate on possible consequences of permanent polarization in organic photovoltaic devices. The concentration of NPB{sup +} was measured by charge-modulation spectroscopy (CMS) in the near infrared, where the NPB{sup +} has a strong absorption band, supplemented by differential-capacitance and current-voltage measurements. Unlike CMS in the visible, this method avoids complications from modulation of the electroluminescence and electroabsorption.« less
Distinctive Features of NREM Parasomnia Behaviors in Parkinson’s Disease and Multiple System Atrophy

PubMed Central

Ratti, Pietro-Luca; Sierra-Peña, Maria; Manni, Raffaele; Simonetta-Moreau, Marion; Bastin, Julien; Mace, Harrison; Rascol, Olivier; David, Olivier

2015-01-01

Objective To characterize parasomnia behaviors on arousal from NREM sleep in Parkinson’s Disease (PD) and Multiple System Atrophy (MSA). Methods From 30 patients with PD, Dementia with Lewy Bodies/Dementia associated with PD, or MSA undergoing nocturnal video-polysomnography for presumed dream enactment behavior, we were able to select 2 PD and 2 MSA patients featuring NREM Parasomnia Behviors (NPBs). We identified episodes during which the subjects seemed to enact dreams or presumed dream-like mentation (NPB arousals) versus episodes with physiological movements (no-NPB arousals). A time-frequency analysis (Morlet Wavelet Transform) of the scalp EEG signals around each NPB and no- NPB arousal onset was performed, and the amplitudes of the spectral frequencies were compared between NPB and no-NPB arousals. Results 19 NPBs were identified, 12 of which consisting of ‘elementary’ NPBs while 7 resembling confusional arousals. With quantitative EEG analysis, we found an amplitude reduction in the 5-6 Hz band 40 seconds before NPBs arousal as compared to no-NPB arousals at F4 and C4 derivations (p<0.01). Conclusions Many PD and MSA patients feature various NREM sleep-related behaviors, with clinical and electrophysiological differences and similarities with arousal parasomnias in the general population. Significance This study help bring to attention an overlooked phenomenon in neurodegenerative diseases. PMID:25756280
ISE structural dynamic experiments

NASA Technical Reports Server (NTRS)

Lock, Malcolm H.; Clark, S. Y.

1988-01-01

The topics are presented in viewgraph form and include the following: directed energy systems - vibration issue; Neutral Particle Beam Integrated Space Experiment (NPB-ISE) opportunity/study objective; vibration sources/study plan; NPB-ISE spacecraft configuration; baseline slew analysis and results; modal contributions; fundamental pitch mode; vibration reduction approaches; peak residual vibration; NPB-ISE spacecraft slew experiment; goodbye ISE - hello Zenith Star Program.
Strategies for Energy Efficient Resource Management of Hybrid Programming Models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Dong; Supinski, Bronis de; Schulz, Martin

2013-01-01

Many scientific applications are programmed using hybrid programming models that use both message-passing and shared-memory, due to the increasing prevalence of large-scale systems with multicore, multisocket nodes. Previous work has shown that energy efficiency can be improved using software-controlled execution schemes that consider both the programming model and the power-aware execution capabilities of the system. However, such approaches have focused on identifying optimal resource utilization for one programming model, either shared-memory or message-passing, in isolation. The potential solution space, thus the challenge, increases substantially when optimizing hybrid models since the possible resource configurations increase exponentially. Nonetheless, with the accelerating adoptionmore » of hybrid programming models, we increasingly need improved energy efficiency in hybrid parallel applications on large-scale systems. In this work, we present new software-controlled execution schemes that consider the effects of dynamic concurrency throttling (DCT) and dynamic voltage and frequency scaling (DVFS) in the context of hybrid programming models. Specifically, we present predictive models and novel algorithms based on statistical analysis that anticipate application power and time requirements under different concurrency and frequency configurations. We apply our models and methods to the NPB MZ benchmarks and selected applications from the ASC Sequoia codes. Overall, we achieve substantial energy savings (8.74% on average and up to 13.8%) with some performance gain (up to 7.5%) or negligible performance loss.« less
Object-Oriented Implementation of the NAS Parallel Benchmarks using Charm++

NASA Technical Reports Server (NTRS)

Krishnan, Sanjeev; Bhandarkar, Milind; Kale, Laxmikant V.

1996-01-01

This report describes experiences with implementing the NAS Computational Fluid Dynamics benchmarks using a parallel object-oriented language, Charm++. Our main objective in implementing the NAS CFD kernel benchmarks was to develop a code that could be used to easily experiment with different domain decomposition strategies and dynamic load balancing. We also wished to leverage the object-orientation provided by the Charm++ parallel object-oriented language, to develop reusable abstractions that would simplify the process of developing parallel applications. We first describe the Charm++ parallel programming model and the parallel object array abstraction, then go into detail about each of the Scalar Pentadiagonal (SP) and Lower/Upper Triangular (LU) benchmarks, along with performance results. Finally we conclude with an evaluation of the methodology used.

Study on charge carrier recombination zone with ultrathin rubrene layer as probe

NASA Astrophysics Data System (ADS)

Wen, Wen; Yu, Jungsheng; Li, Yi; Li, Lu; Jiang, Yadong

2009-05-01

The characteristic of charge carrier recombination zone in N,N'-bis-(1-naphthyl)-N,N'-biphenyl-1,1'-biphenyl-4,4'-diamine (NPB) based OLEDs is studied using an ultrathin 5,6,11,12-tetraphenylnaphthacene (rubrene) as a probe. By adjusting the rubrene thickness and location in NPB light-emitting layer, the luminescent spectra and electrical properties of the devices are investigated. The results show that when the thickness ranges from 0.2 to 0.8 nm, the surface morphology of rubrene exists as the discontinuous island-like state locating on the surface of NPB film and seldom affect the electrical characteristics. While the location of rubrene shifted from the interface of NPB/2,9-dimethyl-4,7-diphenyl-1,10-phenanthroline (BCP) to NPB side, the maximum exciton concentration is found within 2 nm away from the interface, which is the main charge carrier recombination zone. With an optimized structure of indium-tin-oxide (ITO)/NPB (40nm)/rubrene (0.3nm)/NPB (7nm)/BCP (30nm)/Mg:Ag, the device exhibits a turn on voltage as low as 3 V and stable white light. The peaks of EL spectra are located at 431 and 555 nm corresponding to the Commissions Internationale De L'Eclairage (CIE) coordinates of (0.32, 0.32), which are relatively stable under the bias voltage from 5 to 15 V. A maximum luminance of 5630 cd/m2 and a maximum power efficiency of 0.6 lm/W is achieved. The balanced spectra are attributed to the stable confining of charge carriers and exciton by the thin emitting layers.
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
Predictive cytogenetic biomarkers for colorectal neoplasia in medium risk patients.

PubMed

Ionescu, E M; Nicolaie, T; Ionescu, M A; Becheanu, G; Andrei, F; Diculescu, M; Ciocirlan, M

2015-01-01

DNA damage and chromosomal alterations in peripheral lymphocytes parallels DNA mutations in tumor tissues. The aim of our study was to predict the presence of neoplastic colorectal lesions by specific biomarkers in "medium risk" individuals (age 50 to 75, with no personal or family of any colorectal neoplasia). We designed a prospective cohort observational study including patients undergoing diagnostic or opportunistic screening colonoscopy. Specific biomarkers were analyzed for each patient in peripheral lymphocytes - presence of micronuclei (MN), nucleoplasmic bridges (NPB) and the Nuclear Division Index (NDI) by the cytokinesis-blocked micronucleus assay (CBMN). Of 98 patients included, 57 were "medium risk" individuals. MN frequency and NPB presence were not significantly different in patients with neoplastic lesions compared to controls. In "medium risk" individuals, mean NDI was significantly lower for patients with any neoplastic lesions (adenomas and adenocarcinomas, AUROC 0.668, p 00.5), for patients with advanced neoplasia (advanced adenoma and adenocarcinoma, AUROC 0.636 p 0.029) as well as for patients with adenocarcinoma (AUROC 0.650, p 0.048), for each comparison with the rest of the population. For a cut-off of 1.8, in "medium risk" individuals, an NDI inferior to that value may predict any neoplastic lesion with a sensitivity of 97.7%, an advanced neoplastic lesion with a sensitivity of 97% and adenocarcinoma with a sensitivity of 94.4%. NDI score may have a role as a colorectal cancer-screening test in "medium risk" individuals. DNA = deoxyribonucleic acid; CRC = colorectal cancer; EU = European Union; WHO = World Health Organization; FOBT = fecal occult blood test; CBMN = cytokinesis-blocked micronucleus assay; MN = micronuclei; NPB = nucleoplasmic bridges; NDI = Nuclear Division Index; FAP = familial adenomatous polyposis; HNPCC = hereditary non-polypoid colorectal cancer; IBD = inflammatory bowel diseases; ROC = receiver operating characteristics; AUROC = area under the receiver operating characteristics curve.
Separation of compounds with multiple -OH groups from dilute aqueous solutions via complexation with organoboronate

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chow, Tina Kuo Fung

1992-05-01

The complexing extractant agent investigated in this work is 3-nitrophenylboronic acid (NPBA) in its anionic form (NPB). NPBA and Aliquat 336 (quaternary amine) is dissolved in 2-ethyl-l-hexanol, and the extractant is contacted with aq. NaOH. Solutes investigated were 1,2-propanediol, glycerol, fructose, sorbitol and lactic acid. Batch extraction experiments were performed at 25°C. Partition coefficients, distribution ratios and loadings are reported for varying concentrations of solute and NPB. All solutes complexed with NPB -, with all complexes containing only one NPB - per complex. The 1:1 complexation constants for the solutes glycerol, fructose and sorbitol follow trends similar to complexation withmore » B(OH) 4 - (aq.), i.e. the complexation constants increase with increasing number of -OH groups available for complexation. Assumption of 1:1 complex is not valid for 1, 2-propanediol, which showed overloading (more than one mole of solute complexed to one mole NPB -) at higher concentrations. The -OH group on the NPB - which is left uncomplexed after one solute molecule had bound to the other two -OH groups may be responsible for the overloading. Overloading is also observed in extraction of tactic acid, but through a different mechanism. It was found that TOMA + can extract lactic acid to an extent comparable to the uptake of lactic acid by NPB -. The complexation is probably through formation of an acid-base ion pair. Losses of NPBA into the aqueous phase could lead to problems, poor economics in industrial separation processes. One way of overcoming this problem would be to incorporate the NPBA onto a solid support.« less
Separation of compounds with multiple -OH groups from dilute aqueous solutions via complexation with organoboronate. [1,2-propanediol

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chow, Tina Kuo Fung.

1992-05-01

The complexing extractant agent investigated in this work is 3-nitrophenylboronic acid (NPBA) in its anionic form (NPB). NPBA and Aliquat 336 (quaternary amine) is dissolved in 2-ethyl-l-hexanol, and the extractant is contacted with aq. NaOH. Solutes investigated were 1,2-propanediol, glycerol, fructose, sorbitol and lactic acid. Batch extraction experiments were performed at 25{degree}C. Partition coefficients, distribution ratios and loadings are reported for varying concentrations of solute and NPB. All solutes complexed with NPB{sup {minus}}, with all complexes containing only one NPB{sup {minus}} per complex. The 1:1 complexation constants for the solutes glycerol, fructose and sorbitol follow trends similar to complexation withmore » B(OH){sub 4}{sup {minus}} (aq.), i.e. the complexation constants increase with increasing number of {minus}OH groups available for complexation. Assumption of 1:1 complex is not valid for 1, 2-propanediol, which showed overloading (more than one mole of solute complexed to one mole NPB{sup {minus}}) at higher concentrations. The {minus}OH group on the NPB{sup {minus}} which is left uncomplexed after one solute molecule had bound to the other two {minus}OH groups may be responsible for the overloading. Overloading is also observed in extraction of tactic acid, but through a different mechanism. It was found that TOMA{sup +} can extract lactic acid to an extent comparable to the uptake of lactic acid by NPB{sup {minus}}. The complexation is probably through formation of an acid-base ion pair. Losses of NPBA into the aqueous phase could lead to problems, poor economics in industrial separation processes. One way of overcoming this problem would be to incorporate the NPBA onto a solid support.« less
BMP, Wnt and FGF signals are integrated through evolutionarily conserved enhancers to achieve robust expression of Pax3 and Zic genes at the zebrafish neural plate border

PubMed Central

Garnett, Aaron T.; Square, Tyler A.; Medeiros, Daniel M.

2012-01-01

Neural crest cells generate a range of cells and tissues in the vertebrate head and trunk, including peripheral neurons, pigment cells, and cartilage. Neural crest cells arise from the edges of the nascent central nervous system, a domain called the neural plate border (NPB). NPB induction is known to involve the BMP, Wnt and FGF signaling pathways. However, little is known about how these signals are integrated to achieve temporally and spatially specific expression of genes in NPB cells. Furthermore, the timing and relative importance of these signals in NPB formation appears to differ between vertebrate species. Here, we use heat-shock overexpression and chemical inhibitors to determine whether, and when, BMP, Wnt and FGF signaling are needed for expression of the NPB specifiers pax3a and zic3 in zebrafish. We then identify four evolutionarily conserved enhancers from the pax3a and zic3 loci and test their response to BMP, Wnt and FGF perturbations. We find that all three signaling pathways are required during gastrulation for the proper expression of pax3a and zic3 in the zebrafish NPB. We also find that, although the expression patterns driven by the pax3a and zic3 enhancers largely overlap, they respond to different combinations of BMP, Wnt and FGF signals. Finally, we show that the combination of the two pax3a enhancers is less susceptible to signaling perturbations than either enhancer alone. Taken together, our results reveal how BMPs, FGFs and Wnts act cooperatively and redundantly through partially redundant enhancers to achieve robust, specific gene expression in the zebrafish NPB. PMID:23034628
Unstructured Adaptive (UA) NAS Parallel Benchmark. Version 1.0

NASA Technical Reports Server (NTRS)

Feng, Huiyu; VanderWijngaart, Rob; Biswas, Rupak; Mavriplis, Catherine

2004-01-01

We present a complete specification of a new benchmark for measuring the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. It complements the existing NAS Parallel Benchmark suite. The benchmark involves the solution of a stylized heat transfer problem in a cubic domain, discretized on an adaptively refined, unstructured mesh.
Purification and crystal growth of NPB via imidazolium based ionic liquids

NASA Astrophysics Data System (ADS)

Oh, Yong-Taeg; Shin, Dong-Chan

2018-04-01

Here we report the production of high purity and crystallinity organic electronic material of NPB (N,N‧-Di-[(1-naphthyl)-N,N‧-diphenyl]-1,1‧-biphenyl-4,4‧-diamine (C44H32N2) through solution recrystallization within imidazolium based ionic liquids. When low purity NPB was recrystallized at 170 °C within C8MIM[TFSI], its purity was drastically improved from 82% to 99.92%. These recrystallized NPB crystals showed 0.040° FWHM (Full Width Half Maximum) of X-ray (1 1 1) diffraction peak. Such small FWHM angle indicates single-crystal like crystallinity. Initial NPB powder was dissolved at 100 °C and recrystallized at temperature above 110 °C. At higher temperature of 170 °C, a small number of bigger crystals were formed compared to those at 110 °C. This can be well explained by the classical nucleation and growth theory. Therefore, solution recrystallization process using ionic liquid might be promising for mass production of organic electronic materials by replacing the widely-used sublimation purification method.
New methodology for Ozone Depletion Potentials of short-lived compounds: n-Propyl bromide as an example

NASA Astrophysics Data System (ADS)

Wuebbles, Donald J.; Patten, Kenneth O.; Johnson, Matthew T.; Kotamarthi, Rao

2001-07-01

A number of the compounds proposed as replacements for substances controlled under the Montreal Protocol have extremely short atmospheric lifetimes, on the order of days to a few months. An important example is n-propyl bromide (also referred to as 1-bromopropane, CH2BrCH2CH3 or simplified as 1-C3H7Br or nPB). This compound, useful as a solvent, has an atmospheric lifetime of less than 20 days due to its reaction with hydroxyl. Because nPB contains bromine, any amount reaching the stratosphere has the potential to affect concentrations of stratospheric ozone. The definition of Ozone Depletion Potentials (ODP) needs to be modified for such short-lived compounds to account for the location and timing of emissions. It is not adequate to treat these chemicals as if they were uniformly emitted at all latitudes and longitudes as normally done for longer-lived gases. Thus, for short-lived compounds, policymakers will need a table of ODP values instead of the single value generally provided in past studies. This study uses the MOZART2 three-dimensional chemical-transport model in combination with studies with our less computationally expensive two-dimensional model to examine potential effects of nPB on stratospheric ozone. Multiple facets of this study examine key questions regarding the amount of bromine reaching the stratosphere following emission of nPB. Our most significant findings from this study for the purposes of short-lived replacement compound ozone effects are summarized as follows. The degradation of nPB produces a significant quantity of bromoacetone which increases the amount of bromine transported to the stratosphere due to nPB. However, much of that effect is not due to bromoacetone itself, but instead to inorganic bromine which is produced from tropospheric oxidation of nPB, bromoacetone, and other degradation products and is transported above the dry and wet deposition processes of the model. The MOZART2 nPB results indicate a minimal correction of the two-dimensional results in order to derive our final results: an nPB chemical lifetime of 19 days and an Ozone Depletion Potential range of 0.033 to 0.040 for assumed global emissions over landmasses, 19 days and 0.021 to 0.028, respectively, for assumed emissions in the industrialized regions of the Northern Hemisphere, and 9 days and 0.087 to 0.105, respectively, for assumed emission in tropical Southeast Asia.
Effect of SiO 2/Si 3N 4 dielectric distributed Bragg reflectors (DDBRs) for Alq 3/NPB thin-film resonant cavity organic light emitting diodes

NASA Astrophysics Data System (ADS)

Lei, Po-Hsun; Wang, Shun-Hsi; Juang, Fuh Shyang; Tseng, Yung-Hsin; Chung, Meng-Jung

2010-05-01

In this article, we report on the effect of SiO 2/Si 3N 4 dielectric distributed Bragg reflectors (DDBRs) for Alq 3/NPB thin-film resonant cavity organic light emitting diode (RCOLED) in increasing the light output intensity and reducing the linewidth of spontaneous emission spectrum. The optimum DDBR number is found as 3 pairs. The device performance will be bad by further increasing or decreasing the number of DDBR. As compared to the conventional Alq 3/NPB thin-film organic light emitting diode (OLED), the Alq 3/NPB thin-film RCOLED with 3-pair DDBRs has the superior electrical and optical characteristics including a forward voltage of 6 V, a current efficiency of 3.4 cd/A, a luminance of 2715 cd/m 2 under the injection current density of 1000 A/m 2, and a full width at half maximum (FWHM) of 12 nm for emission spectrum over the 5-9 V bias range. These results represent that the Alq 3/NPB thin-film OLED with DDBRs shows a potential as the light source for plastic optical fiber (POF) communication system.
Deep blue exciplex organic light-emitting diodes with enhanced efficiency; P-type or E-type triplet conversion to singlet excitons?

PubMed

Jankus, Vygintas; Chiang, Chien-Jung; Dias, Fernando; Monkman, Andrew P

2013-03-13

Simple trilayer, deep blue, fluorescent exciplex organic light-emitting diodes (OLEDs) are reported. These OLEDs emit from an exciplex state formed between the highest occupied molecular orbital (HOMO) of N,N'-bis(1-naphthyl)N,N'-diphenyl-1,1'-biphenyl-4,4'-diamine (NPB) and lowest unoccupied molecular orbital (LUMO) of 1,3,5-tri(1-phenyl-1H-benzo[d]imidazol-2-yl)phenyl (TPBi) and the NPB singlet manifold, yielding 2.7% external quantum efficiency at 450 nm. It is shown that the majority of the delayed emission in electroluminescence arises from P-type triplet fusion at NPB sites not E-type reverse intersystem crossing because of the presence of the NPB triplet state acting as a deep trap. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Preliminary study of propyl bromide exposure among New Jersey dry cleaners as a result of a pending ban on perchloroethylene.

PubMed

Blando, James D; Schill, Donald P; De La Cruz, Mary Pauline; Zhang, Lin; Zhang, Junfeng

2010-09-01

Many states are considering, and some states have actively pursued, banning the use of perchloroethylene (PERC) in dry cleaning establishments. Proposed legislation has led many dry cleaners to consider the use of products that contain greater than 90% n-propyl bromide (n-PB; also called 1-bromopropane or 1-BP). Very little information is known about toxicity and exposure to n-PB. Some n-PB-containing products are marketed as nonhazardous and "green" or "organic." This has resulted in some users perceiving the solvent as nontoxic and has resulted in at least one significant poisoning incident in New Jersey. In addition, many dry cleaning operators may not realize that the machine components and settings must be changed when converting from PERC to n-PB containing products. Not performing these modifications may result in overheating and significant leaks in the dry cleaning equipment. A preliminary investigation was conducted of the potential exposures to n-PB and isopropyl bromide (iso-PB; also called 2-bromopropane or 2-BP) among dry cleaners in New Jersey who have converted their machines from PERC to these new solvent products. Personal breathing zone and area samples were collected using the National Institute for Occupational Safety and Health Sampling and Analytical Method 1025, with a slight modification to gas chromatography conditions to facilitate better separation of n-PB from iso-PB. During the preliminary investigation, exposures to n-PB among some workers in two of three shops were measured that were greater than the American Conference of Governmental Industrial Hygienists (ACGIH) threshold limit value (TLV) for n-PB. The highest exposure measured among a dry cleaning machine operator was 54 parts per million (ppm) as an 8-hr time-weighted average, which is more than 5 times the ACGIH TLV of 10 ppm. The preliminary investigation also found that the work tasks most likely to result in the highest short-term exposures included the introduction of solvent to the machine, maintenance of the machine, unloading and handling of recently cleaned clothes, and interrupting the wash cycle of the machine. In addition, this assessment suggested that leaks may have contributed to exposure and may have resulted from normal machine wear over time, ineffective maintenance, and from the incompatibility of n-PB with gasket materials.
Challenges and Opportunities for Biophotonic Devices in the Liquid State and the Solid State

DTIC Science & Technology

2006-07-01

of the NPB:Eu device and a baseline device (without the NPB layer and emitting from the Alq3 layer) as a function of current density. The luminance...of the NPB:Eu device is clearly superior, with a maximum of 590 cd/m2 at 375 mA/cm2, whereas the Alq3 OLED peaks at only 45 cd/m2 at 30 mA/cm2...Luminance versus current density for Eu-doped BioLED and for baseline Alq3 device. 1-4244-0078-3/06/$20.00 (c) 2006 IEEE B. Electrofluidic
An Application-Based Performance Evaluation of NASAs Nebula Cloud Computing Platform

NASA Technical Reports Server (NTRS)

Saini, Subhash; Heistand, Steve; Jin, Haoqiang; Chang, Johnny; Hood, Robert T.; Mehrotra, Piyush; Biswas, Rupak

2012-01-01

The high performance computing (HPC) community has shown tremendous interest in exploring cloud computing as it promises high potential. In this paper, we examine the feasibility, performance, and scalability of production quality scientific and engineering applications of interest to NASA on NASA's cloud computing platform, called Nebula, hosted at Ames Research Center. This work represents the comprehensive evaluation of Nebula using NUTTCP, HPCC, NPB, I/O, and MPI function benchmarks as well as four applications representative of the NASA HPC workload. Specifically, we compare Nebula performance on some of these benchmarks and applications to that of NASA s Pleiades supercomputer, a traditional HPC system. We also investigate the impact of virtIO and jumbo frames on interconnect performance. Overall results indicate that on Nebula (i) virtIO and jumbo frames improve network bandwidth by a factor of 5x, (ii) there is a significant virtualization layer overhead of about 10% to 25%, (iii) write performance is lower by a factor of 25x, (iv) latency for short MPI messages is very high, and (v) overall performance is 15% to 48% lower than that on Pleiades for NASA HPC applications. We also comment on the usability of the cloud platform.
DNA as an Optical Material

DTIC Science & Technology

2011-07-01

the electron blocking function of the DNA layer; electroluminescence occurs in either the AlQ3 (green) or NPB layer (blue) layers. Source: J. A...been observed for sev- eral fluorescent materials with different HOMO/LUMO levels, including AlQ3 (green emission) and NPB (blue emission). OLEDs
Predictive cytogenetic biomarkers for colorectal neoplasia in medium risk patients

PubMed Central

Ionescu, EM; Nicolaie, T; Ionescu, MA; Becheanu, G; Andrei, F; Diculescu, M; Ciocirlan, M

2015-01-01

Rationale: DNA damage and chromosomal alterations in peripheral lymphocytes parallels DNA mutations in tumor tissues. Objective: The aim of our study was to predict the presence of neoplastic colorectal lesions by specific biomarkers in “medium risk” individuals (age 50 to 75, with no personal or family of any colorectal neoplasia). Methods and Results: We designed a prospective cohort observational study including patients undergoing diagnostic or opportunistic screening colonoscopy. Specific biomarkers were analyzed for each patient in peripheral lymphocytes - presence of micronuclei (MN), nucleoplasmic bridges (NPB) and the Nuclear Division Index (NDI) by the cytokinesis-blocked micronucleus assay (CBMN). Of 98 patients included, 57 were “medium risk” individuals. MN frequency and NPB presence were not significantly different in patients with neoplastic lesions compared to controls. In “medium risk” individuals, mean NDI was significantly lower for patients with any neoplastic lesions (adenomas and adenocarcinomas, AUROC 0.668, p 00.5), for patients with advanced neoplasia (advanced adenoma and adenocarcinoma, AUROC 0.636 p 0.029) as well as for patients with adenocarcinoma (AUROC 0.650, p 0.048), for each comparison with the rest of the population. For a cut-off of 1.8, in “medium risk” individuals, an NDI inferior to that value may predict any neoplastic lesion with a sensitivity of 97.7%, an advanced neoplastic lesion with a sensitivity of 97% and adenocarcinoma with a sensitivity of 94.4%. Discussion: NDI score may have a role as a colorectal cancer-screening test in “medium risk” individuals. Abbreviations: DNA = deoxyribonucleic acid; CRC = colorectal cancer; EU = European Union; WHO = World Health Organization; FOBT = fecal occult blood test; CBMN = cytokinesis-blocked micronucleus assay; MN = micronuclei; NPB = nucleoplasmic bridges; NDI = Nuclear Division Index; FAP = familial adenomatous polyposis; HNPCC = hereditary non-polypoid colorectal cancer; IBD = inflammatory bowel diseases; ROC = receiver operating characteristics; AUROC = area under the receiver operating characteristics curve. PMID:26351547
Intense deep blue exciplex electroluminescence from NPB/TPBi:PPh3O-based OLEDs and their intrinsic degradation mechanisms (Conference Presentation)

NASA Astrophysics Data System (ADS)

Shinar, Joseph; Hippola, Chamika; Danilovic, Dusan; Bhattacharjee, Ujjal; Petrich, Jacob W.; Shinar, Ruth

2016-09-01

We describe intense and efficient deep blue (430 - 440 nm) exciplex emission from NPB/TPBi:PPh3O OLEDs where the luminous efficiency approaches 4 Cd/A and the maximal brightness exceeds 22,000 Cd/m2. Time resolved PL measurements confirm the exciplex emission from NPB:TPBi, as studied earlier by Monkman and coworkers [Adv. Mater. 25, 1455 (2013)]. However, the inclusion of PPh3O improves the OLED performance significantly. The effect of PPh3O on the EL and PL will be discussed. The NPB/TPBi:PPh3O-based OLEDs were also studied by optically and electrically detected magnetic resonance (ODMR and EDMR, respectively). In particular, the amplitude of the negative (EL- and current-quenching) spin 1/2 resonance, previously attributed to enhanced formation of strongly EL-quenching positive bipolarons, increases as the OLEDs degrade in a dry nitrogen atmosphere. This degradation mechanism is discussed in relation to degradation induced by hot polarons that are energized by exciton annihilation.
Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications

NASA Technical Reports Server (NTRS)

Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele

2004-01-01

In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.
The OpenMP Implementation of NAS Parallel Benchmarks and its Performance

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry

1999-01-01

As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.
Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry

1998-01-01

This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.

MPI, HPF or OpenMP: A Study with the NAS Benchmarks

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Hribar, Michelle; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

1999-01-01

Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but the task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study,potentials of applying some of the techniques to realistic aerospace applications will be presented
MPI, HPF or OpenMP: A Study with the NAS Benchmarks

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Hribar, M.; Waheed, A.; Yan, J.; Saini, Subhash (Technical Monitor)

1999-01-01

Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but this task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study, we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study, potentials of applying some of the techniques to realistic aerospace applications will be presented.
Implementation of the NAS Parallel Benchmarks in Java

NASA Technical Reports Server (NTRS)

Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan (Technical Monitor)

2002-01-01

Several features make Java an attractive choice for High Performance Computing (HPC). In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for CFD applications.
Performance and Scalability of the NAS Parallel Benchmarks in Java

NASA Technical Reports Server (NTRS)

Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan A. (Technical Monitor)

2002-01-01

Several features make Java an attractive choice for scientific applications. In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for scientific applications.
A Comparison of Automatic Parallelization Tools/Compilers on the SGI Origin 2000 Using the NAS Benchmarks

NASA Technical Reports Server (NTRS)

Saini, Subhash; Frumkin, Michael; Hribar, Michelle; Jin, Hao-Qiang; Waheed, Abdul; Yan, Jerry

1998-01-01

Porting applications to new high performance parallel and distributed computing platforms is a challenging task. Since writing parallel code by hand is extremely time consuming and costly, porting codes would ideally be automated by using some parallelization tools and compilers. In this paper, we compare the performance of the hand written NAB Parallel Benchmarks against three parallel versions generated with the help of tools and compilers: 1) CAPTools: an interactive computer aided parallelization too] that generates message passing code, 2) the Portland Group's HPF compiler and 3) using compiler directives with the native FORTAN77 compiler on the SGI Origin2000.
A flexible top-emitting organic light-emitting diode on steel foil

NASA Astrophysics Data System (ADS)

Xie, Zhiyuan; Hung, Liang-Sun; Zhu, Furong

2003-11-01

An efficient flexible top-emitting organic light-emitting diode (FTOLED) was developed on a thin steel foil. The FTOLED was constructed on the spin-on-glass (SOG)-coated steel substrate with an organic stack of NPB/Alq 3 sandwiched by a highly reflective Ag anode and a semitransparent Sm cathode. An ultrathin plasma-polymerized hydrocarbon film (CF X) was interposed between the Ag anode and the NPB layer to enhance hole-injection, and an additional Alq 3 layer was overlaid on the Sm cathode to increase light output. The FTOLED showed a peak efficiency of 4.4 cd/A higher than 3.7 cd/A of a convention NPB/Alq 3-based bottom-emitting OLED.
Implementation of NAS Parallel Benchmarks in Java

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Schultz, Matthew; Jin, Hao-Qiang; Yan, Jerry

2000-01-01

A number of features make Java an attractive but a debatable choice for High Performance Computing (HPC). In order to gauge the applicability of Java to the Computational Fluid Dynamics (CFD) we have implemented NAS Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would move Java closer to Fortran in the competition for CFD applications.
Increased micronucleus, nucleoplasmic bridge, and nuclear bud frequencies in the peripheral blood lymphocytes of diesel engine exhaust-exposed workers.

PubMed

Zhang, Xiao; Duan, Huawei; Gao, Feng; Li, Yuanyuan; Huang, Chuanfeng; Niu, Yong; Gao, Weimin; Yu, Shanfa; Zheng, Yuxin

2015-02-01

The International Agency for Research on Cancer has recently reclassified diesel engine exhaust (DEE) as a Group 1 carcinogen. Micronucleus (MN), nucleoplasmic bridge (NPB), and nuclear bud (NBUD) frequencies in peripheral blood lymphocytes (PBLs) are associated with cancer risk. However, the impact of DEE exposure on MN frequency has not been thoroughly elucidated due to mixed exposure and its impact on NPB and NBUD frequencies has never been explored in humans. We recruited 117 diesel engine testing workers with exclusive exposure to DEE and 112 non-DEE-exposed workers, and then we measured urinary levels of 4 mono-hydroxylated polycyclic aromatic hydrocarbons (OH-PAHs) using high-performance liquid chromatography-mass spectrometry as well as MN, NPB, and NBUD frequencies in PBLs using cytokinesis-block MN assay. The DEE-exposed workers exhibited significantly higher MN, NPB, and NBUD frequencies than the non-DEE-exposed workers (P < 0.05). Among all study subjects, increasing levels of all 4 urinary OH-PAHs, on both quartile and continuous scales, were associated with increased MN, NPB, and NBUD frequencies (all P < 0.05). When the associations were analyzed separately in DEE-exposed and non-DEE-exposed workers, we found that the association between increasing quartiles of urinary 9-hydroxyphenanthrene (9-OHPh) and MN frequencies persisted in DEE-exposed workers (P = 0.001). The percent of MN frequencies increased, on average, by 23.99% (95% confidential interval, 9.64-39.93) per 1-unit increase in ln-transformed 9-OHPh. Our results clearly show that exposure to DEE can induce increases in MN, NPB, and NBUD frequencies in PBLs and suggest that DEE exposure level is associated with MN frequencies. © The Author 2014. Published by Oxford University Press on behalf of the Society of Toxicology. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Benchmarking Ada tasking on tightly coupled multiprocessor architectures

NASA Technical Reports Server (NTRS)

Collard, Philippe; Goforth, Andre; Marquardt, Matthew

1989-01-01

The development of benchmarks and performance measures for parallel Ada tasking is reported with emphasis on the macroscopic behavior of the benchmark across a set of load parameters. The application chosen for the study was the NASREM model for telerobot control, relevant to many NASA missions. The results of the study demonstrate the potential of parallel Ada in accomplishing the task of developing a control system for a system such as the Flight Telerobotic Servicer using the NASREM framework.
Effect of conduction band non-parabolicity on the optical gain of quantum cascade lasers based on the effective two-band finite difference method

NASA Astrophysics Data System (ADS)

Cho, Gookbin; Kim, Jungho

2017-09-01

We theoretically investigate the effect of conduction band non-parabolicity (NPB) on the optical gain spectrum of quantum cascade lasers (QCLs) using the effective two-band finite difference method. Based on the effective two-band model to consider the NPB effect in the multiple quantum wells (QWs), the wave functions and confined energies of electron states are calculated in two different active-region structures, which correspond to three-QW single-phonon and four-QW double-phonon resonance designs. In addition, intersubband optical dipole moments and polar-optical-phonon scattering times are calculated and compared without and with the conduction band NPB effect. Finally, the calculation results of optical gain spectra are compared in the two QCL structures having the same peak gain wavelength of 8.55 μm. The gain peaks are greatly shifted to longer wavelengths and the overall gain magnitudes are slightly reduced when the NPB effect is considered. Compared with the three-QW active-region design, the redshift of the peak gain is more prominent in the four-QW active-region design, which makes use of higher electronic states for the lasing transition.
Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

DOE PAGES

Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; ...

2015-12-21

This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore » specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000 ® problems. These benchmark and scaling studies show promising results.« less
Final safety analysis report for the Ground Test Accelerator (GTA), Phase 2

DOE Office of Scientific and Technical Information (OSTI.GOV)

NONE

1994-10-01

This document is the third volume of a 3 volume safety analysis report on the Ground Test Accelerator (GTA). The GTA program at the Los Alamos National Laboratory (LANL) is the major element of the national Neutral Particle Beam (NPB) program, which is supported by the Strategic Defense Initiative Office (SDIO). A principal goal of the national NPB program is to assess the feasibility of using hydrogen and deuterium neutral particle beams outside the Earth`s atmosphere. The main effort of the NPB program at Los Alamos concentrates on developing the GTA. The GTA is classified as a low-hazard facility, exceptmore » for the cryogenic-cooling system, which is classified as a moderate-hazard facility. This volume consists of appendices C through U of the report« less
Final safety analysis report for the Ground Test Accelerator (GTA), Phase 2

DOE Office of Scientific and Technical Information (OSTI.GOV)

NONE

1994-10-01

This document is the first volume of a 3 volume safety analysis report on the Ground Test Accelerator (GTA). The GTA program at the Los Alamos National Laboratory (LANL) is the major element of the national Neutral Particle Beam (NPB) program, which is supported by the Strategic Defense Initiative Office (SDIO). A principal goal of the national NPB program is to assess the feasibility of using hydrogen and deuterium neutral particle beams outside the Earth`s atmosphere. The main effort of the NPB program at Los Alamos concentrates on developing the GTA. The GTA is classified as a low-hazard facility, exceptmore » for the cryogenic-cooling system, which is classified as a moderate-hazard facility. This volume consists of an introduction, summary/conclusion, site description and assessment, description of facility, and description of operation.« less
The characterization of electroplex generated from the interface between 2-(4-trifluoromethyl-2-hydroxyphenyl)benzothiazole] zinc and N,N'-diphenyl-N,N'- bis(1-naphthyl)-(1,1'-biphenyl)-4,4'-diamine

NASA Astrophysics Data System (ADS)

Zhang, Ye; Hao, Yuying; Meng, Weixin; Xu, Huixia; Wang, Hua; Xu, Bingshe

2012-03-01

The electroplex between (2-(4-trifluoromethyl-2-hydroxyphenyl)benzothiazole) zinc [Zn(4-TfmBTZ)2] as an electron-acceptor and N,N'-diphenyl-N,N'-bis(1-naphthyl)-(1,1'-biphenyl)-4,4'-diamine (NPB) as an electron-donor was characterized by bilayer, blend, and multilayer quantum-well (MQW) device, respectively. The blend composition and quantum-well number are effective parameters for tuning electroluminescence color. White light with high color purity and color rendering index (CRI) was observed from these devices based on Zn(4-TfmBTZ)2/NPB. Moreover, the blend and MQW devices all exhibit high operation stability, hence excellent color stability. For the device with 5 mol% NPB in blend layer, its Commission International Del'Eclairage (CIE) coordinate region is x=0.28-0.31, y=0.33-0.35 and CRI is 83.3-91.2 at 5-9 V. For MQW structure device with NPB of 60 nm thickness, its CIE coordinate region is x=0.29-0.32, y=0.31-0.34 and CRI=87.9-92.5 at 10-15 V. Such high color stability and purity and CRI, being close to ideal white light, are of current important for white OLED.
Characterization of the Hole Transport and Electrical Properties in the Small-Molecule Organic Semiconductors

NASA Astrophysics Data System (ADS)

Wang, L. G.; Zhu, J. J.; Liu, X. L.; Cheng, L. F.

2017-10-01

In this paper, we investigate the hole transport and electrical properties in a small-molecule organic material N, N'-bis(1-naphthyl)- N, N'-diphenyl-1,1'-biphenyl-4,4'-diamine (NPB), which is frequently used in organic light-emitting diodes. It is shown that the thickness-dependent current density versus voltage ( J- V) characteristics of sandwich-type NPB-based hole-only devices cannot be described well using the conventional mobility model without carrier density or electric field dependence. However, a consistent and excellent description of the thickness-dependent and temperature-dependent J- V characteristics of NPB hole-only devices can be obtained with a single set of parameters by using our recently introduced improved model that take into account the temperature, carrier density, and electric field dependence of the mobility. For the small-molecule organic semiconductor studied, we find that the width of the Gaussian distribution of density of states σ and the lattice constant a are similar to the values reported for conjugated polymers. Furthermore, we show that the boundary carrier density has an important effect on the J- V characteristics. Both the maximum of carrier density and the minimum of electric field appear near the interface of NPB hole-only devices.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.
Effect of organic small-molecule hole injection materials on the performance of inverted organic solar cells

NASA Astrophysics Data System (ADS)

Li, Jie; Zheng, Yifan; Zheng, Ding; Yu, Junsheng

2016-07-01

In this study, the influence of small-molecule organic hole injection materials on the performance of organic solar cells (OSCs) as the hole transport layer (HTL) with an architecture of ITO/ZnO/P3HT:PC71BM/HTL/Ag has been investigated. A significant enhancement on the performance of OSCs from 1.06% to 2.63% is obtained by using N, N‧-bis(1-naphthalenyl)-N, N‧-bis-phenyl-(1, 1‧-biphenyl)-4, 4‧-diamine (NPB) HTL. Through the resistance simulation and space-charge limited current analysis, we found that NPB HTL cannot merely improve the hole mobility of the device but also form the Ohmic contact between the active layer and anode. Besides, when we apply mix HTL by depositing the NPB on the surface of molybdenum oxide, the power conversion efficiency of OSC are able to be further improved to 2.96%.
Synthesis of Co3O4 nanoparticles with block and sphere morphology, and investigation into the influence of morphology on biological toxicity

PubMed Central

RAMAN, VENKATARAMANAN; SURESH, SHRUTHI; SAVARIMUTHU, PHILIP ANTHONY; RAMAN, THIAGARAJAN; TSATSAKIS, ARISTIDES MICHAEL; GOLOKHVAST, KIRIL SERGEEVICH; VADIVEL, VINOD KUMAR

2016-01-01

In the present study, cobalt oxide (Co3O4) magnetic nanoparticles with block and sphere morphologies were synthesized using various surfactants, and the toxicity of the particles was analyzed by monitoring biomarkers of nanoparticle toxicity in zebrafish. The use of tartarate as a surfactant produced highly crystalline blocks of Co3O4 nanoparticles with pores on the sides, whereas citrate lead to the formation of nanoparticles with a spherical morphology. Co3O4 structure, crystallinity, size and morphology were studied using X-ray diffractogram and field emission scanning electron microscopy. Following an increase in nanoparticle concentration from 1 to 200 ppm, there was a corresponding increase in nitric oxide (NO) generation, induced by both types of nanoparticles [Co3O4-NP-B (block), r=0.953; Co3O4-NP-S (sphere), r=1.140]. Comparative analyses indicated that both types of nanoparticle produced significant stimulation at ≥5 ppm (P<0.05) compared with a control. Upon analyzing the effect of nanoparticle morphology on NO generation, it was observed that Co3O4-NP-S was more effective compared with Co3O4-NP-B (5 and 100 ppm, P<0.05; 200 ppm, P<0.01). Exposure to both types of nanoparticles produced reduction in liver glutathione (GSH) activity with corresponding increase in dose (Co3O4-NP-B, r=−0.359; Co3O4-NP-S, r=−0.429). However, subsequent analyses indicated that Co3O4-NP-B was more potent in inhibiting liver GSH activity compared with Co3O4-NP-S. Co3O4-NP-B proved to be toxic at 5 ppm (P<0.05) and GSH activity was almost completely inhibited at 200 ppm. A similar toxicity was observed with both types of Co3O4-NPs against brain levels of acetylcholinesterase (AChE; Co3O4-NP-B, r=−0.180; Co3O4-NP-S, r=−0.230), indicating the ability of synthesized Co3O4-NPs to cross the blood-brain barrier and produce neuronal toxicity. Co3O4-NP-B showed increased inhibition of brain AChE activity compared with Co3O4-NP-S (1,5, and 10 ppm, P<0.05; 50, 100 and 200 ppm, P<0.01). These results suggested that the morphology of nanoparticle and surface area contribute to toxicity, which may have implications for their biological application. PMID:26893646
Cloud-Coffee: implementation of a parallel consistency-based multiple alignment algorithm in the T-Coffee package and its benchmarking on the Amazon Elastic-Cloud.

PubMed

Di Tommaso, Paolo; Orobitg, Miquel; Guirado, Fernando; Cores, Fernado; Espinosa, Toni; Notredame, Cedric

2010-08-01

We present the first parallel implementation of the T-Coffee consistency-based multiple aligner. We benchmark it on the Amazon Elastic Cloud (EC2) and show that the parallelization procedure is reasonably effective. We also conclude that for a web server with moderate usage (10K hits/month) the cloud provides a cost-effective alternative to in-house deployment. T-Coffee is a freeware open source package available from http://www.tcoffee.org/homepage.html
Polyaniline as a new type of hole-transporting material to significantly increase the solar water splitting performance of BiVO4 photoanodes

NASA Astrophysics Data System (ADS)

Wang, Xiaojun; Ye, Kai-Hang; Yu, Xiang; Zhu, Jiaqian; Zhu, Yi; Zhang, Yuanming

2018-07-01

Polyaniline (PANI), with its low cost, chemical stability and high conductivity, is used as a hole transporting layer to fabricate NiOOH/PANI/BiVO4 (NPB) photoanode, of which the photoelectrochemical (PEC) water splitting performance is significantly enhanced. The remarkable water oxidation photocurrent of NPB photoanode achieves 3.31 mA cm-2 at 1.23 V vs. RHE under AM 1.5G solar light irradiation, which is greatly increased compared with that of pristine BiVO4 (0.89 mA cm-2 under the same condition). The maximal incident photon-to-current conversion efficiency achieves 83.3% at 430 nm at 1.23 V vs. RHE and the maximal applied bias photo-to-current efficiency reaches 1.20% at 0.68 V vs. RHE, which are nearly five and ten times higher than that of pristine BiVO4 photoanode, respectively. This NPB photoanode exhibits excellent stability with about 97.22% Faraday efficiency after PEC water splitting for 3 h. The exciting results demonstrate that PANI shows great potential as a hole-transporting layer for photoanode and NPB is an efficient and stable photoanode material with a great potential application in PEC water splitting. Overall, this work provides an excellent reference on designing and fabricating photoanode materials for the future.

White organic light-emitting diodes based on doped and ultrathin Rubrene layer

NASA Astrophysics Data System (ADS)

Li, Yi; Jiang, Yadong; Wen, Wen; Yu, Junsheng

2010-10-01

Based on a yellow fluorescent dye of 5, 6, 11, 12-tetraphenylnaphthacene (Rubrene), WOLEDs were fabricated, with doping structure and ultrathin layer structure utilized in the devices. By doping Rubrene into blue-emitting N,N'-bis-(1- naphthyl)-N,N'-biphenyl-1,1'-biphenyl-4,4'-diamine (NPB), the device with a structure of indium-tin-oxide (ITO)/NPB (40 nm)/NPB:Rubrene (0.25 wt%, 7 nm)/2,9-dimethyl-4,7-diphenyl-1,10-phenanthroline (BCP) (30 nm)/Mg:Ag exhibited a warm white light with Commissions Internationale De L'Eclairage (CIE) coordinates of (0.38, 0.41) at 12 V. The electroluminescent spectrum of the OLED consisted of blue and yellow fluorescent emissions, the intensity of blue emission increased gradually relative to the orange emission with increasing voltage. This is mainly due to the recombination zone shifted towards the anode side as the transmission rate of electrons grows faster than that of holes under higher bias voltage. A maximum luminance of 7300 cd/m2 and a maximum power efficiency of 0.57 lm/W were achieved. Comparatively, by utilizing ultrathin dopant layer, the device with a structure of ITO/NPB (40 nm)/Rubrene (0.3 nm)/NPB (7 nm)/BCP (30 nm)/Mg:Ag achieved a low turn-on voltage of 3 V and a more stable white light. The peaks of EL spectra located at 430 and 560 nm corresponding to the CIE coordinates of (0.32, 0.32) under bias voltage ranging from 5 to 15 V. A maximum luminance of 5630 cd/m2 and a maximum power efficiency of 0.6 lm/W were achieved. The balanced spectra were attributed to the stable confining of charge carriers and exciton by the thin emitting layers. Hence, with simple device structure and fabricating process, the device with ultrathin layer achieved low turn-on voltage, stable white light emitting and higher power efficiency.
Automatic Data Distribution for CFD Applications on Structured Grids

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Yan, Jerry

2000-01-01

Data distribution is an important step in implementation of any parallel algorithm. The data distribution determines data traffic, utilization of the interconnection network and affects the overall code efficiency. In recent years a number data distribution methods have been developed and used in real programs for improving data traffic. We use some of the methods for translating data dependence and affinity relations into data distribution directives. We describe an automatic data alignment and placement tool (ADAFT) which implements these methods and show it results for some CFD codes (NPB and ARC3D). Algorithms for program analysis and derivation of data distribution implemented in ADAFT are efficient three pass algorithms. Most algorithms have linear complexity with the exception of some graph algorithms having complexity O(n(sup 4)) in the worst case.
Automatic Data Distribution for CFD Applications on Structured Grids

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Yan, Jerry

1999-01-01

Data distribution is an important step in implementation of any parallel algorithm. The data distribution determines data traffic, utilization of the interconnection network and affects the overall code efficiency. In recent years a number data distribution methods have been developed and used in real programs for improving data traffic. We use some of the methods for translating data dependence and affinity relations into data distribution directives. We describe an automatic data alignment and placement tool (ADAPT) which implements these methods and show it results for some CFD codes (NPB and ARC3D). Algorithms for program analysis and derivation of data distribution implemented in ADAPT are efficient three pass algorithms. Most algorithms have linear complexity with the exception of some graph algorithms having complexity O(n(sup 4)) in the worst case.
An efficient parallel algorithm for matrix-vector multiplication

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendrickson, B.; Leland, R.; Plimpton, S.

The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
Automated and Assistive Tools for Accelerated Code migration of Scientific Computing on to Heterogeneous MultiCore Systems

DTIC Science & Technology

2017-04-13

modelling code, a parallel benchmark , and a communication avoiding version of the QR algorithm. Further, several improvements to the OmpSs model were...movement; and a port of the dynamic load balancing library to OmpSs. Finally, several updates to the tools infrastructure were accomplished, including: an...OmpSs: a basic algorithm on image processing applications, a mini application representative of an ocean modelling code, a parallel benchmark , and a
Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

NASA Astrophysics Data System (ADS)

Moon, Hongsik

What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
A Molecular Beam Deposition of DNA Nanometer Films

DTIC Science & Technology

2007-01-01

device structure consists of ITO/PEDOT:PSS (50 nm)/NPB (30 nm)/ Alq3 (40 nm)/BCP (20 nm)/ Alq3 (10 nm)/Li:Al, while the Bi- OLED has an additional DNA...layer; DNA- CTMA is an electron blocking layer (EBL); NPB is used as hole transport layer; Alq3 is used for both the electron transport layer and the...N,N’-bis(naphthalen-1-yl)-N,N’- bis(phenyl)benzidine)], Alq3 [tris-(8-hydroxyquinoline) aluminum] and BCP [2,9- Dimethyl-4,7-diphenyl-1,10
A new deadlock resolution protocol and message matching algorithm for the extreme-scale simulator

DOE PAGES

Engelmann, Christian; Naughton, III, Thomas J.

2016-03-22

Investigating the performance of parallel applications at scale on future high-performance computing (HPC) architectures and the performance impact of different HPC architecture choices is an important component of HPC hardware/software co-design. The Extreme-scale Simulator (xSim) is a simulation toolkit for investigating the performance of parallel applications at scale. xSim scales to millions of simulated Message Passing Interface (MPI) processes. The overhead introduced by a simulation tool is an important performance and productivity aspect. This paper documents two improvements to xSim: (1)~a new deadlock resolution protocol to reduce the parallel discrete event simulation overhead and (2)~a new simulated MPI message matchingmore » algorithm to reduce the oversubscription management overhead. The results clearly show a significant performance improvement. The simulation overhead for running the NAS Parallel Benchmark suite was reduced from 102% to 0% for the embarrassingly parallel (EP) benchmark and from 1,020% to 238% for the conjugate gradient (CG) benchmark. xSim offers a highly accurate simulation mode for better tracking of injected MPI process failures. Furthermore, with highly accurate simulation, the overhead was reduced from 3,332% to 204% for EP and from 37,511% to 13,808% for CG.« less
Final safety analysis report for the Ground Test Accelerator (GTA), Phase 2

DOE Office of Scientific and Technical Information (OSTI.GOV)

NONE

1994-10-01

This document is the second volume of a 3 volume safety analysis report on the Ground Test Accelerator (GTA). The GTA program at the Los Alamos National Laboratory (LANL) is the major element of the national Neutral Particle Beam (NPB) program, which is supported by the Strategic Defense Initiative Office (SDIO). A principal goal of the national NPB program is to assess the feasibility of using hydrogen and deuterium neutral particle beams outside the Earth`s atmosphere. The main effort of the NPB program at Los Alamos concentrates on developing the GTA. The GTA is classified as a low-hazard facility, exceptmore » for the cryogenic-cooling system, which is classified as a moderate-hazard facility. This volume consists of failure modes and effects analysis; accident analysis; operational safety requirements; quality assurance program; ES&H management program; environmental, safety, and health systems critical to safety; summary of waste-management program; environmental monitoring program; facility expansion, decontamination, and decommissioning; summary of emergency response plan; summary plan for employee training; summary plan for operating procedures; glossary; and appendices A and B.« less
Automatic Thread-Level Parallelization in the Chombo AMR Library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Christen, Matthias; Keen, Noel; Ligocki, Terry

2011-05-26

The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware?s architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number ofmore » existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.« less
Data Race Benchmark Collection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liao, Chunhua; Lin, Pei-Hung; Asplund, Joshua

2017-03-21

This project is a benchmark suite of Open-MP parallel codes that have been checked for data races. The programs are marked to show which do and do not have races. This allows them to be leveraged while testing and developing race detection tools.
N-propargylbenzylamine, a major metabolite of pargyline, is a potent inhibitor of monoamine oxidase type B in rats in vivo: a comparison with deprenyl.

PubMed Central

Karoum, F.

1987-01-01

In an effort to explore the contribution of the metabolites of pargyline towards the in vivo inhibition of monoamine oxidase (MAO), the effects of pargyline and its major metabolites on the production and metabolism of a number of biogenic amines were studied in rats. The administration of pargyline gave rise to three major ethyl acetate extractable metabolites: benzylamine, N-methylbenzylamine and N-propargylbenzylamine (NPB). Only NPB demonstrated in vivo monoamine oxidase inhibitory properties at an acute dose of 30 mg kg-1. The acute effects of pargyline, NPB, and deprenyl on urine and brain concentrations of a number of biogenic amines (phenylethylamine (PEA), m- and p-tyramine, noradrenaline (NA), dopamine, and 5-hydroxytryptamine (5-HT) and their metabolites were evaluated. Increased urine and brain concentrations of PEA were considered to represent in vivo inhibition of type B MAO while decreased concentrations of NA and 5-HT metabolites were regarded as indicators of an in vivo inhibition of MAO type A. NPB, like deprenyl and pargyline, significantly increased urine and brain PEA while only pargyline reduced 5-HT metabolism, suggesting that the metabolism of pargyline to NPB may contribute towards the MAO type B inhibitory effects of pargyline in vivo. Since the therapeutic benefits of MAO inhibitors in clinical practice usually require some period of chronic treatment, the chronic effects of repeated 14 daily doses of the above MAO inhibitors on central and peripheral biogenic amines were evaluated at the following times: during treatment, one day and five days after termination of treatment. The biochemical changes observed during the course of chronic NPB, pargyline and deprenyl treatments generally follow the expected in vitro characteristics of these drugs, but the detailed changes observed suggest clear differences. For example, the in vivo effect of pargyline on urine 5-hydroxyindoleacetic acid excretion was considerably weaker than its effect on the excretion of NA and dopamine metabolites. These changes are opposite to the in vitro effects of pargyline on 5-HT, dopamine and NA oxidative deamination. Inhibitions of the metabolism of all the amines studied were clearly observed during chronic MAOI treatments, but these effects were less evident five days after the end of treatment, suggesting an almost normal metabolism of biogenic amines. It is concluded that while MAO inhibitors may be the primary compound responsible for MAO inhibition, the effects of their metabolites in some cases may also play equally important roles in the regulation of monoamines both in the periphery and the brain.(ABSTRACT TRUNCATED AT 400 WORDS) PMID:3103805
A comparison of five benchmarks

NASA Technical Reports Server (NTRS)

Huss, Janice E.; Pennline, James A.

1987-01-01

Five benchmark programs were obtained and run on the NASA Lewis CRAY X-MP/24. A comparison was made between the programs codes and between the methods for calculating performance figures. Several multitasking jobs were run to gain experience in how parallel performance is measured.
Automatic Multilevel Parallelization Using OpenMP

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

2002-01-01

In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.
Automatic Multilevel Parallelization Using OpenMP

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

2002-01-01

In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.
Spherical harmonic results for the 3D Kobayashi Benchmark suite

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, P N; Chang, B; Hanebutte, U R

1999-03-02

Spherical harmonic solutions are presented for the Kobayashi benchmark suite. The results were obtained with Ardra, a scalable, parallel neutron transport code developed at Lawrence Livermore National Laboratory (LLNL). The calculations were performed on the IBM ASCI Blue-Pacific computer at LLNL.
Benchmarking hypercube hardware and software

NASA Technical Reports Server (NTRS)

Grunwald, Dirk C.; Reed, Daniel A.

1986-01-01

It was long a truism in computer systems design that balanced systems achieve the best performance. Message passing parallel processors are no different. To quantify the balance of a hypercube design, an experimental methodology was developed and the associated suite of benchmarks was applied to several existing hypercubes. The benchmark suite includes tests of both processor speed in the absence of internode communication and message transmission speed as a function of communication patterns.
An analytical benchmark and a Mathematica program for MD codes: Testing LAMMPS on the 2nd generation Brenner potential

NASA Astrophysics Data System (ADS)

Favata, Antonino; Micheletti, Andrea; Ryu, Seunghwa; Pugno, Nicola M.

2016-10-01

An analytical benchmark and a simple consistent Mathematica program are proposed for graphene and carbon nanotubes, that may serve to test any molecular dynamics code implemented with REBO potentials. By exploiting the benchmark, we checked results produced by LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) when adopting the second generation Brenner potential, we made evident that this code in its current implementation produces results which are offset from those of the benchmark by a significant amount, and provide evidence of the reason.
High Performance Computing at NASA

NASA Technical Reports Server (NTRS)

Bailey, David H.; Cooper, D. M. (Technical Monitor)

1994-01-01

The speaker will give an overview of high performance computing in the U.S. in general and within NASA in particular, including a description of the recently signed NASA-IBM cooperative agreement. The latest performance figures of various parallel systems on the NAS Parallel Benchmarks will be presented. The speaker was one of the authors of the NAS (National Aerospace Standards) Parallel Benchmarks, which are now widely cited in the industry as a measure of sustained performance on realistic high-end scientific applications. It will be shown that significant progress has been made by the highly parallel supercomputer industry during the past year or so, with several new systems, based on high-performance RISC processors, that now deliver superior performance per dollar compared to conventional supercomputers. Various pitfalls in reporting performance will be discussed. The speaker will then conclude by assessing the general state of the high performance computing field.
Parallel Ada benchmarks for the SVMS

NASA Technical Reports Server (NTRS)

Collard, Philippe E.

1990-01-01

The use of parallel processing paradigm to design and develop faster and more reliable computers appear to clearly mark the future of information processing. NASA started the development of such an architecture: the Spaceborne VHSIC Multi-processor System (SVMS). Ada will be one of the languages used to program the SVMS. One of the unique characteristics of Ada is that it supports parallel processing at the language level through the tasking constructs. It is important for the SVMS project team to assess how efficiently the SVMS architecture will be implemented, as well as how efficiently Ada environment will be ported to the SVMS. AUTOCLASS II, a Bayesian classifier written in Common Lisp, was selected as one of the benchmarks for SVMS configurations. The purpose of the R and D effort was to provide the SVMS project team with the version of AUTOCLASS II, written in Ada, that would make use of Ada tasking constructs as much as possible so as to constitute a suitable benchmark. Additionally, a set of programs was developed that would measure Ada tasking efficiency on parallel architectures as well as determine the critical parameters influencing tasking efficiency. All this was designed to provide the SVMS project team with a set of suitable tools in the development of the SVMS architecture.

Molecular evolution of Theta-class glutathione transferase for enhanced activity with the anticancer drug 1,3-bis-(2-chloroethyl)-1-nitrosourea and other alkylating agents.

PubMed

Larsson, Anna-Karin; Shokeer, Abeer; Mannervik, Bengt

2010-05-01

Glutathione transferase (GST) displaying enhanced activity with the cytostatic drug 1,3-bis-(2-chloroethyl)-1-nitrosourea (BCNU) and structurally related alkylating agents was obtained by molecular evolution. Mutant libraries created by recursive recombination of cDNA coding for human and rodent Theta-class GSTs were heterologously expressed in Escherichia coli and screened with the surrogate substrate 4-nitrophenethyl bromide (NPB) for enhanced alkyltransferase activity. A mutant with a 70-fold increased catalytic efficiency with NPB, compared to human GST T1-1, was isolated. The efficiency in degrading BCNU had improved 170-fold, significantly more than with the model substrate NPB. The enhanced catalytic activity of the mutant GST was also 2-fold higher with BCNU than wild-type mouse GST T1-1, which is 80-fold more efficient than wild-type human GST T1-1. We propose that GSTs catalyzing inactivation of anticancer drugs may find clinical use in protecting sensitive normal tissues to toxic side-effects in treated patients, and as selectable markers in gene therapy. Copyright 2010 Elsevier Inc. All rights reserved.
Unusual behaviour of (Np,Pu)B2C

NASA Astrophysics Data System (ADS)

Klimczuk, Tomasz; Boulet, Pascal; Griveau, Jean-Christophe; Colineau, Eric; Bauer, Ernst; Falmbigl, Matthias; Rogl, Peter; Wastin, Franck

2015-02-01

Two transuranium metal boron carbides, NpB2C and PuB2C have been synthesized by argon arc melting. The crystal structures of the {Np,Pu}B2C compounds were determined from single-crystal X-ray data to be isotypic with the ThB2C-type (space group ?, a = 0.6532(2) nm; c = 1.0769(3) nm for NpB2C and a = 0.6509(2) nm; c = 1.0818(3) nm for PuB2C; Z = 9). Physical properties have been derived from polycrystalline bulk material in the temperature range from 2 to 300 K and in magnetic fields up to 9 T. Magnetic susceptibility and heat capacity data indicate the occurrence of antiferromagnetic ordering for NpB2C with a Neel temperature TN = 68 K. PuB2C is a Pauli paramagnet most likely due to a strong hybridization of s(p,d) electrons with the Pu-5f states. A pseudo-gap, as concluded from the Sommerfeld value and the electronic transport, is thought to be a consequence of the hybridization. The magnetic behaviour of {Np,Pu}B2C is consistent with the criterion of Hill.
A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Madduri, Kamesh; Ediger, David; Jiang, Karl

2009-02-15

We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 millionmore » vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Madduri, Kamesh; Ediger, David; Jiang, Karl

2009-05-29

We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor.more » For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
Enabling the High Level Synthesis of Data Analytics Accelerators

DOE Office of Scientific and Technical Information (OSTI.GOV)

Minutoli, Marco; Castellana, Vito G.; Tumeo, Antonino

Conventional High Level Synthesis (HLS) tools mainly tar- get compute intensive kernels typical of digital signal pro- cessing applications. We are developing techniques and ar- chitectural templates to enable HLS of data analytics appli- cations. These applications are memory intensive, present fine-grained, unpredictable data accesses, and irregular, dy- namic task parallelism. We discuss an architectural tem- plate based around a distributed controller to efficiently ex- ploit thread level parallelism. We present a memory in- terface that supports parallel memory subsystems and en- ables implementing atomic memory operations. We intro- duce a dynamic task scheduling approach to efficiently ex- ecute heavilymore » unbalanced workload. The templates are val- idated by synthesizing queries from the Lehigh University Benchmark (LUBM), a well know SPARQL benchmark.« less
Microbial Group Specific Uptake Kinetics of Inorganic Phosphate and Adenosine-5′-Triphosphate (ATP) in the North Pacific Subtropical Gyre

PubMed Central

Björkman, Karin; Duhamel, Solange; Karl, David M.

2012-01-01

We investigated the concentration dependent uptake of inorganic phosphate (Pi) and adenosine-5′-triphosphate (ATP) in microbial populations in the North Pacific Subtropical Gyre (NPSG). We used radiotracers to measure substrate uptake into whole water communities, differentiated microbial size classes, and two flow sorted groups; Prochlorococcus (PRO) and non-pigmented bacteria (NPB). The Pi concentrations, uptake rates, and Pi pool turnover times (Tt) were (mean, ±SD); 54.9 ± 35.0 nmol L−1 (n = 22), 4.8 ± 1.9 nmol L−1 day−1 (n = 19), and 14.7 ± 10.2 days (n = 19), respectively. Pi uptake into >2 μm cells was on average 12 ± 7% (n = 15) of the total uptake. The kinetic response to Pi (10–500 nmol L−1) was small, indicating that the microorganisms were close to their maximum uptake velocity (Vmax). Vmax averaged 8.0 ± 3.6 nmol L−1 day−1 (n = 19) in the >0.2 μm group, with half saturation constants (Km) of 40 ± 28 nmol L−1 (n = 19). PRO had three times the cell specific Pi uptake rate of NPB, at ambient concentrations, but when adjusted to cells L−1 the rates were similar, and these two groups were equally competitive for Pi. The Tt of γ-P-ATP in the >0.2 μm group were shorter than for the Pi pool (4.4 ± 1.0 days; n = 6), but this difference diminished in the larger size classes. The kinetic response to ATP was large in the >0.2 μm class with Vmax exceeding the rates at ambient concentrations (mean 62 ± 27 times; n = 6) with a mean Vmax for γ-P-ATP of 2.8 ± 1.0 nmol L−1 day−1, and Km at 11.5 ± 5.4 nmol L−1 (n = 6). The NPB contribution to γ-P-ATP uptake was high (95 ± 3%, n = 4) at ambient concentrations but decreased to ∼50% at the highest ATP amendment. PRO had Km values 5–10 times greater than NPB. The above indicates that PRO and NPB were in close competition in terms of Pi acquisition, whereas P uptake from ATP could be attributed to NPB. This apparent resource partitioning may be a niche separating strategy and an important factor in the successful co-existence within the oligotrophic upper ocean of the NPSG. PMID:22701449
A heterogeneous computing accelerated SCE-UA global optimization method using OpenMP, OpenCL, CUDA, and OpenACC.

PubMed

Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Liang, Ke; Hong, Yang

2017-10-01

The shuffled complex evolution optimization developed at the University of Arizona (SCE-UA) has been successfully applied in various kinds of scientific and engineering optimization applications, such as hydrological model parameter calibration, for many years. The algorithm possesses good global optimality, convergence stability and robustness. However, benchmark and real-world applications reveal the poor computational efficiency of the SCE-UA. This research aims at the parallelization and acceleration of the SCE-UA method based on powerful heterogeneous computing technology. The parallel SCE-UA is implemented on Intel Xeon multi-core CPU (by using OpenMP and OpenCL) and NVIDIA Tesla many-core GPU (by using OpenCL, CUDA, and OpenACC). The serial and parallel SCE-UA were tested based on the Griewank benchmark function. Comparison results indicate the parallel SCE-UA significantly improves computational efficiency compared to the original serial version. The OpenCL implementation obtains the best overall acceleration results however, with the most complex source code. The parallel SCE-UA has bright prospects to be applied in real-world applications.
Development and Application of a Parallel LCAO Cluster Method

NASA Astrophysics Data System (ADS)

Patton, David C.

1997-08-01

CPU intensive steps in the SCF electronic structure calculations of clusters and molecules with a first-principles LCAO method have been fully parallelized via a message passing paradigm. Identification of the parts of the code that are composed of many independent compute-intensive steps is discussed in detail as they are the most readily parallelized. Most of the parallelization involves spatially decomposing numerical operations on a mesh. One exception is the solution of Poisson's equation which relies on distribution of the charge density and multipole methods. The method we use to parallelize this part of the calculation is quite novel and is covered in detail. We present a general method for dynamically load-balancing a parallel calculation and discuss how we use this method in our code. The results of benchmark calculations of the IR and Raman spectra of PAH molecules such as anthracene (C_14H_10) and tetracene (C_18H_12) are presented. These benchmark calculations were performed on an IBM SP2 and a SUN Ultra HPC server with both MPI and PVM. Scalability and speedup for these calculations is analyzed to determine the efficiency of the code. In addition, performance and usage issues for MPI and PVM are presented.
Enhanced Emission Efficiency in Organic Light-Emitting Diodes Using Deoxyribonucleic Acid Complex as an Electron Blocking Layer

DTIC Science & Technology

2006-04-28

1. Color online Photographs of EL emission from several devices: a green Alq3 baseline OLED at 25 V 707 mA/cm2—590 cd/m2, 0.35 cd/A; b green... Alq3 BioLED with DNA EBL at 25 V 308 mA/cm2—21 100 cd/m2, 6.56 cd/A; c blue NPB baseline OLED at 20 V 460 mA/cm2—700 cd/m2, 0.14 cd/A; d blue...al. Appl. Phys. Lett. 88, 171109 2006NPB N ,N-bisnaphthalene-1-yl-N ,N-bisphenyl benzi- dine hole transport layer HTL; Alq3 tris-8
Rethinking key–value store for parallel I/O optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kougkas, Anthony; Eslami, Hassan; Sun, Xian-He

2015-01-26

Key-value stores are being widely used as the storage system for large-scale internet services and cloud storage systems. However, they are rarely used in HPC systems, where parallel file systems are the dominant storage solution. In this study, we examine the architecture differences and performance characteristics of parallel file systems and key-value stores. We propose using key-value stores to optimize overall Input/Output (I/O) performance, especially for workloads that parallel file systems cannot handle well, such as the cases with intense data synchronization or heavy metadata operations. We conducted experiments with several synthetic benchmarks, an I/O benchmark, and a real application.more » We modeled the performance of these two systems using collected data from our experiments, and we provide a predictive method to identify which system offers better I/O performance given a specific workload. The results show that we can optimize the I/O performance in HPC systems by utilizing key-value stores.« less
Unstructured Adaptive Meshes: Bad for Your Memory?

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Feng, Hui-Yu; VanderWijngaart, Rob

2003-01-01

This viewgraph presentation explores the need for a NASA Advanced Supercomputing (NAS) parallel benchmark for problems with irregular dynamical memory access. This benchmark is important and necessary because: 1) Problems with localized error source benefit from adaptive nonuniform meshes; 2) Certain machines perform poorly on such problems; 3) Parallel implementation may provide further performance improvement but is difficult. Some examples of problems which use irregular dynamical memory access include: 1) Heat transfer problem; 2) Heat source term; 3) Spectral element method; 4) Base functions; 5) Elemental discrete equations; 6) Global discrete equations. Nonconforming Mesh and Mortar Element Method are covered in greater detail in this presentation.
A Programming Model Performance Study Using the NAS Parallel Benchmarks

DOE PAGES

Shan, Hongzhang; Blagojević, Filip; Min, Seung-Jai; ...

2010-01-01

Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the threemore » programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors. We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.« less
Hole transport characteristics in phosphorescent dye-doped NPB films by admittance spectroscopy

NASA Astrophysics Data System (ADS)

Wang, Ying; Chen, Jiangshan; Huang, Jinying; Dai, Yanfeng; Zhang, Zhiqiang; Liu, Su; Ma, Dongge

2014-05-01

Admittance spectroscopy is a powerful tool to determine the carrier mobility. The carrier mobility is a significant parameter to understand the behavior or to optimize the organic light-emitting diode or other organic semiconductor devices. Hole transport in phosphorescent dye, bis[2-(9,9-diethyl-9H-fluoren-2-yl)-1-phenyl-1Hbenzoimidazol-N,C3] iridium(acetylacetonate [(fbi)2Ir(acac)]) doped into N,N-diphenyl-N,N-bis(1-naphthylphenyl)-1,1-biphenyl-4,4-diamine (NPB) films was investigated by admittance spectroscopy. The results show that doped (fbi)2Ir(acac) molecules behave as hole traps in NPB, and lower the hole mobility. For thicker films(≳300 nm), the electric field dependence of hole mobility is as expected positive, i.e., the mobility increases exponentially with the electric field. However, for thinner films (≲300 nm), the electric field dependence of hole mobility is negative, i.e., the hole mobility decreases exponentially with the electric field. Physical mechanisms behind the negative field dependence of hole mobility are discussed. In addition, three frequency regions were divided to analyze the behaviors of the capacitance in the hole-only device and the physical mechanism was explained by trap theory and the parasitic capacitance effect.
Defective Autophagy, Mitochondrial Clearance and Lipophagy in Niemann-Pick Type B Lymphocytes

PubMed Central

Salucci, Sara; Luchetti, Francesca; Falcieri, Elisabetta; Di Sario, Gianna; Palma, Fulvio; Papa, Stefano

2016-01-01

Niemann-Pick disease type A (NP-A) and type B (NP-B) are lysosomal storage diseases (LSDs) caused by sphingomyelin accumulation in lysosomes relying on reduced or absent acid sphingomyelinase. A considerable body of evidence suggests that lysosomal storage in many LSD impairs autophagy, resulting in the accumulation of poly-ubiquitinated proteins and dysfunctional mitochondria, ultimately leading to cell death. Here we test this hypothesis in a cellular model of Niemann-Pick disease type B, in which autophagy has never been studied. The basal autophagic pathway was first examined in order to evaluate its functionality using several autophagy-modulating substances such as rapamycin and nocodazole. We found that human NP-B B lymphocytes display considerable alteration in their autophagic vacuole accumulation and mitochondrial fragmentation, as well as mitophagy induction (for damaged mitochondria clearance). Furthermore, lipid traceability of intra and extra-cellular environments shows lipid accumulation in NP-B B lymphocytes and also reveals their peculiar trafficking/management, culminating in lipid microparticle extrusion (by lysosomal exocytosis mechanisms) or lipophagy. All of these features point to the presence of a deep autophagy/mitophagy alteration revealing autophagic stress and defective mitochondrial clearance. Hence, rapamycin might be used to regulate autophagy/mitophagy (at least in part) and to contribute to the clearance of lysosomal aberrant lipid storage. PMID:27798705
[Rhodobaculum claviforme gen. nov., sp. nov., a New Alkaliphilic Nonsulfur Purple Bacterium].

PubMed

Bryantseva, I A; Gaisin, V A; Gorlenko, V M

2015-01-01

Two alkaliphilic strains of nonsulfur purple bacteria (NPB), B7-4 and B8-2, were isolated from southeast Siberia moderately saline alkaline steppe lakes with pH values above 9.0. The isolates were motile, polymorphous cells (from short rods to long spindly cells) 2.0-3.2 x 9.6-20.0 μm. Intracellular membranes of vesicular type were mostly located at the cell periphery. The microorganisms contained bacteriochlorophyll a and carotenoids of the spheroidene and spirilloxanthin series. The photosynthetic apparatus was represented by LH2 and LH1 light-harvesting complexes. In the presence of organic compounds, the strains grew aerobically in the dark or anaerobically in the light. Capacity for photo- and chemoautotrophic growth was not detected. The cbbl gene encoding RuBisCO was not revealed. Optimal growth of both strains occurred at 2% NaCl (range from 0.5 to 4%), pH 8.0-8.8 (range from 7.5 to 9.7), and 25-35 degrees C. The DNA G+C content was 67.6-69.8 mol %. Pairwise comparison of the nucleotides of the 16S rRNA genes revealed that strains B7-4 and B8-2 belonged to the same species (99.9% homology) and were most closely related to the aerobic alkaliphilic bacteriochlorophyll a-containing anoxygenic phototrophic bacterium (APB) Roseibacula alcaliphilum De (95.2%) and to NPB strains Rhodobaca barguzinensis VKM B-2406(T) (94.2%) and Rbc. bogoriensis LBB1(T) (93.9%). The isolates were closely related to the NPB Rhodobacter veldkampii DSM 11550(T) (94.8%) and to aerobic bacteriochlorophyll a-containing bacteria Roseinatronobacter monicus ROS 35(T) and Roseicitreum antarcticul ZS2-28(T) (93.5 and 93.9%, respectively). New strains were described as a new NPB genus and species of the family Rhodobacteriaceae, Rhodobaculum claviforme gen. nov., sp. nov., with B7-4(T) (VKM B-2708, LMG 28126) as the type strain.
The excitation mechanism of btp2 Ir(acac) in CBP host.

PubMed

Xiao-Bo, Zhang; Fu-Xiang, Wei

2017-05-01

Whether bis(2-(2'-benzo[4,5-α]thienyl)pyridinato-N,C3')iridium(acetylacetonate) (btp 2 Ir(acac)) emission comes from carrier trapping and/or energy transfer, when doped in the 4,4'-bis(N-carbazolyl)biphenyl (CBP) host in organic light-emitting devices, is not clear; therefore, the btp 2 Ir(acac) emission in CBP hosts was studied. In the red-doped device, both N,N'-bis(1-naphthyl)-N,N'-diphenyl-1.1'-bipheny1-4-4'-diamine (NPB) and (1,1'-biphenyl-4'-oxy)bis(8-hydroxy-2-methylquinolinato)-aluminum (BAlq) emission appeared, which illustrated that CBP excitons cannot be formed at two emissive layer (EML) interfaces in the device. In the co-doped devices, NPB and BAlq emissions disappear and 1,4-bis[2-(3-N-ethylcarbazoryl)vinyl]benzene (BCzVB) emission appears, illustrating the formation of CBP excitons at two EML interfaces in these devices. The reason for this difference was analyzed and it was found that holes in the NPB layer could be made directly into the CBP host in the EML interface of the red-doped device. In contrast, holes were injected into CBP host via the btp 2 Ir(acac)/BCzVB dopants in the co-doped devices, which facilitated hole injection from the NPB layer to the EML, leading to the formation of CBP excitons at two EML interfaces in the co-doped devices. Therefore, btp 2 Ir(acac) emission was caused by carrier trapping in the red-doped device, while, in the co-doped devices, it resulted from both carrier trapping and energy transfer from the CBP. Furthermore, it was revealed that the carrier trapping mechanism is less efficient than the energy transfer mechanism for btp 2 Ir(acac) excitation in co-doped devices. In summary, our results clarified the excitation mechanism of btp 2 Ir(acac) in the CBP host. Copyright © 2016 John Wiley & Sons, Ltd.
[Optical and electrical properties of NPB/Alq3 organic quantum well].

PubMed

Huang, Jin-Zhao; Xu, Zheng; Zhao, Su-Ling; Zhang, Fu-Jun; Wang, Yong

2007-04-01

In the present paper, the organic quantum-well device similar to the type-II quantum well of inorganic semiconductor material was prepared by heat evaporation. NPB (N, N'-di-[(1-naphthalenyl)-N, N'-diphenyl]-(1,1'-biphenyl)-4,4'-diamine) and Alq3 (Tris-(8-quinolinolato) aluminum) act as the potential barrier layer and the potential well layer respectively. Besides, the single layer structure of Alq3 was prepared. In the experiments, the Forster nonradiative resonant energy transfer from the barrier layer to the well layer was identified, and the quantum well luminescence device possesses a favorable current-voltage property. The narrowing of spectrum was observed, and the spectrum shifted to blue region continuously when the applied voltage increased.
Applications Performance on NAS Intel Paragon XP/S - 15#

NASA Technical Reports Server (NTRS)

Saini, Subhash; Simon, Horst D.; Copper, D. M. (Technical Monitor)

1994-01-01

The Numerical Aerodynamic Simulation (NAS) Systems Division received an Intel Touchstone Sigma prototype model Paragon XP/S- 15 in February, 1993. The i860 XP microprocessor with an integrated floating point unit and operating in dual -instruction mode gives peak performance of 75 million floating point operations (NIFLOPS) per second for 64 bit floating point arithmetic. It is used in the Paragon XP/S-15 which has been installed at NAS, NASA Ames Research Center. The NAS Paragon has 208 nodes and its peak performance is 15.6 GFLOPS. Here, we will report on early experience using the Paragon XP/S- 15. We have tested its performance using both kernels and applications of interest to NAS. We have measured the performance of BLAS 1, 2 and 3 both assembly-coded and Fortran coded on NAS Paragon XP/S- 15. Furthermore, we have investigated the performance of a single node one-dimensional FFT, a distributed two-dimensional FFT and a distributed three-dimensional FFT Finally, we measured the performance of NAS Parallel Benchmarks (NPB) on the Paragon and compare it with the performance obtained on other highly parallel machines, such as CM-5, CRAY T3D, IBM SP I, etc. In particular, we investigated the following issues, which can strongly affect the performance of the Paragon: a. Impact of the operating system: Intel currently uses as a default an operating system OSF/1 AD from the Open Software Foundation. The paging of Open Software Foundation (OSF) server at 22 MB to make more memory available for the application degrades the performance. We found that when the limit of 26 NIB per node out of 32 MB available is reached, the application is paged out of main memory using virtual memory. When the application starts paging, the performance is considerably reduced. We found that dynamic memory allocation can help applications performance under certain circumstances. b. Impact of data cache on the i860/XP: We measured the performance of the BLAS both assembly coded and Fortran coded. We found that the measured performance of assembly-coded BLAS is much less than what memory bandwidth limitation would predict. The influence of data cache on different sizes of vectors is also investigated using one-dimensional FFTs. c. Impact of processor layout: There are several different ways processors can be laid out within the two-dimensional grid of processors on the Paragon. We have used the FFT example to investigate performance differences based on processors layout.
libvdwxc: a library for exchange-correlation functionals in the vdW-DF family

NASA Astrophysics Data System (ADS)

Hjorth Larsen, Ask; Kuisma, Mikael; Löfgren, Joakim; Pouillon, Yann; Erhart, Paul; Hyldgaard, Per

2017-09-01

We present libvdwxc, a general library for evaluating the energy and potential for the family of vdW-DF exchange-correlation functionals. libvdwxc is written in C and provides an efficient implementation of the vdW-DF method and can be interfaced with various general-purpose DFT codes. Currently, the Gpaw and Octopus codes implement interfaces to libvdwxc. The present implementation emphasizes scalability and parallel performance, and thereby enables ab initio calculations of nanometer-scale complexes. The numerical accuracy is benchmarked on the S22 test set whereas parallel performance is benchmarked on ligand-protected gold nanoparticles ({{Au}}144{({{SC}}11{{NH}}25)}60) up to 9696 atoms.
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Gaeke, Brian R.; Husbands, Parry; Li, Xiaoye S.; Oliker, Leonid; Yelick, Katherine A.; Biegel, Bryan (Technical Monitor)

2002-01-01

The increasing gap between processor and memory performance has lead to new architectural models for memory-intensive applications. In this paper, we explore the performance of a set of memory-intensive benchmarks and use them to compare the performance of conventional cache-based microprocessors to a mixed logic and DRAM processor called VIRAM. The benchmarks are based on problem statements, rather than specific implementations, and in each case we explore the fundamental hardware requirements of the problem, as well as alternative algorithms and data structures that can help expose fine-grained parallelism or simplify memory access patterns. The benchmarks are characterized by their memory access patterns, their basic control structures, and the ratio of computation to memory operation.

Roofline model toolkit: A practical tool for architectural and program analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lo, Yu Jung; Williams, Samuel; Van Straalen, Brian

We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measuremore » sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.« less
Numerical study of the light output intensity of the bilayer organic light-emitting diodes

NASA Astrophysics Data System (ADS)

Lu, Feiping

2017-02-01

The structure of organic light-emitting diodes (OLEDs) is one of most important issues that influence the light output intensity (LOI) of OLEDs. In this paper, based on a simple but accurate optical model, the influences of hole and electron transport layer thickness on the LOI of bilayer OLEDs, which with N,N0- bis(naphthalen-1-yl)-N,N0- bis(phenyl)- benzidine (NPB) or N,N'- diphenyl-N,N'-bis(3-methylphenyl)-1,1'-biphenyl-4,4-diamine (TPD) as hole transport layer, with tris(8-hydroxyquinoline) aluminum (Alq3) as electron transport and light emitting layers, were investigated. The laws of LOI for OLEDs under different organic layer thickness values were obtained. The results show that the LOI of devices varies in accordance with damped cosine or sine function as the increasing of organic layer thickness, and the results show that the bilayer OLEDs with the structure of Glass/ITO/NPB (55 nm)/Alq3 (75 nm)/Al and Glass/ITO/TPB (60 nm)/Alq3 (75 nm)/Al have most largest LOI. When the thickness of Alq3 is less than 105 nm, the OLEDs with TPD as hole transport layer have larger LOI than that with NPB as hole transport layer. The results obtained in this paper can present an in-depth understanding of the working mechanism of OLEDs and help ones fabricate high efficiency OLEDs.
Importance of the pluripotency factor LIN28 in the mammalian nucleolus during early embryonic development.

PubMed

Vogt, Edgar J; Meglicki, Maciej; Hartung, Kristina Ilka; Borsuk, Ewa; Behr, Rüdiger

2012-12-01

The maternal nucleolus is required for proper activation of the embryonic genome (EGA) and early embryonic development. Nucleologenesis is characterized by the transformation of a nucleolar precursor body (NPB) to a mature nucleolus during preimplantation development. However, the function of NPBs and the involved molecular factors are unknown. We uncover a novel role for the pluripotency factor LIN28, the biological significance of which was previously demonstrated in the reprogramming of human somatic cells to induced pluripotent stem (iPS) cells. Here, we show that LIN28 accumulates at the NPB and the mature nucleolus in mouse preimplantation embryos and embryonic stem cells (ESCs), where it colocalizes with the nucleolar marker B23 (nucleophosmin 1). LIN28 has nucleolar localization in non-human primate (NHP) preimplantation embryos, but is cytoplasmic in NHP ESCs. Lin28 transcripts show a striking decline before mouse EGA, whereas LIN28 protein localizes to NPBs at the time of EGA. Following knockdown with a Lin28 morpholino, the majority of embryos arrest between the 2- and 4-cell stages and never develop to morula or blastocyst. Lin28 morpholino-injected embryos arrested at the 2-cell stage were not enriched with nucleophosmin at presumptive NPB sites, indicating that functional NPBs were not assembled. Based on these results, we propose that LIN28 is an essential factor of nucleologenesis during early embryonic development.
Implementation and performance of parallel Prolog interpreter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei, S.; Kale, L.V.; Balkrishna, R.

1988-01-01

In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
Tunable electroluminescent color for 2, 5-diphenyl -1, 4-distyrylbenzene with two trans-double bonds

NASA Astrophysics Data System (ADS)

Cheng, Gang; Zhang, Yingfang; Zhao, Yi; Liu, Shiyong; Xie, Zengqi; Xia, Hong; Hanif, Muddasir; Ma, Yuguang

2005-07-01

Exciplex emission is observed in electroluminescent (EL) spectrum of an organic light-emitting device (OLED), where 2, 5-diphenyl -1, 4-distyrylbenzene with two trans-double bonds (trans-DPDSB), (8-hydroxyquinoline) aluminum, and N,N'-diphenyl-N,N'-bis(1-naphthyl)-(1,1'-biphenyl)-4,4'-diamine (NPB) are used as light-emitting, electron-transporting, and hole-transporting layers, respectively. This emission can be dramatically weakened by inserting a hole-injecting layer of poly(3,4-ethylenedioxythiophene):poly(styrene sulfonic acid) between the hole-transporting layer and the anode. Consequently, EL color of this OLED is tuned from white to blue. This phenomenon may result from the improvement of hole injection, which shifts the major recombination zone from the NPB/trans-DPDSB interface to the trans-DPDSB layer.
Commissioning the GTA accelerator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sander, O.R.; Atkins, W.H.; Bolme, G.O.

1992-09-01

The Ground Test Accelerator (GTA) is supported by the Strategic Defense command as part of their Neutral Particle Beam (NPB) program. Neutral particles have the advantage that in space they are unaffected by the earth`s magnetic field and travel in straight lines unless they enter the earth`s atmosphere and become charged by stripping. Heavy particles are difficult to stop and can probe the interior of space vehicles; hence, NPB can function as a discriminator between warheads and decoys. We are using GTA to resolve the physics and engineering issues related to accelerating, focusing, and steering a high-brightness, high-current H{sup -}more » beam and then neutralizing it. Our immediate goal is to produce a 24-MeV, 50mA device with a 2% duty factor.« less
Commissioning the GTA accelerator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sander, O.R.; Atkins, W.H.; Bolme, G.O.

1992-01-01

The Ground Test Accelerator (GTA) is supported by the Strategic Defense command as part of their Neutral Particle Beam (NPB) program. Neutral particles have the advantage that in space they are unaffected by the earth's magnetic field and travel in straight lines unless they enter the earth's atmosphere and become charged by stripping. Heavy particles are difficult to stop and can probe the interior of space vehicles; hence, NPB can function as a discriminator between warheads and decoys. We are using GTA to resolve the physics and engineering issues related to accelerating, focusing, and steering a high-brightness, high-current H{sup -}more » beam and then neutralizing it. Our immediate goal is to produce a 24-MeV, 50mA device with a 2% duty factor.« less
Design of Unstructured Adaptive (UA) NAS Parallel Benchmark Featuring Irregular, Dynamic Memory Accesses

NASA Technical Reports Server (NTRS)

Feng, Hui-Yu; VanderWijngaart, Rob; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2001-01-01

We describe the design of a new method for the measurement of the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. The method involves the solution of a stylized heat transfer problem on an unstructured, adaptive grid. A Spectral Element Method (SEM) with an adaptive, nonconforming mesh is selected to discretize the transport equation. The relatively high order of the SEM lowers the fraction of wall clock time spent on inter-processor communication, which eases the load balancing task and allows us to concentrate on the memory accesses. The benchmark is designed to be three-dimensional. Parallelization and load balance issues of a reference implementation will be described in detail in future reports.
Present Status and Extensions of the Monte Carlo Performance Benchmark

NASA Astrophysics Data System (ADS)

Hoogenboom, J. Eduard; Petrovic, Bojan; Martin, William R.

2014-06-01

The NEA Monte Carlo Performance benchmark started in 2011 aiming to monitor over the years the abilities to perform a full-size Monte Carlo reactor core calculation with a detailed power production for each fuel pin with axial distribution. This paper gives an overview of the contributed results thus far. It shows that reaching a statistical accuracy of 1 % for most of the small fuel zones requires about 100 billion neutron histories. The efficiency of parallel execution of Monte Carlo codes on a large number of processor cores shows clear limitations for computer clusters with common type computer nodes. However, using true supercomputers the speedup of parallel calculations is increasing up to large numbers of processor cores. More experience is needed from calculations on true supercomputers using large numbers of processors in order to predict if the requested calculations can be done in a short time. As the specifications of the reactor geometry for this benchmark test are well suited for further investigations of full-core Monte Carlo calculations and a need is felt for testing other issues than its computational performance, proposals are presented for extending the benchmark to a suite of benchmark problems for evaluating fission source convergence for a system with a high dominance ratio, for coupling with thermal-hydraulics calculations to evaluate the use of different temperatures and coolant densities and to study the correctness and effectiveness of burnup calculations. Moreover, other contemporary proposals for a full-core calculation with realistic geometry and material composition will be discussed.
Implementation of BT, SP, LU, and FT of NAS Parallel Benchmarks in Java

NASA Technical Reports Server (NTRS)

Schultz, Matthew; Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

2000-01-01

A number of Java features make it an attractive but a debatable choice for High Performance Computing. We have implemented benchmarks working on single structured grid BT,SP,LU and FT in Java. The performance and scalability of the Java code shows that a significant improvement in Java compiler technology and in Java thread implementation are necessary for Java to compete with Fortran in HPC applications.
Accelerating the Pace of Protein Functional Annotation With Intel Xeon Phi Coprocessors.

PubMed

Feinstein, Wei P; Moreno, Juana; Jarrell, Mark; Brylinski, Michal

2015-06-01

Intel Xeon Phi is a new addition to the family of powerful parallel accelerators. The range of its potential applications in computationally driven research is broad; however, at present, the repository of scientific codes is still relatively limited. In this study, we describe the development and benchmarking of a parallel version of eFindSite, a structural bioinformatics algorithm for the prediction of ligand-binding sites in proteins. Implemented for the Intel Xeon Phi platform, the parallelization of the structure alignment portion of eFindSite using pragma-based OpenMP brings about the desired performance improvements, which scale well with the number of computing cores. Compared to a serial version, the parallel code runs 11.8 and 10.1 times faster on the CPU and the coprocessor, respectively; when both resources are utilized simultaneously, the speedup is 17.6. For example, ligand-binding predictions for 501 benchmarking proteins are completed in 2.1 hours on a single Stampede node equipped with the Intel Xeon Phi card compared to 3.1 hours without the accelerator and 36.8 hours required by a serial version. In addition to the satisfactory parallel performance, porting existing scientific codes to the Intel Xeon Phi architecture is relatively straightforward with a short development time due to the support of common parallel programming models by the coprocessor. The parallel version of eFindSite is freely available to the academic community at www.brylinski.org/efindsite.
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

PubMed

Nadkarni, P M; Miller, P L

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
MoMaS reactive transport benchmark using PFLOTRAN

NASA Astrophysics Data System (ADS)

Park, H.

2017-12-01

MoMaS benchmark was developed to enhance numerical simulation capability for reactive transport modeling in porous media. The benchmark was published in late September of 2009; it is not taken from a real chemical system, but realistic and numerically challenging tests. PFLOTRAN is a state-of-art massively parallel subsurface flow and reactive transport code that is being used in multiple nuclear waste repository projects at Sandia National Laboratories including Waste Isolation Pilot Plant and Used Fuel Disposition. MoMaS benchmark has three independent tests with easy, medium, and hard chemical complexity. This paper demonstrates how PFLOTRAN is applied to this benchmark exercise and shows results of the easy benchmark test case which includes mixing of aqueous components and surface complexation. Surface complexations consist of monodentate and bidentate reactions which introduces difficulty in defining selectivity coefficient if the reaction applies to a bulk reference volume. The selectivity coefficient becomes porosity dependent for bidentate reaction in heterogeneous porous media. The benchmark is solved by PFLOTRAN with minimal modification to address the issue and unit conversions were made properly to suit PFLOTRAN.
Fused Methoxynaphthyl Phenanthrimidazole Semiconductors as Functional Layer in High Efficient OLEDs.

PubMed

Jayabharathi, Jayaraman; Ramanathan, Periyasamy; Karunakaran, Chockalingam; Thanikachalam, Venugopal

2016-01-01

Efficient hole transport materials based on novel fused methoxynaphthyl phenanthrimidazole core structure were synthesised and characterized. Their device performances in phosphorescent organic light emitting diodes were investigated. The high thermal stability in combination with the reversible oxidation process made promising candidates as hole-transporting materials for organic light-emitting devices. Highly efficient Alq3-based organic light emitting devices have been developed using phenanthrimidazoles as functional layers between NPB [4,4-bis(N-(1-naphthyl)-N-phenylamino)biphenyl] and Alq3 [tris(8-hydroxyquinoline)aluminium] layers. Using the device of ITO/NPB/4/Alq3/LiF/Al, a maximum luminous efficiency of 5.99 cd A(-1) was obtained with a maximum brightness of 40,623 cd m(-2) and a power efficiency of 5.25 lm W(-1).
Neutral particle beam sensing and steering

DOEpatents

Maier, II, William B.; Cobb, Donald D.; Robiscoe, Richard T.

1991-01-01

The direction of a neutral particle beam (NPB) is determined by detecting Ly.alpha. radiation emitted during motional quenching of excited H(2S) atoms in the beam during movement of the atoms through a magnetic field. At least one detector is placed adjacent the beam exit to define an optical axis that intercepts the beam at a viewing angle to include a volume generating a selected number of photons for detection. The detection system includes a lens having an area that is small relative to the NPB area and a pixel array located in the focal plane of the lens. The lens viewing angle and area pixel array are selected to optimize the beam tilt sensitivity. In one embodiment, two detectors are placed coplanar with the beam axis to generate a difference signal that is insensitive to beam variations other than beam tilt.
Assessment of respiratory flow cycle morphology in patients with chronic heart failure.

PubMed

Garde, Ainara; Sörnmo, Leif; Laguna, Pablo; Jané, Raimon; Benito, Salvador; Bayés-Genís, Antoni; Giraldo, Beatriz F

2017-02-01

Breathing pattern as periodic breathing (PB) in chronic heart failure (CHF) is associated with poor prognosis and high mortality risk. This work investigates the significance of a number of time domain parameters for characterizing respiratory flow cycle morphology in patients with CHF. Thus, our primary goal is to detect PB pattern and identify patients at higher risk. In addition, differences in respiratory flow cycle morphology between CHF patients (with and without PB) and healthy subjects are studied. Differences between these parameters are assessed by investigating the following three classification issues: CHF patients with PB versus with non-periodic breathing (nPB), CHF patients (both PB and nPB) versus healthy subjects, and nPB patients versus healthy subjects. Twenty-six CHF patients (8/18 with PB/nPB) and 35 healthy subjects are studied. The results show that the maximal expiratory flow interval is shorter and with lower dispersion in CHF patients than in healthy subjects. The flow slopes are much steeper in CHF patients, especially for PB. Both inspiration and expiration durations are reduced in CHF patients, mostly for PB. Using the classification and regression tree technique, the most discriminant parameters are selected. For signals shorter than 1 min, the time domain parameters produce better results than the spectral parameters, with accuracies for each classification of 82/78, 89/85, and 91/89 %, respectively. It is concluded that morphologic analysis in the time domain is useful, especially when short signals are analyzed.
Fabrication and Characterization of Flexible Organic Light Emitting Diodes Based on Transparent Flexible Clay Substrates

NASA Astrophysics Data System (ADS)

Venkatachalam, Shanmugam; Hayashi, Hiromichi; Ebina, Takeo; Nakamura, Takashi; Nanjo, Hiroshi

2013-03-01

In the present work, transparent flexible polymer-doped clay (P-clay) substrates were prepared for flexible organic light emitting diode (OLED) applications. Nanocrystalline indium tin oxide (ITO) thin films were prepared on P-clay substrates by ion-beam sputter deposition method. The structural, optical, and electrical properties of as-prepared ITO/P-clay showed that the as-prepared ITO thin film was amorphous, and the average optical transparency and sheet resistance were around 84% and 56 Ω/square, respectively. The as-prepared ITO/P-clay samples were annealed at 200 and 270 °C for 1 h to improve the optical transparency and electrical conductivity. The average optical transparency was found to be maximum at an annealing temperature of 200 °C. Finally, N,N-bis[(1-naphthyl)-N,N '-diphenyl]-1,1'-biphenyl)-4,4'-diamine (NPB), tris(8-hydroxyquinoline) aluminum (Alq3) thin films, and aluminum (Al) electrode were prepared on ITO/P-clay substrates by thermal evaporation method. The current density-voltage (J-V) characteristic of Al/NPB/ITO/P-clay showed linear Ohmic behaviour. In contrast, J-V characteristic of Al/Alq3/NPB/ITO/P-clay showed non-linear Schottky behaviour. Finally, a very flexible OLED was successfully fabricated on newly fabricated transparent flexible P-clay substrates. The electroluminescence study showed that the emission intensity of light from the flexible OLED device gradually increased with increasing applied voltage.
Conserved gene regulatory module specifies lateral neural borders across bilaterians

PubMed Central

Li, Yongbin; Zhao, Di; Horie, Takeo; Chen, Geng; Bao, Hongcun; Chen, Siyu; Liu, Weihong; Horie, Ryoko; Liang, Tao; Dong, Biyu; Feng, Qianqian; Tao, Qinghua

2017-01-01

The lateral neural plate border (NPB), the neural part of the vertebrate neural border, is composed of central nervous system (CNS) progenitors and peripheral nervous system (PNS) progenitors. In invertebrates, PNS progenitors are also juxtaposed to the lateral boundary of the CNS. Whether there are conserved molecular mechanisms determining vertebrate and invertebrate lateral neural borders remains unclear. Using single-cell-resolution gene-expression profiling and genetic analysis, we present evidence that orthologs of the NPB specification module specify the invertebrate lateral neural border, which is composed of CNS and PNS progenitors. First, like in vertebrates, the conserved neuroectoderm lateral border specifier Msx/vab-15 specifies lateral neuroblasts in Caenorhabditis elegans. Second, orthologs of the vertebrate NPB specification module (Msx/vab-15, Pax3/7/pax-3, and Zic/ref-2) are significantly enriched in worm lateral neuroblasts. In addition, like in other bilaterians, the expression domain of Msx/vab-15 is more lateral than those of Pax3/7/pax-3 and Zic/ref-2 in C. elegans. Third, we show that Msx/vab-15 regulates the development of mechanosensory neurons derived from lateral neural progenitors in multiple invertebrate species, including C. elegans, Drosophila melanogaster, and Ciona intestinalis. We also identify a novel lateral neural border specifier, ZNF703/tlp-1, which functions synergistically with Msx/vab-15 in both C. elegans and Xenopus laevis. These data suggest a common origin of the molecular mechanism specifying lateral neural borders across bilaterians. PMID:28716930
Conserved gene regulatory module specifies lateral neural borders across bilaterians.

PubMed

Li, Yongbin; Zhao, Di; Horie, Takeo; Chen, Geng; Bao, Hongcun; Chen, Siyu; Liu, Weihong; Horie, Ryoko; Liang, Tao; Dong, Biyu; Feng, Qianqian; Tao, Qinghua; Liu, Xiao

2017-08-01

The lateral neural plate border (NPB), the neural part of the vertebrate neural border, is composed of central nervous system (CNS) progenitors and peripheral nervous system (PNS) progenitors. In invertebrates, PNS progenitors are also juxtaposed to the lateral boundary of the CNS. Whether there are conserved molecular mechanisms determining vertebrate and invertebrate lateral neural borders remains unclear. Using single-cell-resolution gene-expression profiling and genetic analysis, we present evidence that orthologs of the NPB specification module specify the invertebrate lateral neural border, which is composed of CNS and PNS progenitors. First, like in vertebrates, the conserved neuroectoderm lateral border specifier Msx/vab-15 specifies lateral neuroblasts in Caenorhabditis elegans Second, orthologs of the vertebrate NPB specification module ( Msx/vab-15 , Pax3/7/pax-3 , and Zic/ref-2 ) are significantly enriched in worm lateral neuroblasts. In addition, like in other bilaterians, the expression domain of Msx/vab-15 is more lateral than those of Pax3/7/pax-3 and Zic/ref- 2 in C. elegans Third, we show that Msx/vab-15 regulates the development of mechanosensory neurons derived from lateral neural progenitors in multiple invertebrate species, including C. elegans , Drosophila melanogaster , and Ciona intestinalis We also identify a novel lateral neural border specifier, ZNF703/tlp-1 , which functions synergistically with Msx/vab- 15 in both C. elegans and Xenopus laevis These data suggest a common origin of the molecular mechanism specifying lateral neural borders across bilaterians.
MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

NASA Astrophysics Data System (ADS)

Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

2018-02-01

We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.

An OpenACC-Based Unified Programming Model for Multi-accelerator Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S

2015-01-01

This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
A conservative approach to parallelizing the Sharks World simulation

NASA Technical Reports Server (NTRS)

Nicol, David M.; Riffe, Scott E.

1990-01-01

Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes.
Ion-pair extraction of multi-OH compounds by complexation with organoboronate

DOE Office of Scientific and Technical Information (OSTI.GOV)

Randel, L.A.; Chow, T.K.F.; King, C.J.

1994-08-01

Ion-pair extraction with organoboronate has been investigated as a regenerable means of removal and recovery of multi -OH compounds from aqueous solution. The extractant utilized was 3-nitrophenylboronate (NPB[sup [minus
Parallel tempering for the traveling salesman problem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Percus, Allon; Wang, Richard; Hyman, Jeffrey

We explore the potential of parallel tempering as a combinatorial optimization method, applying it to the traveling salesman problem. We compare simulation results of parallel tempering with a benchmark implementation of simulated annealing, and study how different choices of parameters affect the relative performance of the two methods. We find that a straightforward implementation of parallel tempering can outperform simulated annealing in several crucial respects. When parameters are chosen appropriately, both methods yield close approximation to the actual minimum distance for an instance with 200 nodes. However, parallel tempering yields more consistently accurate results when a series of independent simulationsmore » are performed. Our results suggest that parallel tempering might offer a simple but powerful alternative to simulated annealing for combinatorial optimization problems.« less
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

PubMed Central

Nadkarni, P. M.; Miller, P. L.

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632
Hybrid permeable metal-base transistor with large common-emitter current gain and low operational voltage.

PubMed

Feng, Chengang; Yi, Mingdong; Yu, Shunyang; Hümmelgen, Ivo A; Zhang, Tong; Ma, Dongge

2008-04-01

We demonstrate the suitability of N,N'-diphenyl-N,N'-bis(1-naphthylphenyl)-1,1'-biphenyl-4,4'-diamine (NPB), an organic semiconductor widely used in organic light-emitting diodes (OLEDs), for high-gain, low operational voltage nanostructured vertical-architecture transistors, which operate as permeable-base transistors. By introducing vanadium oxide (V2O5) between the injecting metal and NPB layer at the transistor emitter, we reduced the emitter operational voltage. The addition of two Ca layers, leading to a Ca/Ag/Ca base, allowed to obtain a large value of common-emitter current gain, but still retaining the permeable-base transistor character. This kind of vertical devices produced by simple technologies offer attractive new possibilities due to the large variety of available molecular semiconductors, opening the possibility of incorporating new functionalities in silicon-based devices.
GTA: The NPB legacy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schneider, J.D.

1994-12-31

Technical developments on the neutral particle beam (NPB) program over a period of 18 years led to significant developments in accelerator technology. Many of these state-of-the-art technologies were integrated into the Ground Test Accelerator (GTA). GTA beam experiments were completed on components and systems that included the ion source through low-energy DTL modules. Provisions for beam funneling, matching, cryogenic (20 K) operation, detailed transverse and longitudinal beam characterization, combined with state-of-the-art accelerator and rf controls made this GTA system unique. The authors will summarize the types and magnitudes of these technology advances that culminated in the fabrication of the 24more » MeV front end of the GTA. A number of highly instrumented beam experiments at several stages validated the innovative designs. Applications of GTA-developed technology to several new accelerators will highlight the practical benefits of the GTA technology integration.« less
Highly efficient phosphorescent organic light-emitting diode with a nanometer-thick Ni silicide/polycrystalline p-Si composite anode.

PubMed

Li, Y Z; Wang, Z L; Luo, H; Wang, Y Z; Xu, W J; Ran, G Z; Qin, G G; Zhao, W Q; Liu, H

2010-07-19

A phosphorescent organic light-emitting diode (PhOLED) with a nanometer-thick (approximately 10 nm) Ni silicide/ polycrystalline p-Si composite anode is reported. The structure of the PhOLED is Al mirror/ glass substrate / Si isolation layer / Ni silicide / polycrystalline p-Si/ V(2)O(5)/ NPB/ CBP: (ppy)(2)Ir(acac)/ Bphen/ Bphen: Cs(2)CO(3)/ Sm/ Au/ BCP. In the composite anode, the Ni-induced polycrystalline p-Si layer injects holes into the V(2)O(5)/ NPB, and the Ni silicide layer reduces the sheet resistance of the composite anode and thus the series resistance of the PhOLED. By adopting various measures for specially optimizing the thickness of the Ni layer, which induces Si crystallization and forms a Ni silicide layer of appropriate thickness, the highest external quantum efficiency and power conversion efficiency have been raised to 26% and 11%, respectively.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
Rubus: A compiler for seamless and extensible parallelism.

PubMed

Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.
Rubus: A compiler for seamless and extensible parallelism

PubMed Central

Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758
Resistance training reduces whole-body protein turnover and improves net protein retention in untrained young males.

PubMed

Hartman, Joseph W; Moore, Daniel R; Phillips, Stuart M

2006-10-01

It is thought that resistance exercise results in an increased need for dietary protein; however, data also exists to support the opposite conclusion. The purpose of this study was to determine the impact of resistance exercise training on protein metabolism in novices with the hypothesis that resistance training would reduce protein turnover and improve whole-body protein retention. Healthy males (n = 8, 22 +/- 1 y, BMI = 25.3 +/- 1.8 kg.m(-2)) participated in a progressive whole-body split routine resistance-training program 5d/week for 12 weeks. Before (PRE) and after (POST) the training, oral [15N]-glycine ingestion was used to assess nitrogen flux (Q), protein synthesis (PS), protein breakdown (PB), and net protein balance (NPB = PS-PB). Macronutrient intake was controlled over a 5d period PRE and POST, while estimates of protein turnover and urinary nitrogen balance (N(bal) = N(in) - urine N(out)) were conducted. Bench press and leg press increased 40% and 50%, respectively (p < 0.01). Fat- and bone-free mass (i.e., lean muscle mass) increased from PRE to POST (2.5 +/- 0.8 kg, p < 0.05). Significant PRE to POST decreases (p <0.05) occurred in Q (0.9 +/- 0.1 vs. 0.6 +/- 0.1 g N.kg(-1).d(-1)), PS (4.6 +/- 0.7 vs. 2.9 +/- 0.3 g.kg(-1).d(-1)), and PB (4.3 +/- 0.7 vs. 2.4 +/- 0.2 g.kg(-1).d(-1)). Significant training-induced increases in both NPB (PRE = 0.22 +/- 0.13 g.kg(-1).d(-1); POST = 0.54 +/- 0.08 g.kg(-1).d(-1)) and urinary nitrogen balance (PRE = 2.8 +/- 1.7 g N.d(-1); POST = 6.5 +/- 0.9 g N.d(-1)) were observed. A program of resistance training that induced significant muscle hypertrophy resulted in reductions of both whole-body PS and PB, but an improved NPB, which favoured the accretion of skeletal muscle protein. Urinary nitrogen balance increased after training. The reduction in PS and PB and a higher NPB in combination with an increased nitrogen balance after training suggest that dietary requirements for protein in novice resistance-trained athletes are not higher, but lower, after resistance training.
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

NASA Technical Reports Server (NTRS)

Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
AdiosStMan: Parallelizing Casacore Table Data System using Adaptive IO System

NASA Astrophysics Data System (ADS)

Wang, R.; Harris, C.; Wicenec, A.

2016-07-01

In this paper, we investigate the Casacore Table Data System (CTDS) used in the casacore and CASA libraries, and methods to parallelize it. CTDS provides a storage manager plugin mechanism for third-party developers to design and implement their own CTDS storage managers. Having this in mind, we looked into various storage backend techniques that can possibly enable parallel I/O for CTDS by implementing new storage managers. After carrying on benchmarks showing the excellent parallel I/O throughput of the Adaptive IO System (ADIOS), we implemented an ADIOS based parallel CTDS storage manager. We then applied the CASA MSTransform frequency split task to verify the ADIOS Storage Manager. We also ran a series of performance tests to examine the I/O throughput in a massively parallel scenario.
Use of normal propyl bromide solvents for extraction and recovery of asphalt cements

DOT National Transportation Integrated Search

2000-11-01

Four normal propyl bromide (nPB) solvents were evaluated for use as chlorinated solvent replacements in typical hot mix asphalt (HMA) extraction and recovery processes. The experimental design included one method of extraction (centrifuge), one metho...
PCLIPS: Parallel CLIPS

NASA Technical Reports Server (NTRS)

Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

1994-01-01

A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.
Nonlinear viscoplasticity in ASPECT: benchmarking and applications to subduction

NASA Astrophysics Data System (ADS)

Glerum, Anne; Thieulot, Cedric; Fraters, Menno; Blom, Constantijn; Spakman, Wim

2018-03-01

ASPECT (Advanced Solver for Problems in Earth's ConvecTion) is a massively parallel finite element code originally designed for modeling thermal convection in the mantle with a Newtonian rheology. The code is characterized by modern numerical methods, high-performance parallelism and extensibility. This last characteristic is illustrated in this work: we have extended the use of ASPECT from global thermal convection modeling to upper-mantle-scale applications of subduction.
Subduction modeling generally requires the tracking of multiple materials with different properties and with nonlinear viscous and viscoplastic rheologies. To this end, we implemented a frictional plasticity criterion that is combined with a viscous diffusion and dislocation creep rheology. Because ASPECT uses compositional fields to represent different materials, all material parameters are made dependent on a user-specified number of fields.
The goal of this paper is primarily to describe and verify our implementations of complex, multi-material rheology by reproducing the results of four well-known two-dimensional benchmarks: the indentor benchmark, the brick experiment, the sandbox experiment and the slab detachment benchmark. Furthermore, we aim to provide hands-on examples for prospective users by demonstrating the use of multi-material viscoplasticity with three-dimensional, thermomechanical models of oceanic subduction, putting ASPECT on the map as a community code for high-resolution, nonlinear rheology subduction modeling.
Performance and scalability evaluation of "Big Memory" on Blue Gene Linux.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoshii, K.; Iskra, K.; Naik, H.

2011-05-01

We address memory performance issues observed in Blue Gene Linux and discuss the design and implementation of 'Big Memory' - an alternative, transparent memory space introduced to eliminate the memory performance issues. We evaluate the performance of Big Memory using custom memory benchmarks, NAS Parallel Benchmarks, and the Parallel Ocean Program, at a scale of up to 4,096 nodes. We find that Big Memory successfully resolves the performance issues normally encountered in Blue Gene Linux. For the ocean simulation program, we even find that Linux with Big Memory provides better scalability than does the lightweight compute node kernel designed solelymore » for high-performance applications. Originally intended exclusively for compute node tasks, our new memory subsystem dramatically improves the performance of certain I/O node applications as well. We demonstrate this performance using the central processor of the LOw Frequency ARray radio telescope as an example.« less
ComprehensiveBench: a Benchmark for the Extensive Evaluation of Global Scheduling Algorithms

NASA Astrophysics Data System (ADS)

Pilla, Laércio L.; Bozzetti, Tiago C.; Castro, Márcio; Navaux, Philippe O. A.; Méhaut, Jean-François

2015-10-01

Parallel applications that present tasks with imbalanced loads or complex communication behavior usually do not exploit the underlying resources of parallel platforms to their full potential. In order to mitigate this issue, global scheduling algorithms are employed. As finding the optimal task distribution is an NP-Hard problem, identifying the most suitable algorithm for a specific scenario and comparing algorithms are not trivial tasks. In this context, this paper presents ComprehensiveBench, a benchmark for global scheduling algorithms that enables the variation of a vast range of parameters that affect performance. ComprehensiveBench can be used to assist in the development and evaluation of new scheduling algorithms, to help choose a specific algorithm for an arbitrary application, to emulate other applications, and to enable statistical tests. We illustrate its use in this paper with an evaluation of Charm++ periodic load balancers that stresses their characteristics.
Implementation of ADI: Schemes on MIMD parallel computers

NASA Technical Reports Server (NTRS)

Vanderwijngaart, Rob F.

1993-01-01

In order to simulate the effects of the impingement of hot exhaust jets of High Performance Aircraft on landing surfaces a multi-disciplinary computation coupling flow dynamics to heat conduction in the runway needs to be carried out. Such simulations, which are essentially unsteady, require very large computational power in order to be completed within a reasonable time frame of the order of an hour. Such power can be furnished by the latest generation of massively parallel computers. These remove the bottleneck of ever more congested data paths to one or a few highly specialized central processing units (CPU's) by having many off-the-shelf CPU's work independently on their own data, and exchange information only when needed. During the past year the first phase of this project was completed, in which the optimal strategy for mapping an ADI-algorithm for the three dimensional unsteady heat equation to a MIMD parallel computer was identified. This was done by implementing and comparing three different domain decomposition techniques that define the tasks for the CPU's in the parallel machine. These implementations were done for a Cartesian grid and Dirichlet boundary conditions. The most promising technique was then used to implement the heat equation solver on a general curvilinear grid with a suite of nontrivial boundary conditions. Finally, this technique was also used to implement the Scalar Penta-diagonal (SP) benchmark, which was taken from the NAS Parallel Benchmarks report. All implementations were done in the programming language C on the Intel iPSC/860 computer.

Supercomputing '91; Proceedings of the 4th Annual Conference on High Performance Computing, Albuquerque, NM, Nov. 18-22, 1991

NASA Technical Reports Server (NTRS)

1991-01-01

Various papers on supercomputing are presented. The general topics addressed include: program analysis/data dependence, memory access, distributed memory code generation, numerical algorithms, supercomputer benchmarks, latency tolerance, parallel programming, applications, processor design, networks, performance tools, mapping and scheduling, characterization affecting performance, parallelism packaging, computing climate change, combinatorial algorithms, hardware and software performance issues, system issues. (No individual items are abstracted in this volume)
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
Heterogeneous Distributed Computing for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Sunderam, Vaidy S.

1998-01-01

The research supported under this award focuses on heterogeneous distributed computing for high-performance applications, with particular emphasis on computational aerosciences. The overall goal of this project was to and investigate issues in, and develop solutions to, efficient execution of computational aeroscience codes in heterogeneous concurrent computing environments. In particular, we worked in the context of the PVM[1] system and, subsequent to detailed conversion efforts and performance benchmarking, devising novel techniques to increase the efficacy of heterogeneous networked environments for computational aerosciences. Our work has been based upon the NAS Parallel Benchmark suite, but has also recently expanded in scope to include the NAS I/O benchmarks as specified in the NHT-1 document. In this report we summarize our research accomplishments under the auspices of the grant.
Genetic Parallel Programming: design and implementation.

PubMed

Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

2006-01-01

This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
Research on computer systems benchmarking

NASA Technical Reports Server (NTRS)

Smith, Alan Jay (Principal Investigator)

1996-01-01

This grant addresses the topic of research on computer systems benchmarking and is more generally concerned with performance issues in computer systems. This report reviews work in those areas during the period of NASA support under this grant. The bulk of the work performed concerned benchmarking and analysis of CPUs, compilers, caches, and benchmark programs. The first part of this work concerned the issue of benchmark performance prediction. A new approach to benchmarking and machine characterization was reported, using a machine characterizer that measures the performance of a given system in terms of a Fortran abstract machine. Another report focused on analyzing compiler performance. The performance impact of optimization in the context of our methodology for CPU performance characterization was based on the abstract machine model. Benchmark programs are analyzed in another paper. A machine-independent model of program execution was developed to characterize both machine performance and program execution. By merging these machine and program characterizations, execution time can be estimated for arbitrary machine/program combinations. The work was continued into the domain of parallel and vector machines, including the issue of caches in vector processors and multiprocessors. All of the afore-mentioned accomplishments are more specifically summarized in this report, as well as those smaller in magnitude supported by this grant.
Interfacing Computer Aided Parallelization and Performance Analysis

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Biegel, Bryan A. (Technical Monitor)

2003-01-01

When porting sequential applications to parallel computer architectures, the program developer will typically go through several cycles of source code optimization and performance analysis. We have started a project to develop an environment where the user can jointly navigate through program structure and performance data information in order to make efficient optimization decisions. In a prototype implementation we have interfaced the CAPO computer aided parallelization tool with the Paraver performance analysis tool. We describe both tools and their interface and give an example for how the interface helps within the program development cycle of a benchmark code.
Scalability and Portability of Two Parallel Implementations of ADI

NASA Technical Reports Server (NTRS)

Phung, Thanh; VanderWijngaart, Rob F.

1994-01-01

Two domain decompositions for the implementation of the NAS Scalar Penta-diagonal Parallel Benchmark on MIMD systems are investigated, namely transposition and multi-partitioning. Hardware platforms considered are the Intel iPSC/860 and Paragon XP/S-15, and clusters of SGI workstations on ethernet, communicating through PVM. It is found that the multi-partitioning strategy offers the kind of coarse granularity that allows scaling up to hundreds of processors on a massively parallel machine. Moreover, efficiency is retained when the code is ported verbatim (save message passing syntax) to a PVM environment on a modest size cluster of workstations.
Analysis of the structure and the FT-IR and Raman spectra of 2-(4-nitrophenyl)-4H-3,1-benzoxazin-4-one. Comparisons with the chlorinated and methylated derivatives

NASA Astrophysics Data System (ADS)

Castillo, María V.; Rudyk, Roxana A.; Davies, Lilian; Brandán, Silvia Antonia

2017-07-01

In this work, the structural, topological and vibrational properties of the monomer and three dimers of the 2-(4-nitrophenyl)-4H-3,1-benzoxazin-4-one (NPB) derivative were studied combining the experimental FTIR and FT-Raman spectra in the solid phase with DFT calculations. Here, Natural Bond Orbital (NBO), Atoms in Molecules (AIM) and HOMO and LUMO calculations were performed by using the hybrid B3LYP/6-31G*and B3LYP/6-311++G** methods in order to compute those properties and to predict their reactivities. The comparisons with the properties reported for the chlorinated (Cl-PB) and methylated (CH3-PB) derivatives at the same levels of theory can be clearly justified by the activating (CH3) and deactivating (NO2 and Cl) characteristics of the different groups linked to oxaxin rings. The NBO and AIM studies evidence the following stability orders: Cl-PB > NO2-PB > CH3-PB in very good concordance with the f(νC23-X26) force constants values. The frontier orbitals analyses reveal that the Cl-PB and NO2-PB derivatives have good stabilities and high chemical hardness while CH3-PB has a higher chemical reactivity. On the other hand, the complete vibrational assignments for monomer and dimers species of NPB were presented. The presence of the IR bands at 1574 and 1037 cm-1 and, of the Raman bands at 1571 and 1038 cm-1 support clearly the presence of the different dimeric species proposed for NPB.
Analysis of the Electrical Properties of an Electron Injection Layer in Alq3-Based Organic Light Emitting Diodes.

PubMed

Kim, Soonkon; Choi, Pyungho; Kim, Sangsub; Park, Hyoungsun; Baek, Dohyun; Kim, Sangsoo; Choi, Byoungdeog

2016-05-01

We investigated the carrier transfer and luminescence characteristics of organic light emitting diodes (OLEDs) with structure ITO/HAT-CN/NPB/Alq3/Al, ITO/HAT-CN/NPB/Alq3/Liq/Al, and ITO/HAT-CN/NPB/Alq3/LiF/A. The performance of the OLED device is improved by inserting an electron injection layer (EIL), which induces lowering of the electron injection barrier. We also investigated the electrical transport behaviors of p-Si/Alq3/Al, p-Si/Alq3/Liq/Al, and p-Si/Alq3/LiF/Al Schottky diodes, by using current-voltage (L-V) and capacitance-voltage (C-V) characterization methods. The parameters of diode quality factor n and barrier height φ(b) were dependent on the interlayer materials between Alq3 and Al. The barrier heights φ(b) were 0.59, 0.49, and 0.45 eV, respectively, and the diode quality factors n were 1.34, 1.31, and 1.30, respectively, obtained from the I-V characteristics. The built in potentials V(bi) were 0.41, 0.42, and 0.42 eV, respectively, obtained from the C-V characteristics. In this experiment, Liq and LiF thin film layers improved the carrier transport behaviors by increasing electron injection from Al to Alq3, and the LiF schottky diode showed better I-V performance than the Liq schottky diode. We confirmed that a Liq or LiF thin film inter-layer governs electron and hole transport at the Al/Alq3 interface, and has an important role in determining the electrical properties of OLED devices.
Highly efficient organic electroluminescent diodes realized by efficient charge balance with optimized electron and hole transport layers

NASA Astrophysics Data System (ADS)

Khan, M. A.; Xu, Wei; Wei, Fuxiang; Bai, Yu; Jiang, X. Y.; Zhang, Z. L.; Zhu, W. Q.

2007-11-01

Highly efficient organic electroluminescent devices (OLEDs) were developed based on 4,7-diphenyl-1, 10-phenanthroline (BPhen) as the electron transport layer (ETL), tris-(8-hydroxyquinoline) aluminum (Alq 3) as the emission layer (EML) and N,Ń-bis-[1-naphthy(-N,Ńdiphenyl-1,1'-biphenyl-4,4'-diamine)] (NPB) as the hole transport layer (HTL). The typical device structure was glass substrate/ ITO/ NPB/ Alq 3/ BPhen/ LiF/ Al. Since BPhen possesses a considerable high electron mobility of 5×10 -4 cm 2 V -1 s -1, devices with BPhen as ETL can realize an extremely high luminous efficiency. By optimizing the thickness of both HTL and ETL, we obtained a highly efficient OLED with a current efficiency of 6.80 cd/A and luminance of 1361 cd/m 2 at a current density of 20 mA/cm 2. This dramatic improvement in the current efficiency has been explained on the principle of charge balance.
Stoichiometric and Oxygen-Deficient VO2 as Versatile Hole Injection Electrode for Organic Semiconductors.

PubMed

Fu, Keke; Wang, Rongbin; Katase, Takayoshi; Ohta, Hiromichi; Koch, Norbert; Duhm, Steffen

2018-03-28

Using photoemission spectroscopy, we show that the surface electronic structure of VO 2 is determined by the temperature-dependent metal-insulator phase transition and the density of oxygen vacancies, which depends on the temperature and ultrahigh vacuum (UHV) conditions. The atomically clean and stoichiometric VO 2 surface is insulating at room temperature and features an ultrahigh work function of up to 6.7 eV. Heating in UHV just above the phase transition temperature induces the expected metallic phase, which goes in hand with the formation of oxygen defects (up to 6% in this study), but a high work function >6 eV is maintained. To demonstrate the suitability of VO 2 as hole injection contact for organic semiconductors, we investigated the energy-level alignment with the prototypical organic hole transport material N, N'-di(1-naphthyl)- N, N'-diphenyl-(1,1'-biphenyl)-4,4'-diamine (NPB). Evidence for strong Fermi-level pinning and the associated energy-level bending in NPB is found, rendering an Ohmic contact for holes.
Dual beam organic depth profiling using large argon cluster ion beams

PubMed Central

Holzweber, M; Shard, AG; Jungnickel, H; Luch, A; Unger, WES

2014-01-01

Argon cluster sputtering of an organic multilayer reference material consisting of two organic components, 4,4′-bis[N-(1-naphthyl-1-)-N-phenyl- amino]-biphenyl (NPB) and aluminium tris-(8-hydroxyquinolate) (Alq3), materials commonly used in organic light-emitting diodes industry, was carried out using time-of-flight SIMS in dual beam mode. The sample used in this study consists of a ∽400-nm-thick NPB matrix with 3-nm marker layers of Alq3 at depth of ∽50, 100, 200 and 300 nm. Argon cluster sputtering provides a constant sputter yield throughout the depth profiles, and the sputter yield volumes and depth resolution are presented for Ar-cluster sizes of 630, 820, 1000, 1250 and 1660 atoms at a kinetic energy of 2.5 keV. The effect of cluster size in this material and over this range is shown to be negligible. © 2014 The Authors. Surface and Interface Analysis published by John Wiley & Sons Ltd. PMID:25892830
Improved hole-injection and power efficiency of organic light-emitting diodes using an ultrathin cerium fluoride buffer layer

NASA Astrophysics Data System (ADS)

Lu, Hsin-Wei; Kao, Po-Ching; Chu, Sheng-Yuan

2016-09-01

In this study, the efficiency of organic light-emitting diodes (OLEDs) was enhanced by depositing a CeF3 film as an ultra-thin buffer layer between the ITO and NPB hole transport layer, with the structure configuration ITO/CeF3 (1 nm)/NPB (40 nm)/Alq3 (60 nm)/LiF (1 nm)/Al (150 nm). The enhancement mechanism was systematically investigated via several approaches. The work function increased from 4.8 eV (standard ITO electrode) to 5.2 eV (1-nm-thick UV-ozone treated CeF3 film deposited on the ITO electrode). The turn-on voltage decreased from 4.2 V to 4.0 V at 1 mA/cm2, the luminance increased from 7588 cd/m2 to 10820 cd/m2, and the current efficiency increased from 3.2 cd/A to 3.5 cd/A when the 1-nm-thick UV-ozone treated CeF3 film was inserted into the OLEDs.
Rhizophagus irregularis MUCL 41833 transitorily reduces tomato bacterial wilt incidence caused by Ralstonia solanacearum under in vitro conditions.

PubMed

Chave, Marie; Crozilhac, Patrice; Deberdt, Péninna; Plouznikoff, Katia; Declerck, Stéphane

2017-10-01

Bacterial wilt caused by Ralstonia solanacearum is one of the world's most important soil-borne plant diseases. In Martinique, French West Indies, a highly virulent new pathogenic variant of this bacterium (phylotype IIB/4NPB) severely impacts tomato production. Here we report on the effect of R. solanacearum CFBP 6783, classified in phytotype IIB/4NPB, on tomato plantlets grown under strict in vitro culture conditions in the presence or absence of the arbuscular mycorrhizal fungus Rhizophagus irregularis MUCL 41833. A mycelium donor plant (i.e. Crotalaria spectabilis) was used for rapid, uniform mycorrhization of tomato plantlets that were subsequently infected by the bacterium. Bacterial wilt was significantly delayed and the incidence of the disease consequently reduced in the mycorrhizal tomato plantlets. Conversely, R. solanacearum did not affect root colonization by the AMF within the 16 days of the experiment. These results suggested that the mycorrhizal fungus was able to reduce bacterial wilt symptoms, probably by eliciting defence mechanisms in the plant.
High-performance tandem organic light-emitting diodes based on a buffer-modified p/n-type planar organic heterojunction as charge generation layer

NASA Astrophysics Data System (ADS)

Wu, Yukun; Sun, Ying; Qin, Houyun; Hu, Shoucheng; Wu, Qingyang; Zhao, Yi

2017-04-01

High-performance tandem organic light-emitting diodes (TOLEDs) were realized using a buffer-modified p/n-type planar organic heterojunction (OHJ) as charge generation layer (CGL) consisting of common organic materials, and the configuration of this p/n-type CGL was "LiF/N,N'-diphenyl-N,N'-bis(1-napthyl)-1,1'-biphenyl-4,4'-diamine (NPB)/4,7-diphenyl-1,10-phenanthroline (Bphen)/molybdenum oxide (MoOx)". The optimized TOLED exhibited a maximum current efficiency of 77.6 cd/A without any out-coupling techniques, and the efficiency roll-off was greatly improved compared to the single-unit OLED. The working mechanism of the p/n-type CGL was discussed in detail. It is found that the NPB/Bphen heterojunction generated enough charges under a forward applied voltage and the carrier extraction was a tunneling process. These results could provide a new method to fabricate high-performance TOLEDs.
Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster

NASA Technical Reports Server (NTRS)

Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland

2003-01-01

In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Parareal in time 3D numerical solver for the LWR Benchmark neutron diffusion transient model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baudron, Anne-Marie, E-mail: anne-marie.baudron@cea.fr; CEA-DRN/DMT/SERMA, CEN-Saclay, 91191 Gif sur Yvette Cedex; Lautard, Jean-Jacques, E-mail: jean-jacques.lautard@cea.fr

2014-12-15

In this paper we present a time-parallel algorithm for the 3D neutrons calculation of a transient model in a nuclear reactor core. The neutrons calculation consists in numerically solving the time dependent diffusion approximation equation, which is a simplified transport equation. The numerical resolution is done with finite elements method based on a tetrahedral meshing of the computational domain, representing the reactor core, and time discretization is achieved using a θ-scheme. The transient model presents moving control rods during the time of the reaction. Therefore, cross-sections (piecewise constants) are taken into account by interpolations with respect to the velocity ofmore » the control rods. The parallelism across the time is achieved by an adequate use of the parareal in time algorithm to the handled problem. This parallel method is a predictor corrector scheme that iteratively combines the use of two kinds of numerical propagators, one coarse and one fine. Our method is made efficient by means of a coarse solver defined with large time step and fixed position control rods model, while the fine propagator is assumed to be a high order numerical approximation of the full model. The parallel implementation of our method provides a good scalability of the algorithm. Numerical results show the efficiency of the parareal method on large light water reactor transient model corresponding to the Langenbuch–Maurer–Werner benchmark.« less
SU-E-T-466: Implementation of An Extension Module for Dose Response Models in the TOPAS Monte Carlo Toolkit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ramos-Mendez, J; Faddegon, B; Perl, J

2015-06-15

Purpose: To develop and verify an extension to TOPAS for calculation of dose response models (TCP/NTCP). TOPAS wraps and extends Geant4. Methods: The TOPAS DICOM interface was extended to include structure contours, for subsequent calculation of DVH’s and TCP/NTCP. The following dose response models were implemented: Lyman-Kutcher-Burman (LKB), critical element (CE), population based critical volume (CV), parallel-serials, a sigmoid-based model of Niemierko for NTCP and TCP, and a Poisson-based model for TCP. For verification, results for the parallel-serial and Poisson models, with 6 MV x-ray dose distributions calculated with TOPAS and Pinnacle v9.2, were compared to data from the benchmarkmore » configuration of the AAPM Task Group 166 (TG166). We provide a benchmark configuration suitable for proton therapy along with results for the implementation of the Niemierko, CV and CE models. Results: The maximum difference in DVH calculated with Pinnacle and TOPAS was 2%. Differences between TG166 data and Monte Carlo calculations of up to 4.2%±6.1% were found for the parallel-serial model and up to 1.0%±0.7% for the Poisson model (including the uncertainty due to lack of knowledge of the point spacing in TG166). For CE, CV and Niemierko models, the discrepancies between the Pinnacle and TOPAS results are 74.5%, 34.8% and 52.1% when using 29.7 cGy point spacing, the differences being highly sensitive to dose spacing. On the other hand, with our proposed benchmark configuration, the largest differences were 12.05%±0.38%, 3.74%±1.6%, 1.57%±4.9% and 1.97%±4.6% for the CE, CV, Niemierko and LKB models, respectively. Conclusion: Several dose response models were successfully implemented with the extension module. Reference data was calculated for future benchmarking. Dose response calculated for the different models varied much more widely for the TG166 benchmark than for the proposed benchmark, which had much lower sensitivity to the choice of DVH dose points. This work was supported by National Cancer Institute Grant R01CA140735.« less
Accelerating finite-rate chemical kinetics with coprocessors: Comparing vectorization methods on GPUs, MICs, and CPUs

NASA Astrophysics Data System (ADS)

Stone, Christopher P.; Alferman, Andrew T.; Niemeyer, Kyle E.

2018-05-01

Accurate and efficient methods for solving stiff ordinary differential equations (ODEs) are a critical component of turbulent combustion simulations with finite-rate chemistry. The ODEs governing the chemical kinetics at each mesh point are decoupled by operator-splitting allowing each to be solved concurrently. An efficient ODE solver must then take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and a nonstiff Runge-Kutta ODE solver are both implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms within OpenCL. Both methods solve multiple ODEs concurrently within the same instruction stream. The performance of these parallel implementations was measured on three chemical kinetic models of increasing size across several multicore and many-core platforms. Two separate benchmarks were conducted to clearly determine any performance advantage offered by either method. The first benchmark measured the run-time of evaluating the right-hand-side source terms in parallel and the second benchmark integrated a series of constant-pressure, homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded C++ code. The SIMT parallel model on the host and Phi was 13%-35% slower than the baseline while the SIMT model on the NVIDIA Kepler GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased significantly with the SIMD implementations on the host CPU (2.5-2.7 ×) and Xeon Phi coprocessor (4.7-4.9 ×) compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.5-1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi. The significant performance improvement provided by the SIMD parallel strategy motivates further research into more ODE solver methods that are both SIMD-friendly and computationally efficient.
FX-87 performance measurements: data-flow implementation. Technical report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hammel, R.T.; Gifford, D.K.

1988-11-01

This report documents a series of experiments performed to explore the thesis that the FX-87 effect system permits a compiler to schedule imperative programs (i.e., programs that may contain side-effects) for execution on a parallel computer. The authors analyze how much the FX-87 static effect system can improve the execution times of five benchmark programs on a parallel graph interpreter. Three of their benchmark programs do not use side-effects (factorial, fibonacci, and polynomial division) and thus did not have any effect-induced constraints. Their FX-87 performance was comparable to their performance in a purely functional language. Two of the benchmark programsmore » use side effects (DNA sequence matching and Scheme interpretation) and the compiler was able to use effect information to reduce their execution times by factors of 1.7 to 5.4 when compared with sequential execution times. These results support the thesis that a static effect system is a powerful tool for compilation to multiprocessor computers. However, the graph interpreter we used was based on unrealistic assumptions, and thus our results may not accurately reflect the performance of a practical FX-87 implementation. The results also suggest that conventional loop analysis would complement the FX-87 effect system« less

I/O-Efficient Scientific Computation Using TPIE

NASA Technical Reports Server (NTRS)

Vengroff, Darren Erik; Vitter, Jeffrey Scott

1996-01-01

In recent years, input/output (I/O)-efficient algorithms for a wide variety of problems have appeared in the literature. However, systems specifically designed to assist programmers in implementing such algorithms have remained scarce. TPIE is a system designed to support I/O-efficient paradigms for problems from a variety of domains, including computational geometry, graph algorithms, and scientific computation. The TPIE interface frees programmers from having to deal not only with explicit read and write calls, but also the complex memory management that must be performed for I/O-efficient computation. In this paper we discuss applications of TPIE to problems in scientific computation. We discuss algorithmic issues underlying the design and implementation of the relevant components of TPIE and present performance results of programs written to solve a series of benchmark problems using our current TPIE prototype. Some of the benchmarks we present are based on the NAS parallel benchmarks while others are of our own creation. We demonstrate that the central processing unit (CPU) overhead required to manage I/O is small and that even with just a single disk, the I/O overhead of I/O-efficient computation ranges from negligible to the same order of magnitude as CPU time. We conjecture that if we use a number of disks in parallel this overhead can be all but eliminated.
Use of particle beams for lunar prospecting

NASA Technical Reports Server (NTRS)

Toepfer, A. J.; Eppler, D.; Friedlander, A.; Weitz, R.

1993-01-01

A key issue in choosing the appropriate site for a manned lunar base is the availability of resources, particularly oxygen and hydrogen for the production of water, and ores for the production of fuels and building materials. NASA has proposed two Lunar Scout missions that would orbit the Moon and use, among other instruments, a hard X-ray spectrometer, a neutron spectrometer, and a Ge gamma ray spectrometer to map the lunar surface. This passive instrumentation will have low resolution (tens of kilometers) due to the low signal levels produced by natural radioactivity and the interaction of cosmic rays and the solar wind with the lunar surface. This paper presents the results of a concept definition effort for a neutral particle beam lunar mapper probe. The idea of using particle beam probes to survey asteroids was first proposed by Sagdeev et al., and an ion beam device was fielded on the 1988 Soviet probe to the Mars moon Phobos. During the past five years, significant advances in the technology of neutral particle beams (NPB) have led to a suborbital flight of a neutral hydrogen beam device in the SDIO-sponsored BEAR experiment. An orbital experiment, the Neutral Particle Beam Far Field Optics Experiment (NPB-FOX) is presently in the preliminary design phase. The development of NPB accelerators that are space-operable leads one to consider the utility of these devices for probing the surface of the Moon using gamma ray, X-ray, and optical/UV spectroscopy to locate various elements and compounds. We consider the utility of the NPB-FOX satellite containing a 5-MeV particle beam accelerator as a probe in lunar orbit. Irradiation of the lunar surface by the particle beam will induce secondary and back scattered radiation from the lunar surface to be detected by a sensor that may be co-orbital with or on the particle beam satellite platform, or may be in a separate orbit. The secondary radiation is characteristic of the make-up of the lunar surface. The size of the spot irradiated by the beam is less than 1 km wide along the ground track of the satellite, resulting in the potential for high resolution. The fact that the probe could be placed in polar orbit would result in global coverage of the lunar surface. The orbital particle beam probe could provide the basis for selection of sites for more detailed prospecting by surface rovers.
Parallelization of PANDA discrete ordinates code using spatial decomposition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Humbert, P.

2006-07-01

We present the parallel method, based on spatial domain decomposition, implemented in the 2D and 3D versions of the discrete Ordinates code PANDA. The spatial mesh is orthogonal and the spatial domain decomposition is Cartesian. For 3D problems a 3D Cartesian domain topology is created and the parallel method is based on a domain diagonal plane ordered sweep algorithm. The parallel efficiency of the method is improved by directions and octants pipelining. The implementation of the algorithm is straightforward using MPI blocking point to point communications. The efficiency of the method is illustrated by an application to the 3D-Ext C5G7more » benchmark of the OECD/NEA. (authors)« less
PCTDSE: A parallel Cartesian-grid-based TDSE solver for modeling laser-atom interactions

NASA Astrophysics Data System (ADS)

Fu, Yongsheng; Zeng, Jiaolong; Yuan, Jianmin

2017-01-01

We present a parallel Cartesian-grid-based time-dependent Schrödinger equation (TDSE) solver for modeling laser-atom interactions. It can simulate the single-electron dynamics of atoms in arbitrary time-dependent vector potentials. We use a split-operator method combined with fast Fourier transforms (FFT), on a three-dimensional (3D) Cartesian grid. Parallelization is realized using a 2D decomposition strategy based on the Message Passing Interface (MPI) library, which results in a good parallel scaling on modern supercomputers. We give simple applications for the hydrogen atom using the benchmark problems coming from the references and obtain repeatable results. The extensions to other laser-atom systems are straightforward with minimal modifications of the source code.
A high performance linear equation solver on the VPP500 parallel supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi

1994-12-31

This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.
Lightweight Specifications for Parallel Correctness

DTIC Science & Technology

2012-12-05

Galenson, Benjamin Hindman, Thibaud Hottelier, Pallavi Joshi, Ben- jamin Lipshitz, Leo Meyerovich, Mayur Naik, Chang-Seo Park, and Philip Reames — many...violating executions. We discuss some of these errors in detail in the CHAPTER 5. SPECIFYING AND CHECKING SEMANTIC ATOMICITY 84 Benchmark Approx. LoC
Hierarchically Parallelized Constrained Nonlinear Solvers with Automated Substructuring

NASA Technical Reports Server (NTRS)

Padovan, Joe; Kwang, Abel

1994-01-01

This paper develops a parallelizable multilevel multiple constrained nonlinear equation solver. The substructuring process is automated to yield appropriately balanced partitioning of each succeeding level. Due to the generality of the procedure,_sequential, as well as partially and fully parallel environments can be handled. This includes both single and multiprocessor assignment per individual partition. Several benchmark examples are presented. These illustrate the robustness of the procedure as well as its capability to yield significant reductions in memory utilization and calculational effort due both to updating and inversion.
Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster

NASA Technical Reports Server (NTRS)

Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)

2002-01-01

In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Interactive visual optimization and analysis for RFID benchmarking.

PubMed

Wu, Yingcai; Chung, Ka-Kei; Qu, Huamin; Yuan, Xiaoru; Cheung, S C

2009-01-01

Radio frequency identification (RFID) is a powerful automatic remote identification technique that has wide applications. To facilitate RFID deployment, an RFID benchmarking instrument called aGate has been invented to identify the strengths and weaknesses of different RFID technologies in various environments. However, the data acquired by aGate are usually complex time varying multidimensional 3D volumetric data, which are extremely challenging for engineers to analyze. In this paper, we introduce a set of visualization techniques, namely, parallel coordinate plots, orientation plots, a visual history mechanism, and a 3D spatial viewer, to help RFID engineers analyze benchmark data visually and intuitively. With the techniques, we further introduce two workflow procedures (a visual optimization procedure for finding the optimum reader antenna configuration and a visual analysis procedure for comparing the performance and identifying the flaws of RFID devices) for the RFID benchmarking, with focus on the performance analysis of the aGate system. The usefulness and usability of the system are demonstrated in the user evaluation.
Evaluation of contributions to seasonal reproductive inefficiency; NPB project #14-052

USDA-ARS?s Scientific Manuscript database

The objective of the current study was to evaluate quality of semen collected from June (spring), August (summer), or January (winter) and either stored and used as cooled-extended (ExT) or cryopreserved (FrZ) for breeding gilts in summer (August) or winter (January). Semen quality evaluation includ...
DOE Office of Scientific and Technical Information (OSTI.GOV)

Cohen, J; Dossa, D; Gokhale, M

Critical data science applications requiring frequent access to storage perform poorly on today's computing architectures. This project addresses efficient computation of data-intensive problems in national security and basic science by exploring, advancing, and applying a new form of computing called storage-intensive supercomputing (SISC). Our goal is to enable applications that simply cannot run on current systems, and, for a broad range of data-intensive problems, to deliver an order of magnitude improvement in price/performance over today's data-intensive architectures. This technical report documents much of the work done under LDRD 07-ERD-063 Storage Intensive Supercomputing during the period 05/07-09/07. The following chapters describe:more » (1) a new file I/O monitoring tool iotrace developed to capture the dynamic I/O profiles of Linux processes; (2) an out-of-core graph benchmark for level-set expansion of scale-free graphs; (3) an entity extraction benchmark consisting of a pipeline of eight components; and (4) an image resampling benchmark drawn from the SWarp program in the LSST data processing pipeline. The performance of the graph and entity extraction benchmarks was measured in three different scenarios: data sets residing on the NFS file server and accessed over the network; data sets stored on local disk; and data sets stored on the Fusion I/O parallel NAND Flash array. The image resampling benchmark compared performance of software-only to GPU-accelerated. In addition to the work reported here, an additional text processing application was developed that used an FPGA to accelerate n-gram profiling for language classification. The n-gram application will be presented at SC07 at the High Performance Reconfigurable Computing Technologies and Applications Workshop. The graph and entity extraction benchmarks were run on a Supermicro server housing the NAND Flash 40GB parallel disk array, the Fusion-io. The Fusion system specs are as follows: SuperMicro X7DBE Xeon Dual Socket Blackford Server Motherboard; 2 Intel Xeon Dual-Core 2.66 GHz processors; 1 GB DDR2 PC2-5300 RAM (2 x 512); 80GB Hard Drive (Seagate SATA II Barracuda). The Fusion board is presently capable of 4X in a PCIe slot. The image resampling benchmark was run on a dual Xeon workstation with NVIDIA graphics card (see Chapter 5 for full specification). An XtremeData Opteron+FPGA was used for the language classification application. We observed that these benchmarks are not uniformly I/O intensive. The only benchmark that showed greater that 50% of the time in I/O was the graph algorithm when it accessed data files over NFS. When local disk was used, the graph benchmark spent at most 40% of its time in I/O. The other benchmarks were CPU dominated. The image resampling benchmark and language classification showed order of magnitude speedup over software by using co-processor technology to offload the CPU-intensive kernels. Our experiments to date suggest that emerging hardware technologies offer significant benefit to boosting the performance of data-intensive algorithms. Using GPU and FPGA co-processors, we were able to improve performance by more than an order of magnitude on the benchmark algorithms, eliminating the processor bottleneck of CPU-bound tasks. Experiments with a prototype solid state nonvolative memory available today show 10X better throughput on random reads than disk, with a 2X speedup on a graph processing benchmark when compared to the use of local SATA disk.« less
A new Green's function Monte Carlo algorithm for the solution of the two-dimensional nonlinear Poisson–Boltzmann equation: Application to the modeling of the communication breakdown problem in space vehicles during re-entry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chatterjee, Kausik, E-mail: kausik.chatterjee@aggiemail.usu.edu; Center for Atmospheric and Space Sciences, Utah State University, Logan, UT 84322; Roadcap, John R., E-mail: john.roadcap@us.af.mil

The objective of this paper is the exposition of a recently-developed, novel Green's function Monte Carlo (GFMC) algorithm for the solution of nonlinear partial differential equations and its application to the modeling of the plasma sheath region around a cylindrical conducting object, carrying a potential and moving at low speeds through an otherwise neutral medium. The plasma sheath is modeled in equilibrium through the GFMC solution of the nonlinear Poisson–Boltzmann (NPB) equation. The traditional Monte Carlo based approaches for the solution of nonlinear equations are iterative in nature, involving branching stochastic processes which are used to calculate linear functionals ofmore » the solution of nonlinear integral equations. Over the last several years, one of the authors of this paper, K. Chatterjee has been developing a philosophically-different approach, where the linearization of the equation of interest is not required and hence there is no need for iteration and the simulation of branching processes. Instead, an approximate expression for the Green's function is obtained using perturbation theory, which is used to formulate the random walk equations within the problem sub-domains where the random walker makes its walks. However, as a trade-off, the dimensions of these sub-domains have to be restricted by the limitations imposed by perturbation theory. The greatest advantage of this approach is the ease and simplicity of parallelization stemming from the lack of the need for iteration, as a result of which the parallelization procedure is identical to the parallelization procedure for the GFMC solution of a linear problem. The application area of interest is in the modeling of the communication breakdown problem during a space vehicle's re-entry into the atmosphere. However, additional application areas are being explored in the modeling of electromagnetic propagation through the atmosphere/ionosphere in UHF/GPS applications.« less
A new Green's function Monte Carlo algorithm for the solution of the two-dimensional nonlinear Poisson-Boltzmann equation: Application to the modeling of the communication breakdown problem in space vehicles during re-entry

NASA Astrophysics Data System (ADS)

Chatterjee, Kausik; Roadcap, John R.; Singh, Surendra

2014-11-01

The objective of this paper is the exposition of a recently-developed, novel Green's function Monte Carlo (GFMC) algorithm for the solution of nonlinear partial differential equations and its application to the modeling of the plasma sheath region around a cylindrical conducting object, carrying a potential and moving at low speeds through an otherwise neutral medium. The plasma sheath is modeled in equilibrium through the GFMC solution of the nonlinear Poisson-Boltzmann (NPB) equation. The traditional Monte Carlo based approaches for the solution of nonlinear equations are iterative in nature, involving branching stochastic processes which are used to calculate linear functionals of the solution of nonlinear integral equations. Over the last several years, one of the authors of this paper, K. Chatterjee has been developing a philosophically-different approach, where the linearization of the equation of interest is not required and hence there is no need for iteration and the simulation of branching processes. Instead, an approximate expression for the Green's function is obtained using perturbation theory, which is used to formulate the random walk equations within the problem sub-domains where the random walker makes its walks. However, as a trade-off, the dimensions of these sub-domains have to be restricted by the limitations imposed by perturbation theory. The greatest advantage of this approach is the ease and simplicity of parallelization stemming from the lack of the need for iteration, as a result of which the parallelization procedure is identical to the parallelization procedure for the GFMC solution of a linear problem. The application area of interest is in the modeling of the communication breakdown problem during a space vehicle's re-entry into the atmosphere. However, additional application areas are being explored in the modeling of electromagnetic propagation through the atmosphere/ionosphere in UHF/GPS applications.
Spherical Harmonic Solutions to the 3D Kobayashi Benchmark Suite

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, P.N.; Chang, B.; Hanebutte, U.R.

1999-12-29

Spherical harmonic solutions of order 5, 9 and 21 on spatial grids containing up to 3.3 million cells are presented for the Kobayashi benchmark suite. This suite of three problems with simple geometry of pure absorber with large void region was proposed by Professor Kobayashi at an OECD/NEA meeting in 1996. Each of the three problems contains a source, a void and a shield region. Problem 1 can best be described as a box in a box problem, where a source region is surrounded by a square void region which itself is embedded in a square shield region. Problems 2more » and 3 represent a shield with a void duct. Problem 2 having a straight and problem 3 a dog leg shaped duct. A pure absorber and a 50% scattering case are considered for each of the three problems. The solutions have been obtained with Ardra, a scalable, parallel neutron transport code developed at Lawrence Livermore National Laboratory (LLNL). The Ardra code takes advantage of a two-level parallelization strategy, which combines message passing between processing nodes and thread based parallelism amongst processors on each node. All calculations were performed on the IBM ASCI Blue-Pacific computer at LLNL.« less
A Parallel Particle Swarm Optimization Algorithm Accelerated by Asynchronous Evaluations

NASA Technical Reports Server (NTRS)

Venter, Gerhard; Sobieszczanski-Sobieski, Jaroslaw

2005-01-01

A parallel Particle Swarm Optimization (PSO) algorithm is presented. Particle swarm optimization is a fairly recent addition to the family of non-gradient based, probabilistic search algorithms that is based on a simplified social model and is closely tied to swarming theory. Although PSO algorithms present several attractive properties to the designer, they are plagued by high computational cost as measured by elapsed time. One approach to reduce the elapsed time is to make use of coarse-grained parallelization to evaluate the design points. Previous parallel PSO algorithms were mostly implemented in a synchronous manner, where all design points within a design iteration are evaluated before the next iteration is started. This approach leads to poor parallel speedup in cases where a heterogeneous parallel environment is used and/or where the analysis time depends on the design point being analyzed. This paper introduces an asynchronous parallel PSO algorithm that greatly improves the parallel e ciency. The asynchronous algorithm is benchmarked on a cluster assembled of Apple Macintosh G5 desktop computers, using the multi-disciplinary optimization of a typical transport aircraft wing as an example.
Biopolymers in Light Emitting Devices

DTIC Science & Technology

2006-09-01

from an Alq3 layer are illustrated in Fig. 4. Red emission from the rare earth ion Eu3+ doped into the various emitter layers has also been...structure. 26 • IDRC 3.2 / A. J. Steckl Figure 4. DNA BioLEDs: (a) blue (NPB) and green ( Alq3 ) emitting devices in operation; (b) luminance
Refunctionalization of the ancient rice blast disease resistance gene Pit by the recruitment of a retrotransposon as a promoter.

PubMed

Hayashi, Keiko; Yoshida, Hitoshi

2009-02-01

The plant genome contains a large number of disease resistance (R) genes that have evolved through diverse mechanisms. Here, we report that a long terminal repeat (LTR) retrotransposon contributed to the evolution of the rice blast resistance gene Pit. Pit confers race-specific resistance against the fungal pathogen Magnaporthe grisea, and is a member of the nucleotide-binding site leucine-rich repeat (NBS-LRR) family of R genes. Compared with the non-functional allele Pit(Npb), the functional allele Pit(K59) contains four amino acid substitutions, and has the LTR retrotransposon Renovator inserted upstream. Pathogenesis assays using chimeric constructs carrying the various regions of Pit(K59) and Pit(Npb) suggest that amino acid substitutions might have a potential effect in Pit resistance; more importantly, the upregulated promoter activity conferred by the Renovator sequence is essential for Pit function. Our data suggest that transposon-mediated transcriptional activation may play an important role in the refunctionalization of additional 'sleeping' R genes in the plant genome.
Exciton enhancement and exciplex quenching by plasmonic effect of Aluminum nanoparticle arrays in a blue organic light emitting diode.

PubMed

Khadir, Samira; Diallo, AmadouThierno; Chakaroun, Mahmoud; Boudrioua, Azzedine

2017-05-01

We report the investigation of plasmonic effect of array of aluminum nanoparticles (Al-NPs) on blue micro-OLED subject to exciplex emission. N,N'-Di(1-naphthyl)-N,N'-diphenyl-(1,1'-biphenyl)-4,4'-diamine (NPB) andcarbazol derivative 4,4'-bis(N-carbazolyl)-1,1'-biphenyl (CBP) have been used as the emitting layer (EML) and hole transport layer (HTL), respectively. For the reference µ-OLED without Al-NPs, we observed two emission peaks attributed to CBP emission and exciplex emission formed at the NPB/CBP (EML/HTL) interface. By the incorporation of the Al-NPs array, obtained by e-beam lithography technique on the ITO anode, the exciplex emission has been widely depressed. Moreover, thanks to localized surface plasmon resonance (LSPR), an enhancement of the CBP emission has been achieved indicating an efficient energy coupling between the LSPR of the Al-NPs and the CBP excitons. Thus, an enhancement of about 20% of the efficiency of the µ-OLED with Al-NPs in comparison to the reference device has been obtained.
A white organic light emitting diode based on anthracene-triphenylamine derivatives

NASA Astrophysics Data System (ADS)

Jiang, Quan; Qu, Jianjun; Yu, Junsheng; Tao, Silu; Gan, Yuanyuan; Jiang, Yadong

2010-10-01

White organic lighting-diode (WOLED) can be used as flat light sources, backlights for liquid crystal displays and full color displays. Recently, a research mainstream of white OLED is to develop the novel materials and optimize the structure of devices. In this work a WOLED with a structure of ITO/NPB/PAA/Alq3: x% rubrene/Alq3/Mg: Ag, was fabricated. The device has two light-emitting layers. NPB is used as a hole transport layer, PAA as a blue emitting layer, Alq3: rubrene host-guest system as a yellow emitting layer, and Alq3 close to the cathode as an electron transport layer. In the experiment, the doping concentration of rubrene was optimized. WOLED 1 with 4% rubrene achieved a maximum luminous efficiency of 1.80 lm/W, a maximum luminance of 3926 cd/m2 and CIE coordinates of (0.374, 0.341) .WOLED 2 with 2% rubrene achieved a maximum luminous efficiency of 0.65 lm/W, a maximum luminance of 7495cd/m2 and CIE coordinates of (0.365,0.365).
Sharp green electroluminescence from 1H-pyrazolo[3,4-b]quinoline-based light-emitting diodes

NASA Astrophysics Data System (ADS)

Tao, Y. T.; Balasubramaniam, E.; Danel, A.; Jarosz, B.; Tomasik, P.

2000-09-01

A multilayer organic light-emitting diode was fabricated using a fluorescent compound {6-N,N-diethylamino-1-methyl-3-phenyl-1H-pyrazolo[3,4-b]quinoline} (PAQ-NEt2) doped into the hole-transporting layer of NPB {4,4'-bis[N-(1-naphthyl-1-)-N-phenyl-amino]-biphenyl}, with the TPBI {2,2',2″-(1,3,5-phenylene)tris[1-phenyl-1H-benzimidazole]} as an electrontransporting material. At 16% PAQ-NEt2 doping concentration, the device gave a sharp, bright, and efficient green electroluminescence (EL) peaked at around 530 nm. The full width at half maximum of the EL is 60 nm, which is 60% of the green emission from typical NPB/AlQ [where AlQ=tris(8-hydroxyquinoline) aluminum] device. For the same concentration, a maximum luminance of 37 000 cd/m2 was obtained at 10.0 V and the maximum power, luminescence, and external quantum efficiencies were obtained 4.2 lm/W, 6.0 cd/A, and 1.6%, respectively, at 5.0 V.

MPF: A portable message passing facility for shared memory multiprocessors

NASA Technical Reports Server (NTRS)

Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.

1987-01-01

The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)

2002-01-01

The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.

PubMed

Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio

2014-07-05

A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems. Copyright © 2014 Wiley Periodicals, Inc.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Caubet, Jordi; Biegel, Bryan A. (Technical Monitor)

2002-01-01

In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. This system consists of two major components: The OMPItrace dynamic instrumentation mechanism, which allows the tracing of processes and threads and the Paraver graphical user interface for inspection and analyses of the generated traces. We describe how to use the system to conduct a detailed comparative study of a benchmark code implemented in five different programming paradigms applicable for shared memory
Understanding the Cray X1 System

NASA Technical Reports Server (NTRS)

Cheung, Samson

2004-01-01

This paper helps the reader understand the characteristics of the Cray X1 vector supercomputer system, and provides hints and information to enable the reader to port codes to the system. It provides a comparison between the basic performance of the X1 platform and other platforms that are available at NASA Ames Research Center. A set of codes, solving the Laplacian equation with different parallel paradigms, is used to understand some features of the X1 compiler. An example code from the NAS Parallel Benchmarks is used to demonstrate performance optimization on the X1 platform.
High Performance Programming Using Explicit Shared Memory Model on the Cray T3D

NASA Technical Reports Server (NTRS)

Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)

1994-01-01

The Cray T3D is the first-phase system in Cray Research Inc.'s (CRI) three-phase massively parallel processing program. In this report we describe the architecture of the T3D, as well as the CRAFT (Cray Research Adaptive Fortran) programming model, and contrast it with PVM, which is also supported on the T3D We present some performance data based on the NAS Parallel Benchmarks to illustrate both architectural and software features of the T3D.
Benchmarking Memory Performance with the Data Cube Operator

NASA Technical Reports Server (NTRS)

Frumkin, Michael A.; Shabanov, Leonid V.

2004-01-01

Data movement across a computer memory hierarchy and across computational grids is known to be a limiting factor for applications processing large data sets. We use the Data Cube Operator on an Arithmetic Data Set, called ADC, to benchmark capabilities of computers and of computational grids to handle large distributed data sets. We present a prototype implementation of a parallel algorithm for computation of the operatol: The algorithm follows a known approach for computing views from the smallest parent. The ADC stresses all levels of grid memory and storage by producing some of 2d views of an Arithmetic Data Set of d-tuples described by a small number of integers. We control data intensity of the ADC by selecting the tuple parameters, the sizes of the views, and the number of realized views. Benchmarking results of memory performance of a number of computer architectures and of a small computational grid are presented.
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing

NASA Technical Reports Server (NTRS)

Fricker, David M.

1997-01-01

The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations

NASA Astrophysics Data System (ADS)

Teijeiro, C.; Hammerschmidt, T.; Drautz, R.; Sutmann, G.

2016-07-01

Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors.
PIPS-SBB: A Parallel Distributed-Memory Branch-and-Bound Algorithm for Stochastic Mixed-Integer Programs

DOE PAGES

Munguia, Lluis-Miquel; Oxberry, Geoffrey; Rajan, Deepak

2016-05-01

Stochastic mixed-integer programs (SMIPs) deal with optimization under uncertainty at many levels of the decision-making process. When solved as extensive formulation mixed- integer programs, problem instances can exceed available memory on a single workstation. In order to overcome this limitation, we present PIPS-SBB: a distributed-memory parallel stochastic MIP solver that takes advantage of parallelism at multiple levels of the optimization process. We also show promising results on the SIPLIB benchmark by combining methods known for accelerating Branch and Bound (B&B) methods with new ideas that leverage the structure of SMIPs. Finally, we expect the performance of PIPS-SBB to improve furthermore » as more functionality is added in the future.« less
Gust Acoustics Computation with a Space-Time CE/SE Parallel 3D Solver

NASA Technical Reports Server (NTRS)

Wang, X. Y.; Himansu, A.; Chang, S. C.; Jorgenson, P. C. E.; Reddy, D. R. (Technical Monitor)

2002-01-01

The benchmark Problem 2 in Category 3 of the Third Computational Aero-Acoustics (CAA) Workshop is solved using the space-time conservation element and solution element (CE/SE) method. This problem concerns the unsteady response of an isolated finite-span swept flat-plate airfoil bounded by two parallel walls to an incident gust. The acoustic field generated by the interaction of the gust with the flat-plate airfoil is computed by solving the 3D (three-dimensional) Euler equations in the time domain using a parallel version of a 3D CE/SE solver. The effect of the gust orientation on the far-field directivity is studied. Numerical solutions are presented and compared with analytical solutions, showing a reasonable agreement.
Maximal clique enumeration with data-parallel primitives

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lessley, Brenton; Perciano, Talita; Mathai, Manish

The enumeration of all maximal cliques in an undirected graph is a fundamental problem arising in several research areas. We consider maximal clique enumeration on shared-memory, multi-core architectures and introduce an approach consisting entirely of data-parallel operations, in an effort to achieve efficient and portable performance across different architectures. We study the performance of the algorithm via experiments varying over benchmark graphs and architectures. Overall, we observe that our algorithm achieves up to a 33-time speedup and 9-time speedup over state-of-the-art distributed and serial algorithms, respectively, for graphs with higher ratios of maximal cliques to total cliques. Further, we attainmore » additional speedups on a GPU architecture, demonstrating the portable performance of our data-parallel design.« less
Debugging and Analysis of Large-Scale Parallel Programs

DTIC Science & Technology

1989-09-01

Przybylski, T. Riordan , C. Rowen, and D. Van’t Hof, "A CMOS RISC Processor with Integrated System Functions," In Proc. of the 1986 COMPCON. IEEE, March 1986...Sequencers," Communications of the ACM, 22(2):115-123, 1979. 115 [Richardson, 1988] Rick Richardson, "Dhrystone 2.1 Benchmark," Usenet Distribution
Code Parallelization with CAPO: A User Manual

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)

2001-01-01

A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.
Automated Generation of Message-Passing Programs: An Evaluation Using CAPTools

NASA Technical Reports Server (NTRS)

Hribar, Michelle R.; Jin, Haoqiang; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

Scientists at NASA Ames Research Center have been developing computational aeroscience applications on highly parallel architectures over the past ten years. During that same time period, a steady transition of hardware and system software also occurred, forcing us to expend great efforts into migrating and re-coding our applications. As applications and machine architectures become increasingly complex, the cost and time required for this process will become prohibitive. In this paper, we present the first set of results in our evaluation of interactive parallelization tools. In particular, we evaluate CAPTool's ability to parallelize computational aeroscience applications. CAPTools was tested on serial versions of the NAS Parallel Benchmarks and ARC3D, a computational fluid dynamics application, on two platforms: the SGI Origin 2000 and the Cray T3E. This evaluation includes performance, amount of user interaction required, limitations and portability. Based on these results, a discussion on the feasibility of computer aided parallelization of aerospace applications is presented along with suggestions for future work.
Cooperative parallel adaptive neighbourhood search for the disjunctively constrained knapsack problem

NASA Astrophysics Data System (ADS)

Quan, Zhe; Wu, Lei

2017-09-01

This article investigates the use of parallel computing for solving the disjunctively constrained knapsack problem. The proposed parallel computing model can be viewed as a cooperative algorithm based on a multi-neighbourhood search. The cooperation system is composed of a team manager and a crowd of team members. The team members aim at applying their own search strategies to explore the solution space. The team manager collects the solutions from the members and shares the best one with them. The performance of the proposed method is evaluated on a group of benchmark data sets. The results obtained are compared to those reached by the best methods from the literature. The results show that the proposed method is able to provide the best solutions in most cases. In order to highlight the robustness of the proposed parallel computing model, a new set of large-scale instances is introduced. Encouraging results have been obtained.
Time-Dependent Simulations of Turbopump Flows

NASA Technical Reports Server (NTRS)

Kiris, Cetin; Kwak, Dochan; Chan, William; Williams, Robert

2002-01-01

Unsteady flow simulations for RLV (Reusable Launch Vehicles) 2nd Generation baseline turbopump for one and half impeller rotations have been completed by using a 34.3 Million grid points model. MLP (Multi-Level Parallelism) shared memory parallelism has been implemented in INS3D, and benchmarked. Code optimization for cash based platforms will be completed by the end of September 2001. Moving boundary capability is obtained by using DCF module. Scripting capability from CAD (computer aided design) geometry to solution has been developed. Data compression is applied to reduce data size in post processing. Fluid/Structure coupling has been initiated.
Microscope mode secondary ion mass spectrometry imaging with a Timepix detector.

PubMed

Kiss, Andras; Jungmann, Julia H; Smith, Donald F; Heeren, Ron M A

2013-01-01

In-vacuum active pixel detectors enable high sensitivity, highly parallel time- and space-resolved detection of ions from complex surfaces. For the first time, a Timepix detector assembly was combined with a secondary ion mass spectrometer for microscope mode secondary ion mass spectrometry (SIMS) imaging. Time resolved images from various benchmark samples demonstrate the imaging capabilities of the detector system. The main advantages of the active pixel detector are the higher signal-to-noise ratio and parallel acquisition of arrival time and position. Microscope mode SIMS imaging of biomolecules is demonstrated from tissue sections with the Timepix detector.
High efficiency fluorescent white OLEDs based on DOPPP

NASA Astrophysics Data System (ADS)

Zhang, Gang; Chen, Chen; Lang, Jihui; Zhao, Lina; Jiang, Wenlong

2017-08-01

The white organic light-emitting devices (WOLED) with the structures of ITO/m-MTDATA (10 nm)/NPB (30 nm)/Rubrene (0.2 nm)/DOPPP (x nm)/TAz (10 nm)/Alq3 (30 nm)/LiF (0.5 nm)/Al and ITO/NPB (30 nm)/DPAVBi:Rubrene (2 wt.%, 20 nm)/ DOPPP (x nm)/TAZ (10 nm)/Alq3 (30 nm)/LiF (0.5 nm)/Al (100 nm) have been fabricated by the vacuum thermal evaporation method. The results show that the chroma of the non-doped device is the best and the color coordinates are in the range of white light. The maximum luminance is 12,750 cd/m2 and the maximum current efficiency is 8.55 cd/A. The doped device A has the maximum luminance (16,570 cd/m2), when the thickness of blue layer DOPPP is 25 nm, and the doped device B achieves the highest efficiency (10.47 cd/A), when the thickness of DOPPP is 15 nm. All the performances of the doped devices are better than the non-doped one. The results demonstrate that the doped structures can realize the energy transfer and then improve the performance of the device effectively.
Implementation of Benchmarking Transportation Logistics Practices and Future Benchmarking Organizations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thrower, A.W.; Patric, J.; Keister, M.

2008-07-01

The purpose of the Office of Civilian Radioactive Waste Management's (OCRWM) Logistics Benchmarking Project is to identify established government and industry practices for the safe transportation of hazardous materials which can serve as a yardstick for design and operation of OCRWM's national transportation system for shipping spent nuclear fuel and high-level radioactive waste to the proposed repository at Yucca Mountain, Nevada. The project will present logistics and transportation practices and develop implementation recommendations for adaptation by the national transportation system. This paper will describe the process used to perform the initial benchmarking study, highlight interim findings, and explain how thesemore » findings are being implemented. It will also provide an overview of the next phase of benchmarking studies. The benchmarking effort will remain a high-priority activity throughout the planning and operational phases of the transportation system. The initial phase of the project focused on government transportation programs to identify those practices which are most clearly applicable to OCRWM. These Federal programs have decades of safe transportation experience, strive for excellence in operations, and implement effective stakeholder involvement, all of which parallel OCRWM's transportation mission and vision. The initial benchmarking project focused on four business processes that are critical to OCRWM's mission success, and can be incorporated into OCRWM planning and preparation in the near term. The processes examined were: transportation business model, contract management/out-sourcing, stakeholder relations, and contingency planning. More recently, OCRWM examined logistics operations of AREVA NC's Business Unit Logistics in France. The next phase of benchmarking will focus on integrated domestic and international commercial radioactive logistic operations. The prospective companies represent large scale shippers and have vast experience in safely and efficiently shipping spent nuclear fuel and other radioactive materials. Additional business processes may be examined in this phase. The findings of these benchmarking efforts will help determine the organizational structure and requirements of the national transportation system. (authors)« less

Performance effects of irregular communications patterns on massively parallel multiprocessors

NASA Technical Reports Server (NTRS)

Saltz, Joel; Petiton, Serge; Berryman, Harry; Rifkin, Adam

1991-01-01

A detailed study of the performance effects of irregular communications patterns on the CM-2 was conducted. The communications capabilities of the CM-2 were characterized under a variety of controlled conditions. In the process of carrying out the performance evaluation, extensive use was made of a parameterized synthetic mesh. In addition, timings with unstructured meshes generated for aerodynamic codes and a set of sparse matrices with banded patterns on non-zeroes were performed. This benchmarking suite stresses the communications capabilities of the CM-2 in a range of different ways. Benchmark results demonstrate that it is possible to make effective use of much of the massive concurrency available in the communications network.
Benchmarking and performance analysis of the CM-2. [SIMD computer

NASA Technical Reports Server (NTRS)

Myers, David W.; Adams, George B., II

1988-01-01

A suite of benchmarking routines testing communication, basic arithmetic operations, and selected kernel algorithms written in LISP and PARIS was developed for the CM-2. Experiment runs are automated via a software framework that sequences individual tests, allowing for unattended overnight operation. Multiple measurements are made and treated statistically to generate well-characterized results from the noisy values given by cm:time. The results obtained provide a comparison with similar, but less extensive, testing done on a CM-1. Tests were chosen to aid the algorithmist in constructing fast, efficient, and correct code on the CM-2, as well as gain insight into what performance criteria are needed when evaluating parallel processing machines.
Execution models for mapping programs onto distributed memory parallel computers

NASA Technical Reports Server (NTRS)

Sussman, Alan

1992-01-01

The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program.
Massively parallel quantum computer simulator

NASA Astrophysics Data System (ADS)

De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

2007-01-01

We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.
Performance Evaluation and Modeling Techniques for Parallel Processors. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Dimpsey, Robert Tod

1992-01-01

In practice, the performance evaluation of supercomputers is still substantially driven by singlepoint estimates of metrics (e.g., MFLOPS) obtained by running characteristic benchmarks or workloads. With the rapid increase in the use of time-shared multiprogramming in these systems, such measurements are clearly inadequate. This is because multiprogramming and system overhead, as well as other degradations in performance due to time varying characteristics of workloads, are not taken into account. In multiprogrammed environments, multiple jobs and users can dramatically increase the amount of system overhead and degrade the performance of the machine. Performance techniques, such as benchmarking, which characterize performance on a dedicated machine ignore this major component of true computer performance. Due to the complexity of analysis, there has been little work done in analyzing, modeling, and predicting the performance of applications in multiprogrammed environments. This is especially true for parallel processors, where the costs and benefits of multi-user workloads are exacerbated. While some may claim that the issue of multiprogramming is not a viable one in the supercomputer market, experience shows otherwise. Even in recent massively parallel machines, multiprogramming is a key component. It has even been claimed that a partial cause of the demise of the CM2 was the fact that it did not efficiently support time-sharing. In the same paper, Gordon Bell postulates that, multicomputers will evolve to multiprocessors in order to support efficient multiprogramming. Therefore, it is clear that parallel processors of the future will be required to offer the user a time-shared environment with reasonable response times for the applications. In this type of environment, the most important performance metric is the completion of response time of a given application. However, there are a few evaluation efforts addressing this issue.
The Metropolis Monte Carlo method with CUDA enabled Graphic Processing Units

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hall, Clifford; School of Physics, Astronomy, and Computational Sciences, George Mason University, 4400 University Dr., Fairfax, VA 22030; Ji, Weixiao

2014-02-01

We present a CPU–GPU system for runtime acceleration of large molecular simulations using GPU computation and memory swaps. The memory architecture of the GPU can be used both as container for simulation data stored on the graphics card and as floating-point code target, providing an effective means for the manipulation of atomistic or molecular data on the GPU. To fully take advantage of this mechanism, efficient GPU realizations of algorithms used to perform atomistic and molecular simulations are essential. Our system implements a versatile molecular engine, including inter-molecule interactions and orientational variables for performing the Metropolis Monte Carlo (MMC) algorithm,more » which is one type of Markov chain Monte Carlo. By combining memory objects with floating-point code fragments we have implemented an MMC parallel engine that entirely avoids the communication time of molecular data at runtime. Our runtime acceleration system is a forerunner of a new class of CPU–GPU algorithms exploiting memory concepts combined with threading for avoiding bus bandwidth and communication. The testbed molecular system used here is a condensed phase system of oligopyrrole chains. A benchmark shows a size scaling speedup of 60 for systems with 210,000 pyrrole monomers. Our implementation can easily be combined with MPI to connect in parallel several CPU–GPU duets. -- Highlights: •We parallelize the Metropolis Monte Carlo (MMC) algorithm on one CPU—GPU duet. •The Adaptive Tempering Monte Carlo employs MMC and profits from this CPU—GPU implementation. •Our benchmark shows a size scaling-up speedup of 62 for systems with 225,000 particles. •The testbed involves a polymeric system of oligopyrroles in the condensed phase. •The CPU—GPU parallelization includes dipole—dipole and Mie—Jones classic potentials.« less
Automated Instrumentation, Monitoring and Visualization of PVM Programs Using AIMS

NASA Technical Reports Server (NTRS)

Mehra, Pankaj; VanVoorst, Brian; Yan, Jerry; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

We present views and analysis of the execution of several PVM (Parallel Virtual Machine) codes for Computational Fluid Dynamics on a networks of Sparcstations, including: (1) NAS Parallel Benchmarks CG and MG; (2) a multi-partitioning algorithm for NAS Parallel Benchmark SP; and (3) an overset grid flowsolver. These views and analysis were obtained using our Automated Instrumentation and Monitoring System (AIMS) version 3.0, a toolkit for debugging the performance of PVM programs. We will describe the architecture, operation and application of AIMS. The AIMS toolkit contains: (1) Xinstrument, which can automatically instrument various computational and communication constructs in message-passing parallel programs; (2) Monitor, a library of runtime trace-collection routines; (3) VK (Visual Kernel), an execution-animation tool with source-code clickback; and (4) Tally, a tool for statistical analysis of execution profiles. Currently, Xinstrument can handle C and Fortran 77 programs using PVM 3.2.x; Monitor has been implemented and tested on Sun 4 systems running SunOS 4.1.2; and VK uses XIIR5 and Motif 1.2. Data and views obtained using AIMS clearly illustrate several characteristic features of executing parallel programs on networked workstations: (1) the impact of long message latencies; (2) the impact of multiprogramming overheads and associated load imbalance; (3) cache and virtual-memory effects; and (4) significant skews between workstation clocks. Interestingly, AIMS can compensate for constant skew (zero drift) by calibrating the skew between a parent and its spawned children. In addition, AIMS' skew-compensation algorithm can adjust timestamps in a way that eliminates physically impossible communications (e.g., messages going backwards in time). Our current efforts are directed toward creating new views to explain the observed performance of PVM programs. Some of the features planned for the near future include: (1) ConfigView, showing the physical topology of the virtual machine, inferred using specially formatted IP (Internet Protocol) packets: and (2) LoadView, synchronous animation of PVM-program execution and resource-utilization patterns.
Pharmacological enhancement of leg and muscle microvascular blood flow does not augment anabolic responses in skeletal muscle of young men under fed conditions.

PubMed

Phillips, Bethan E; Atherton, Philip J; Varadhan, Krishna; Wilkinson, Daniel J; Limb, Marie; Selby, Anna L; Rennie, Michael J; Smith, Kenneth; Williams, John P

2014-01-15

Skeletal muscle anabolism associated with postprandial plasma aminoacidemia and insulinemia is contingent upon amino acids (AA) and insulin crossing the microcirculation-myocyte interface. In this study, we hypothesized that increasing muscle microvascular blood volume (flow) would enhance fed-state anabolic responses in muscle protein turnover. We studied 10 young men (23.2 ± 2.1 yr) under postabsorptive and fed [iv Glamin (∼10 g AA), glucose ∼7.5 mmol/l] conditions. Methacholine was infused into the femoral artery of one leg to determine, via bilateral comparison, the effects of feeding alone vs. feeding plus pharmacological vasodilation. We measured leg blood flow (LBF; femoral artery) by Doppler ultrasound, muscle microvascular blood volume (MBV) by contrast-enhanced ultrasound (CEUS), muscle protein synthesis (MPS) and breakdown (MPB; a-v balance modeling), and net protein balance (NPB) using [1,2-(13)C2]leucine and [(2)H5]phenylalanine tracers via gas chromatography-mass spectrometry (GC-MS). Indexes of anabolic signaling/endothelial activation (e.g., Akt/mTORC1/NOS) were assessed using immunoblotting techniques. Under fed conditions, LBF (+12 ± 5%, P < 0.05), MBV (+25 ± 10%, P < 0.05), and MPS (+129 ± 33%, P < 0.05) increased. Infusion of methacholine further enhanced LBF (+126 ± 12%, P < 0.05) and MBV (+79 ± 30%, P < 0.05). Despite these radically different blood flow conditions, neither increases in MPS in response to feeding (0.04 ± 0.004 vs. 0.08 ± 0.01%/h, P < 0.05) nor improvements in NPB (-4.4 ± 2.4 vs. 16.4 ± 5.7 nmol Phe·100 ml leg(-1)·min(-1), P < 0.05) were affected by methacholine infusion (MPS 0.07 ± 0.01%/h; NPB 24.0 ± 7.7 nmol Phe·100 ml leg(-1)·min(-1)), whereas MPB was unaltered by either feeding or infusion of methacholine. Thus, enhancing LBF/MBV above that occurring naturally with feeding alone does not improve muscle anabolism.
The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit

NASA Technical Reports Server (NTRS)

Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete;

1998-01-01

Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.

The Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

NASA Technical Reports Server (NTRS)

Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)

2001-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
Fine-structural cytochemical and immunocytochemical observations on nuclear bodies in the bovine 2-cell embryo.

PubMed

Kopecný, V; Biggiogera, M; Pivko, J; Pavlok, A; Martin, T E; Kaufmann, S H; Shaper, J H; Fakan, S

2000-11-01

Nuclear bodies occurring during the 2-cell stage of bovine embryos (obtained either by in vitro fertilisation of in vitro matured ovarian oocytes, or collection after fertilisation and cleavage in vivo) were studied using ultrastructural cytochemistry and immunocytochemistry to determine whether their occurrence may be linked with the onset of embryonic transcription. In addition, the species-specific ultrastructural features of the interchromatin structures of the 2-cell bovine embryo were displayed. Three different types of nuclear bodies were distinguished: (i) nucleolus precursor bodies (NPBs), (ii) loose bodies (LBs) and (iii) dense bodies (DBs). In order to determine their possible functional significance, we considered parallels between these three nuclear entities and interchromatin compartments reported in other cells. As detected by their preferential ribonucleoprotein staining, all types of nuclear bodies contained ribonucleoproteins. In contrast to the other types of nuclear bodies studied, NPBs contained argyrophilic proteins but in no case they did show morphological features of functional nucleoli. Both compact and vacuolated forms of NPBs were seen in both in vivo and in vitro embryos, sometimes simultaneously in the same nucleus. LBs and DBs reacted with antibodies to Sm antigen, indicating the presence of a group of nucleoplasmic, non-nucleolar small nuclear ribonucleoproteins (snRNPs). The immunoreactivity for Sm antigen was more intense and homogeneous in DBs than in LBs. DBs were seen in both categories of embryo. A possible kinship of DBs with the sphere organelle known from oocytes of different animal species or the prominent spherical inclusions of the early mouse embryo nuclei is suggested. The last type of intranuclear body, the LBs, showed a composite structure. Their granular component, occurring in clusters and displaying immunoreactivity for Sm antigen, was similar to interchromatin granules and was therefore named IG-like granules. Another component forming the LBs showed a much finer structure and a lower immunoreactivity with anti-Sm antibodies. We suggest that this amorphous component may be related to the IG-associated zone. All three types of intranuclear bodies were often seen close together, suggesting their possible mutual functional relationship. From these and other observations we conclude that the intranuclear bodies in 2-cell bovine embryos correspond, with the exception of the NPB, to similar structures/compartments supposed to accumulate inactive spliceosomal components in certain phases of somatic cell nucleus functions. Accordingly, the occurrence of such nuclear bodies does not represent cytological evidence for RNA synthesis. In contrast to this, an important morphological feature revealing the status of the bovine 2-cell embryo is the vacuolisation of the NPB.
Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

DOE PAGES

Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael

2015-04-08

The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on themore » performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.« less
Benchmarking NWP Kernels on Multi- and Many-core Processors

NASA Astrophysics Data System (ADS)

Michalakes, J.; Vachharajani, M.

2008-12-01

Increased computing power for weather, climate, and atmospheric science has provided direct benefits for defense, agriculture, the economy, the environment, and public welfare and convenience. Today, very large clusters with many thousands of processors are allowing scientists to move forward with simulations of unprecedented size. But time-critical applications such as real-time forecasting or climate prediction need strong scaling: faster nodes and processors, not more of them. Moreover, the need for good cost- performance has never been greater, both in terms of performance per watt and per dollar. For these reasons, the new generations of multi- and many-core processors being mass produced for commercial IT and "graphical computing" (video games) are being scrutinized for their ability to exploit the abundant fine- grain parallelism in atmospheric models. We present results of our work to date identifying key computational kernels within the dynamics and physics of a large community NWP model, the Weather Research and Forecast (WRF) model. We benchmark and optimize these kernels on several different multi- and many-core processors. The goals are to (1) characterize and model performance of the kernels in terms of computational intensity, data parallelism, memory bandwidth pressure, memory footprint, etc. (2) enumerate and classify effective strategies for coding and optimizing for these new processors, (3) assess difficulties and opportunities for tool or higher-level language support, and (4) establish a continuing set of kernel benchmarks that can be used to measure and compare effectiveness of current and future designs of multi- and many-core processors for weather and climate applications.
Parallelization of Unsteady Adaptive Mesh Refinement for Unstructured Navier-Stokes Solvers

NASA Technical Reports Server (NTRS)

Schwing, Alan M.; Nompelis, Ioannis; Candler, Graham V.

2014-01-01

This paper explores the implementation of the MPI parallelization in a Navier-Stokes solver using adaptive mesh re nement. Viscous and inviscid test problems are considered for the purpose of benchmarking, as are implicit and explicit time advancement methods. The main test problem for comparison includes e ects from boundary layers and other viscous features and requires a large number of grid points for accurate computation. Ex- perimental validation against double cone experiments in hypersonic ow are shown. The adaptive mesh re nement shows promise for a staple test problem in the hypersonic com- munity. Extension to more advanced techniques for more complicated ows is described.
Analytical theory of coherent synchrotron radiation wakefield of short bunches shielded by conducting parallel plates

NASA Astrophysics Data System (ADS)

Stupakov, Gennady; Zhou, Demin

2016-04-01

We develop a general model of coherent synchrotron radiation (CSR) impedance with shielding provided by two parallel conducting plates. This model allows us to easily reproduce all previously known analytical CSR wakes and to expand the analysis to situations not explored before. It reduces calculations of the impedance to taking integrals along the trajectory of the beam. New analytical results are derived for the radiation impedance with shielding for the following orbits: a kink, a bending magnet, a wiggler of finite length, and an infinitely long wiggler. All our formulas are benchmarked against numerical simulations with the CSRZ computer code.
Using domain decomposition in the multigrid NAS parallel benchmark on the Fujitsu VPP500

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, J.C.H.; Lung, H.; Katsumata, Y.

1995-12-01

In this paper, we demonstrate how domain decomposition can be applied to the multigrid algorithm to convert the code for MPP architectures. We also discuss the performance and scalability of this implementation on the new product line of Fujitsu`s vector parallel computer, VPP500. This computer has Fujitsu`s well-known vector processor as the PE each rated at 1.6 C FLOPS. The high speed crossbar network rated at 800 MB/s provides the inter-PE communication. The results show that the physical domain decomposition is the best way to solve MG problems on VPP500.
Analytical theory of coherent synchrotron radiation wakefield of short bunches shielded by conducting parallel plates

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stupakov, Gennady; Zhou, Demin

2016-04-21

We develop a general model of coherent synchrotron radiation (CSR) impedance with shielding provided by two parallel conducting plates. This model allows us to easily reproduce all previously known analytical CSR wakes and to expand the analysis to situations not explored before. It reduces calculations of the impedance to taking integrals along the trajectory of the beam. New analytical results are derived for the radiation impedance with shielding for the following orbits: a kink, a bending magnet, a wiggler of finite length, and an infinitely long wiggler. All our formulas are benchmarked against numerical simulations with the CSRZ computer code.
Comprehensive School Reform at the Helm: The North Carolina Instructional Leadership Reform Program. Benchmark. Volume 5, Issue 4, Fall 2004

ERIC Educational Resources Information Center

Janc, Helen; Appelbaum, Deborah

2004-01-01

This newsletter discusses how in many ways the development of the school reform concept parallels with the evolution of thinking about principal leadership over the past decade. Under Comprehensive School Reform (CSR), schools are increasingly being asked to use data to plan for improvement and to fortify instruction and professional development…
Scalable Effective Approaches for Quadratic Assignment Problems Based on Conic Optimization and Applications

DTIC Science & Technology

2012-02-09

1nclud1ng suggestions for reduc1ng the burden. to the Department of Defense. ExecutiVe Serv1ce D>rectorate (0704-0188) Respondents should be aware...benchmark problem we contacted Bertrand LeCun who in their poject CHOC from 2005-2008 had applied their parallel B&B framework BOB++ to the RLT1
Advances in molecular quantum chemistry contained in the Q-Chem 4 program package

NASA Astrophysics Data System (ADS)

Shao, Yihan; Gan, Zhengting; Epifanovsky, Evgeny; Gilbert, Andrew T. B.; Wormit, Michael; Kussmann, Joerg; Lange, Adrian W.; Behn, Andrew; Deng, Jia; Feng, Xintian; Ghosh, Debashree; Goldey, Matthew; Horn, Paul R.; Jacobson, Leif D.; Kaliman, Ilya; Khaliullin, Rustam Z.; Kuś, Tomasz; Landau, Arie; Liu, Jie; Proynov, Emil I.; Rhee, Young Min; Richard, Ryan M.; Rohrdanz, Mary A.; Steele, Ryan P.; Sundstrom, Eric J.; Woodcock, H. Lee, III; Zimmerman, Paul M.; Zuev, Dmitry; Albrecht, Ben; Alguire, Ethan; Austin, Brian; Beran, Gregory J. O.; Bernard, Yves A.; Berquist, Eric; Brandhorst, Kai; Bravaya, Ksenia B.; Brown, Shawn T.; Casanova, David; Chang, Chun-Min; Chen, Yunqing; Chien, Siu Hung; Closser, Kristina D.; Crittenden, Deborah L.; Diedenhofen, Michael; DiStasio, Robert A., Jr.; Do, Hainam; Dutoi, Anthony D.; Edgar, Richard G.; Fatehi, Shervin; Fusti-Molnar, Laszlo; Ghysels, An; Golubeva-Zadorozhnaya, Anna; Gomes, Joseph; Hanson-Heine, Magnus W. D.; Harbach, Philipp H. P.; Hauser, Andreas W.; Hohenstein, Edward G.; Holden, Zachary C.; Jagau, Thomas-C.; Ji, Hyunjun; Kaduk, Benjamin; Khistyaev, Kirill; Kim, Jaehoon; Kim, Jihan; King, Rollin A.; Klunzinger, Phil; Kosenkov, Dmytro; Kowalczyk, Tim; Krauter, Caroline M.; Lao, Ka Un; Laurent, Adèle D.; Lawler, Keith V.; Levchenko, Sergey V.; Lin, Ching Yeh; Liu, Fenglai; Livshits, Ester; Lochan, Rohini C.; Luenser, Arne; Manohar, Prashant; Manzer, Samuel F.; Mao, Shan-Ping; Mardirossian, Narbe; Marenich, Aleksandr V.; Maurer, Simon A.; Mayhall, Nicholas J.; Neuscamman, Eric; Oana, C. Melania; Olivares-Amaya, Roberto; O'Neill, Darragh P.; Parkhill, John A.; Perrine, Trilisa M.; Peverati, Roberto; Prociuk, Alexander; Rehn, Dirk R.; Rosta, Edina; Russ, Nicholas J.; Sharada, Shaama M.; Sharma, Sandeep; Small, David W.; Sodt, Alexander; Stein, Tamar; Stück, David; Su, Yu-Chuan; Thom, Alex J. W.; Tsuchimochi, Takashi; Vanovschi, Vitalii; Vogt, Leslie; Vydrov, Oleg; Wang, Tao; Watson, Mark A.; Wenzel, Jan; White, Alec; Williams, Christopher F.; Yang, Jun; Yeganeh, Sina; Yost, Shane R.; You, Zhi-Qiang; Zhang, Igor Ying; Zhang, Xing; Zhao, Yan; Brooks, Bernard R.; Chan, Garnet K. L.; Chipman, Daniel M.; Cramer, Christopher J.; Goddard, William A., III; Gordon, Mark S.; Hehre, Warren J.; Klamt, Andreas; Schaefer, Henry F., III; Schmidt, Michael W.; Sherrill, C. David; Truhlar, Donald G.; Warshel, Arieh; Xu, Xin; Aspuru-Guzik, Alán; Baer, Roi; Bell, Alexis T.; Besley, Nicholas A.; Chai, Jeng-Da; Dreuw, Andreas; Dunietz, Barry D.; Furlani, Thomas R.; Gwaltney, Steven R.; Hsu, Chao-Ping; Jung, Yousung; Kong, Jing; Lambrecht, Daniel S.; Liang, WanZhen; Ochsenfeld, Christian; Rassolov, Vitaly A.; Slipchenko, Lyudmila V.; Subotnik, Joseph E.; Van Voorhis, Troy; Herbert, John M.; Krylov, Anna I.; Gill, Peter M. W.; Head-Gordon, Martin

2015-01-01

A summary of the technical advances that are incorporated in the fourth major release of the Q-Chem quantum chemistry program is provided, covering approximately the last seven years. These include developments in density functional theory methods and algorithms, nuclear magnetic resonance (NMR) property evaluation, coupled cluster and perturbation theories, methods for electronically excited and open-shell species, tools for treating extended environments, algorithms for walking on potential surfaces, analysis tools, energy and electron transfer modelling, parallel computing capabilities, and graphical user interfaces. In addition, a selection of example case studies that illustrate these capabilities is given. These include extensive benchmarks of the comparative accuracy of modern density functionals for bonded and non-bonded interactions, tests of attenuated second order Møller-Plesset (MP2) methods for intermolecular interactions, a variety of parallel performance benchmarks, and tests of the accuracy of implicit solvation models. Some specific chemical examples include calculations on the strongly correlated Cr2 dimer, exploring zeolite-catalysed ethane dehydrogenation, energy decomposition analysis of a charged ter-molecular complex arising from glycerol photoionisation, and natural transition orbitals for a Frenkel exciton state in a nine-unit model of a self-assembling nanotube.

Block-Parallel Data Analysis with DIY2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morozov, Dmitriy; Peterka, Tom

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
Processing large remote sensing image data sets on Beowulf clusters

USGS Publications Warehouse

Steinwand, Daniel R.; Maddox, Brian; Beckmann, Tim; Schmidt, Gail

2003-01-01

High-performance computing is often concerned with the speed at which floating- point calculations can be performed. The architectures of many parallel computers and/or their network topologies are based on these investigations. Often, benchmarks resulting from these investigations are compiled with little regard to how a large dataset would move about in these systems. This part of the Beowulf study addresses that concern by looking at specific applications software and system-level modifications. Applications include an implementation of a smoothing filter for time-series data, a parallel implementation of the decision tree algorithm used in the Landcover Characterization project, a parallel Kriging algorithm used to fit point data collected in the field on invasive species to a regular grid, and modifications to the Beowulf project's resampling algorithm to handle larger, higher resolution datasets at a national scale. Systems-level investigations include a feasibility study on Flat Neighborhood Networks and modifications of that concept with Parallel File Systems.
A Systems Approach to Scalable Transportation Network Modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perumalla, Kalyan S

2006-01-01

Emerging needs in transportation network modeling and simulation are raising new challenges with respect to scal-ability of network size and vehicular traffic intensity, speed of simulation for simulation-based optimization, and fidel-ity of vehicular behavior for accurate capture of event phe-nomena. Parallel execution is warranted to sustain the re-quired detail, size and speed. However, few parallel simulators exist for such applications, partly due to the challenges underlying their development. Moreover, many simulators are based on time-stepped models, which can be computationally inefficient for the purposes of modeling evacuation traffic. Here an approach is presented to de-signing a simulator with memory andmore » speed efficiency as the goals from the outset, and, specifically, scalability via parallel execution. The design makes use of discrete event modeling techniques as well as parallel simulation meth-ods. Our simulator, called SCATTER, is being developed, incorporating such design considerations. Preliminary per-formance results are presented on benchmark road net-works, showing scalability to one million vehicles simu-lated on one processor.« less
Sequential Feedback Scheme Outperforms the Parallel Scheme for Hamiltonian Parameter Estimation.

PubMed

Yuan, Haidong

2016-10-14

Measurement and estimation of parameters are essential for science and engineering, where the main quest is to find the highest achievable precision with the given resources and design schemes to attain it. Two schemes, the sequential feedback scheme and the parallel scheme, are usually studied in the quantum parameter estimation. While the sequential feedback scheme represents the most general scheme, it remains unknown whether it can outperform the parallel scheme for any quantum estimation tasks. In this Letter, we show that the sequential feedback scheme has a threefold improvement over the parallel scheme for Hamiltonian parameter estimations on two-dimensional systems, and an order of O(d+1) improvement for Hamiltonian parameter estimation on d-dimensional systems. We also show that, contrary to the conventional belief, it is possible to simultaneously achieve the highest precision for estimating all three components of a magnetic field, which sets a benchmark on the local precision limit for the estimation of a magnetic field.
Manipulation and control of the interfacial polarization in organic light-emitting diodes by dipolar doping

NASA Astrophysics Data System (ADS)

Jäger, Lars; Schmidt, Tobias D.; Brütting, Wolfgang

2016-09-01

Most of the commonly used electron transporting materials in organic light-emitting diodes exhibit interfacial polarization resulting from partially aligned permanent dipole moments of the molecules. This property modifies the internal electric field distribution of the device and therefore enables an earlier flat band condition for the hole transporting side, leading to improved charge carrier injection. Recently, this phenomenon was studied with regard to different materials and degradation effects, however, so far the influence of dilution has not been investigated. In this paper we focus on dipolar doping of the hole transporting material 4,4-bis[N-(1-naphthyl)-N-phenylamino]-biphenyl (NPB) with the polar electron transporting material tris-(8-hydroxyquinolate) aluminum (Alq3). Impedance spectroscopy reveals that changes of the hole injection voltage do not scale in a simple linear fashion with the effective thickness of the doped layer. In fact, the measured interfacial polarization reaches a maximum value for a 1:1 blend. Taking the permanent dipole moment of Alq3 into account, an increasing degree of dipole alignment is found for decreasing Alq3 concentration. This observation can be explained by the competition between dipole-dipole interactions leading to dimerization and the driving force for vertical orientation of Alq3 dipoles at the surface of the NPB layer.
Experimental Study on Thermal Conductivity and Hardness of Cu and Ni Nanoparticle Packed Bed for Thermoelectric Application

NASA Astrophysics Data System (ADS)

Lin, Zi-Zhen; Huang, Cong-Liang; Zhen, Wen-Kai; Feng, Yan-Hui; Zhang, Xin-Xin; Wang, Ge

2017-03-01

The hot-wire method is applied in this paper to probe the thermal conductivity (TC) of Cu and Ni nanoparticle packed beds (NPBs). A different decrease tendency of TC versus porosity than that currently known is discovered. The relationship between the porosity and nanostructure is investigated to explain this unusual phenomenon. It is found that the porosity dominates the TC of the NPB in large porosities, while the TC depends on the contact area between nanoparticles in small porosities. Meanwhile, the Vickers hardness (HV) of NPBs is also measured. It turns out that the enlarged contact area between nanoparticles is responsible for the rapid increase of HV in large porosity, and the saturated nanoparticle deformation is responsible for the small increase of HV in low porosity. With both TC and HV considered, it can be pointed out that a structure of NPB with a porosity of 0.25 is preferable as a thermoelectric material because of the low TC and the higher hardness. Although Cu and Ni are not good thermoelectric materials, this study is supposed to provide an effective way to optimize thermoelectric figure of merit (ZT) and HV of nanoporous materials prepared by the cold-pressing method.
Genetic biomarkers for neoplastic colorectal cancer in peripheral lymphocytes.

PubMed

Ionescu, Mirela; Ciocirlan, Mihai; Ionescu, Cristina; Becheanu, Gabriel; Gologan, Serban; Teiusanu, Adriana; Arbanas, Tudor; Mircea, Diculescu

2011-04-01

Loss of genomic stability appears as a key step in colorectal carcinogenesis. Micronucleus (MN) designates a chromosome fragment or an entire chromosme which lags behind mitosis. MN may be noticed as an additional nucleus within the cytoplasm cell during the intermediate mitosis phases. We tested the hypothesis that MN and its related anomalies may be associated with the presence of neoplastic colorectal lesions. Peripheral blood lymphocytes were cultured and microscopically examined. The frequency of micronuclei (FMN) and the presence of nucleoplasmic bridges (NPB) in binucleated cells were compared in patients with of without colorectal neoplastic lesions. We included 45 patients undergoing colonoscopy, 23 males and 22 females, with a median age of 59. 17 patients had polyps, 11 colorectal cancer (CRC) and 17 had a normal colonoscopy. The FMN was significantly higher in women than in men (8.14 vs 4.17, p=0.008); NPB were significantly less frequent in patients with advanced adenomas (>10mm or vilous) or CRC (p=0.044) when compared with patients with normal colonoscopy, hiperplastic polyps or non-advanced adenomas. Micronuclei are more frequent in women, but its frequency was not significantly different in patients with advanced adenomas or CRC. Null or low frequency values for nucleoplasmic bridges presence in peripheral lymphocyte may be predictive for advanced adenomas and colorectal cancer.
Parallelized multi–graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy

PubMed Central

Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.

2014-01-01

Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Parallelized multi-graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy.

PubMed

Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P

2014-07-01

Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
Benchmarking Teacher Education: A Comparative Assessment of the Top Ten Teacher-Producing Universities' Contributions to the Teacher Workforce

ERIC Educational Resources Information Center

Lin, Zeng; Gardner, Dianne

2006-01-01

The purpose of this study is to demonstrate the usefulness of the Schools and Staffing Survey (SASS), for the comparative analysis of alumni teachers. This article shows how SASS can be used as an evaluative tool by any institution that wants to appraise its alumni in comparison to those of its parallel institutions for the purposes of…
Mean Length of Utterance in Children with Specific Language Impairment and in Younger Control Children Shows Concurrent Validity and Stable and Parallel Growth Trajectories

ERIC Educational Resources Information Center

Rice, Mabel L.; Redmond, Sean M.; Hoffman, Lesa

2006-01-01

Purpose: Although mean length of utterance (MLU) is a useful benchmark in studies of children with specific language impairment (SLI), some empirical and interpretive issues are unresolved. The authors report on 2 studies examining, respectively, the concurrent validity and temporal stability of MLU equivalency between children with SLI and…
ViSAPy: a Python tool for biophysics-based generation of virtual spiking activity for evaluation of spike-sorting algorithms.

PubMed

Hagen, Espen; Ness, Torbjørn V; Khosrowshahi, Amir; Sørensen, Christina; Fyhn, Marianne; Hafting, Torkel; Franke, Felix; Einevoll, Gaute T

2015-04-30

New, silicon-based multielectrodes comprising hundreds or more electrode contacts offer the possibility to record spike trains from thousands of neurons simultaneously. This potential cannot be realized unless accurate, reliable automated methods for spike sorting are developed, in turn requiring benchmarking data sets with known ground-truth spike times. We here present a general simulation tool for computing benchmarking data for evaluation of spike-sorting algorithms entitled ViSAPy (Virtual Spiking Activity in Python). The tool is based on a well-established biophysical forward-modeling scheme and is implemented as a Python package built on top of the neuronal simulator NEURON and the Python tool LFPy. ViSAPy allows for arbitrary combinations of multicompartmental neuron models and geometries of recording multielectrodes. Three example benchmarking data sets are generated, i.e., tetrode and polytrode data mimicking in vivo cortical recordings and microelectrode array (MEA) recordings of in vitro activity in salamander retinas. The synthesized example benchmarking data mimics salient features of typical experimental recordings, for example, spike waveforms depending on interspike interval. ViSAPy goes beyond existing methods as it includes biologically realistic model noise, synaptic activation by recurrent spiking networks, finite-sized electrode contacts, and allows for inhomogeneous electrical conductivities. ViSAPy is optimized to allow for generation of long time series of benchmarking data, spanning minutes of biological time, by parallel execution on multi-core computers. ViSAPy is an open-ended tool as it can be generalized to produce benchmarking data or arbitrary recording-electrode geometries and with various levels of complexity. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
ELAPSE - NASA AMES LISP AND ADA BENCHMARK SUITE: EFFICIENCY OF LISP AND ADA PROCESSING - A SYSTEM EVALUATION

NASA Technical Reports Server (NTRS)

Davis, G. J.

1994-01-01

One area of research of the Information Sciences Division at NASA Ames Research Center is devoted to the analysis and enhancement of processors and advanced computer architectures, specifically in support of automation and robotic systems. To compare systems' abilities to efficiently process Lisp and Ada, scientists at Ames Research Center have developed a suite of non-parallel benchmarks called ELAPSE. The benchmark suite was designed to test a single computer's efficiency as well as alternate machine comparisons on Lisp, and/or Ada languages. ELAPSE tests the efficiency with which a machine can execute the various routines in each environment. The sample routines are based on numeric and symbolic manipulations and include two-dimensional fast Fourier transformations, Cholesky decomposition and substitution, Gaussian elimination, high-level data processing, and symbol-list references. Also included is a routine based on a Bayesian classification program sorting data into optimized groups. The ELAPSE benchmarks are available for any computer with a validated Ada compiler and/or Common Lisp system. Of the 18 routines that comprise ELAPSE, provided within this package are 14 developed or translated at Ames. The others are readily available through literature. The benchmark that requires the most memory is CHOLESKY.ADA. Under VAX/VMS, CHOLESKY.ADA requires 760K of main memory. ELAPSE is available on either two 5.25 inch 360K MS-DOS format diskettes (standard distribution) or a 9-track 1600 BPI ASCII CARD IMAGE format magnetic tape. The contents of the diskettes are compressed using the PKWARE archiving tools. The utility to unarchive the files, PKUNZIP.EXE, is included. The ELAPSE benchmarks were written in 1990. VAX and VMS are trademarks of Digital Equipment Corporation. MS-DOS is a registered trademark of Microsoft Corporation.
PFLOTRAN Verification: Development of a Testing Suite to Ensure Software Quality

NASA Astrophysics Data System (ADS)

Hammond, G. E.; Frederick, J. M.

2016-12-01

In scientific computing, code verification ensures the reliability and numerical accuracy of a model simulation by comparing the simulation results to experimental data or known analytical solutions. The model is typically defined by a set of partial differential equations with initial and boundary conditions, and verification ensures whether the mathematical model is solved correctly by the software. Code verification is especially important if the software is used to model high-consequence systems which cannot be physically tested in a fully representative environment [Oberkampf and Trucano (2007)]. Justified confidence in a particular computational tool requires clarity in the exercised physics and transparency in its verification process with proper documentation. We present a quality assurance (QA) testing suite developed by Sandia National Laboratories that performs code verification for PFLOTRAN, an open source, massively-parallel subsurface simulator. PFLOTRAN solves systems of generally nonlinear partial differential equations describing multiphase, multicomponent and multiscale reactive flow and transport processes in porous media. PFLOTRAN's QA test suite compares the numerical solutions of benchmark problems in heat and mass transport against known, closed-form, analytical solutions, including documentation of the exercised physical process models implemented in each PFLOTRAN benchmark simulation. The QA test suite development strives to follow the recommendations given by Oberkampf and Trucano (2007), which describes four essential elements in high-quality verification benchmark construction: (1) conceptual description, (2) mathematical description, (3) accuracy assessment, and (4) additional documentation and user information. Several QA tests within the suite will be presented, including details of the benchmark problems and their closed-form analytical solutions, implementation of benchmark problems in PFLOTRAN simulations, and the criteria used to assess PFLOTRAN's performance in the code verification procedure. References Oberkampf, W. L., and T. G. Trucano (2007), Verification and Validation Benchmarks, SAND2007-0853, 67 pgs., Sandia National Laboratories, Albuquerque, NM.
Benchmarking hardware architecture candidates for the NFIRAOS real-time controller

NASA Astrophysics Data System (ADS)

Smith, Malcolm; Kerley, Dan; Herriot, Glen; Véran, Jean-Pierre

2014-07-01

As a part of the trade study for the Narrow Field Infrared Adaptive Optics System, the adaptive optics system for the Thirty Meter Telescope, we investigated the feasibility of performing real-time control computation using a Linux operating system and Intel Xeon E5 CPUs. We also investigated a Xeon Phi based architecture which allows higher levels of parallelism. This paper summarizes both the CPU based real-time controller architecture and the Xeon Phi based RTC. The Intel Xeon E5 CPU solution meets the requirements and performs the computation for one AO cycle in an average of 767 microseconds. The Xeon Phi solution did not meet the 1200 microsecond time requirement and also suffered from unpredictable execution times. More detailed benchmark results are reported for both architectures.
An Application-Based Performance Characterization of the Columbia Supercluster

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Djomehri, Jahed M.; Hood, Robert; Jin, Hoaqiang; Kiris, Cetin; Saini, Subhash

2005-01-01

Columbia is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processors each, and currently ranked as the second-fastest computer in the world. In this paper, we present the performance characteristics of Columbia obtained on up to four computing nodes interconnected via the InfiniBand and/or NUMAlink4 communication fabrics. We evaluate floating-point performance, memory bandwidth, message passing communication speeds, and compilers using a subset of the HPC Challenge benchmarks, and some of the NAS Parallel Benchmarks including the multi-zone versions. We present detailed performance results for three scientific applications of interest to NASA, one from molecular dynamics, and two from computational fluid dynamics. Our results show that both the NUMAlink4 and the InfiniBand hold promise for application scaling to a large number of processors.
Characterizing Task-Based OpenMP Programs

PubMed Central

Muddukrishna, Ananya; Jonsson, Peter A.; Brorsson, Mats

2015-01-01

Programmers struggle to understand performance of task-based OpenMP programs since profiling tools only report thread-based performance. Performance tuning also requires task-based performance in order to balance per-task memory hierarchy utilization against exposed task parallelism. We provide a cost-effective method to extract detailed task-based performance information from OpenMP programs. We demonstrate the utility of our method by quickly diagnosing performance problems and characterizing exposed task parallelism and per-task instruction profiles of benchmarks in the widely-used Barcelona OpenMP Tasks Suite. Programmers can tune performance faster and understand performance tradeoffs more effectively than existing tools by using our method to characterize task-based performance. PMID:25860023
Synergia: an accelerator modeling tool with 3-D space charge

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amundson, James F.; Spentzouris, P.; /Fermilab

2004-07-01

High precision modeling of space-charge effects, together with accurate treatment of single-particle dynamics, is essential for designing future accelerators as well as optimizing the performance of existing machines. We describe Synergia, a high-fidelity parallel beam dynamics simulation package with fully three dimensional space-charge capabilities and a higher order optics implementation. We describe the computational techniques, the advanced human interface, and the parallel performance obtained using large numbers of macroparticles. We also perform code benchmarks comparing to semi-analytic results and other codes. Finally, we present initial results on particle tune spread, beam halo creation, and emittance growth in the Fermilab boostermore » accelerator.« less
The development of a revised version of multi-center molecular Ornstein-Zernike equation

NASA Astrophysics Data System (ADS)

Kido, Kentaro; Yokogawa, Daisuke; Sato, Hirofumi

2012-04-01

Ornstein-Zernike (OZ)-type theory is a powerful tool to obtain 3-dimensional solvent distribution around solute molecule. Recently, we proposed multi-center molecular OZ method, which is suitable for parallel computing of 3D solvation structure. The distribution function in this method consists of two components, namely reference and residue parts. Several types of the function were examined as the reference part to investigate the numerical robustness of the method. As the benchmark, the method is applied to water, benzene in aqueous solution and single-walled carbon nanotube in chloroform solution. The results indicate that fully-parallelization is achieved by utilizing the newly proposed reference functions.
Analytical theory of coherent synchrotron radiation wakefield of short bunches shielded by conducting parallel plates

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stupakov, Gennady; Zhou, Demin

2016-04-21

We develop a general model of coherent synchrotron radiation (CSR) impedance with shielding provided by two parallel conducting plates. This model allows us to easily reproduce all previously known analytical CSR wakes and to expand the analysis to situations not explored before. It reduces calculations of the impedance to taking integrals along the trajectory of the beam. New analytical results are derived for the radiation impedance with shielding for the following orbits: a kink, a bending magnet, a wiggler of finite length, and an infinitely long wiggler. Furthermore, all our formulas are benchmarked against numerical simulations with the CSRZ computermore » code.« less

A GaAs vector processor based on parallel RISC microprocessors

NASA Astrophysics Data System (ADS)

Misko, Tim A.; Rasset, Terry L.

A vector processor architecture based on the development of a 32-bit microprocessor using gallium arsenide (GaAs) technology has been developed. The McDonnell Douglas vector processor (MVP) will be fabricated completely from GaAs digital integrated circuits. The MVP architecture includes a vector memory of 1 megabyte, a parallel bus architecture with eight processing elements connected in parallel, and a control processor. The processing elements consist of a reduced instruction set CPU (RISC) with four floating-point coprocessor units and necessary memory interface functions. This architecture has been simulated for several benchmark programs including complex fast Fourier transform (FFT), complex inner product, trigonometric functions, and sort-merge routine. The results of this study indicate that the MVP can process a 1024-point complex FFT at a speed of 112 microsec (389 megaflops) while consuming approximately 618 W of power in a volume of approximately 0.1 ft-cubed.
Automation of Data Traffic Control on DSM Architecture

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

2001-01-01

The design of distributed shared memory (DSM) computers liberates users from the duty to distribute data across processors and allows for the incremental development of parallel programs using, for example, OpenMP or Java threads. DSM architecture greatly simplifies the development of parallel programs having good performance on a few processors. However, to achieve a good program scalability on DSM computers requires that the user understand data flow in the application and use various techniques to avoid data traffic congestions. In this paper we discuss a number of such techniques, including data blocking, data placement, data transposition and page size control and evaluate their efficiency on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks. We also present a tool which automates the detection of constructs causing data congestions in Fortran array oriented codes and advises the user on code transformations for improving data traffic in the application.
Computational Performance of a Parallelized Three-Dimensional High-Order Spectral Element Toolbox

NASA Astrophysics Data System (ADS)

Bosshard, Christoph; Bouffanais, Roland; Clémençon, Christian; Deville, Michel O.; Fiétier, Nicolas; Gruber, Ralf; Kehtari, Sohrab; Keller, Vincent; Latt, Jonas

In this paper, a comprehensive performance review of an MPI-based high-order three-dimensional spectral element method C++ toolbox is presented. The focus is put on the performance evaluation of several aspects with a particular emphasis on the parallel efficiency. The performance evaluation is analyzed with help of a time prediction model based on a parameterization of the application and the hardware resources. A tailor-made CFD computation benchmark case is introduced and used to carry out this review, stressing the particular interest for clusters with up to 8192 cores. Some problems in the parallel implementation have been detected and corrected. The theoretical complexities with respect to the number of elements, to the polynomial degree, and to communication needs are correctly reproduced. It is concluded that this type of code has a nearly perfect speed up on machines with thousands of cores, and is ready to make the step to next-generation petaflop machines.
Experimental Mapping and Benchmarking of Magnetic Field Codes on the LHD Ion Accelerator

NASA Astrophysics Data System (ADS)

Chitarin, G.; Agostinetti, P.; Gallo, A.; Marconato, N.; Nakano, H.; Serianni, G.; Takeiri, Y.; Tsumori, K.

2011-09-01

For the validation of the numerical models used for the design of the Neutral Beam Test Facility for ITER in Padua [1], an experimental benchmark against a full-size device has been sought. The LHD BL2 injector [2] has been chosen as a first benchmark, because the BL2 Negative Ion Source and Beam Accelerator are geometrically similar to SPIDER, even though BL2 does not include current bars and ferromagnetic materials. A comprehensive 3D magnetic field model of the LHD BL2 device has been developed based on the same assumptions used for SPIDER. In parallel, a detailed experimental magnetic map of the BL2 device has been obtained using a suitably designed 3D adjustable structure for the fine positioning of the magnetic sensors inside 27 of the 770 beamlet apertures. The calculated values have been compared to the experimental data. The work has confirmed the quality of the numerical model, and has also provided useful information on the magnetic non-uniformities due to the edge effects and to the tolerance on permanent magnet remanence.
Experimental Mapping and Benchmarking of Magnetic Field Codes on the LHD Ion Accelerator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chitarin, G.; University of Padova, Dept. of Management and Engineering, strad. S. Nicola, 36100 Vicenza; Agostinetti, P.

2011-09-26

For the validation of the numerical models used for the design of the Neutral Beam Test Facility for ITER in Padua [1], an experimental benchmark against a full-size device has been sought. The LHD BL2 injector [2] has been chosen as a first benchmark, because the BL2 Negative Ion Source and Beam Accelerator are geometrically similar to SPIDER, even though BL2 does not include current bars and ferromagnetic materials. A comprehensive 3D magnetic field model of the LHD BL2 device has been developed based on the same assumptions used for SPIDER. In parallel, a detailed experimental magnetic map of themore » BL2 device has been obtained using a suitably designed 3D adjustable structure for the fine positioning of the magnetic sensors inside 27 of the 770 beamlet apertures. The calculated values have been compared to the experimental data. The work has confirmed the quality of the numerical model, and has also provided useful information on the magnetic non-uniformities due to the edge effects and to the tolerance on permanent magnet remanence.« less
Development of Azeotropic Blends to Replace TCE and nPB in Vapor Degreasing Operations

DTIC Science & Technology

2016-12-21

vapor zone is fully contained and oxygen free, inexpensive and effective flammable solvents may be used. Working toward this type of process change...chlorine, chromic acid etc.) 7.2. Conditions for safe storage including any incompatibilities Store in a well- ventilated place. Store at temperatures ...Nausea, Dizziness, Headache, Exposure to and/or consumption of alcohol may increase toxic effects . To the best of our knowledge, the chemical
The International Conference on Vector and Parallel Computing (2nd)

DTIC Science & Technology

1989-01-17

Computation of the SVD of Bidiagonal Matrices" ...................................... 11 " Lattice QCD -As a Large Scale Scientific Computation...vectorizcd for the IBM 3090 Vector Facility. In addition, elapsed times " Lattice QCD -As a Large Scale Scientific have been reduced by using 3090...benchmarked Lattice QCD on a large number ofcompu- come from the wavefront solver routine. This was exten- ters: CrayX-MP and Cray 2 (vector
High-Order Methods for Computational Physics

DTIC Science & Technology

1999-03-01

computation is running in 278 Ronald D. Henderson parallel. Instead we use the concept of a voxel database (VDB) of geometric positions in the mesh [85...processor 0 Fig. 4.19. Connectivity and communications axe established by building a voxel database (VDB) of positions. A VDB maps each position to a...studies such as the highly accurate stability computations considered help expand the database for this benchmark problem. The two-dimensional linear
Hybrid parallel code acceleration methods in full-core reactor physics calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Courau, T.; Plagne, L.; Ponicot, A.

2012-07-01

When dealing with nuclear reactor calculation schemes, the need for three dimensional (3D) transport-based reference solutions is essential for both validation and optimization purposes. Considering a benchmark problem, this work investigates the potential of discrete ordinates (Sn) transport methods applied to 3D pressurized water reactor (PWR) full-core calculations. First, the benchmark problem is described. It involves a pin-by-pin description of a 3D PWR first core, and uses a 8-group cross-section library prepared with the DRAGON cell code. Then, a convergence analysis is performed using the PENTRAN parallel Sn Cartesian code. It discusses the spatial refinement and the associated angular quadraturemore » required to properly describe the problem physics. It also shows that initializing the Sn solution with the EDF SPN solver COCAGNE reduces the number of iterations required to converge by nearly a factor of 6. Using a best estimate model, PENTRAN results are then compared to multigroup Monte Carlo results obtained with the MCNP5 code. Good consistency is observed between the two methods (Sn and Monte Carlo), with discrepancies that are less than 25 pcm for the k{sub eff}, and less than 2.1% and 1.6% for the flux at the pin-cell level and for the pin-power distribution, respectively. (authors)« less
Scalable Metropolis Monte Carlo for simulation of hard shapes

NASA Astrophysics Data System (ADS)

Anderson, Joshua A.; Eric Irrgang, M.; Glotzer, Sharon C.

2016-07-01

We design and implement a scalable hard particle Monte Carlo simulation toolkit (HPMC), and release it open source as part of HOOMD-blue. HPMC runs in parallel on many CPUs and many GPUs using domain decomposition. We employ BVH trees instead of cell lists on the CPU for fast performance, especially with large particle size disparity, and optimize inner loops with SIMD vector intrinsics on the CPU. Our GPU kernel proposes many trial moves in parallel on a checkerboard and uses a block-level queue to redistribute work among threads and avoid divergence. HPMC supports a wide variety of shape classes, including spheres/disks, unions of spheres, convex polygons, convex spheropolygons, concave polygons, ellipsoids/ellipses, convex polyhedra, convex spheropolyhedra, spheres cut by planes, and concave polyhedra. NVT and NPT ensembles can be run in 2D or 3D triclinic boxes. Additional integration schemes permit Frenkel-Ladd free energy computations and implicit depletant simulations. In a benchmark system of a fluid of 4096 pentagons, HPMC performs 10 million sweeps in 10 min on 96 CPU cores on XSEDE Comet. The same simulation would take 7.6 h in serial. HPMC also scales to large system sizes, and the same benchmark with 16.8 million particles runs in 1.4 h on 2048 GPUs on OLCF Titan.
Multiprocessing the Sieve of Eratosthenes

NASA Technical Reports Server (NTRS)

Bokhari, S.

1986-01-01

The Sieve of Eratosthenes for finding prime numbers in recent years has seen much use as a benchmark algorithm for serial computers while its intrinsically parallel nature has gone largely unnoticed. The implementation of a parallel version of this algorithm for a real parallel computer, the Flex/32, is described and its performance discussed. It is shown that the algorithm is sensitive to several fundamental performance parameters of parallel machines, such as spawning time, signaling time, memory access, and overhead of process switching. Because of the nature of the algorithm, it is impossible to get any speedup beyond 4 or 5 processors unless some form of dynamic load balancing is employed. We describe the performance of our algorithm with and without load balancing and compare it with theoretical lower bounds and simulated results. It is straightforward to understand this algorithm and to check the final results. However, its efficient implementation on a real parallel machine requires thoughtful design, especially if dynamic load balancing is desired. The fundamental operations required by the algorithm are very simple: this means that the slightest overhead appears prominently in performance data. The Sieve thus serves not only as a very severe test of the capabilities of a parallel processor but is also an interesting challenge for the programmer.
Accelerating the Gillespie Exact Stochastic Simulation Algorithm using hybrid parallel execution on graphics processing units.

PubMed

Komarov, Ivan; D'Souza, Roshan M

2012-01-01

The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.
Characterization of robotics parallel algorithms and mapping onto a reconfigurable SIMD machine

NASA Technical Reports Server (NTRS)

Lee, C. S. G.; Lin, C. T.

1989-01-01

The kinematics, dynamics, Jacobian, and their corresponding inverse computations are six essential problems in the control of robot manipulators. Efficient parallel algorithms for these computations are discussed and analyzed. Their characteristics are identified and a scheme on the mapping of these algorithms to a reconfigurable parallel architecture is presented. Based on the characteristics including type of parallelism, degree of parallelism, uniformity of the operations, fundamental operations, data dependencies, and communication requirement, it is shown that most of the algorithms for robotic computations possess highly regular properties and some common structures, especially the linear recursive structure. Moreover, they are well-suited to be implemented on a single-instruction-stream multiple-data-stream (SIMD) computer with reconfigurable interconnection network. The model of a reconfigurable dual network SIMD machine with internal direct feedback is introduced. A systematic procedure internal direct feedback is introduced. A systematic procedure to map these computations to the proposed machine is presented. A new scheduling problem for SIMD machines is investigated and a heuristic algorithm, called neighborhood scheduling, that reorders the processing sequence of subtasks to reduce the communication time is described. Mapping results of a benchmark algorithm are illustrated and discussed.
Parallel computation with molecular-motor-propelled agents in nanofabricated networks.

PubMed

Nicolau, Dan V; Lard, Mercy; Korten, Till; van Delft, Falco C M J M; Persson, Malin; Bengtsson, Elina; Månsson, Alf; Diez, Stefan; Linke, Heiner; Nicolau, Dan V

2016-03-08

The combinatorial nature of many important mathematical problems, including nondeterministic-polynomial-time (NP)-complete problems, places a severe limitation on the problem size that can be solved with conventional, sequentially operating electronic computers. There have been significant efforts in conceiving parallel-computation approaches in the past, for example: DNA computation, quantum computation, and microfluidics-based computation. However, these approaches have not proven, so far, to be scalable and practical from a fabrication and operational perspective. Here, we report the foundations of an alternative parallel-computation system in which a given combinatorial problem is encoded into a graphical, modular network that is embedded in a nanofabricated planar device. Exploring the network in a parallel fashion using a large number of independent, molecular-motor-propelled agents then solves the mathematical problem. This approach uses orders of magnitude less energy than conventional computers, thus addressing issues related to power consumption and heat dissipation. We provide a proof-of-concept demonstration of such a device by solving, in a parallel fashion, the small instance {2, 5, 9} of the subset sum problem, which is a benchmark NP-complete problem. Finally, we discuss the technical advances necessary to make our system scalable with presently available technology.
Evaluation of plasma cholestane-3β,5α,6β-triol and 7-ketocholesterol in inherited disorders related to cholesterol metabolism[S

PubMed Central

Boenzi, Sara; Deodato, Federica; Taurisano, Roberta; Goffredo, Bianca Maria; Rizzo, Cristiano; Dionisi-Vici, Carlo

2016-01-01

Oxysterols are intermediates of cholesterol metabolism and are generated from cholesterol via either enzymatic or nonenzymatic pathways under oxidative stress conditions. Cholestan-3β,5α,6β-triol (C-triol) and 7-ketocholesterol (7-KC) have been proposed as new biomarkers for the diagnosis of Niemann-Pick type C (NP-C) disease, representing an alternative tool to the invasive and time-consuming method of fibroblast filipin test. To test the efficacy of plasma oxysterol determination for the diagnosis of NP-C, we systematically screened oxysterol levels in patients affected by different inherited disorders related with cholesterol metabolism, which included Niemann-Pick type B (NP-B) disease, lysosomal acid lipase (LAL) deficiency, Smith-Lemli-Opitz syndrome (SLOS), congenital familial hypercholesterolemia (FH), and sitosterolemia (SITO). As expected, NP-C patients showed significant increase of both C-triol and 7-KC. Strong increase of both oxysterols was observed in NP-B and less pronounced in LAL deficiency. In SLOS, only 7-KC was markedly increased, whereas in both FH and in SITO, oxysterol concentrations were normal. Interestingly, in NP-C alone, we observed that plasma oxysterols correlate negatively with patient’s age and positively with serum total bilirubin, suggesting the potential relationship between oxysterol levels and hepatic disease status. Our results indicate that oxysterols are reliable and sensitive biomarkers of NP-C. PMID:26733147
Transition metal oxide as anode interface buffer for impedance spectroscopy

NASA Astrophysics Data System (ADS)

Xu, Hui; Tang, Chao; Wang, Xu-Liang; Zhai, Wen-Juan; Liu, Rui-Lan; Rong, Zhou; Pang, Zong-Qiang; Jiang, Bing; Fan, Qu-Li; Huang, Wei

2015-12-01

Impedance spectroscopy is a strong method in electric measurement, which also shows powerful function in research of carrier dynamics in organic semiconductors when suitable mathematical physical models are used. Apart from this, another requirement is that the contact interface between the electrode and materials should at least be quasi-ohmic contact. So in this report, three different transitional metal oxides, V2O5, MoO3 and WO3 were used as hole injection buffer for interface of ITO/NPB. Through the impedance spectroscopy and PSO algorithm, the carrier mobilities and I-V characteristics of the NPB in different devices were measured. Then the data curves were compared with the single layer device without the interface layer in order to investigate the influence of transitional metal oxides on the carrier mobility. The careful research showed that when the work function (WF) of the buffer material was just between the work function of anode and the HOMO of the organic material, such interface material could work as a good bridge for carrier injection. Under such condition, the carrier mobility measured through impedance spectroscopy should be close to the intrinsic value. Considering that the HOMO (or LUMO) of most organic semiconductors did not match with the work function of the electrode, this report also provides a method for wide application of impedance spectroscopy to the research of carrier dynamics.
Genetic Biomarkers for Neoplastic Colorectal Cancer in Peripheral Lymphocytes

PubMed Central

Ionescu, Mirela; Ciocirlan, Mihai; Ionescu, Cristina; Becheanu, Gabriel; Gologan, Serban; Teiusanu, Adriana; Arbanas, Tudor; Mircea, Diculescu

2011-01-01

ABSTRACT Background: Loss of genomic stability appears as a key step in colorectal carcinogenesis. Micronucleus (MN) designates a chromosome fragment or an entire chromosme which lags behind mitosis. MN may be noticed as an additional nucleus within the cytoplasm cell during the intermediate mitosis phases. We tested the hypothesis that MN and its related anomalies may be associated with the presence of neoplastic colorectal lesions. Method: Peripheral blood lymphocytes were cultured and microscopically examined. The frequency of micronuclei (FMN) and the presence of nucleoplasmic bridges (NPB) in binucleated cells were compared in patients with of without colorectal neoplastic lesions. Results: We included 45 patients undergoing colonoscopy, 23 males and 22 females, with a median age of 59. 17 patients had polyps, 11 colorectal cancer (CRC) and 17 had a normal colonoscopy. The FMN was significantly higher in women than in men (8.14 vs 4.17, p=0.008); NPB were significantly less frequent in patients with advanced adenomas (>10mm or vilous) or CRC (p=0.044) when compared with patients with normal colonoscopy, hiperplastic polyps or non-advanced adenomas. Conclusion: Micronuclei are more frequent in women, but its frequency was not significantly different in patients with advanced adenomas or CRC. Null or low frequency values for nucleoplasmic bridges presence in peripheral lymphocyte may be predictive for advanced adenomas and colorectal cancer. PMID:22205889
Reconstruction of the maxillary midline papilla following a combined orthodontic-periodontic treatment in adult periodontal patients.

PubMed

Cardaropoli, Daniele; Re, Stefania; Corrente, Giuseppe; Abundo, Roberto

2004-02-01

The aim of the present study was to evaluate the role of a combined orthodontic-periodontic treatment in determining the reconstruction of midline papilla lost following periodontitis. Twenty-eight patients, with infrabony defect and extrusion of one maxillary central incisor, were treated. At baseline, all patients presented opening of the interdental diastema and loss of the papilla. At 7-10 days after open-flap surgery, the intrusive movement started. For each patient, probing pocket depth (PPD), clinical attachment level (CAL) and papilla presence index (PI) were assessed at baseline, end of treatment and after 1 year. PI was also evaluated independently in patients with narrow or wide periodontal biotype (NPB-WPB). All parameters showed statistical improvement between the initial and final measurements, and showed no changes at follow-up time. The mean residual PPD was 2.50 mm, with a decrease of 4.29 mm, while the mean CAL gain was 5.93 mm. Twenty-three out of 28 patients improved the PI score at the end of therapy. No statistical difference was recorded in PI values between groups NPB and WPB. The presented clinical protocol resulted in the improvement of all parameters examined. At the end of orthodontic treatment, a predictable reconstruction of the interdental papilla was reported, both in patients with thin or wide gingiva. Copyright Blackwell Munksgaard, 2004.
Influence of ammonium salts on the lipase/esterase activity assay using p-nitrophenyl esters as substrates.

PubMed

De Yan, Hong; Zhang, Yin Jun; Liu, Hong Cai; Zheng, Jian Yong; Wang, Zhao

2013-01-01

p-Nitrophenyl esters with a short-chain carboxylic group, such as p-nitrophenyl acetate (p-NPA) and p-nitrophenyl butyrate (p-NPB), could be effectively hydrolyzed by ammonium salts. p-Nitrophenyl esters were usually used as substrates to assay the lipase/esterase activity. Ammonium sulfate precipitation was often used to purify proteins, and some ammonium salts were usually used as nitrogen sources or inorganic salts for the lipase/esterase production. To study the effect of ammonium salts on the assay of the lipase/esterase activity, the contributing factors of hydrolysis of p-NPA/p-NPB catalyzed by ammonium salts were investigated. The lipase activities were compared in the presence and absence of ammonium sulfate. The hydrolysis reaction could be catalyzed under neutral and alkaline circumstances. The hydrolysis rate increased with the increase in the reaction temperature or the concentration of ammonium ion. When p-NPA was employed as the substrate for the analysis of the lipase/esterase activity, the effect of ammonium sulfate on the analysis could be neutralized by setting a control when the concentration of ammonium sulfate was less than 40% saturation. However, when the concentration of ammonium sulfate increased from 40% to 100% saturation, the enzyme activities decreased about 13-40%, which could not be ignored for accurate analysis of the enzyme activity. © 2013 International Union of Biochemistry and Molecular Biology, Inc.
Simulating Hydrologic Flow and Reactive Transport with PFLOTRAN and PETSc on Emerging Fine-Grained Parallel Computer Architectures

NASA Astrophysics Data System (ADS)

Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.

2017-12-01

As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.

Serial vs. parallel models of attention in visual search: accounting for benchmark RT-distributions.

PubMed

Moran, Rani; Zehetleitner, Michael; Liesefeld, Heinrich René; Müller, Hermann J; Usher, Marius

2016-10-01

Visual search is central to the investigation of selective visual attention. Classical theories propose that items are identified by serially deploying focal attention to their locations. While this accounts for set-size effects over a continuum of task difficulties, it has been suggested that parallel models can account for such effects equally well. We compared the serial Competitive Guided Search model with a parallel model in their ability to account for RT distributions and error rates from a large visual search data-set featuring three classical search tasks: 1) a spatial configuration search (2 vs. 5); 2) a feature-conjunction search; and 3) a unique feature search (Wolfe, Palmer & Horowitz Vision Research, 50(14), 1304-1311, 2010). In the parallel model, each item is represented by a diffusion to two boundaries (target-present/absent); the search corresponds to a parallel race between these diffusors. The parallel model was highly flexible in that it allowed both for a parametric range of capacity-limitation and for set-size adjustments of identification boundaries. Furthermore, a quit unit allowed for a continuum of search-quitting policies when the target is not found, with "single-item inspection" and exhaustive searches comprising its extremes. The serial model was found to be superior to the parallel model, even before penalizing the parallel model for its increased complexity. We discuss the implications of the results and the need for future studies to resolve the debate.
Analysis of dosimetry from the H.B. Robinson unit 2 pressure vessel benchmark using RAPTOR-M3G and ALPAN

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fischer, G.A.

2011-07-01

Document available in abstract form only, full text of document follows: The dosimetry from the H. B. Robinson Unit 2 Pressure Vessel Benchmark is analyzed with a suite of Westinghouse-developed codes and data libraries. The radiation transport from the reactor core to the surveillance capsule and ex-vessel locations is performed by RAPTOR-M3G, a parallel deterministic radiation transport code that calculates high-resolution neutron flux information in three dimensions. The cross-section library used in this analysis is the ALPAN library, an Evaluated Nuclear Data File (ENDF)/B-VII.0-based library designed for reactor dosimetry and fluence analysis applications. Dosimetry is evaluated with the industry-standard SNLRMLmore » reactor dosimetry cross-section data library. (authors)« less
Progress in Unsteady Turbopump Flow Simulations Using Overset Grid Systems

NASA Technical Reports Server (NTRS)

Kiris, Cetin C.; Chan, William; Kwak, Dochan

2002-01-01

This viewgraph presentation provides information on unsteady flow simulations for the Second Generation RLV (Reusable Launch Vehicle) baseline turbopump. Three impeller rotations were simulated by using a 34.3 million grid points model. MPI/OpenMP hybrid parallelism and MLP shared memory parallelism has been implemented and benchmarked in INS3D, an incompressible Navier-Stokes solver. For RLV turbopump simulations a speed up of more than 30 times has been obtained. Moving boundary capability is obtained by using the DCF module. Scripting capability from CAD geometry to solution is developed. Unsteady flow simulations for advanced consortium impeller/diffuser by using a 39 million grid points model are currently underway. 1.2 impeller rotations are completed. The fluid/structure coupling is initiated.
Support of Multidimensional Parallelism in the OpenMP Programming Model

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele

2003-01-01

OpenMP is the current standard for shared-memory programming. While providing ease of parallel programming, the OpenMP programming model also has limitations which often effect the scalability of applications. Examples for these limitations are work distribution and point-to-point synchronization among threads. We propose extensions to the OpenMP programming model which allow the user to easily distribute the work in multiple dimensions and synchronize the workflow among the threads. The proposed extensions include four new constructs and the associated runtime library. They do not require changes to the source code and can be implemented based on the existing OpenMP standard. We illustrate the concept in a prototype translator and test with benchmark codes and a cloud modeling code.
Multi-Purpose, Application-Centric, Scalable I/O Proxy Application

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, M. C.

2015-06-15

MACSio is a Multi-purpose, Application-Centric, Scalable I/O proxy application. It is designed to support a number of goals with respect to parallel I/O performance testing and benchmarking including the ability to test and compare various I/O libraries and I/O paradigms, to predict scalable performance of real applications and to help identify where improvements in I/O performance can be made within the HPC I/O software stack.
PeakRanger: A cloud-enabled peak caller for ChIP-seq data

PubMed Central

2011-01-01

Background Chromatin immunoprecipitation (ChIP), coupled with massively parallel short-read sequencing (seq) is used to probe chromatin dynamics. Although there are many algorithms to call peaks from ChIP-seq datasets, most are tuned either to handle punctate sites, such as transcriptional factor binding sites, or broad regions, such as histone modification marks; few can do both. Other algorithms are limited in their configurability, performance on large data sets, and ability to distinguish closely-spaced peaks. Results In this paper, we introduce PeakRanger, a peak caller software package that works equally well on punctate and broad sites, can resolve closely-spaced peaks, has excellent performance, and is easily customized. In addition, PeakRanger can be run in a parallel cloud computing environment to obtain extremely high performance on very large data sets. We present a series of benchmarks to evaluate PeakRanger against 10 other peak callers, and demonstrate the performance of PeakRanger on both real and synthetic data sets. We also present real world usages of PeakRanger, including peak-calling in the modENCODE project. Conclusions Compared to other peak callers tested, PeakRanger offers improved resolution in distinguishing extremely closely-spaced peaks. PeakRanger has above-average spatial accuracy in terms of identifying the precise location of binding events. PeakRanger also has excellent sensitivity and specificity in all benchmarks evaluated. In addition, PeakRanger offers significant improvements in run time when running on a single processor system, and very marked improvements when allowed to take advantage of the MapReduce parallel environment offered by a cloud computing resource. PeakRanger can be downloaded at the official site of modENCODE project: http://www.modencode.org/software/ranger/ PMID:21554709
[Benchmark experiment to verify radiation transport calculations for dosimetry in radiation therapy].

PubMed

Renner, Franziska

2016-09-01

Monte Carlo simulations are regarded as the most accurate method of solving complex problems in the field of dosimetry and radiation transport. In (external) radiation therapy they are increasingly used for the calculation of dose distributions during treatment planning. In comparison to other algorithms for the calculation of dose distributions, Monte Carlo methods have the capability of improving the accuracy of dose calculations - especially under complex circumstances (e.g. consideration of inhomogeneities). However, there is a lack of knowledge of how accurate the results of Monte Carlo calculations are on an absolute basis. A practical verification of the calculations can be performed by direct comparison with the results of a benchmark experiment. This work presents such a benchmark experiment and compares its results (with detailed consideration of measurement uncertainty) with the results of Monte Carlo calculations using the well-established Monte Carlo code EGSnrc. The experiment was designed to have parallels to external beam radiation therapy with respect to the type and energy of the radiation, the materials used and the kind of dose measurement. Because the properties of the beam have to be well known in order to compare the results of the experiment and the simulation on an absolute basis, the benchmark experiment was performed using the research electron accelerator of the Physikalisch-Technische Bundesanstalt (PTB), whose beam was accurately characterized in advance. The benchmark experiment and the corresponding Monte Carlo simulations were carried out for two different types of ionization chambers and the results were compared. Considering the uncertainty, which is about 0.7 % for the experimental values and about 1.0 % for the Monte Carlo simulation, the results of the simulation and the experiment coincide. Copyright © 2015. Published by Elsevier GmbH.
A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG

NASA Astrophysics Data System (ADS)

Griffiths, M. K.; Fedun, V.; Erdélyi, R.

2015-03-01

Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1-3D MHD code capable of modelling magnetized and gravitationally stratified plasma. The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.
Parallel computing of a digital hologram and particle searching for microdigital-holographic particle-tracking velocimetry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Satake, Shin-ichi; Kanamori, Hiroyuki; Kunugi, Tomoaki

2007-02-01

We have developed a parallel algorithm for microdigital-holographic particle-tracking velocimetry. The algorithm is used in (1) numerical reconstruction of a particle image computer using a digital hologram, and (2) searching for particles. The numerical reconstruction from the digital hologram makes use of the Fresnel diffraction equation and the FFT (fast Fourier transform),whereas the particle search algorithm looks for local maximum graduation in a reconstruction field represented by a 3D matrix. To achieve high performance computing for both calculations (reconstruction and particle search), two memory partitions are allocated to the 3D matrix. In this matrix, the reconstruction part consists of horizontallymore » placed 2D memory partitions on the x-y plane for the FFT, whereas, the particle search part consists of vertically placed 2D memory partitions set along the z axes.Consequently, the scalability can be obtained for the proportion of processor elements,where the benchmarks are carried out for parallel computation by a SGI Altix machine.« less
Parallel replica dynamics method for bistable stochastic reaction networks: Simulation and sensitivity analysis

NASA Astrophysics Data System (ADS)

Wang, Ting; Plecháč, Petr

2017-12-01

Stochastic reaction networks that exhibit bistable behavior are common in systems biology, materials science, and catalysis. Sampling of stationary distributions is crucial for understanding and characterizing the long-time dynamics of bistable stochastic dynamical systems. However, simulations are often hindered by the insufficient sampling of rare transitions between the two metastable regions. In this paper, we apply the parallel replica method for a continuous time Markov chain in order to improve sampling of the stationary distribution in bistable stochastic reaction networks. The proposed method uses parallel computing to accelerate the sampling of rare transitions. Furthermore, it can be combined with the path-space information bounds for parametric sensitivity analysis. With the proposed methodology, we study three bistable biological networks: the Schlögl model, the genetic switch network, and the enzymatic futile cycle network. We demonstrate the algorithmic speedup achieved in these numerical benchmarks. More significant acceleration is expected when multi-core or graphics processing unit computer architectures and programming tools such as CUDA are employed.
INL Results for Phases I and III of the OECD/NEA MHTGR-350 Benchmark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gerhard Strydom; Javier Ortensi; Sonat Sen

2013-09-01

The Idaho National Laboratory (INL) Very High Temperature Reactor (VHTR) Technology Development Office (TDO) Methods Core Simulation group led the construction of the Organization for Economic Cooperation and Development (OECD) Modular High Temperature Reactor (MHTGR) 350 MW benchmark for comparing and evaluating prismatic VHTR analysis codes. The benchmark is sponsored by the OECD's Nuclear Energy Agency (NEA), and the project will yield a set of reference steady-state, transient, and lattice depletion problems that can be used by the Department of Energy (DOE), the Nuclear Regulatory Commission (NRC), and vendors to assess their code suits. The Methods group is responsible formore » defining the benchmark specifications, leading the data collection and comparison activities, and chairing the annual technical workshops. This report summarizes the latest INL results for Phase I (steady state) and Phase III (lattice depletion) of the benchmark. The INSTANT, Pronghorn and RattleSnake codes were used for the standalone core neutronics modeling of Exercise 1, and the results obtained from these codes are compared in Section 4. Exercise 2 of Phase I requires the standalone steady-state thermal fluids modeling of the MHTGR-350 design, and the results for the systems code RELAP5-3D are discussed in Section 5. The coupled neutronics and thermal fluids steady-state solution for Exercise 3 are reported in Section 6, utilizing the newly developed Parallel and Highly Innovative Simulation for INL Code System (PHISICS)/RELAP5-3D code suit. Finally, the lattice depletion models and results obtained for Phase III are compared in Section 7. The MHTGR-350 benchmark proved to be a challenging simulation set of problems to model accurately, and even with the simplifications introduced in the benchmark specification this activity is an important step in the code-to-code verification of modern prismatic VHTR codes. A final OECD/NEA comparison report will compare the Phase I and III results of all other international participants in 2014, while the remaining Phase II transient case results will be reported in 2015.« less
Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance

PubMed Central

Rand, Hugh; Shumway, Martin; Trees, Eija K.; Simmons, Mustafa; Agarwala, Richa; Davis, Steven; Tillman, Glenn E.; Defibaugh-Chavez, Stephanie; Carleton, Heather A.; Klimke, William A.; Katz, Lee S.

2017-01-01

Background As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. Methods We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (reference-based SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and “known” phylogenetic trees in publicly-accessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Results Our “outbreak” benchmark datasets represent the four major foodborne bacterial pathogens (Listeria monocytogenes, Salmonella enterica, Escherichia coli, and Campylobacter jejuni) and one simulated dataset where the “known tree” can be accurately called the “true tree”. The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. Discussion These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross-institutional collaborations. Our work is part of a global effort to provide collaborative infrastructure for sequence data and analytic tools—we welcome additional benchmark datasets in our recommended format, and, if relevant, we will add these on our GitHub site. Together, these datasets, dataset format, and the underlying GitHub infrastructure present a recommended path for worldwide standardization of phylogenomic pipelines. PMID:29372115
Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance.

PubMed

Timme, Ruth E; Rand, Hugh; Shumway, Martin; Trees, Eija K; Simmons, Mustafa; Agarwala, Richa; Davis, Steven; Tillman, Glenn E; Defibaugh-Chavez, Stephanie; Carleton, Heather A; Klimke, William A; Katz, Lee S

2017-01-01

As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (reference-based SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and "known" phylogenetic trees in publicly-accessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Our "outbreak" benchmark datasets represent the four major foodborne bacterial pathogens ( Listeria monocytogenes , Salmonella enterica , Escherichia coli , and Campylobacter jejuni ) and one simulated dataset where the "known tree" can be accurately called the "true tree". The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross-institutional collaborations. Our work is part of a global effort to provide collaborative infrastructure for sequence data and analytic tools-we welcome additional benchmark datasets in our recommended format, and, if relevant, we will add these on our GitHub site. Together, these datasets, dataset format, and the underlying GitHub infrastructure present a recommended path for worldwide standardization of phylogenomic pipelines.
Quasi-disjoint pentadiagonal matrix systems for the parallelization of compact finite-difference schemes and filters

NASA Astrophysics Data System (ADS)

Kim, Jae Wook

2013-05-01

This paper proposes a novel systematic approach for the parallelization of pentadiagonal compact finite-difference schemes and filters based on domain decomposition. The proposed approach allows a pentadiagonal banded matrix system to be split into quasi-disjoint subsystems by using a linear-algebraic transformation technique. As a result the inversion of pentadiagonal matrices can be implemented within each subdomain in an independent manner subject to a conventional halo-exchange process. The proposed matrix transformation leads to new subdomain boundary (SB) compact schemes and filters that require three halo terms to exchange with neighboring subdomains. The internode communication overhead in the present approach is equivalent to that of standard explicit schemes and filters based on seven-point discretization stencils. The new SB compact schemes and filters demand additional arithmetic operations compared to the original serial ones. However, it is shown that the additional cost becomes sufficiently low by choosing optimal sizes of their discretization stencils. Compared to earlier published results, the proposed SB compact schemes and filters successfully reduce parallelization artifacts arising from subdomain boundaries to a level sufficiently negligible for sophisticated aeroacoustic simulations without degrading parallel efficiency. The overall performance and parallel efficiency of the proposed approach are demonstrated by stringent benchmark tests.
Cavitation, Flow Structure and Turbulence in the Tip Region of a Rotor Blade

NASA Technical Reports Server (NTRS)

Wu, H.; Miorini, R.; Soranna, F.; Katz, J.; Michael, T.; Jessup, S.

2010-01-01

Objectives: Measure the flow structure and turbulence within a Naval, axial waterjet pump. Create a database for benchmarking and validation of parallel computational efforts. Address flow and turbulence modeling issues that are unique to this complex environment. Measure and model flow phenomena affecting cavitation within the pump and its effect on pump performance. This presentation focuses on cavitation phenomena and associated flow structure in the tip region of a rotor blade.
Time-Dependent Simulations of Incompressible Flow in a Turbopump Using Overset Grid Approach

NASA Technical Reports Server (NTRS)

Kiris, Cetin; Kwak, Dochan

2001-01-01

This viewgraph presentation provides information on mathematical modelling of the SSME (space shuttle main engine). The unsteady SSME-rig1 start-up procedure from the pump at rest has been initiated by using 34.3 million grid points. The computational model for the SSME-rig1 has been completed. Moving boundary capability is obtained by using DCF module in OVERFLOW-D. MPI (Message Passing Interface)/OpenMP hybrid parallel code has been benchmarked.
Quantitative phenotyping via deep barcode sequencing.

PubMed

Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

2009-10-01

Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.
Robust fuzzy output feedback controller for affine nonlinear systems via T-S fuzzy bilinear model: CSTR benchmark.

PubMed

Hamdy, M; Hamdan, I

2015-07-01

In this paper, a robust H∞ fuzzy output feedback controller is designed for a class of affine nonlinear systems with disturbance via Takagi-Sugeno (T-S) fuzzy bilinear model. The parallel distributed compensation (PDC) technique is utilized to design a fuzzy controller. The stability conditions of the overall closed loop T-S fuzzy bilinear model are formulated in terms of Lyapunov function via linear matrix inequality (LMI). The control law is robustified by H∞ sense to attenuate external disturbance. Moreover, the desired controller gains can be obtained by solving a set of LMI. A continuous stirred tank reactor (CSTR), which is a benchmark problem in nonlinear process control, is discussed in detail to verify the effectiveness of the proposed approach with a comparative study. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
Running Neuroimaging Applications on Amazon Web Services: How, When, and at What Cost?

PubMed

Madhyastha, Tara M; Koh, Natalie; Day, Trevor K M; Hernández-Fernández, Moises; Kelley, Austin; Peterson, Daniel J; Rajan, Sabreena; Woelfer, Karl A; Wolf, Jonathan; Grabowski, Thomas J

2017-01-01

The contribution of this paper is to identify and describe current best practices for using Amazon Web Services (AWS) to execute neuroimaging workflows "in the cloud." Neuroimaging offers a vast set of techniques by which to interrogate the structure and function of the living brain. However, many of the scientists for whom neuroimaging is an extremely important tool have limited training in parallel computation. At the same time, the field is experiencing a surge in computational demands, driven by a combination of data-sharing efforts, improvements in scanner technology that allow acquisition of images with higher image resolution, and by the desire to use statistical techniques that stress processing requirements. Most neuroimaging workflows can be executed as independent parallel jobs and are therefore excellent candidates for running on AWS, but the overhead of learning to do so and determining whether it is worth the cost can be prohibitive. In this paper we describe how to identify neuroimaging workloads that are appropriate for running on AWS, how to benchmark execution time, and how to estimate cost of running on AWS. By benchmarking common neuroimaging applications, we show that cloud computing can be a viable alternative to on-premises hardware. We present guidelines that neuroimaging labs can use to provide a cluster-on-demand type of service that should be familiar to users, and scripts to estimate cost and create such a cluster.
ARABIC TRANSLATION AND ADAPTATION OF THE HOSPITAL CONSUMER ASSESSMENT OF HEALTHCARE PROVIDERS AND SYSTEMS (HCAHPS) PATIENT SATISFACTION SURVEY INSTRUMENT.

PubMed

Dockins, James; Abuzahrieh, Ramzi; Stack, Martin

2015-01-01

To translate and adapt an effective, validated, benchmarked, and widely used patient satisfaction measurement tool for use with an Arabic-speaking population. Translation of survey's items, survey administration process development, evaluation of reliability, and international benchmarking Three hundred-bed tertiary care hospital in Jeddah, Saudi Arabia. 645 patients discharged during 2011 from the hospital's inpatient care units. INTERVENTIONS; The Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) instrument was translated into Arabic, a randomized weekly sample of patients was selected, and the survey was administered via telephone during 2011 to patients or their relatives. Scores were compiled for each of the HCAHPS questions and then for each of the six HCAHPS clinical composites, two non-clinical items, and two global items. Clinical composite scores, as well as the two non-clinical and two global items were analyzed for the 645 respondents. Clinical composites were analyzed using Spearman's correlation coefficient and Cronbach's alpha to demonstrate acceptable internal consistency for these items and scales demonstrated acceptable internal consistency for the clinical composites. (Spearman's correlation coefficient = 0.327 - 0.750, P < 0.01; Cronbach's alpha = 0.516 - 0.851) All ten HCAHPS measures were compared quarterly to US national averages with results that closely paralleled the US benchmarks. . The Arabic translation and adaptation of the HCAHPS is a valid, reliable, and feasible tool for evaluation and benchmarking of inpatient satisfaction in Arabic speaking populations.

The Scalable Checkpoint/Restart Library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moody, A.

The Scalable Checkpoint/Restart (SCR) library provides an interface that codes may use to worite our and read in application-level checkpoints in a scalable fashion. In the current implementation, checkpoint files are cached in local storage (hard disk or RAM disk) on the compute nodes. This technique provides scalable aggregate bandwidth and uses storage resources that are fully dedicated to the job. This approach addresses the two common drawbacks of checkpointing a large-scale application to a shared parallel file system, namely, limited bandwidth and file system contention. In fact, on current platforms, SCR scales linearly with the number of compute nodes.more » It has been benchmarked as high as 720GB/s on 1094 nodes of Atlas, which is nearly two orders of magnitude faster thanthe parallel file system.« less
GLAD: a system for developing and deploying large-scale bioinformatics grid.

PubMed

Teo, Yong-Meng; Wang, Xianbing; Ng, Yew-Kwong

2005-03-01

Grid computing is used to solve large-scale bioinformatics problems with gigabytes database by distributing the computation across multiple platforms. Until now in developing bioinformatics grid applications, it is extremely tedious to design and implement the component algorithms and parallelization techniques for different classes of problems, and to access remotely located sequence database files of varying formats across the grid. In this study, we propose a grid programming toolkit, GLAD (Grid Life sciences Applications Developer), which facilitates the development and deployment of bioinformatics applications on a grid. GLAD has been developed using ALiCE (Adaptive scaLable Internet-based Computing Engine), a Java-based grid middleware, which exploits the task-based parallelism. Two bioinformatics benchmark applications, such as distributed sequence comparison and distributed progressive multiple sequence alignment, have been developed using GLAD.
Data Acquisition and Linguistic Resources

NASA Astrophysics Data System (ADS)

Strassel, Stephanie; Christianson, Caitlin; McCary, John; Staderman, William; Olive, Joseph

All human language technology demands substantial quantities of data for system training and development, plus stable benchmark data to measure ongoing progress. While creation of high quality linguistic resources is both costly and time consuming, such data has the potential to profoundly impact not just a single evaluation program but language technology research in general. GALE's challenging performance targets demand linguistic data on a scale and complexity never before encountered. Resources cover multiple languages (Arabic, Chinese, and English) and multiple genres -- both structured (newswire and broadcast news) and unstructured (web text, including blogs and newsgroups, and broadcast conversation). These resources include significant volumes of monolingual text and speech, parallel text, and transcribed audio combined with multiple layers of linguistic annotation, ranging from word aligned parallel text and Treebanks to rich semantic annotation.
Parallel 3D-TLM algorithm for simulation of the Earth-ionosphere cavity

NASA Astrophysics Data System (ADS)

Toledo-Redondo, Sergio; Salinas, Alfonso; Morente-Molinera, Juan Antonio; Méndez, Antonio; Fornieles, Jesús; Portí, Jorge; Morente, Juan Antonio

2013-03-01

A parallel 3D algorithm for solving time-domain electromagnetic problems with arbitrary geometries is presented. The technique employed is the Transmission Line Modeling (TLM) method implemented in Shared Memory (SM) environments. The benchmarking performed reveals that the maximum speedup depends on the memory size of the problem as well as multiple hardware factors, like the disposition of CPUs, cache, or memory. A maximum speedup of 15 has been measured for the largest problem. In certain circumstances of low memory requirements, superlinear speedup is achieved using our algorithm. The model is employed to model the Earth-ionosphere cavity, thus enabling a study of the natural electromagnetic phenomena that occur in it. The algorithm allows complete 3D simulations of the cavity with a resolution of 10 km, within a reasonable timescale.
Evaluation of plasma cholestane-3β,5α,6β-triol and 7-ketocholesterol in inherited disorders related to cholesterol metabolism.

PubMed

Boenzi, Sara; Deodato, Federica; Taurisano, Roberta; Goffredo, Bianca Maria; Rizzo, Cristiano; Dionisi-Vici, Carlo

2016-03-01

Oxysterols are intermediates of cholesterol metabolism and are generated from cholesterol via either enzymatic or nonenzymatic pathways under oxidative stress conditions. Cholestan-3β,5α,6β-triol (C-triol) and 7-ketocholesterol (7-KC) have been proposed as new biomarkers for the diagnosis of Niemann-Pick type C (NP-C) disease, representing an alternative tool to the invasive and time-consuming method of fibroblast filipin test. To test the efficacy of plasma oxysterol determination for the diagnosis of NP-C, we systematically screened oxysterol levels in patients affected by different inherited disorders related with cholesterol metabolism, which included Niemann-Pick type B (NP-B) disease, lysosomal acid lipase (LAL) deficiency, Smith-Lemli-Opitz syndrome (SLOS), congenital familial hypercholesterolemia (FH), and sitosterolemia (SITO). As expected, NP-C patients showed significant increase of both C-triol and 7-KC. Strong increase of both oxysterols was observed in NP-B and less pronounced in LAL deficiency. In SLOS, only 7-KC was markedly increased, whereas in both FH and in SITO, oxysterol concentrations were normal. Interestingly, in NP-C alone, we observed that plasma oxysterols correlate negatively with patient's age and positively with serum total bilirubin, suggesting the potential relationship between oxysterol levels and hepatic disease status. Our results indicate that oxysterols are reliable and sensitive biomarkers of NP-C. Copyright © 2016 by the American Society for Biochemistry and Molecular Biology, Inc.
Enhancement of external quantum efficiency and reduction of roll-off in blue phosphorescent organic light emitt diodes using TCTA inter-layer

NASA Astrophysics Data System (ADS)

Kim, Ji Young; Kim, Nam Ho; Kim, Jin Wook; Kang, Jin Sung; Yoon, Ju-An; Yoo, Seung Il; Kim, Woo Young; Cheah, Kok Wai

2014-11-01

The improved external quantum efficiency (EQE) and reduced roll-off properties of blue phosphorescent organic light-emitting diodes (PHOLEDs), were fabricated with structure, ITO/NPB (400 Å)/TCTA (200 Å)/mCP:FIrpic (7%)(300 Å)/TPBi (300 Å)/Liq (20 Å)/Al (800 Å) by incorporating an 4,4‧,4‧‧-tris(carbazol-9-yl)-triphenylamine (TCTA) interlayer. We compared the properties of 2,9-dimethyl-4,7-diphenyl-1,10-phenanthroline (BCP) and 1,3,5-tris(N-phenylbenzimidazole-2-yl)benzene (TPBi) as the electron transport layer (ETL) with a typical structure of hole transport layer (HTL)/emissive layer (EML)/ETL in OLEDs and utilized inter-layer in the optimized structure to enhance EQE to 52% at 5.5 V, also stabilize the roll-off of 23%. The use of inter-layer in blue PHOLEDs exhibits a current efficiency of 10.04 cd/A, an EQE of 6.20% at 5.5 V and the highest luminance of 10310 cd/m2 at 9.5 V. We have identified the properties of electroluminescence through the inter-layer in blue PHOLEDs which can be divided into singlet excitons and triplet excitons which emit fluorescence of N,N‧-bis(1-naphthalenyl)-N,N‧-bis-phenyl-(1,1‧-biphenyl)-4,4‧-diamine (NPB) at 420 nm and phosphorescence of Iridium (III) bis[(4,6-difluorophenyl)-pyridinato-N,C2‧] picolinate (FIrpic) at 470 nm, 494 nm, respectively.
The role of negatively charged lipids in lysosomal phospholipase A2 function

PubMed Central

Abe, Akira; Shayman, James A.

2009-01-01

Lysosomal phospholipase A2 (LPLA2) is characterized by increased activity toward zwitterionic phospholipid liposomes containing negatively charged lipids under acidic conditions. The effect of anionic lipids on LPLA2 activity was investigated. Mouse LPLA2 activity was assayed as C2-ceramide transacylation. Sulfatide incorporated into liposomes enhanced LPLA2 activity under acidic conditions and was weakened by NaCl or increased pH. Amiodarone, a cationic amphiphilic drug, reduced LPLA2 activity. LPLA2 exhibited esterase activity when p-nitro-phenylbutyrate (pNPB) was used as a substrate. Unlike the phospholipase A2 activity, the esterase activity was detected over wide pH range and not inhibited by NaCl or amiodarone. Presteady-state kinetics using pNPB were consistent with the formation of an acyl-enzyme intermediate. C2-ceramide was an acceptor for the acyl group of the acyl-enzyme but was not available as the acyl group acceptor when dispersed in liposomes containing amiodarone. Cosedimentation of LPLA2 with liposomes was enhanced in the presence of sulfatide and was reduced by raising NaCl, amiodarone, or pH in the reaction mixture. LPLA2 adsorption to negatively charged lipid membrane surfaces through an electrostatic attraction, therefore, enhances LPLA2 enzyme activity toward insoluble substrates. Thus, anionic lipids present within lipid membranes enhance the rate of phospholipid hydrolysis by LPLA2 at lipid-water interfaces.—Abe, A., and J. A. Shayman. The role of negatively charged lipids in lysosomal phospholipase A2 function. PMID:19321879
Long-range interactions and parallel scalability in molecular simulations

NASA Astrophysics Data System (ADS)

Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko

2007-01-01

Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
High-performance computational fluid dynamics: a custom-code approach

NASA Astrophysics Data System (ADS)

Fannon, James; Loiseau, Jean-Christophe; Valluri, Prashant; Bethune, Iain; Náraigh, Lennon Ó.

2016-07-01

We introduce a modified and simplified version of the pre-existing fully parallelized three-dimensional Navier-Stokes flow solver known as TPLS. We demonstrate how the simplified version can be used as a pedagogical tool for the study of computational fluid dynamics (CFDs) and parallel computing. TPLS is at its heart a two-phase flow solver, and uses calls to a range of external libraries to accelerate its performance. However, in the present context we narrow the focus of the study to basic hydrodynamics and parallel computing techniques, and the code is therefore simplified and modified to simulate pressure-driven single-phase flow in a channel, using only relatively simple Fortran 90 code with MPI parallelization, but no calls to any other external libraries. The modified code is analysed in order to both validate its accuracy and investigate its scalability up to 1000 CPU cores. Simulations are performed for several benchmark cases in pressure-driven channel flow, including a turbulent simulation, wherein the turbulence is incorporated via the large-eddy simulation technique. The work may be of use to advanced undergraduate and graduate students as an introductory study in CFDs, while also providing insight for those interested in more general aspects of high-performance computing.
GRAMM-X public web server for protein–protein docking

PubMed Central

Tovchigrechko, Andrey; Vakser, Ilya A.

2006-01-01

Protein docking software GRAMM-X and its web interface () extend the original GRAMM Fast Fourier Transformation methodology by employing smoothed potentials, refinement stage, and knowledge-based scoring. The web server frees users from complex installation of database-dependent parallel software and maintaining large hardware resources needed for protein docking simulations. Docking problems submitted to GRAMM-X server are processed by a 320 processor Linux cluster. The server was extensively tested by benchmarking, several months of public use, and participation in the CAPRI server track. PMID:16845016
A classification and evaluation of data movement technologies for the delivery of highly voluminous scientific data products

NASA Technical Reports Server (NTRS)

Mattmann, Chris A.; Kelly, Sean; Crichton, Daniel J.; Hughes, J. Steven; Hardman, Sean; Ramirez, Paul; Joyner, Ron

2006-01-01

In this paper, we present a preliminary study of several different electronic data movement technologies. We detail our approach to classifying the technologies included in our study and present the preliminary results of some initial performance benchmarking. Our studies suggest that highly parallel TCP/IP streaming technologies, such as GridFTP and bbFTP, outperform commercial and open-source UDP-bursting technologies in several of the key data movement dimensions that we studied.
Personal supercomputing by using transputer and Intel 80860 in plasma engineering

NASA Astrophysics Data System (ADS)

Ido, S.; Aoki, K.; Ishine, M.; Kubota, M.

1992-09-01

Transputer (T800) and 64-bit RISC Intel 80860 (i860) added on a personal computer can be used as an accelerator. When 32-bit T800s in a parallel system or 64-bit i860s are used, scientific calculations are carried out several ten times as fast as in the case of commonly used 32-bit personal computers or UNIX workstations. Benchmark tests and examples of physical simulations using T800s and i860 are reported.
Real-Time Parallel Software Design Case Study: Implementation of the RASSP SAR Benchmark on the Intel Paragon.

DTIC Science & Technology

1996-01-01

Real-Time 19 5 Conclusion 23 List of References 25 ii LIST OF FIGURES FIGURE PAGE 3-1 Test Bench Pseudo Code 7 3-2 Fast Convolution...3-1 shows pseudo - code for a test bench with two application nodes. The outer test bench wrapper consists of three functions: pipeline_init, pipeline...exit_func); Figure 3-1. Test Bench Pseudo Code The application wrapper is contained in the pipeline routine and similarly consists of an
Automated Instrumentation, Monitoring and Visualization of PVM Programs Using AIMS

NASA Technical Reports Server (NTRS)

Mehra, Pankaj; VanVoorst, Brian; Yan, Jerry; Tucker, Deanne (Technical Monitor)

1994-01-01

We present views and analysis of the execution of several PVM codes for Computational Fluid Dynamics on a network of Sparcstations, including (a) NAS Parallel benchmarks CG and MG (White, Alund and Sunderam 1993); (b) a multi-partitioning algorithm for NAS Parallel Benchmark SP (Wijngaart 1993); and (c) an overset grid flowsolver (Smith 1993). These views and analysis were obtained using our Automated Instrumentation and Monitoring System (AIMS) version 3.0, a toolkit for debugging the performance of PVM programs. We will describe the architecture, operation and application of AIMS. The AIMS toolkit contains (a) Xinstrument, which can automatically instrument various computational and communication constructs in message-passing parallel programs; (b) Monitor, a library of run-time trace-collection routines; (c) VK (Visual Kernel), an execution-animation tool with source-code clickback; and (d) Tally, a tool for statistical analysis of execution profiles. Currently, Xinstrument can handle C and Fortran77 programs using PVM 3.2.x; Monitor has been implemented and tested on Sun 4 systems running SunOS 4.1.2; and VK uses X11R5 and Motif 1.2. Data and views obtained using AIMS clearly illustrate several characteristic features of executing parallel programs on networked workstations: (a) the impact of long message latencies; (b) the impact of multiprogramming overheads and associated load imbalance; (c) cache and virtual-memory effects; and (4significant skews between workstation clocks. Interestingly, AIMS can compensate for constant skew (zero drift) by calibrating the skew between a parent and its spawned children. In addition, AIMS' skew-compensation algorithm can adjust timestamps in a way that eliminates physically impossible communications (e.g., messages going backwards in time). Our current efforts are directed toward creating new views to explain the observed performance of PVM programs. Some of the features planned for the near future include: (a) ConfigView, showing the physical topology of the virtual machine, inferred using specially formatted IP (Internet Protocol) packets; and (b) LoadView, synchronous animation of PVM-program execution and resource-utilization patterns.
Parallelization of sequential Gaussian, indicator and direct simulation algorithms

NASA Astrophysics Data System (ADS)

Nunes, Ruben; Almeida, José A.

2010-08-01

Improving the performance and robustness of algorithms on new high-performance parallel computing architectures is a key issue in efficiently performing 2D and 3D studies with large amount of data. In geostatistics, sequential simulation algorithms are good candidates for parallelization. When compared with other computational applications in geosciences (such as fluid flow simulators), sequential simulation software is not extremely computationally intensive, but parallelization can make it more efficient and creates alternatives for its integration in inverse modelling approaches. This paper describes the implementation and benchmarking of a parallel version of the three classic sequential simulation algorithms: direct sequential simulation (DSS), sequential indicator simulation (SIS) and sequential Gaussian simulation (SGS). For this purpose, the source used was GSLIB, but the entire code was extensively modified to take into account the parallelization approach and was also rewritten in the C programming language. The paper also explains in detail the parallelization strategy and the main modifications. Regarding the integration of secondary information, the DSS algorithm is able to perform simple kriging with local means, kriging with an external drift and collocated cokriging with both local and global correlations. SIS includes a local correction of probabilities. Finally, a brief comparison is presented of simulation results using one, two and four processors. All performance tests were carried out on 2D soil data samples. The source code is completely open source and easy to read. It should be noted that the code is only fully compatible with Microsoft Visual C and should be adapted for other systems/compilers.
A Next-Generation Parallel File System Environment for the OLCF

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dillow, David A; Fuller, Douglas; Gunasekaran, Raghul

2012-01-01

When deployed in 2008/2009 the Spider system at the Oak Ridge National Laboratory s Leadership Computing Facility (OLCF) was the world s largest scale Lustre parallel file system. Envisioned as a shared parallel file system capable of delivering both the bandwidth and capacity requirements of the OLCF s diverse computational environment, Spider has since become a blueprint for shared Lustre environments deployed worldwide. Designed to support the parallel I/O requirements of the Jaguar XT5 system and other smallerscale platforms at the OLCF, the upgrade to the Titan XK6 heterogeneous system will begin to push the limits of Spider s originalmore » design by mid 2013. With a doubling in total system memory and a 10x increase in FLOPS, Titan will require both higher bandwidth and larger total capacity. Our goal is to provide a 4x increase in total I/O bandwidth from over 240GB=sec today to 1TB=sec and a doubling in total capacity. While aggregate bandwidth and total capacity remain important capabilities, an equally important goal in our efforts is dramatically increasing metadata performance, currently the Achilles heel of parallel file systems at leadership. We present in this paper an analysis of our current I/O workloads, our operational experiences with the Spider parallel file systems, the high-level design of our Spider upgrade, and our efforts in developing benchmarks that synthesize our performance requirements based on our workload characterization studies.« less
Improved performances of organic light-emitting diodes with mixed layer and metal oxide as anode buffer

NASA Astrophysics Data System (ADS)

Xue, Qin; Liu, Shouyin; Zhang, Shiming; Chen, Ping; Zhao, Yi; Liu, Shiyong

2013-01-01

We fabricated organic light-emitting devices (OLEDs) employing 2-methyl-9,10-di(2-naphthyl)-anthracene (MADN) as hole-transport material (HTM) instead of commonly used N,N'-bis-(1-naphthyl)-N,N'-diphenyl,1,1'-biphenyl-4,4'-diamine (NPB). After inserting a 0.9 nm thick molybdenum oxide (MoOx) layer at the indium tin oxide (ITO)/MADN interface and a 5 nm thick mixed layer at the organic/organic heterojunction interface, the power conversion efficiency of the device can be increased by 4-fold.
NASA Exhibits

NASA Technical Reports Server (NTRS)

Deardorff, Glenn; Djomehri, M. Jahed; Freeman, Ken; Gambrel, Dave; Green, Bryan; Henze, Chris; Hinke, Thomas; Hood, Robert; Kiris, Cetin; Moran, Patrick;

2001-01-01

A series of NASA presentations for the Supercomputing 2001 conference are summarized. The topics include: (1) Mars Surveyor Landing Sites "Collaboratory"; (2) Parallel and Distributed CFD for Unsteady Flows with Moving Overset Grids; (3) IP Multicast for Seamless Support of Remote Science; (4) Consolidated Supercomputing Management Office; (5) Growler: A Component-Based Framework for Distributed/Collaborative Scientific Visualization and Computational Steering; (6) Data Mining on the Information Power Grid (IPG); (7) Debugging on the IPG; (8) Debakey Heart Assist Device: (9) Unsteady Turbopump for Reusable Launch Vehicle; (10) Exploratory Computing Environments Component Framework; (11) OVERSET Computational Fluid Dynamics Tools; (12) Control and Observation in Distributed Environments; (13) Multi-Level Parallelism Scaling on NASA's Origin 1024 CPU System; (14) Computing, Information, & Communications Technology; (15) NAS Grid Benchmarks; (16) IPG: A Large-Scale Distributed Computing and Data Management System; and (17) ILab: Parameter Study Creation and Submission on the IPG.

OpenMP performance for benchmark 2D shallow water equations using LBM

NASA Astrophysics Data System (ADS)

Sabri, Khairul; Rabbani, Hasbi; Gunawan, Putu Harry

2018-03-01

Shallow water equations or commonly referred as Saint-Venant equations are used to model fluid phenomena. These equations can be solved numerically using several methods, like Lattice Boltzmann method (LBM), SIMPLE-like Method, Finite Difference Method, Godunov-type Method, and Finite Volume Method. In this paper, the shallow water equation will be approximated using LBM or known as LABSWE and will be simulated in performance of parallel programming using OpenMP. To evaluate the performance between 2 and 4 threads parallel algorithm, ten various number of grids Lx and Ly are elaborated. The results show that using OpenMP platform, the computational time for solving LABSWE can be decreased. For instance using grid sizes 1000 × 500, the speedup of 2 and 4 threads is observed 93.54 s and 333.243 s respectively.
A neurally plausible parallel distributed processing model of event-related potential word reading data.

PubMed

Laszlo, Sarah; Plaut, David C

2012-03-01

The Parallel Distributed Processing (PDP) framework has significant potential for producing models of cognitive tasks that approximate how the brain performs the same tasks. To date, however, there has been relatively little contact between PDP modeling and data from cognitive neuroscience. In an attempt to advance the relationship between explicit, computational models and physiological data collected during the performance of cognitive tasks, we developed a PDP model of visual word recognition which simulates key results from the ERP reading literature, while simultaneously being able to successfully perform lexical decision-a benchmark task for reading models. Simulations reveal that the model's success depends on the implementation of several neurally plausible features in its architecture which are sufficiently domain-general to be relevant to cognitive modeling more generally. Copyright Â© 2011 Elsevier Inc. All rights reserved.

A CPU/MIC Collaborated Parallel Framework for GROMACS on Tianhe-2 Supercomputer.

PubMed

Peng, Shaoliang; Yang, Shunyun; Su, Wenhe; Zhang, Xiaoyu; Zhang, Tenglilang; Liu, Weiguo; Zhao, Xingming

2017-06-16

Molecular Dynamics (MD) is the simulation of the dynamic behavior of atoms and molecules. As the most popular software for molecular dynamics, GROMACS cannot work on large-scale data because of limit computing resources. In this paper, we propose a CPU and Intel® Xeon Phi Many Integrated Core (MIC) collaborated parallel framework to accelerate GROMACS using the offload mode on a MIC coprocessor, with which the performance of GROMACS is improved significantly, especially with the utility of Tianhe-2 supercomputer. Furthermore, we optimize GROMACS so that it can run on both the CPU and MIC at the same time. In addition, we accelerate multi-node GROMACS so that it can be used in practice. Benchmarking on real data, our accelerated GROMACS performs very well and reduces computation time significantly. Source code: https://github.com/tianhe2/gromacs-mic.
Porting AMG2013 to Heterogeneous CPU+GPU Nodes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Samfass, Philipp

LLNL's future advanced technology system SIERRA will feature heterogeneous compute nodes that consist of IBM PowerV9 CPUs and NVIDIA Volta GPUs. Conceptually, the motivation for such an architecture is quite straightforward: While GPUs are optimized for throughput on massively parallel workloads, CPUs strive to minimize latency for rather sequential operations. Yet, making optimal use of heterogeneous architectures raises new challenges for the development of scalable parallel software, e.g., with respect to work distribution. Porting LLNL's parallel numerical libraries to upcoming heterogeneous CPU+GPU architectures is therefore a critical factor for ensuring LLNL's future success in ful lling its national mission. Onemore » of these libraries, called HYPRE, provides parallel solvers and precondi- tioners for large, sparse linear systems of equations. In the context of this intern- ship project, I consider AMG2013 which is a proxy application for major parts of HYPRE that implements a benchmark for setting up and solving di erent systems of linear equations. In the following, I describe in detail how I ported multiple parts of AMG2013 to the GPU (Section 2) and present results for di erent experiments that demonstrate a successful parallel implementation on the heterogeneous ma- chines surface and ray (Section 3). In Section 4, I give guidelines on how my code should be used. Finally, I conclude and give an outlook for future work (Section 5).« less
Running Neuroimaging Applications on Amazon Web Services: How, When, and at What Cost?

PubMed Central

Madhyastha, Tara M.; Koh, Natalie; Day, Trevor K. M.; Hernández-Fernández, Moises; Kelley, Austin; Peterson, Daniel J.; Rajan, Sabreena; Woelfer, Karl A.; Wolf, Jonathan; Grabowski, Thomas J.

2017-01-01

The contribution of this paper is to identify and describe current best practices for using Amazon Web Services (AWS) to execute neuroimaging workflows “in the cloud.” Neuroimaging offers a vast set of techniques by which to interrogate the structure and function of the living brain. However, many of the scientists for whom neuroimaging is an extremely important tool have limited training in parallel computation. At the same time, the field is experiencing a surge in computational demands, driven by a combination of data-sharing efforts, improvements in scanner technology that allow acquisition of images with higher image resolution, and by the desire to use statistical techniques that stress processing requirements. Most neuroimaging workflows can be executed as independent parallel jobs and are therefore excellent candidates for running on AWS, but the overhead of learning to do so and determining whether it is worth the cost can be prohibitive. In this paper we describe how to identify neuroimaging workloads that are appropriate for running on AWS, how to benchmark execution time, and how to estimate cost of running on AWS. By benchmarking common neuroimaging applications, we show that cloud computing can be a viable alternative to on-premises hardware. We present guidelines that neuroimaging labs can use to provide a cluster-on-demand type of service that should be familiar to users, and scripts to estimate cost and create such a cluster. PMID:29163119
Implementation of a 3D version of ponderomotive guiding center solver in particle-in-cell code OSIRIS

NASA Astrophysics Data System (ADS)

Helm, Anton; Vieira, Jorge; Silva, Luis; Fonseca, Ricardo

2016-10-01

Laser-driven accelerators gained an increased attention over the past decades. Typical modeling techniques for laser wakefield acceleration (LWFA) are based on particle-in-cell (PIC) simulations. PIC simulations, however, are very computationally expensive due to the disparity of the relevant scales ranging from the laser wavelength, in the micrometer range, to the acceleration length, currently beyond the ten centimeter range. To minimize the gap between these despair scales the ponderomotive guiding center (PGC) algorithm is a promising approach. By describing the evolution of the laser pulse envelope separately, only the scales larger than the plasma wavelength are required to be resolved in the PGC algorithm, leading to speedups in several orders of magnitude. Previous work was limited to two dimensions. Here we present the implementation of the 3D version of a PGC solver into the massively parallel, fully relativistic PIC code OSIRIS. We extended the solver to include periodic boundary conditions and parallelization in all spatial dimensions. We present benchmarks for distributed and shared memory parallelization. We also discuss the stability of the PGC solver.
Vectorial finite elements for solving the radiative transfer equation

NASA Astrophysics Data System (ADS)

Badri, M. A.; Jolivet, P.; Rousseau, B.; Le Corre, S.; Digonnet, H.; Favennec, Y.

2018-06-01

The discrete ordinate method coupled with the finite element method is often used for the spatio-angular discretization of the radiative transfer equation. In this paper we attempt to improve upon such a discretization technique. Instead of using standard finite elements, we reformulate the radiative transfer equation using vectorial finite elements. In comparison to standard finite elements, this reformulation yields faster timings for the linear system assemblies, as well as for the solution phase when using scattering media. The proposed vectorial finite element discretization for solving the radiative transfer equation is cross-validated against a benchmark problem available in literature. In addition, we have used the method of manufactured solutions to verify the order of accuracy for our discretization technique within different absorbing, scattering, and emitting media. For solving large problems of radiation on parallel computers, the vectorial finite element method is parallelized using domain decomposition. The proposed domain decomposition method scales on large number of processes, and its performance is unaffected by the changes in optical thickness of the medium. Our parallel solver is used to solve a large scale radiative transfer problem of the Kelvin-cell radiation.
Massively parallel algorithm and implementation of RI-MP2 energy calculation for peta-scale many-core supercomputers.

PubMed

Katouda, Michio; Naruse, Akira; Hirano, Yukihiko; Nakajima, Takahito

2016-11-15

A new parallel algorithm and its implementation for the RI-MP2 energy calculation utilizing peta-flop-class many-core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual-level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi-node and multi-GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi-node and multi-GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Empirical study of parallel LRU simulation algorithms

NASA Technical Reports Server (NTRS)

Carr, Eric; Nicol, David M.

1994-01-01

This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.
Integrating the Apache Big Data Stack with HPC for Big Data

NASA Astrophysics Data System (ADS)

Fox, G. C.; Qiu, J.; Jha, S.

2014-12-01

There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development. However, the same is not so true for data intensive computing, even though commercially clouds devote much more resources to data analytics than supercomputers devote to simulations. We look at a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures. We suggest a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks and use these to identify a few key classes of hardware/software architectures. Our analysis builds on combining HPC and ABDS the Apache big data software stack that is well used in modern cloud computing. Initial results on clouds and HPC systems are encouraging. We propose the development of SPIDAL - Scalable Parallel Interoperable Data Analytics Library -- built on system aand data abstractions suggested by the HPC-ABDS architecture. We discuss how it can be used in several application areas including Polar Science.
Towards implementation of cellular automata in Microbial Fuel Cells.

PubMed

Tsompanas, Michail-Antisthenis I; Adamatzky, Andrew; Sirakoulis, Georgios Ch; Greenman, John; Ieropoulos, Ioannis

2017-01-01

The Microbial Fuel Cell (MFC) is a bio-electrochemical transducer converting waste products into electricity using microbial communities. Cellular Automaton (CA) is a uniform array of finite-state machines that update their states in discrete time depending on states of their closest neighbors by the same rule. Arrays of MFCs could, in principle, act as massive-parallel computing devices with local connectivity between elementary processors. We provide a theoretical design of such a parallel processor by implementing CA in MFCs. We have chosen Conway's Game of Life as the 'benchmark' CA because this is the most popular CA which also exhibits an enormously rich spectrum of patterns. Each cell of the Game of Life CA is realized using two MFCs. The MFCs are linked electrically and hydraulically. The model is verified via simulation of an electrical circuit demonstrating equivalent behaviours. The design is a first step towards future implementations of fully autonomous biological computing devices with massive parallelism. The energy independence of such devices counteracts their somewhat slow transitions-compared to silicon circuitry-between the different states during computation.
Parallel Algorithms for Monte Carlo Particle Transport Simulation on Exascale Computing Architectures

NASA Astrophysics Data System (ADS)

Romano, Paul Kollath

Monte Carlo particle transport methods are being considered as a viable option for high-fidelity simulation of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in parallel efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear parallel scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the simulation. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed simulations. The analysis demonstrated that load imbalances in domain decomposed simulations arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with measured data from simulations in OpenMC on a full-core benchmark problem. Finally, a novel algorithm for decomposing large tally data was proposed, analyzed, and implemented/tested in OpenMC. The algorithm relies on disjoint sets of compute processes and tally servers. The analysis showed that for a range of parameters relevant to LWR analysis, the tally server algorithm should perform with minimal overhead. Tests were performed on Intrepid and Titan and demonstrated that the algorithm did indeed perform well over a wide range of parameters. (Copies available exclusively from MIT Libraries, libraries.mit.edu/docs - docs mit.edu)
Chromosomal DNA damage measured using the cytokinesis-block micronucleus cytome assay is significantly associated with cognitive impairment in South Australians.

PubMed

Lee, Sau Lai; Thomas, Philip; Hecker, Jane; Faunt, Jeffrey; Fenech, Michael

2015-01-01

Loss of genome integrity may be associated with increased risk for neurodegenerative disease. The aim of this study was to investigate whether mild cognitive impairment (MCI) or Alzheimer's disease (AD) individuals have increased DNA damage relative to age- and gender- matched controls using the cytokinesis-block micronucleus cytome (CBMN-Cyt) assay. DNA damage was measured as micronuclei (MN), nucleoplasmic bridges (NPB), and nuclear buds (NBUD) in binucleated cells. The assay was performed on blood samples from 80 participants consisting of (i) MCI cases (N = 20) and age- and gender- matched controls (N = 20), and (ii) AD cases (N = 20) and age- and gender- matched controls (N = 20). There was a significant increase in MCI NBUD frequency (P = 0.006) relative to controls, which was also observed in male (P = 0.03) and female (P = 0.04) subgroups. For AD cases, there were no significant differences in assay biomarkers relative to controls. There was a significant negative correlation between Mini Mental State Examination (MMSE) and (i) MN in all controls, (R = -0.3, P = 0.04), and AD cases (R = -0.4, P = 0.03), (ii) NPB in all controls, (R = -0.4, P = 0.006) and AD cases (R = -0.5, P = 0.01), and (iii) NBUD in MCI cases (R = -0.5, P = 0.007) and AD cases (R = -0.7, P = 0.0002). The results suggest that an increase in lymphocyte CBMN-Cyt DNA damage biomarkers may be associated with cognitive decline. © 2014 Wiley Periodicals, Inc.
Quantitative phenotyping via deep barcode sequencing

PubMed Central

Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey

2009-01-01

Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793
[QUIPS: quality improvement in postoperative pain management].

PubMed

Meissner, Winfried

2011-01-01

Despite the availability of high-quality guidelines and advanced pain management techniques acute postoperative pain management is still far from being satisfactory. The QUIPS (Quality Improvement in Postoperative Pain Management) project aims to improve treatment quality by means of standardised data acquisition, analysis of quality and process indicators, and feedback and benchmarking. During a pilot phase funded by the German Ministry of Health (BMG), a total of 12,389 data sets were collected from six participating hospitals. Outcome improved in four of the six hospitals. Process indicators, such as routine pain documentation, were only poorly correlated with outcomes. To date, more than 130 German hospitals use QUIPS as a routine quality management tool. An EC-funded parallel project disseminates the concept internationally. QUIPS demonstrates that patient-reported outcomes in postoperative pain management can be benchmarked in routine clinical practice. Quality improvement initiatives should use outcome instead of structural and process parameters. The concept is transferable to other fields of medicine. Copyright © 2011. Published by Elsevier GmbH.
Hierarchical Artificial Bee Colony Algorithm for RFID Network Planning Optimization

PubMed Central

Ma, Lianbo; Chen, Hanning; Hu, Kunyuan; Zhu, Yunlong

2014-01-01

This paper presents a novel optimization algorithm, namely, hierarchical artificial bee colony optimization, called HABC, to tackle the radio frequency identification network planning (RNP) problem. In the proposed multilevel model, the higher-level species can be aggregated by the subpopulations from lower level. In the bottom level, each subpopulation employing the canonical ABC method searches the part-dimensional optimum in parallel, which can be constructed into a complete solution for the upper level. At the same time, the comprehensive learning method with crossover and mutation operators is applied to enhance the global search ability between species. Experiments are conducted on a set of 10 benchmark optimization problems. The results demonstrate that the proposed HABC obtains remarkable performance on most chosen benchmark functions when compared to several successful swarm intelligence and evolutionary algorithms. Then HABC is used for solving the real-world RNP problem on two instances with different scales. Simulation results show that the proposed algorithm is superior for solving RNP, in terms of optimization accuracy and computation robustness. PMID:24592200
Hierarchical artificial bee colony algorithm for RFID network planning optimization.

PubMed

Ma, Lianbo; Chen, Hanning; Hu, Kunyuan; Zhu, Yunlong

2014-01-01

This paper presents a novel optimization algorithm, namely, hierarchical artificial bee colony optimization, called HABC, to tackle the radio frequency identification network planning (RNP) problem. In the proposed multilevel model, the higher-level species can be aggregated by the subpopulations from lower level. In the bottom level, each subpopulation employing the canonical ABC method searches the part-dimensional optimum in parallel, which can be constructed into a complete solution for the upper level. At the same time, the comprehensive learning method with crossover and mutation operators is applied to enhance the global search ability between species. Experiments are conducted on a set of 10 benchmark optimization problems. The results demonstrate that the proposed HABC obtains remarkable performance on most chosen benchmark functions when compared to several successful swarm intelligence and evolutionary algorithms. Then HABC is used for solving the real-world RNP problem on two instances with different scales. Simulation results show that the proposed algorithm is superior for solving RNP, in terms of optimization accuracy and computation robustness.
Two-fluid dusty shocks: simple benchmarking problems and applications to protoplanetary discs

NASA Astrophysics Data System (ADS)

Lehmann, Andrew; Wardle, Mark

2018-05-01

The key role that dust plays in the interstellar medium has motivated the development of numerical codes designed to study the coupled evolution of dust and gas in systems such as turbulent molecular clouds and protoplanetary discs. Drift between dust and gas has proven to be important as well as numerically challenging. We provide simple benchmarking problems for dusty gas codes by numerically solving the two-fluid dust-gas equations for steady, plane-parallel shock waves. The two distinct shock solutions to these equations allow a numerical code to test different forms of drag between the two fluids, the strength of that drag and the dust to gas ratio. We also provide an astrophysical application of J-type dust-gas shocks to studying the structure of accretion shocks on to protoplanetary discs. We find that two-fluid effects are most important for grains larger than 1 μm, and that the peak dust temperature within an accretion shock provides a signature of the dust-to-gas ratio of the infalling material.
2015 WFNDEC eddy current benchmark modeling of impedance variation in coil due to a crack located at the plate edge

NASA Astrophysics Data System (ADS)

Rocha, João Vicente; Camerini, Cesar; Pereira, Gabriela

2016-02-01

The 2015 World Federation of NDE Centers (WFNDEC) eddy current benchmark problem involves the inspection of two EDM notches placed at the edge of a conducting plate with a pancake coil that runs parallel to the plate's edge line. Experimental data consists of impedance variation measured with a precision LCR bridge as a XY scanner moves the coil. The authors are pleased to present the numerical results obtained with commercial FEM packages (OPERA 3-D). Values of electrical resistance and inductive reactance variation between base material and the region around the notch are plotted as function of the coil displacement over the plate. The calculations were made for frequencies of 1 kHz and 10 kHz and agreement between experimental and numerical results are excellent for all inspection conditions. Explanations are made about how the impedance is calculated as well as pros and cons of the presented methods.
Verification of ARES transport code system with TAKEDA benchmarks

NASA Astrophysics Data System (ADS)

Zhang, Liang; Zhang, Bin; Zhang, Penghe; Chen, Mengteng; Zhao, Jingchang; Zhang, Shun; Chen, Yixue

2015-10-01

Neutron transport modeling and simulation are central to many areas of nuclear technology, including reactor core analysis, radiation shielding and radiation detection. In this paper the series of TAKEDA benchmarks are modeled to verify the critical calculation capability of ARES, a discrete ordinates neutral particle transport code system. SALOME platform is coupled with ARES to provide geometry modeling and mesh generation function. The Koch-Baker-Alcouffe parallel sweep algorithm is applied to accelerate the traditional transport calculation process. The results show that the eigenvalues calculated by ARES are in excellent agreement with the reference values presented in NEACRP-L-330, with a difference less than 30 pcm except for the first case of model 3. Additionally, ARES provides accurate fluxes distribution compared to reference values, with a deviation less than 2% for region-averaged fluxes in all cases. All of these confirms the feasibility of ARES-SALOME coupling and demonstrate that ARES has a good performance in critical calculation.
Scaling of Multimillion-Atom Biological Molecular Dynamics Simulation on a Petascale Supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schulz, Roland; Lindner, Benjamin; Petridis, Loukas

2009-01-01

A strategy is described for a fast all-atom molecular dynamics simulation of multimillion-atom biological systems on massively parallel supercomputers. The strategy is developed using benchmark systems of particular interest to bioenergy research, comprising models of cellulose and lignocellulosic biomass in an aqueous solution. The approach involves using the reaction field (RF) method for the computation of long-range electrostatic interactions, which permits efficient scaling on many thousands of cores. Although the range of applicability of the RF method for biomolecular systems remains to be demonstrated, for the benchmark systems the use of the RF produces molecular dipole moments, Kirkwood G factors,more » other structural properties, and mean-square fluctuations in excellent agreement with those obtained with the commonly used Particle Mesh Ewald method. With RF, three million- and five million atom biological systems scale well up to 30k cores, producing 30 ns/day. Atomistic simulations of very large systems for time scales approaching the microsecond would, therefore, appear now to be within reach.« less
Scaling of Multimillion-Atom Biological Molecular Dynamics Simulation on a Petascale Supercomputer.

PubMed

Schulz, Roland; Lindner, Benjamin; Petridis, Loukas; Smith, Jeremy C

2009-10-13

A strategy is described for a fast all-atom molecular dynamics simulation of multimillion-atom biological systems on massively parallel supercomputers. The strategy is developed using benchmark systems of particular interest to bioenergy research, comprising models of cellulose and lignocellulosic biomass in an aqueous solution. The approach involves using the reaction field (RF) method for the computation of long-range electrostatic interactions, which permits efficient scaling on many thousands of cores. Although the range of applicability of the RF method for biomolecular systems remains to be demonstrated, for the benchmark systems the use of the RF produces molecular dipole moments, Kirkwood G factors, other structural properties, and mean-square fluctuations in excellent agreement with those obtained with the commonly used Particle Mesh Ewald method. With RF, three million- and five million-atom biological systems scale well up to ∼30k cores, producing ∼30 ns/day. Atomistic simulations of very large systems for time scales approaching the microsecond would, therefore, appear now to be within reach.

Modeling of fatigue crack induced nonlinear ultrasonics using a highly parallelized explicit local interaction simulation approach

NASA Astrophysics Data System (ADS)

Shen, Yanfeng; Cesnik, Carlos E. S.

2016-04-01

This paper presents a parallelized modeling technique for the efficient simulation of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction Simulation Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly parallelized supercomputing on powerful graphic cards. Both the explicit contact formulation and the parallel feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.
StrAuto: automation and parallelization of STRUCTURE analysis.

PubMed

Chhatre, Vikram E; Emerson, Kevin J

2017-03-24

Population structure inference using the software STRUCTURE has become an integral part of population genetic studies covering a broad spectrum of taxa including humans. The ever-expanding size of genetic data sets poses computational challenges for this analysis. Although at least one tool currently implements parallel computing to reduce computational overload of this analysis, it does not fully automate the use of replicate STRUCTURE analysis runs required for downstream inference of optimal K. There is pressing need for a tool that can deploy population structure analysis on high performance computing clusters. We present an updated version of the popular Python program StrAuto, to streamline population structure analysis using parallel computing. StrAuto implements a pipeline that combines STRUCTURE analysis with the Evanno Δ K analysis and visualization of results using STRUCTURE HARVESTER. Using benchmarking tests, we demonstrate that StrAuto significantly reduces the computational time needed to perform iterative STRUCTURE analysis by distributing runs over two or more processors. StrAuto is the first tool to integrate STRUCTURE analysis with post-processing using a pipeline approach in addition to implementing parallel computation - a set up ideal for deployment on computing clusters. StrAuto is distributed under the GNU GPL (General Public License) and available to download from http://strauto.popgen.org .
Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aaby, Brandon G; Perumalla, Kalyan S; Seal, Sudip K

2010-01-01

An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Messagemore » Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.« less
Benchmarking GPU and CPU codes for Heisenberg spin glass over-relaxation

NASA Astrophysics Data System (ADS)

Bernaschi, M.; Parisi, G.; Parisi, L.

2011-06-01

We present a set of possible implementations for Graphics Processing Units (GPU) of the Over-relaxation technique applied to the 3D Heisenberg spin glass model. The results show that a carefully tuned code can achieve more than 100 GFlops/s of sustained performance and update a single spin in about 0.6 nanoseconds. A multi-hit technique that exploits the GPU shared memory further reduces this time. Such results are compared with those obtained by means of a highly-tuned vector-parallel code on latest generation multi-core CPUs.
Performance of a carbon nanotube field emission electron gun

NASA Astrophysics Data System (ADS)

Getty, Stephanie A.; King, Todd T.; Bis, Rachael A.; Jones, Hollis H.; Herrero, Federico; Lynch, Bernard A.; Roman, Patrick; Mahaffy, Paul

2007-04-01

A cold cathode field emission electron gun (e-gun) based on a patterned carbon nanotube (CNT) film has been fabricated for use in a miniaturized reflectron time-of-flight mass spectrometer (RTOF MS), with future applications in other charged particle spectrometers, and performance of the CNT e-gun has been evaluated. A thermionic electron gun has also been fabricated and evaluated in parallel and its performance is used as a benchmark in the evaluation of our CNT e-gun. Implications for future improvements and integration into the RTOF MS are discussed.
The Automated Instrumentation and Monitoring System (AIMS): Design and Architecture. 3.2

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Schmidt, Melisa; Schulbach, Cathy; Bailey, David (Technical Monitor)

1997-01-01

Whether a researcher is designing the 'next parallel programming paradigm', another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of such information can help computer and software architects to capture, and therefore, exploit behavioral variations among/within various parallel programs to take advantage of specific hardware characteristics. A software tool-set that facilitates performance evaluation of parallel applications on multiprocessors has been put together at NASA Ames Research Center under the sponsorship of NASA's High Performance Computing and Communications Program over the past five years. The Automated Instrumentation and Monitoring Systematic has three major software components: a source code instrumentor which automatically inserts active event recorders into program source code before compilation; a run-time performance monitoring library which collects performance data; and a visualization tool-set which reconstructs program execution based on the data collected. Besides being used as a prototype for developing new techniques for instrumenting, monitoring and presenting parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Currently, the execution of FORTRAN and C programs on the Intel Paragon and PALM workstations can be automatically instrumented and monitored. Performance data thus collected can be displayed graphically on various workstations. The process of performance tuning with AIMS will be illustrated using various NAB Parallel Benchmarks. This report includes a description of the internal architecture of AIMS and a listing of the source code.
Accelerating atomistic calculations of quantum energy eigenstates on graphic cards

NASA Astrophysics Data System (ADS)

Rodrigues, Walter; Pecchia, A.; Lopez, M.; Auf der Maur, M.; Di Carlo, A.

2014-10-01

Electronic properties of nanoscale materials require the calculation of eigenvalues and eigenvectors of large matrices. This bottleneck can be overcome by parallel computing techniques or the introduction of faster algorithms. In this paper we report a custom implementation of the Lanczos algorithm with simple restart, optimized for graphical processing units (GPUs). The whole algorithm has been developed using CUDA and runs entirely on the GPU, with a specialized implementation that spares memory and reduces at most machine-to-device data transfers. Furthermore parallel distribution over several GPUs has been attained using the standard message passing interface (MPI). Benchmark calculations performed on a GaN/AlGaN wurtzite quantum dot with up to 600,000 atoms are presented. The empirical tight-binding (ETB) model with an sp3d5s∗+spin-orbit parametrization has been used to build the system Hamiltonian (H).
Injector Design Tool Improvements: User's manual for FDNS V.4.5

NASA Technical Reports Server (NTRS)

Chen, Yen-Sen; Shang, Huan-Min; Wei, Hong; Liu, Jiwen

1998-01-01

The major emphasis of the current effort is in the development and validation of an efficient parallel machine computational model, based on the FDNS code, to analyze the fluid dynamics of a wide variety of liquid jet configurations for general liquid rocket engine injection system applications. This model includes physical models for droplet atomization, breakup/coalescence, evaporation, turbulence mixing and gas-phase combustion. Benchmark validation cases for liquid rocket engine chamber combustion conditions will be performed for model validation purpose. Test cases may include shear coaxial, swirl coaxial and impinging injection systems with combinations LOXIH2 or LOXISP-1 propellant injector elements used in rocket engine designs. As a final goal of this project, a well tested parallel CFD performance methodology together with a user's operation description in a final technical report will be reported at the end of the proposed research effort.
Amplitude analysis of four-body decays using a massively-parallel fitting framework

NASA Astrophysics Data System (ADS)

Hasse, C.; Albrecht, J.; Alves, A. A., Jr.; d'Argent, P.; Evans, T. D.; Rademacker, J.; Sokoloff, M. D.

2017-10-01

The GooFit Framework is designed to perform maximum-likelihood fits for arbitrary functions on various parallel back ends, for example a GPU. We present an extension to GooFit which adds the functionality to perform time-dependent amplitude analyses of pseudoscalar mesons decaying into four pseudoscalar final states. Benchmarks of this functionality show a significant performance increase when utilizing a GPU compared to a CPU. Furthermore, this extension is employed to study the sensitivity on the {{{D}}}0-{\\bar{{{D}}}}0 mixing parameters x and y in a time-dependent amplitude analysis of the decay D0 → K+π-π+π-. Studying a sample of 50 000 events and setting the central values to the world average of x = (0.49 ± 0.15)% and y = (0.61 ± 0.08)%, the statistical sensitivities of x and y are determined to be σ(x) = 0.019 % and σ(y) = 0.019 %.
A Parallel Multigrid Solver for Viscous Flows on Anisotropic Structured Grids

NASA Technical Reports Server (NTRS)

Prieto, Manuel; Montero, Ruben S.; Llorente, Ignacio M.; Bushnell, Dennis M. (Technical Monitor)

2001-01-01

This paper presents an efficient parallel multigrid solver for speeding up the computation of a 3-D model that treats the flow of a viscous fluid over a flat plate. The main interest of this simulation lies in exhibiting some basic difficulties that prevent optimal multigrid efficiencies from being achieved. As the computing platform, we have used Coral, a Beowulf-class system based on Intel Pentium processors and equipped with GigaNet cLAN and switched Fast Ethernet networks. Our study not only examines the scalability of the solver but also includes a performance evaluation of Coral where the investigated solver has been used to compare several of its design choices, namely, the interconnection network (GigaNet versus switched Fast-Ethernet) and the node configuration (dual nodes versus single nodes). As a reference, the performance results have been compared with those obtained with the NAS-MG benchmark.
Enhanced color purity of blue OLEDs based on well-design structure

NASA Astrophysics Data System (ADS)

Du, Qianqian; Wang, Wenjun; Li, Shuhong; Wang, Qingru; Xia, Shuzhen; Zhang, Bingyuan; Wang, Minghong; Fan, Quli

2016-09-01

We have fabricated blue organic light-emitting devices (OLEDs) with higher color purity and stability by optimizing the structure of the Glass/ITO/NPB(50 nm)/ BCzVBi (30 nm)/ TPBi (x nm)/Alq3(20 nm)/LiF/Al. The results show that the introducing of hole blocking layer(HBL) TPBi greatly can improve not only the color purity but the color stability, which owe to its higher the Highest Occupied Molecular Orbital (HOMO) energy levels of 6.2 eV. We expect our work will be useful to optimizing the blue OLEDs structure to enhancing the color property.
NCI at Frederick Receives a Royal Visit | Poster

Cancer.gov

The Center for Cancer Research (CCR) and NCI at Frederick recently had the honor of hosting Professor Dr. Her Royal Highness Princess Chulabhorn Mahidol of Thailand. Her Royal Highness has a special interest in scientific research related to the use of natural products for treating disease. The purpose of her visit was to discuss the work on natural products being undertaken at NCI at Frederick. Her Royal Highness attended talks by researchers from both the Molecular Targets Laboratory (MTL), CCR, and the Natural Products Branch (NPB), Developmental Therapeutics Program (DTP), Division of Cancer Treatment and Diagnosis (DCTD).
Modern multicore and manycore architectures: Modelling, optimisation and benchmarking a multiblock CFD code

NASA Astrophysics Data System (ADS)

Hadade, Ioan; di Mare, Luca

2016-08-01

Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range of architectural features such as SIMD for data parallel execution or threads for core parallelism. The exploitation of multi-level parallelism is therefore crucial for achieving superior performance on current and future processors. This paper presents the performance tuning of a multiblock CFD solver on Intel SandyBridge and Haswell multicore CPUs and the Intel Xeon Phi Knights Corner coprocessor. Code optimisations have been applied on two computational kernels exhibiting different computational patterns: the update of flow variables and the evaluation of the Roe numerical fluxes. We discuss at great length the code transformations required for achieving efficient SIMD computations for both kernels across the selected devices including SIMD shuffles and transpositions for flux stencil computations and global memory transformations. Core parallelism is expressed through threading based on a number of domain decomposition techniques together with optimisations pertaining to alleviating NUMA effects found in multi-socket compute nodes. Results are correlated with the Roofline performance model in order to assert their efficiency for each distinct architecture. We report significant speedups for single thread execution across both kernels: 2-5X on the multicore CPUs and 14-23X on the Xeon Phi coprocessor. Computations at full node and chip concurrency deliver a factor of three speedup on the multicore processors and up to 24X on the Xeon Phi manycore coprocessor.
Multiprocessor smalltalk: Implementation, performance, and analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pallas, J.I.

1990-01-01

Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possiblemore » to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.« less
A Parallel Processing Algorithm for Remote Sensing Classification

NASA Technical Reports Server (NTRS)

Gualtieri, J. Anthony

2005-01-01

A current thread in parallel computation is the use of cluster computers created by networking a few to thousands of commodity general-purpose workstation-level commuters using the Linux operating system. For example on the Medusa cluster at NASA/GSFC, this provides for super computing performance, 130 G(sub flops) (Linpack Benchmark) at moderate cost, $370K. However, to be useful for scientific computing in the area of Earth science, issues of ease of programming, access to existing scientific libraries, and portability of existing code need to be considered. In this paper, I address these issues in the context of tools for rendering earth science remote sensing data into useful products. In particular, I focus on a problem that can be decomposed into a set of independent tasks, which on a serial computer would be performed sequentially, but with a cluster computer can be performed in parallel, giving an obvious speedup. To make the ideas concrete, I consider the problem of classifying hyperspectral imagery where some ground truth is available to train the classifier. In particular I will use the Support Vector Machine (SVM) approach as applied to hyperspectral imagery. The approach will be to introduce notions about parallel computation and then to restrict the development to the SVM problem. Pseudocode (an outline of the computation) will be described and then details specific to the implementation will be given. Then timing results will be reported to show what speedups are possible using parallel computation. The paper will close with a discussion of the results.
Optimization of Deep Drilling Performance--Development and Benchmark Testing of Advanced Diamond Product Drill Bits & HP/HT Fluids to Significantly Improve Rates of Penetration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alan Black; Arnis Judzis

2003-10-01

This document details the progress to date on the OPTIMIZATION OF DEEP DRILLING PERFORMANCE--DEVELOPMENT AND BENCHMARK TESTING OF ADVANCED DIAMOND PRODUCT DRILL BITS AND HP/HT FLUIDS TO SIGNIFICANTLY IMPROVE RATES OF PENETRATION contract for the year starting October 2002 through September 2002. The industry cost shared program aims to benchmark drilling rates of penetration in selected simulated deep formations and to significantly improve ROP through a team development of aggressive diamond product drill bit--fluid system technologies. Overall the objectives are as follows: Phase 1--Benchmark ''best in class'' diamond and other product drilling bits and fluids and develop concepts for amore » next level of deep drilling performance; Phase 2--Develop advanced smart bit--fluid prototypes and test at large scale; and Phase 3--Field trial smart bit--fluid concepts, modify as necessary and commercialize products. Accomplishments to date include the following: 4Q 2002--Project started; Industry Team was assembled; Kick-off meeting was held at DOE Morgantown; 1Q 2003--Engineering meeting was held at Hughes Christensen, The Woodlands Texas to prepare preliminary plans for development and testing and review equipment needs; Operators started sending information regarding their needs for deep drilling challenges and priorities for large-scale testing experimental matrix; Aramco joined the Industry Team as DEA 148 objectives paralleled the DOE project; 2Q 2003--Engineering and planning for high pressure drilling at TerraTek commenced; 3Q 2003--Continuation of engineering and design work for high pressure drilling at TerraTek; Baker Hughes INTEQ drilling Fluids and Hughes Christensen commence planning for Phase 1 testing--recommendations for bits and fluids.« less
Benchmark of the local drift-kinetic models for neoclassical transport simulation in helical plasmas

NASA Astrophysics Data System (ADS)

Huang, B.; Satake, S.; Kanno, R.; Sugama, H.; Matsuoka, S.

2017-02-01

The benchmarks of the neoclassical transport codes based on the several local drift-kinetic models are reported here. Here, the drift-kinetic models are zero orbit width (ZOW), zero magnetic drift, DKES-like, and global, as classified in Matsuoka et al. [Phys. Plasmas 22, 072511 (2015)]. The magnetic geometries of Helically Symmetric Experiment, Large Helical Device (LHD), and Wendelstein 7-X are employed in the benchmarks. It is found that the assumption of E ×B incompressibility causes discrepancy of neoclassical radial flux and parallel flow among the models when E ×B is sufficiently large compared to the magnetic drift velocities. For example, Mp≤0.4 where Mp is the poloidal Mach number. On the other hand, when E ×B and the magnetic drift velocities are comparable, the tangential magnetic drift, which is included in both the global and ZOW models, fills the role of suppressing unphysical peaking of neoclassical radial-fluxes found in the other local models at Er≃0 . In low collisionality plasmas, in particular, the tangential drift effect works well to suppress such unphysical behavior of the radial transport caused in the simulations. It is demonstrated that the ZOW model has the advantage of mitigating the unphysical behavior in the several magnetic geometries, and that it also implements the evaluation of bootstrap current in LHD with the low computation cost compared to the global model.
Performance Comparison of Big Data Analytics With NEXUS and Giovanni

NASA Astrophysics Data System (ADS)

Jacob, J. C.; Huang, T.; Lynnes, C.

2016-12-01

NEXUS is an emerging data-intensive analysis framework developed with a new approach for handling science data that enables large-scale data analysis. It is available through open source. We compare performance of NEXUS and Giovanni for 3 statistics algorithms applied to NASA datasets. Giovanni is a statistics web service at NASA Distributed Active Archive Centers (DAACs). NEXUS is a cloud-computing environment developed at JPL and built on Apache Solr, Cassandra, and Spark. We compute global time-averaged map, correlation map, and area-averaged time series. The first two algorithms average over time to produce a value for each pixel in a 2-D map. The third algorithm averages spatially to produce a single value for each time step. This talk is our report on benchmark comparison findings that indicate 15x speedup with NEXUS over Giovanni to compute area-averaged time series of daily precipitation rate for the Tropical Rainfall Measuring Mission (TRMM with 0.25 degree spatial resolution) for the Continental United States over 14 years (2000-2014) with 64-way parallelism and 545 tiles per granule. 16-way parallelism with 16 tiles per granule worked best with NEXUS for computing an 18-year (1998-2015) TRMM daily precipitation global time averaged map (2.5 times speedup) and 18-year global map of correlation between TRMM daily precipitation and TRMM real time daily precipitation (7x speedup). These and other benchmark results will be presented along with key lessons learned in applying the NEXUS tiling approach to big data analytics in the cloud.
Benchmark coupled-cluster g-tensor calculations with full inclusion of the two-particle spin-orbit contributions.

PubMed

Perera, Ajith; Gauss, Jürgen; Verma, Prakash; Morales, Jorge A

2017-04-28

We present a parallel implementation to compute electron spin resonance g-tensors at the coupled-cluster singles and doubles (CCSD) level which employs the ACES III domain-specific software tools for scalable parallel programming, i.e., the super instruction architecture language and processor (SIAL and SIP), respectively. A unique feature of the present implementation is the exact (not approximated) inclusion of the five one- and two-particle contributions to the g-tensor [i.e., the mass correction, one- and two-particle paramagnetic spin-orbit, and one- and two-particle diamagnetic spin-orbit terms]. Like in a previous implementation with effective one-electron operators [J. Gauss et al., J. Phys. Chem. A 113, 11541-11549 (2009)], our implementation utilizes analytic CC second derivatives and, therefore, classifies as a true CC linear-response treatment. Therefore, our implementation can unambiguously appraise the accuracy of less costly effective one-particle schemes and provide a rationale for their widespread use. We have considered a large selection of radicals used previously for benchmarking purposes including those studied in earlier work and conclude that at the CCSD level, the effective one-particle scheme satisfactorily captures the two-particle effects less costly than the rigorous two-particle scheme. With respect to the performance of density functional theory (DFT), we note that results obtained with the B3LYP functional exhibit the best agreement with our CCSD results. However, in general, the CCSD results agree better with the experimental data than the best DFT/B3LYP results, although in most cases within the rather large experimental error bars.
SCORPIO: A Scalable Two-Phase Parallel I/O Library With Application To A Large Scale Subsurface Simulator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sreepathi, Sarat; Sripathi, Vamsi; Mills, Richard T

2013-01-01

Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting themore » I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library, SCORPIO (SCalable block-ORiented Parallel I/O) that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.« less

The effects of ultra-thin cerium fluoride film as the anode buffer layer on the electrical characteristics of organic light emitting diodes

NASA Astrophysics Data System (ADS)

Lu, Hsin-Wei; Tsai, Cheng-Che; Hong, Cheng-Shong; Kao, Po-Ching; Juang, Yung-Der; Chu, Sheng-Yuan

2016-11-01

In this study, the efficiency of organic light-emitting diodes (OLEDs) was enhanced by depositing a CeF3film as an ultra-thin buffer layer between the indium tin oxide (ITO) electrode and α-naphthylphenylbiphenyldiamine (NPB) hole transport layer, with the structure configuration ITO/CeF3 (0.5, 1, and 1.5 nm)/α-naphthylphenylbiphenyl diamine (NPB) (40 nm)/tris(8-hydroxyquinoline) aluminum (Alq3) (60 nm)/lithium fluoride (LiF) (1 nm)/Al (150 nm). The enhancement mechanism was systematically investigated via several approaches. The X-ray photoelectron spectroscopy and ultraviolet photoelectron spectroscopy results revealed the formation of the UV-ozone treated CeF3 film. The work function increased from 4.8 eV (standard ITO electrode) to 5.22 eV (0.5-nm-thick UV-ozone treated CeF3 film deposited on the ITO electrode). The surface roughness of the UV-ozone treated CeF3 film was smoother than that of the standard ITO electrode. Further, the UV-ozone treated CeF3 film increased both the surface energy and polarity, as determined from contact angle measurements. In addition, admittance spectroscopy measurements showed an increased capacitance and conductance of the OLEDs. Accordingly, the turn-on voltage decreased from 4.2 V to 3.6 V at 1 mA/cm2, the luminance increased from 7588 cd/m2 to 24760 cd/m2, and the current efficiency increased from 3.2 cd/A to 3.8 cd/A when the 0.5-nm-thick UV-ozone treated CeF3 film was inserted into the OLEDs.
Electrolytes in a nanometer slab-confinement: Ion-specific structure and solvation forces

NASA Astrophysics Data System (ADS)

Kalcher, Immanuel; Schulz, Julius C. F.; Dzubiella, Joachim

2010-10-01

We study the liquid structure and solvation forces of dense monovalent electrolytes (LiCl, NaCl, CsCl, and NaI) in a nanometer slab-confinement by explicit-water molecular dynamics (MD) simulations, implicit-water Monte Carlo (MC) simulations, and modified Poisson-Boltzmann (PB) theories. In order to consistently coarse-grain and to account for specific hydration effects in the implicit methods, realistic ion-ion and ion-surface pair potentials have been derived from infinite-dilution MD simulations. The electrolyte structure calculated from MC simulations is in good agreement with the corresponding MD simulations, thereby validating the coarse-graining approach. The agreement improves if a realistic, MD-derived dielectric constant is employed, which partially corrects for (water-mediated) many-body effects. Further analysis of the ionic structure and solvation pressure demonstrates that nonlocal extensions to PB (NPB) perform well for a wide parameter range when compared to MC simulations, whereas all local extensions mostly fail. A Barker-Henderson mapping of the ions onto a charged, asymmetric, and nonadditive binary hard-sphere mixture shows that the strength of structural correlations is strongly related to the magnitude and sign of the salt-specific nonadditivity. Furthermore, a grand canonical NPB analysis shows that the Donnan effect is dominated by steric correlations, whereas solvation forces and overcharging effects are mainly governed by ion-surface interactions. However, steric corrections to solvation forces are strongly repulsive for high concentrations and low surface charges, while overcharging can also be triggered by steric interactions in strongly correlated systems. Generally, we find that ion-surface and ion-ion correlations are strongly coupled and that coarse-grained methods should include both, the latter nonlocally and nonadditive (as given by our specific ionic diameters), when studying electrolytes in highly inhomogeneous situations.
Tricolor microcavity OLEDs based on P-nc-Si:H films as the complex anodes

NASA Astrophysics Data System (ADS)

Yang, Li; Xingyuan, Liu; Chunya, Wu; Zhiguo, Meng; Yi, Wang; Shaozhen, Xiong

2009-06-01

A P+-nc-Si:H film (boron-doped nc-Si:H thin film) was used as a complex anode of an OLED. As an ideal candidate for the composite anode, the P+-nc-Si:H thin film has a good conductivity with a high work function (~ 5.7 eV) and outstanding optical properties of high reflectivity, transmission, and a very low absorption. As a result, the combination of the relatively high reflectivity of a P+-nc-Si:H film/ITO complex anode with the very high reflectivity of an Al cathode could form a micro-cavity structure with a certain Q to improve the efficiency of the OLED fabricated on it. An RGB pixel generated by microcavity OLEDs is beneficial for both the reduction of the light loss and the improvement of the color purity and the efficiency. The small molecule Alq would be useful for the emitting light layer (EML) of the MOLED, and the P+-nc-Si film would be used as a complex anode of the MOLED, whose configuration can be constructed as Glass/LTO/P+-nc-Si:H/ITO/MoO3/NPB/Alq/LiF/Al. By adjusting the thickness of the organic layer NPB/Alq, the optical length of the microcavity and the REB colors of the device can be obtained. The peak wavelengths of an OLED are located at 486, 550, and 608 nm, respectively. The CIE coordinates are (0.21, 0.45), (0.33, 0.63), and (0.54, 0.54), and the full widths at half maximum (FWHM) are 35, 32, and 39 nm for red, green, and blue, respectively.
Scaling Semantic Graph Databases in Size and Performance

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morari, Alessandro; Castellana, Vito G.; Villa, Oreste

In this paper we present SGEM, a full software system for accelerating large-scale semantic graph databases on commodity clusters. Unlike current approaches, SGEM addresses semantic graph databases by only employing graph methods at all the levels of the stack. On one hand, this allows exploiting the space efficiency of graph data structures and the inherent parallelism of graph algorithms. These features adapt well to the increasing system memory and core counts of modern commodity clusters. On the other hand, however, these systems are optimized for regular computation and batched data transfers, while graph methods usually are irregular and generate fine-grainedmore » data accesses with poor spatial and temporal locality. Our framework comprises a SPARQL to data parallel C compiler, a library of parallel graph methods and a custom, multithreaded runtime system. We introduce our stack, motivate its advantages with respect to other solutions and show how we solved the challenges posed by irregular behaviors. We present the result of our software stack on the Berlin SPARQL benchmarks with datasets up to 10 billion triples (a triple corresponds to a graph edge), demonstrating scaling in dataset size and in performance as more nodes are added to the cluster.« less
Parallel algorithms for quantum chemistry. I. Integral transformations on a hypercube multiprocessor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whiteside, R.A.; Binkley, J.S.; Colvin, M.E.

1987-02-15

For many years it has been recognized that fundamental physical constraints such as the speed of light will limit the ultimate speed of single processor computers to less than about three billion floating point operations per second (3 GFLOPS). This limitation is becoming increasingly restrictive as commercially available machines are now within an order of magnitude of this asymptotic limit. A natural way to avoid this limit is to harness together many processors to work on a single computational problem. In principle, these parallel processing computers have speeds limited only by the number of processors one chooses to acquire. Themore » usefulness of potentially unlimited processing speed to a computationally intensive field such as quantum chemistry is obvious. If these methods are to be applied to significantly larger chemical systems, parallel schemes will have to be employed. For this reason we have developed distributed-memory algorithms for a number of standard quantum chemical methods. We are currently implementing these on a 32 processor Intel hypercube. In this paper we present our algorithm and benchmark results for one of the bottleneck steps in quantum chemical calculations: the four index integral transformation.« less
Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

NASA Astrophysics Data System (ADS)

Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

2015-09-01

The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Parallelization of Lower-Upper Symmetric Gauss-Seidel Method for Chemically Reacting Flow

NASA Technical Reports Server (NTRS)

Yoon, Seokkwan; Jost, Gabriele; Chang, Sherry

2005-01-01

Development of technologies for exploration of the solar system has revived an interest in computational simulation of chemically reacting flows since planetary probe vehicles exhibit non-equilibrium phenomena during the atmospheric entry of a planet or a moon as well as the reentry to the Earth. Stability in combustion is essential for new propulsion systems. Numerical solution of real-gas flows often increases computational work by an order-of-magnitude compared to perfect gas flow partly because of the increased complexity of equations to solve. Recently, as part of Project Columbia, NASA has integrated a cluster of interconnected SGI Altix systems to provide a ten-fold increase in current supercomputing capacity that includes an SGI Origin system. Both the new and existing machines are based on cache coherent non-uniform memory access architecture. Lower-Upper Symmetric Gauss-Seidel (LU-SGS) relaxation method has been implemented into both perfect and real gas flow codes including Real-Gas Aerodynamic Simulator (RGAS). However, the vectorized RGAS code runs inefficiently on cache-based shared-memory machines such as SGI system. Parallelization of a Gauss-Seidel method is nontrivial due to its sequential nature. The LU-SGS method has been vectorized on an oblique plane in INS3D-LU code that has been one of the base codes for NAS Parallel benchmarks. The oblique plane has been called a hyperplane by computer scientists. It is straightforward to parallelize a Gauss-Seidel method by partitioning the hyperplanes once they are formed. Another way of parallelization is to schedule processors like a pipeline using software. Both hyperplane and pipeline methods have been implemented using openMP directives. The present paper reports the performance of the parallelized RGAS code on SGI Origin and Altix systems.
Quantitative Image Feature Engine (QIFE): an Open-Source, Modular Engine for 3D Quantitative Feature Extraction from Volumetric Medical Images.

PubMed

Echegaray, Sebastian; Bakr, Shaimaa; Rubin, Daniel L; Napel, Sandy

2017-10-06

The aim of this study was to develop an open-source, modular, locally run or server-based system for 3D radiomics feature computation that can be used on any computer system and included in existing workflows for understanding associations and building predictive models between image features and clinical data, such as survival. The QIFE exploits various levels of parallelization for use on multiprocessor systems. It consists of a managing framework and four stages: input, pre-processing, feature computation, and output. Each stage contains one or more swappable components, allowing run-time customization. We benchmarked the engine using various levels of parallelization on a cohort of CT scans presenting 108 lung tumors. Two versions of the QIFE have been released: (1) the open-source MATLAB code posted to Github, (2) a compiled version loaded in a Docker container, posted to DockerHub, which can be easily deployed on any computer. The QIFE processed 108 objects (tumors) in 2:12 (h/mm) using 1 core, and 1:04 (h/mm) hours using four cores with object-level parallelization. We developed the Quantitative Image Feature Engine (QIFE), an open-source feature-extraction framework that focuses on modularity, standards, parallelism, provenance, and integration. Researchers can easily integrate it with their existing segmentation and imaging workflows by creating input and output components that implement their existing interfaces. Computational efficiency can be improved by parallelizing execution at the cost of memory usage. Different parallelization levels provide different trade-offs, and the optimal setting will depend on the size and composition of the dataset to be processed.
Simple techniques for improving deep neural network outcomes on commodity hardware

NASA Astrophysics Data System (ADS)

Colina, Nicholas Christopher A.; Perez, Carlos E.; Paraan, Francis N. C.

2017-08-01

We benchmark improvements in the performance of deep neural networks (DNN) on the MNIST data test upon imple-menting two simple modifications to the algorithm that have little overhead computational cost. First is GPU parallelization on a commodity graphics card, and second is initializing the DNN with random orthogonal weight matrices prior to optimization. Eigenspectra analysis of the weight matrices reveal that the initially orthogonal matrices remain nearly orthogonal after training. The probability distributions from which these orthogonal matrices are drawn are also shown to significantly affect the performance of these deep neural networks.
GAPD: a GPU-accelerated atom-based polychromatic diffraction simulation code.

PubMed

E, J C; Wang, L; Chen, S; Zhang, Y Y; Luo, S N

2018-03-01

GAPD, a graphics-processing-unit (GPU)-accelerated atom-based polychromatic diffraction simulation code for direct, kinematics-based, simulations of X-ray/electron diffraction of large-scale atomic systems with mono-/polychromatic beams and arbitrary plane detector geometries, is presented. This code implements GPU parallel computation via both real- and reciprocal-space decompositions. With GAPD, direct simulations are performed of the reciprocal lattice node of ultralarge systems (∼5 billion atoms) and diffraction patterns of single-crystal and polycrystalline configurations with mono- and polychromatic X-ray beams (including synchrotron undulator sources), and validation, benchmark and application cases are presented.
Vector radiative transfer code SORD: Performance analysis and quick start guide

NASA Astrophysics Data System (ADS)

Korkin, Sergey; Lyapustin, Alexei; Sinyuk, Alexander; Holben, Brent; Kokhanovsky, Alexander

2017-10-01

We present a new open source polarized radiative transfer code SORD written in Fortran 90/95. SORD numerically simulates propagation of monochromatic solar radiation in a plane-parallel atmosphere over a reflecting surface using the method of successive orders of scattering (hence the name). Thermal emission is ignored. We did not improve the method in any way, but report the accuracy and runtime in 52 benchmark scenarios. This paper also serves as a quick start user's guide for the code available from ftp://maiac.gsfc.nasa.gov/pub/skorkin, from the JQSRT website, or from the corresponding (first) author.
Time Dependent Simulation of Turbopump Flows

NASA Technical Reports Server (NTRS)

Kiris, Cetin C.; Kwak, Dochan; Chan, William; Williams, Robert

2001-01-01

The objective of this viewgraph presentation is to enhance incompressible flow simulation capability for developing aerospace vehicle components, especially unsteady flow phenomena associated with high speed turbo pumps. Unsteady Space Shuttle Main Engine (SSME)-rig1 1 1/2 rotations are completed for the 34.3 million grid points model. The moving boundary capability is obtained by using the DCF module. MLP shared memory parallelism has been implemented and benchmarked in INS3D. The scripting capability from CAD geometry to solution is developed. Data compression is applied to reduce data size in post processing and fluid/structure coupling is initiated.
A quantum physical design flow using ILP and graph drawing

NASA Astrophysics Data System (ADS)

Yazdani, Maryam; Saheb Zamani, Morteza; Sedighi, Mehdi

2013-10-01

Implementing large-scale quantum circuits is one of the challenges of quantum computing. One of the central challenges of accurately modeling the architecture of these circuits is to schedule a quantum application and generate the layout while taking into account the cost of communications and classical resources as well as the maximum exploitable parallelism. In this paper, we present and evaluate a design flow for arbitrary quantum circuits in ion trap technology. Our design flow consists of two parts. First, a scheduler takes a description of a circuit and finds the best order for the execution of its quantum gates using integer linear programming regarding the classical resources (qubits) and instruction dependencies. Then a layout generator receives the schedule produced by the scheduler and generates a layout for this circuit using a graph-drawing algorithm. Our experimental results show that the proposed flow decreases the average latency of quantum circuits by about 11 % for a set of attempted benchmarks and by about 9 % for another set of benchmarks compared with the best in literature.
Implementation and benchmark of a long-range corrected functional in the density functional based tight-binding method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lutsker, V.; Niehaus, T. A., E-mail: thomas.niehaus@physik.uni-regensburg.de; Aradi, B.

2015-11-14

Bridging the gap between first principles methods and empirical schemes, the density functional based tight-binding method (DFTB) has become a versatile tool in predictive atomistic simulations over the past years. One of the major restrictions of this method is the limitation to local or gradient corrected exchange-correlation functionals. This excludes the important class of hybrid or long-range corrected functionals, which are advantageous in thermochemistry, as well as in the computation of vibrational, photoelectron, and optical spectra. The present work provides a detailed account of the implementation of DFTB for a long-range corrected functional in generalized Kohn-Sham theory. We apply themore » method to a set of organic molecules and compare ionization potentials and electron affinities with the original DFTB method and higher level theory. The new scheme cures the significant overpolarization in electric fields found for local DFTB, which parallels the functional dependence in first principles density functional theory (DFT). At the same time, the computational savings with respect to full DFT calculations are not compromised as evidenced by numerical benchmark data.« less
Free Energy Reconstruction from Logarithmic Mean-Force Dynamics Using Multiple Nonequilibrium Trajectories.

PubMed

Morishita, Tetsuya; Yonezawa, Yasushige; Ito, Atsushi M

2017-07-11

Efficient and reliable estimation of the mean force (MF), the derivatives of the free energy with respect to a set of collective variables (CVs), has been a challenging problem because free energy differences are often computed by integrating the MF. Among various methods for computing free energy differences, logarithmic mean-force dynamics (LogMFD) [ Morishita et al., Phys. Rev. E 2012 , 85 , 066702 ] invokes the conservation law in classical mechanics to integrate the MF, which allows us to estimate the free energy profile along the CVs on-the-fly. Here, we present a method called parallel dynamics, which improves the estimation of the MF by employing multiple replicas of the system and is straightforwardly incorporated in LogMFD or a related method. In the parallel dynamics, the MF is evaluated by a nonequilibrium path-ensemble using the multiple replicas based on the Crooks-Jarzynski nonequilibrium work relation. Thanks to the Crooks relation, realizing full-equilibrium states is no longer mandatory for estimating the MF. Additionally, sampling in the hidden subspace orthogonal to the CV space is highly improved with appropriate weights for each metastable state (if any), which is hardly achievable by typical free energy computational methods. We illustrate how to implement parallel dynamics by combining it with LogMFD, which we call logarithmic parallel dynamics (LogPD). Biosystems of alanine dipeptide and adenylate kinase in explicit water are employed as benchmark systems to which LogPD is applied to demonstrate the effect of multiple replicas on the accuracy and efficiency in estimating the free energy profiles using parallel dynamics.
PARALLEL HOP: A SCALABLE HALO FINDER FOR MASSIVE COSMOLOGICAL DATA SETS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Skory, Stephen; Turk, Matthew J.; Norman, Michael L.

2010-11-15

Modern N-body cosmological simulations contain billions (10{sup 9}) of dark matter particles. These simulations require hundreds to thousands of gigabytes of memory and employ hundreds to tens of thousands of processing cores on many compute nodes. In order to study the distribution of dark matter in a cosmological simulation, the dark matter halos must be identified using a halo finder, which establishes the halo membership of every particle in the simulation. The resources required for halo finding are similar to the requirements for the simulation itself. In particular, simulations have become too extensive to use commonly employed halo finders, suchmore » that the computational requirements to identify halos must now be spread across multiple nodes and cores. Here, we present a scalable-parallel halo finding method called Parallel HOP for large-scale cosmological simulation data. Based on the halo finder HOP, it utilizes message passing interface and domain decomposition to distribute the halo finding workload across multiple compute nodes, enabling analysis of much larger data sets than is possible with the strictly serial or previous parallel implementations of HOP. We provide a reference implementation of this method as a part of the toolkit {sup yt}, an analysis toolkit for adaptive mesh refinement data that include complementary analysis modules. Additionally, we discuss a suite of benchmarks that demonstrate that this method scales well up to several hundred tasks and data sets in excess of 2000{sup 3} particles. The Parallel HOP method and our implementation can be readily applied to any kind of N-body simulation data and is therefore widely applicable.« less
Constructing Neuronal Network Models in Massively Parallel Environments.

PubMed

Ippen, Tammo; Eppler, Jochen M; Plesser, Hans E; Diesmann, Markus

2017-01-01

Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.
Constructing Neuronal Network Models in Massively Parallel Environments

PubMed Central

Ippen, Tammo; Eppler, Jochen M.; Plesser, Hans E.; Diesmann, Markus

2017-01-01

Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers. PMID:28559808
Computational tools and lattice design for the PEP-II B-Factory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cai, Y.; Irwin, J.; Nosochkov, Y.

1997-02-01

Several accelerator codes were used to design the PEP-II lattices, ranging from matrix-based codes, such as MAD and DIMAD, to symplectic-integrator codes, such as TRACY and DESPOT. In addition to element-by-element tracking, we constructed maps to determine aberration strengths. Furthermore, we have developed a fast and reliable method (nPB tracking) to track particles with a one-turn map. This new technique allows us to evaluate performance of the lattices on the entire tune-plane. Recently, we designed and implemented an object-oriented code in C++ called LEGO which integrates and expands upon TRACY and DESPOT. {copyright} {ital 1997 American Institute of Physics.}
Computational tools and lattice design for the PEP-II B-Factory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cai Yunhai; Irwin, John; Nosochkov, Yuri

1997-02-01

Several accelerator codes were used to design the PEP-II lattices, ranging from matrix-based codes, such as MAD and DIMAD, to symplectic-integrator codes, such as TRACY and DESPOT. In addition to element-by-element tracking, we constructed maps to determine aberration strengths. Furthermore, we have developed a fast and reliable method (nPB tracking) to track particles with a one-turn map. This new technique allows us to evaluate performance of the lattices on the entire tune-plane. Recently, we designed and implemented an object-oriented code in C++ called LEGO which integrates and expands upon TRACY and DESPOT.

PHISICS/RELAP5-3D RESULTS FOR EXERCISES II-1 AND II-2 OF THE OECD/NEA MHTGR-350 BENCHMARK

DOE Office of Scientific and Technical Information (OSTI.GOV)

Strydom, Gerhard

2016-03-01

The Idaho National Laboratory (INL) Advanced Reactor Technologies (ART) High-Temperature Gas-Cooled Reactor (HTGR) Methods group currently leads the Modular High-Temperature Gas-Cooled Reactor (MHTGR) 350 benchmark. The benchmark consists of a set of lattice-depletion, steady-state, and transient problems that can be used by HTGR simulation groups to assess the performance of their code suites. The paper summarizes the results obtained for the first two transient exercises defined for Phase II of the benchmark. The Parallel and Highly Innovative Simulation for INL Code System (PHISICS), coupled with the INL system code RELAP5-3D, was used to generate the results for the Depressurized Conductionmore » Cooldown (DCC) (exercise II-1a) and Pressurized Conduction Cooldown (PCC) (exercise II-2) transients. These exercises require the time-dependent simulation of coupled neutronics and thermal-hydraulics phenomena, and utilize the steady-state solution previously obtained for exercise I-3 of Phase I. This paper also includes a comparison of the benchmark results obtained with a traditional system code “ring” model against a more detailed “block” model that include kinetics feedback on an individual block level and thermal feedbacks on a triangular sub-mesh. The higher spatial fidelity that can be obtained by the block model is illustrated with comparisons of the maximum fuel temperatures, especially in the case of natural convection conditions that dominate the DCC and PCC events. Differences up to 125 K (or 10%) were observed between the ring and block model predictions of the DCC transient, mostly due to the block model’s capability of tracking individual block decay powers and more detailed helium flow distributions. In general, the block model only required DCC and PCC calculation times twice as long as the ring models, and it therefore seems that the additional development and calculation time required for the block model could be worth the gain that can be obtained in the spatial resolution« less
VVER-440 and VVER-1000 reactor dosimetry benchmark - BUGLE-96 versus ALPAN VII.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Duo, J. I.

2011-07-01

Document available in abstract form only, full text of document follows: Analytical results of the vodo-vodyanoi energetichesky reactor-(VVER-) 440 and VVER-1000 reactor dosimetry benchmarks developed from engineering mockups at the Nuclear Research Inst. Rez LR-0 reactor are discussed. These benchmarks provide accurate determination of radiation field parameters in the vicinity and over the thickness of the reactor pressure vessel. Measurements are compared to calculated results with two sets of tools: TORT discrete ordinates code and BUGLE-96 cross-section library versus the newly Westinghouse-developed RAPTOR-M3G and ALPAN VII.0. The parallel code RAPTOR-M3G enables detailed neutron distributions in energy and space in reducedmore » computational time. ALPAN VII.0 cross-section library is based on ENDF/B-VII.0 and is designed for reactor dosimetry applications. It uses a unique broad group structure to enhance resolution in thermal-neutron-energy range compared to other analogous libraries. The comparison of fast neutron (E > 0.5 MeV) results shows good agreement (within 10%) between BUGLE-96 and ALPAN VII.O libraries. Furthermore, the results compare well with analogous results of participants of the REDOS program (2005). Finally, the analytical results for fast neutrons agree within 15% with the measurements, for most locations in all three mockups. In general, however, the analytical results underestimate the attenuation through the reactor pressure vessel thickness compared to the measurements. (authors)« less
IPRT polarized radiative transfer model intercomparison project - Three-dimensional test cases (phase B)

NASA Astrophysics Data System (ADS)

Emde, Claudia; Barlakas, Vasileios; Cornet, Céline; Evans, Frank; Wang, Zhen; Labonotte, Laurent C.; Macke, Andreas; Mayer, Bernhard; Wendisch, Manfred

2018-04-01

Initially unpolarized solar radiation becomes polarized by scattering in the Earth's atmosphere. In particular molecular scattering (Rayleigh scattering) polarizes electromagnetic radiation, but also scattering of radiation at aerosols, cloud droplets (Mie scattering) and ice crystals polarizes. Each atmospheric constituent produces a characteristic polarization signal, thus spectro-polarimetric measurements are frequently employed for remote sensing of aerosol and cloud properties. Retrieval algorithms require efficient radiative transfer models. Usually, these apply the plane-parallel approximation (PPA), assuming that the atmosphere consists of horizontally homogeneous layers. This allows to solve the vector radiative transfer equation (VRTE) efficiently. For remote sensing applications, the radiance is considered constant over the instantaneous field-of-view of the instrument and each sensor element is treated independently in plane-parallel approximation, neglecting horizontal radiation transport between adjacent pixels (Independent Pixel Approximation, IPA). In order to estimate the errors due to the IPA approximation, three-dimensional (3D) vector radiative transfer models are required. So far, only a few such models exist. Therefore, the International Polarized Radiative Transfer (IPRT) working group of the International Radiation Commission (IRC) has initiated a model intercomparison project in order to provide benchmark results for polarized radiative transfer. The group has already performed an intercomparison for one-dimensional (1D) multi-layer test cases [phase A, 1]. This paper presents the continuation of the intercomparison project (phase B) for 2D and 3D test cases: a step cloud, a cubic cloud, and a more realistic scenario including a 3D cloud field generated by a Large Eddy Simulation (LES) model and typical background aerosols. The commonly established benchmark results for 3D polarized radiative transfer are available at the IPRT website (http://www.meteo.physik.uni-muenchen.de/ iprt).
Computation of the free energy due to electron density fluctuation of a solute in solution: A QM/MM method with perturbation approach combined with a theory of solutions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Suzuoka, Daiki; Takahashi, Hideaki, E-mail: hideaki@m.tohoku.ac.jp; Morita, Akihiro

2014-04-07

We developed a perturbation approach to compute solvation free energy Δμ within the framework of QM (quantum mechanical)/MM (molecular mechanical) method combined with a theory of energy representation (QM/MM-ER). The energy shift η of the whole system due to the electronic polarization of the solute is evaluated using the second-order perturbation theory (PT2), where the electric field formed by surrounding solvent molecules is treated as the perturbation to the electronic Hamiltonian of the isolated solute. The point of our approach is that the energy shift η, thus obtained, is to be adopted for a novel energy coordinate of the distributionmore » functions which serve as fundamental variables in the free energy functional developed in our previous work. The most time-consuming part in the QM/MM-ER simulation can be, thus, avoided without serious loss of accuracy. For our benchmark set of molecules, it is demonstrated that the PT2 approach coupled with QM/MM-ER gives hydration free energies in excellent agreements with those given by the conventional method utilizing the Kohn-Sham SCF procedure except for a few molecules in the benchmark set. A variant of the approach is also proposed to deal with such difficulties associated with the problematic systems. The present approach is also advantageous to parallel implementations. We examined the parallel efficiency of our PT2 code on multi-core processors and found that the speedup increases almost linearly with respect to the number of cores. Thus, it was demonstrated that QM/MM-ER coupled with PT2 deserves practical applications to systems of interest.« less
HACC: Extreme Scaling and Performance Across Diverse Architectures

NASA Astrophysics Data System (ADS)

Habib, Salman; Morozov, Vitali; Frontiere, Nicholas; Finkel, Hal; Pope, Adrian; Heitmann, Katrin

2013-11-01

Supercomputing is evolving towards hybrid and accelerator-based architectures with millions of cores. The HACC (Hardware/Hybrid Accelerated Cosmology Code) framework exploits this diverse landscape at the largest scales of problem size, obtaining high scalability and sustained performance. Developed to satisfy the science requirements of cosmological surveys, HACC melds particle and grid methods using a novel algorithmic structure that flexibly maps across architectures, including CPU/GPU, multi/many-core, and Blue Gene systems. We demonstrate the success of HACC on two very different machines, the CPU/GPU system Titan and the BG/Q systems Sequoia and Mira, attaining unprecedented levels of scalable performance. We demonstrate strong and weak scaling on Titan, obtaining up to 99.2% parallel efficiency, evolving 1.1 trillion particles. On Sequoia, we reach 13.94 PFlops (69.2% of peak) and 90% parallel efficiency on 1,572,864 cores, with 3.6 trillion particles, the largest cosmological benchmark yet performed. HACC design concepts are applicable to several other supercomputer applications.
Static analysis techniques for semiautomatic synthesis of message passing software skeletons

DOE PAGES

Sottile, Matthew; Dagit, Jason; Zhang, Deli; ...

2015-06-29

The design of high-performance computing architectures demands performance analysis of large-scale parallel applications to derive various parameters concerning hardware design and software development. The process of performance analysis and benchmarking an application can be done in several ways with varying degrees of fidelity. One of the most cost-effective ways is to do a coarse-grained study of large-scale parallel applications through the use of program skeletons. The concept of a “program skeleton” that we discuss in this article is an abstracted program that is derived from a larger program where source code that is determined to be irrelevant is removed formore » the purposes of the skeleton. In this work, we develop a semiautomatic approach for extracting program skeletons based on compiler program analysis. Finally, we demonstrate correctness of our skeleton extraction process by comparing details from communication traces, as well as show the performance speedup of using skeletons by running simulations in the SST/macro simulator.« less
Parallel Online Temporal Difference Learning for Motor Control.

PubMed

Caarls, Wouter; Schuitema, Erik

2016-07-01

Temporal difference (TD) learning, a key concept in reinforcement learning, is a popular method for solving simulated control problems. However, in real systems, this method is often avoided in favor of policy search methods because of its long learning time. But policy search suffers from its own drawbacks, such as the necessity of informed policy parameterization and initialization. In this paper, we show that TD learning can work effectively in real robotic systems as well, using parallel model learning and planning. Using locally weighted linear regression and trajectory sampled planning with 14 concurrent threads, we can achieve a speedup of almost two orders of magnitude over regular TD control on simulated control benchmarks. For a real-world pendulum swing-up task and a two-link manipulator movement task, we report a speedup of 20× to 60× , with a real-time learning speed of less than half a minute. The results are competitive with state-of-the-art policy search.
Performance Metrics for Monitoring Parallel Program Executions

NASA Technical Reports Server (NTRS)

Sarukkai, Sekkar R.; Gotwais, Jacob K.; Yan, Jerry; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

Existing tools for debugging performance of parallel programs either provide graphical representations of program execution or profiles of program executions. However, for performance debugging tools to be useful, such information has to be augmented with information that highlights the cause of poor program performance. Identifying the cause of poor performance necessitates the need for not only determining the significance of various performance problems on the execution time of the program, but also needs to consider the effect of interprocessor communications of individual source level data structures. In this paper, we present a suite of normalized indices which provide a convenient mechanism for focusing on a region of code with poor performance and highlights the cause of the problem in terms of processors, procedures and data structure interactions. All the indices are generated from trace files augmented with data structure information.. Further, we show with the help of examples from the NAS benchmark suite that the indices help in detecting potential cause of poor performance, based on augmented execution traces obtained by monitoring the program.
Global Patch Matching

NASA Astrophysics Data System (ADS)

Huang, X.; Hu, K.; Ling, X.; Zhang, Y.; Lu, Z.; Zhou, G.

2017-09-01

This paper introduces a novel global patch matching method that focuses on how to remove fronto-parallel bias and obtain continuous smooth surfaces with assuming that the scenes covered by stereos are piecewise continuous. Firstly, simple linear iterative cluster method (SLIC) is used to segment the base image into a series of patches. Then, a global energy function, which consists of a data term and a smoothness term, is built on the patches. The data term is the second-order Taylor expansion of correlation coefficients, and the smoothness term is built by combing connectivity constraints and the coplanarity constraints are combined to construct the smoothness term. Finally, the global energy function can be built by combining the data term and the smoothness term. We rewrite the global energy function in a quadratic matrix function, and use least square methods to obtain the optimal solution. Experiments on Adirondack stereo and Motorcycle stereo of Middlebury benchmark show that the proposed method can remove fronto-parallel bias effectively, and produce continuous smooth surfaces.
Million city traveling salesman problem solution by divide and conquer clustering with adaptive resonance neural networks.

PubMed

Mulder, Samuel A; Wunsch, Donald C

2003-01-01

The Traveling Salesman Problem (TSP) is a very hard optimization problem in the field of operations research. It has been shown to be NP-complete, and is an often-used benchmark for new optimization techniques. One of the main challenges with this problem is that standard, non-AI heuristic approaches such as the Lin-Kernighan algorithm (LK) and the chained LK variant are currently very effective and in wide use for the common fully connected, Euclidean variant that is considered here. This paper presents an algorithm that uses adaptive resonance theory (ART) in combination with a variation of the Lin-Kernighan local optimization algorithm to solve very large instances of the TSP. The primary advantage of this algorithm over traditional LK and chained-LK approaches is the increased scalability and parallelism allowed by the divide-and-conquer clustering paradigm. Tours obtained by the algorithm are lower quality, but scaling is much better and there is a high potential for increasing performance using parallel hardware.
Large-scale molecular dynamics simulation of DNA: implementation and validation of the AMBER98 force field in LAMMPS.

PubMed

Grindon, Christina; Harris, Sarah; Evans, Tom; Novik, Keir; Coveney, Peter; Laughton, Charles

2004-07-15

Molecular modelling played a central role in the discovery of the structure of DNA by Watson and Crick. Today, such modelling is done on computers: the more powerful these computers are, the more detailed and extensive can be the study of the dynamics of such biological macromolecules. To fully harness the power of modern massively parallel computers, however, we need to develop and deploy algorithms which can exploit the structure of such hardware. The Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is a scalable molecular dynamics code including long-range Coulomb interactions, which has been specifically designed to function efficiently on parallel platforms. Here we describe the implementation of the AMBER98 force field in LAMMPS and its validation for molecular dynamics investigations of DNA structure and flexibility against the benchmark of results obtained with the long-established code AMBER6 (Assisted Model Building with Energy Refinement, version 6). Extended molecular dynamics simulations on the hydrated DNA dodecamer d(CTTTTGCAAAAG)(2), which has previously been the subject of extensive dynamical analysis using AMBER6, show that it is possible to obtain excellent agreement in terms of static, dynamic and thermodynamic parameters between AMBER6 and LAMMPS. In comparison with AMBER6, LAMMPS shows greatly improved scalability in massively parallel environments, opening up the possibility of efficient simulations of order-of-magnitude larger systems and/or for order-of-magnitude greater simulation times.
SDA 7: A modular and parallel implementation of the simulation of diffusional association software

PubMed Central

Martinez, Michael; Romanowska, Julia; Kokh, Daria B.; Ozboyaci, Musa; Yu, Xiaofeng; Öztürk, Mehmet Ali; Richter, Stefan

2015-01-01

The simulation of diffusional association (SDA) Brownian dynamics software package has been widely used in the study of biomacromolecular association. Initially developed to calculate bimolecular protein–protein association rate constants, it has since been extended to study electron transfer rates, to predict the structures of biomacromolecular complexes, to investigate the adsorption of proteins to inorganic surfaces, and to simulate the dynamics of large systems containing many biomacromolecular solutes, allowing the study of concentration‐dependent effects. These extensions have led to a number of divergent versions of the software. In this article, we report the development of the latest version of the software (SDA 7). This release was developed to consolidate the existing codes into a single framework, while improving the parallelization of the code to better exploit modern multicore shared memory computer architectures. It is built using a modular object‐oriented programming scheme, to allow for easy maintenance and extension of the software, and includes new features, such as adding flexible solute representations. We discuss a number of application examples, which describe some of the methods available in the release, and provide benchmarking data to demonstrate the parallel performance. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. PMID:26123630
Parallelized modelling and solution scheme for hierarchically scaled simulations

NASA Technical Reports Server (NTRS)

Padovan, Joe

1995-01-01

This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.
Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing

NASA Astrophysics Data System (ADS)

Amooie, M. A.; Moortgat, J.

2017-12-01

We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.
Fine-grained parallelism accelerating for RNA secondary structure prediction with pseudoknots based on FPGA.

PubMed

Xia, Fei; Jin, Guoqing

2014-06-01

PKNOTS is a most famous benchmark program and has been widely used to predict RNA secondary structure including pseudoknots. It adopts the standard four-dimensional (4D) dynamic programming (DP) method and is the basis of many variants and improved algorithms. Unfortunately, the O(N(6)) computing requirements and complicated data dependency greatly limits the usefulness of PKNOTS package with the explosion in gene database size. In this paper, we present a fine-grained parallel PKNOTS package and prototype system for accelerating RNA folding application based on FPGA chip. We adopted a series of storage optimization strategies to resolve the "Memory Wall" problem. We aggressively exploit parallel computing strategies to improve computational efficiency. We also propose several methods that collectively reduce the storage requirements for FPGA on-chip memory. To the best of our knowledge, our design is the first FPGA implementation for accelerating 4D DP problem for RNA folding application including pseudoknots. The experimental results show a factor of more than 50x average speedup over the PKNOTS-1.08 software running on a PC platform with Intel Core2 Q9400 Quad CPU for input RNA sequences. However, the power consumption of our FPGA accelerator is only about 50% of the general-purpose micro-processors.
Multiscale asymmetric orthogonal wavelet kernel for linear programming support vector learning and nonlinear dynamic systems identification.

PubMed

Lu, Zhao; Sun, Jing; Butts, Kenneth

2014-05-01

Support vector regression for approximating nonlinear dynamic systems is more delicate than the approximation of indicator functions in support vector classification, particularly for systems that involve multitudes of time scales in their sampled data. The kernel used for support vector learning determines the class of functions from which a support vector machine can draw its solution, and the choice of kernel significantly influences the performance of a support vector machine. In this paper, to bridge the gap between wavelet multiresolution analysis and kernel learning, the closed-form orthogonal wavelet is exploited to construct new multiscale asymmetric orthogonal wavelet kernels for linear programming support vector learning. The closed-form multiscale orthogonal wavelet kernel provides a systematic framework to implement multiscale kernel learning via dyadic dilations and also enables us to represent complex nonlinear dynamics effectively. To demonstrate the superiority of the proposed multiscale wavelet kernel in identifying complex nonlinear dynamic systems, two case studies are presented that aim at building parallel models on benchmark datasets. The development of parallel models that address the long-term/mid-term prediction issue is more intricate and challenging than the identification of series-parallel models where only one-step ahead prediction is required. Simulation results illustrate the effectiveness of the proposed multiscale kernel learning.
Flexbar 3.0 - SIMD and multicore parallelization.

PubMed

Roehr, Johannes T; Dieterich, Christoph; Reinert, Knut

2017-09-15

High-throughput sequencing machines can process many samples in a single run. For Illumina systems, sequencing reads are barcoded with an additional DNA tag that is contained in the respective sequencing adapters. The recognition of barcode and adapter sequences is hence commonly needed for the analysis of next-generation sequencing data. Flexbar performs demultiplexing based on barcodes and adapter trimming for such data. The massive amounts of data generated on modern sequencing machines demand that this preprocessing is done as efficiently as possible. We present Flexbar 3.0, the successor of the popular program Flexbar. It employs now twofold parallelism: multi-threading and additionally SIMD vectorization. Both types of parallelism are used to speed-up the computation of pair-wise sequence alignments, which are used for the detection of barcodes and adapters. Furthermore, new features were included to cover a wide range of applications. We evaluated the performance of Flexbar based on a simulated sequencing dataset. Our program outcompetes other tools in terms of speed and is among the best tools in the presented quality benchmark. https://github.com/seqan/flexbar. johannes.roehr@fu-berlin.de or knut.reinert@fu-berlin.de. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
2D imaging of helium ion velocity in the DIII-D divertor

NASA Astrophysics Data System (ADS)

Samuell, C. M.; Porter, G. D.; Meyer, W. H.; Rognlien, T. D.; Allen, S. L.; Briesemeister, A.; Mclean, A. G.; Zeng, L.; Jaervinen, A. E.; Howard, J.

2018-05-01

Two-dimensional imaging of parallel ion velocities is compared to fluid modeling simulations to understand the role of ions in determining divertor conditions and benchmark the UEDGE fluid modeling code. Pure helium discharges are used so that spectroscopic He+ measurements represent the main-ion population at small electron temperatures. Electron temperatures and densities in the divertor match simulated values to within about 20%-30%, establishing the experiment/model match as being at least as good as those normally obtained in the more regularly simulated deuterium plasmas. He+ brightness (HeII) comparison indicates that the degree of detachment is captured well by UEDGE, principally due to the inclusion of E ×B drifts. Tomographically inverted Coherence Imaging Spectroscopy measurements are used to determine the He+ parallel velocities which display excellent agreement between the model and the experiment near the divertor target where He+ is predicted to be the main-ion species and where electron-dominated physics dictates the parallel momentum balance. Upstream near the X-point where He+ is a minority species and ion-dominated physics plays a more important role, there is an underestimation of the flow velocity magnitude by a factor of 2-3. These results indicate that more effort is required to be able to correctly predict ion momentum in these challenging regimes.
Fast quantum Monte Carlo on a GPU

NASA Astrophysics Data System (ADS)

Lutsyshyn, Y.

2015-02-01

We present a scheme for the parallelization of quantum Monte Carlo method on graphical processing units, focusing on variational Monte Carlo simulation of bosonic systems. We use asynchronous execution schemes with shared memory persistence, and obtain an excellent utilization of the accelerator. The CUDA code is provided along with a package that simulates liquid helium-4. The program was benchmarked on several models of Nvidia GPU, including Fermi GTX560 and M2090, and the Kepler architecture K20 GPU. Special optimization was developed for the Kepler cards, including placement of data structures in the register space of the Kepler GPUs. Kepler-specific optimization is discussed.
Electro-osmotic transport in wet processing of textiles

DOEpatents

Cooper, John F.

1998-01-01

Electro-osmotic (or electrokinetic) transport is used to efficiently force a solution (or water) through the interior of the fibers or yarns of textile materials for wet processing of textiles. The textile material is passed between electrodes that apply an electric field across the fabric. Used alone or in parallel with conventional hydraulic washing (forced convection), electro-osmotic transport greatly reduces the amount of water used in wet processing. The amount of water required to achieve a fixed level of rinsing of tint can be reduced, for example, to 1-5 lbs water per pound of fabric from an industry benchmark of 20 lbs water/lb fabric.

Does the Intel Xeon Phi processor fit HEP workloads?

NASA Astrophysics Data System (ADS)

Nowak, A.; Bitzes, G.; Dotti, A.; Lazzaro, A.; Jarp, S.; Szostek, P.; Valsan, L.; Botezatu, M.; Leduc, J.

2014-06-01

This paper summarizes the five years of CERN openlab's efforts focused on the Intel Xeon Phi co-processor, from the time of its inception to public release. We consider the architecture of the device vis a vis the characteristics of HEP software and identify key opportunities for HEP processing, as well as scaling limitations. We report on improvements and speedups linked to parallelization and vectorization on benchmarks involving software frameworks such as Geant4 and ROOT. Finally, we extrapolate current software and hardware trends and project them onto accelerators of the future, with the specifics of offline and online HEP processing in mind.
Electro-osmotic transport in wet processing of textiles

DOEpatents

Cooper, J.F.

1998-09-22

Electro-osmotic (or electrokinetic) transport is used to efficiently force a solution (or water) through the interior of the fibers or yarns of textile materials for wet processing of textiles. The textile material is passed between electrodes that apply an electric field across the fabric. Used alone or in parallel with conventional hydraulic washing (forced convection), electro-osmotic transport greatly reduces the amount of water used in wet processing. The amount of water required to achieve a fixed level of rinsing of tint can be reduced, for example, to 1--5 lbs water per pound of fabric from an industry benchmark of 20 lbs water/lb fabric. 5 figs.
Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features.

PubMed

Yu, Dongjun; Wu, Xiaowei; Shen, Hongbin; Yang, Jian; Tang, Zhenmin; Qi, Yong; Yang, Jingyu

2012-12-01

Membrane proteins are encoded by ~ 30% in the genome and function importantly in the living organisms. Previous studies have revealed that membrane proteins' structures and functions show obvious cell organelle-specific properties. Hence, it is highly desired to predict membrane protein's subcellular location from the primary sequence considering the extreme difficulties of membrane protein wet-lab studies. Although many models have been developed for predicting protein subcellular locations, only a few are specific to membrane proteins. Existing prediction approaches were constructed based on statistical machine learning algorithms with serial combination of multi-view features, i.e., different feature vectors are simply serially combined to form a super feature vector. However, such simple combination of features will simultaneously increase the information redundancy that could, in turn, deteriorate the final prediction accuracy. That's why it was often found that prediction success rates in the serial super space were even lower than those in a single-view space. The purpose of this paper is investigation of a proper method for fusing multiple multi-view protein sequential features for subcellular location predictions. Instead of serial strategy, we propose a novel parallel framework for fusing multiple membrane protein multi-view attributes that will represent protein samples in complex spaces. We also proposed generalized principle component analysis (GPCA) for feature reduction purpose in the complex geometry. All the experimental results through different machine learning algorithms on benchmark membrane protein subcellular localization datasets demonstrate that the newly proposed parallel strategy outperforms the traditional serial approach. We also demonstrate the efficacy of the parallel strategy on a soluble protein subcellular localization dataset indicating the parallel technique is flexible to suite for other computational biology problems. The software and datasets are available at: http://www.csbio.sjtu.edu.cn/bioinf/mpsp.
High-Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Park, K. C.; Gumaste, U.; Chen, P.-S.; Lesoinne, M.; Stern, P.

1996-01-01

This research program dealt with the application of high-performance computing methods to the numerical simulation of complete jet engines. The program was initiated in January 1993 by applying two-dimensional parallel aeroelastic codes to the interior gas flow problem of a bypass jet engine. The fluid mesh generation, domain decomposition and solution capabilities were successfully tested. Attention was then focused on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by these structural displacements. The latter is treated by a ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field fluid elements. New partitioned analysis procedures to treat this coupled three-component problem were developed during 1994 and 1995. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers, including the iPSC-860, Paragon XP/S and the IBM SP2. For the global steady-state axisymmetric analysis of a complete engine we have decided to use the NASA-sponsored ENG10 program, which uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor tor parallel versions of ENG10 was developed. During 1995 and 1996 we developed the capability tor the first full 3D aeroelastic simulation of a multirow engine stage. This capability was tested on the IBM SP2 parallel supercomputer at NASA Ames. Benchmark results were presented at the 1196 Computational Aeroscience meeting.
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arampatzis, Giorgos, E-mail: garab@math.uoc.gr; Katsoulakis, Markos A., E-mail: markos@math.umass.edu; Plechac, Petr, E-mail: plechac@math.udel.edu

2012-10-01

We present a mathematical framework for constructing and analyzing parallel algorithms for lattice kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. Rather than focusing on constructing exactly the stochastic trajectories, our approach relies on approximating the evolution of observables, such as density, coverage, correlations and so on. More specifically, we develop a spatial domain decomposition of the Markov operator (generator) that describes the evolution of all observables according to the kinetic Monte Carlo algorithm. This domain decompositionmore » corresponds to a decomposition of the Markov generator into a hierarchy of operators and can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). Based on this operator decomposition, we formulate parallel Fractional step kinetic Monte Carlo algorithms by employing the Trotter Theorem and its randomized variants; these schemes, (a) are partially asynchronous on each fractional step time-window, and (b) are characterized by their communication schedule between processors. The proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules. We carry out a detailed benchmarking of the parallel KMC schemes using available exact solutions, for example, in Ising-type systems and we demonstrate the capabilities of the method to simulate complex spatially distributed reactions at very large scales on GPUs. Finally, we discuss work load balancing between processors and propose a re-balancing scheme based on probabilistic mass transport methods.« less
Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoginath, Srikanth B; Perumalla, Kalyan S

2013-01-01

With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results frommore » experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.« less
Improved performance of organic light-emitting diodes with MoO3 interlayer by oblique angle deposition.

PubMed

Liu, S W; Divayana, Y; Sun, X W; Wang, Y; Leck, K S; Demir, H V

2011-02-28

We fabricated and demonstrated improved organic light emitting diodes (OLEDs) in a thin film architecture of indium tin oxide (ITO)/ molybdenum trioxide (MoO3) (20 nm)/N,N'-Di(naphth-2-yl)-N,N'-diphenyl-benzidine (NPB) (50 nm)/ tris-(8-hydroxyquinoline) (Alq3) (70 nm)/Mg:Ag (200 nm) using an oblique angle deposition technique by which MoO3 was deposited at oblique angles (θ) with respect to the surface normal. It was found that, without sacrificing the power efficiency of the device, the device current efficiency and external quantum efficiency were significantly enhanced at an oblique deposition angle of θ=60° for MoO3.
GTA (ground test accelerator) Phase 1: Baseline design report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

1986-08-01

The national Neutral Particle Beam (NPB) program has two objectives: to provide the necessary basis for a discriminator/weapon decision by 1992, and to develop the technology in stages that lead ultimately to a neutral particle beam weapon. The ground test accelerator (GTA) is the test bed that permits the advancement of the state-of-the-art under experimental conditions in an integrated automated system mode. An intermediate goal of the GTA program is to support the Integrated Space Experiments, while the ultimate goal is to support the 1992 decision. The GTA system and each of its major subsystems are described, and project schedulesmore » and resource requirements are provided. (LEW)« less
Improved packing of protein side chains with parallel ant colonies.

PubMed

Quan, Lijun; Lü, Qiang; Li, Haiou; Xia, Xiaoyan; Wu, Hongjie

2014-01-01

The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms.
Benchmark Comparison of Cloud Analytics Methods Applied to Earth Observations

NASA Technical Reports Server (NTRS)

Lynnes, Chris; Little, Mike; Huang, Thomas; Jacob, Joseph; Yang, Phil; Kuo, Kwo-Sen

2016-01-01

Cloud computing has the potential to bring high performance computing capabilities to the average science researcher. However, in order to take full advantage of cloud capabilities, the science data used in the analysis must often be reorganized. This typically involves sharding the data across multiple nodes to enable relatively fine-grained parallelism. This can be either via cloud-based file systems or cloud-enabled databases such as Cassandra, Rasdaman or SciDB. Since storing an extra copy of data leads to increased cost and data management complexity, NASA is interested in determining the benefits and costs of various cloud analytics methods for real Earth Observation cases. Accordingly, NASA's Earth Science Technology Office and Earth Science Data and Information Systems project have teamed with cloud analytics practitioners to run a benchmark comparison on cloud analytics methods using the same input data and analysis algorithms. We have particularly looked at analysis algorithms that work over long time series, because these are particularly intractable for many Earth Observation datasets which typically store data with one or just a few time steps per file. This post will present side-by-side cost and performance results for several common Earth observation analysis operations.
Benchmark Comparison of Cloud Analytics Methods Applied to Earth Observations

NASA Astrophysics Data System (ADS)

Lynnes, C.; Little, M. M.; Huang, T.; Jacob, J. C.; Yang, C. P.; Kuo, K. S.

2016-12-01

Cloud computing has the potential to bring high performance computing capabilities to the average science researcher. However, in order to take full advantage of cloud capabilities, the science data used in the analysis must often be reorganized. This typically involves sharding the data across multiple nodes to enable relatively fine-grained parallelism. This can be either via cloud-based filesystems or cloud-enabled databases such as Cassandra, Rasdaman or SciDB. Since storing an extra copy of data leads to increased cost and data management complexity, NASA is interested in determining the benefits and costs of various cloud analytics methods for real Earth Observation cases. Accordingly, NASA's Earth Science Technology Office and Earth Science Data and Information Systems project have teamed with cloud analytics practitioners to run a benchmark comparison on cloud analytics methods using the same input data and analysis algorithms. We have particularly looked at analysis algorithms that work over long time series, because these are particularly intractable for many Earth Observation datasets which typically store data with one or just a few time steps per file. This post will present side-by-side cost and performance results for several common Earth observation analysis operations.
Nonlinear 3D visco-resistive MHD modeling of fusion plasmas: a comparison between numerical codes

NASA Astrophysics Data System (ADS)

Bonfiglio, D.; Chacon, L.; Cappello, S.

2008-11-01

Fluid plasma models (and, in particular, the MHD model) are extensively used in the theoretical description of laboratory and astrophysical plasmas. We present here a successful benchmark between two nonlinear, three-dimensional, compressible visco-resistive MHD codes. One is the fully implicit, finite volume code PIXIE3D [1,2], which is characterized by many attractive features, notably the generalized curvilinear formulation (which makes the code applicable to different geometries) and the possibility to include in the computation the energy transport equation and the extended MHD version of Ohm's law. In addition, the parallel version of the code features excellent scalability properties. Results from this code, obtained in cylindrical geometry, are compared with those produced by the semi-implicit cylindrical code SpeCyl, which uses finite differences radially, and spectral formulation in the other coordinates [3]. Both single and multi-mode simulations are benchmarked, regarding both reversed field pinch (RFP) and ohmic tokamak magnetic configurations. [1] L. Chacon, Computer Physics Communications 163, 143 (2004). [2] L. Chacon, Phys. Plasmas 15, 056103 (2008). [3] S. Cappello, Plasma Phys. Control. Fusion 46, B313 (2004) & references therein.
Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils.

PubMed

Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton

2014-11-11

We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.
FaCSI: A block parallel preconditioner for fluid-structure interaction in hemodynamics

NASA Astrophysics Data System (ADS)

Deparis, Simone; Forti, Davide; Grandperrin, Gwenol; Quarteroni, Alfio

2016-12-01

Modeling Fluid-Structure Interaction (FSI) in the vascular system is mandatory to reliably compute mechanical indicators in vessels undergoing large deformations. In order to cope with the computational complexity of the coupled 3D FSI problem after discretizations in space and time, a parallel solution is often mandatory. In this paper we propose a new block parallel preconditioner for the coupled linearized FSI system obtained after space and time discretization. We name it FaCSI to indicate that it exploits the Factorized form of the linearized FSI matrix, the use of static Condensation to formally eliminate the interface degrees of freedom of the fluid equations, and the use of a SIMPLE preconditioner for saddle-point problems. FaCSI is built upon a block Gauss-Seidel factorization of the FSI Jacobian matrix and it uses ad-hoc preconditioners for each physical component of the coupled problem, namely the fluid, the structure and the geometry. In the fluid subproblem, after operating static condensation of the interface fluid variables, we use a SIMPLE preconditioner on the reduced fluid matrix. Moreover, to efficiently deal with a large number of processes, FaCSI exploits efficient single field preconditioners, e.g., based on domain decomposition or the multigrid method. We measure the parallel performances of FaCSI on a benchmark cylindrical geometry and on a problem of physiological interest, namely the blood flow through a patient-specific femoropopliteal bypass. We analyze the dependence of the number of linear solver iterations on the cores count (scalability of the preconditioner) and on the mesh size (optimality).
Use of general purpose graphics processing units with MODFLOW

USGS Publications Warehouse

Hughes, Joseph D.; White, Jeremy T.

2013-01-01

To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.
Advanced Computational Methods for Security Constrained Financial Transmission Rights: Structure and Parallelism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Elbert, Stephen T.; Kalsi, Karanjit; Vlachopoulou, Maria

Financial Transmission Rights (FTRs) help power market participants reduce price risks associated with transmission congestion. FTRs are issued based on a process of solving a constrained optimization problem with the objective to maximize the FTR social welfare under power flow security constraints. Security constraints for different FTR categories (monthly, seasonal or annual) are usually coupled and the number of constraints increases exponentially with the number of categories. Commercial software for FTR calculation can only provide limited categories of FTRs due to the inherent computational challenges mentioned above. In this paper, a novel non-linear dynamical system (NDS) approach is proposed tomore » solve the optimization problem. The new formulation and performance of the NDS solver is benchmarked against widely used linear programming (LP) solvers like CPLEX™ and tested on large-scale systems using data from the Western Electricity Coordinating Council (WECC). The NDS is demonstrated to outperform the widely used CPLEX algorithms while exhibiting superior scalability. Furthermore, the NDS based solver can be easily parallelized which results in significant computational improvement.« less
Particle-in-Cell laser-plasma simulation on Xeon Phi coprocessors

NASA Astrophysics Data System (ADS)

Surmin, I. A.; Bastrakov, S. I.; Efimenko, E. S.; Gonoskov, A. A.; Korzhimanov, A. V.; Meyerov, I. B.

2016-05-01

This paper concerns the development of a high-performance implementation of the Particle-in-Cell method for plasma simulation on Intel Xeon Phi coprocessors. We discuss the suitability of the method for Xeon Phi architecture and present our experience in the porting and optimization of the existing parallel Particle-in-Cell code PICADOR. Direct porting without code modification gives performance on Xeon Phi close to that of an 8-core CPU on a benchmark problem with 50 particles per cell. We demonstrate step-by-step optimization techniques, such as improving data locality, enhancing parallelization efficiency and vectorization leading to an overall 4.2 × speedup on CPU and 7.5 × on Xeon Phi compared to the baseline version. The optimized version achieves 16.9 ns per particle update on an Intel Xeon E5-2660 CPU and 9.3 ns per particle update on an Intel Xeon Phi 5110P. For a real problem of laser ion acceleration in targets with surface grating, where a large number of macroparticles per cell is required, the speedup of Xeon Phi compared to CPU is 1.6 ×.
Taming parallel I/O complexity with auto-tuning

DOE PAGES

Behzad, Babak; Luu, Huong Vu Thanh; Huchette, Joseph; ...

2013-11-17

We present an auto-tuning system for optimizing I/O performance of HDF5 applications and demonstrate its value across platforms, applications, and at scale. The system uses a genetic algorithm to search a large space of tunable parameters and to identify effective settings at all layers of the parallel I/O stack. The parameter settings are applied transparently by the auto-tuning system via dynamically intercepted HDF5 calls. To validate our auto-tuning system, we applied it to three I/O benchmarks (VPIC, VORPAL, and GCRM) that replicate the I/O activity of their respective applications. We tested the system with different weak-scaling configurations (128, 2048, andmore » 4096 CPU cores) that generate 30 GB to 1 TB of data, and executed these configurations on diverse HPC platforms (Cray XE6, IBM BG/P, and Dell Cluster). In all cases, the auto-tuning framework identified tunable parameters that substantially improved write performance over default system settings. In conclusion, we consistently demonstrate I/O write speedups between 2x and 100x for test configurations.« less
Using video-oriented instructions to speed up sequence comparison.

PubMed

Wozniak, A

1997-04-01

This document presents an implementation of the well-known Smith-Waterman algorithm for comparison of proteic and nucleic sequences, using specialized video instructions. These instructions, SIMD-like in their design, make possible parallelization of the algorithm at the instruction level. Benchmarks on an ULTRA SPARC running at 167 MHz show a speed-up factor of two compared to the same algorithm implemented with integer instructions on the same machine. Performance reaches over 18 million matrix cells per second on a single processor, giving to our knowledge the fastest implementation of the Smith-Waterman algorithm on a workstation. The accelerated procedure was introduced in LASSAP--a LArge Scale Sequence compArison Package software developed at INRIA--which handles parallelism at higher level. On a SUN Enterprise 6000 server with 12 processors, a speed of nearly 200 million matrix cells per second has been obtained. A sequence of length 300 amino acids is scanned against SWISSPROT R33 (1,8531,385 residues) in 29 s. This procedure is not restricted to databank scanning. It applies to all cases handled by LASSAP (intra- and inter-bank comparisons, Z-score computation, etc.
Targeted proteomics coming of age - SRM, PRM and DIA performance evaluated from a core facility perspective.

PubMed

Kockmann, Tobias; Trachsel, Christian; Panse, Christian; Wahlander, Asa; Selevsek, Nathalie; Grossmann, Jonas; Wolski, Witold E; Schlapbach, Ralph

2016-08-01

Quantitative mass spectrometry is a rapidly evolving methodology applied in a large number of omics-type research projects. During the past years, new designs of mass spectrometers have been developed and launched as commercial systems while in parallel new data acquisition schemes and data analysis paradigms have been introduced. Core facilities provide access to such technologies, but also actively support the researchers in finding and applying the best-suited analytical approach. In order to implement a solid fundament for this decision making process, core facilities need to constantly compare and benchmark the various approaches. In this article we compare the quantitative accuracy and precision of current state of the art targeted proteomics approaches single reaction monitoring (SRM), parallel reaction monitoring (PRM) and data independent acquisition (DIA) across multiple liquid chromatography mass spectrometry (LC-MS) platforms, using a readily available commercial standard sample. All workflows are able to reproducibly generate accurate quantitative data. However, SRM and PRM workflows show higher accuracy and precision compared to DIA approaches, especially when analyzing low concentrated analytes. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Communication Studies of DMP and SMP Machines

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

Understanding the interplay between machines and problems is key to obtaining high performance on parallel machines. This paper investigates the interplay between programming paradigms and communication capabilities of parallel machines. In particular, we explicate the communication capabilities of the IBM SP-2 distributed-memory multiprocessor and the SGI PowerCHALLENGEarray symmetric multiprocessor. Two benchmark problems of bitonic sorting and Fast Fourier Transform are selected for experiments. Communication-efficient algorithms are developed to exploit the overlapping capabilities of the machines. Programs are written in Message-Passing Interface for portability and identical codes are used for both machines. Various data sizes and message sizes are used to test the machines' communication capabilities. Experimental results indicate that the communication performance of the multiprocessors are consistent with the size of messages. The SP-2 is sensitive to message size but yields a much higher communication overlapping because of the communication co-processor. The PowerCHALLENGEarray is not highly sensitive to message size and yields a low communication overlapping. Bitonic sorting yields lower performance compared to FFT due to a smaller computation-to-communication ratio.
Design and evaluation of an architecture for a digital signal processor for instrumentation applications

NASA Astrophysics Data System (ADS)

Fellman, Ronald D.; Kaneshiro, Ronald T.; Konstantinides, Konstantinos

1990-03-01

The authors present the design and evaluation of an architecture for a monolithic, programmable, floating-point digital signal processor (DSP) for instrumentation applications. An investigation of the most commonly used algorithms in instrumentation led to a design that satisfies the requirements for high computational and I/O (input/output) throughput. In the arithmetic unit, a 16- x 16-bit multiplier and a 32-bit accumulator provide the capability for single-cycle multiply/accumulate operations, and three format adjusters automatically adjust the data format for increased accuracy and dynamic range. An on-chip I/O unit is capable of handling data block transfers through a direct memory access port and real-time data streams through a pair of parallel I/O ports. I/O operations and program execution are performed in parallel. In addition, the processor includes two data memories with independent addressing units, a microsequencer with instruction RAM, and multiplexers for internal data redirection. The authors also present the structure and implementation of a design environment suitable for the algorithmic, behavioral, and timing simulation of a complete DSP system. Various benchmarking results are reported.
Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Carter, Jonathan; Shalf, John; Skinner, David; Ethier, Stephane; Biswas, Rupak; Djomehri, Jahed; VanderWijngaart, Rob

2003-01-01

The growing gap between sustained and peak performance for scientific applications has become a well-known problem in high performance computing. The recent development of parallel vector systems offers the potential to bridge this gap for a significant number of computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines a full spectrum of low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks using some simple optimizations. Finally, we evaluate the perfor- mance of several numerical codes from key scientific computing domains. Overall results demonstrate that the SX6 achieves high performance on a large fraction of our application suite and in many cases significantly outperforms the RISC-based architectures. However, certain classes of applications are not easily amenable to vectorization and would likely require extensive reengineering of both algorithm and implementation to utilize the SX6 effectively.
Supercomputer simulations of structure formation in the Universe

NASA Astrophysics Data System (ADS)

Ishiyama, Tomoaki

2017-06-01

We describe the implementation and performance results of our massively parallel MPI†/OpenMP‡ hybrid TreePM code for large-scale cosmological N-body simulations. For domain decomposition, a recursive multi-section algorithm is used and the size of domains are automatically set so that the total calculation time is the same for all processes. We developed a highly-tuned gravity kernel for short-range forces, and a novel communication algorithm for long-range forces. For two trillion particles benchmark simulation, the average performance on the fullsystem of K computer (82,944 nodes, the total number of core is 663,552) is 5.8 Pflops, which corresponds to 55% of the peak speed.
Low-frequency quadrupole impedance of undulators and wigglers

DOE PAGES

Blednykh, A.; Bassi, G.; Hidaka, Y.; ...

2016-10-25

An analytical expression of the low-frequency quadrupole impedance for undulators and wigglers is derived and benchmarked against beam-based impedance measurements done at the 3 GeV NSLS-II storage ring. The adopted theoretical model, valid for an arbitrary number of electromagnetic layers with parallel geometry, allows to calculate the quadrupole impedance for arbitrary values of the magnetic permeability μ r. Here, in the comparison of the analytical results with the measurements for variable magnet gaps, two limit cases of the permeability have been studied: the case of perfect magnets (μ r → ∞), and the case in which the magnets are fullymore » saturated (μ r = 1).« less
Fairness of QoS supporting in optical burst switching

NASA Astrophysics Data System (ADS)

Xuan, Xuelei; Liu, Hua; Chen, Chunfeng; Zhang, Zhizhong

2004-04-01

In this paper we investigate the fairness problem of offset-time-based quality of service (QoS) scheme proposed by Qiao and Dixit in optical burst switching (OBS) networks. In the proposed schemes, QoS relies on the fact that the requests for reservation further into the future, but for practical, benchmark offset-time of data bursts at the intermediate nodes is not equal to each other. Here, a new offset-time-based QoS scheme is introduced, where data bursts are classified according to their offset-time and isolated in the wavelength domain or time domain to achieve the parallel reservation. Through simulation, it is found that this scheme achieves fairness among data bursts with different priority.
A New Code SORD for Simulation of Polarized Light Scattering in the Earth Atmosphere

NASA Technical Reports Server (NTRS)

Korkin, Sergey; Lyapustin, Alexei; Sinyuk, Aliaksandr; Holben, Brent

2016-01-01

We report a new publicly available radiative transfer (RT) code for numerical simulation of polarized light scattering in plane-parallel atmosphere of the Earth. Using 44 benchmark tests, we prove high accuracy of the new RT code, SORD (Successive ORDers of scattering). We describe capabilities of SORD and show run time for each test on two different machines. At present, SORD is supposed to work as part of the Aerosol Robotic NETwork (AERONET) inversion algorithm. For natural integration with the AERONET software, SORD is coded in Fortran 90/95. The code is available by email request from the corresponding (first) author or from ftp://climate1.gsfc.nasa.gov/skorkin/SORD/.
High-order finite difference formulations for the incompressible Navier-Stokes equations on the CM-5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tafti, D.

1995-12-01

The paper describes the features and implementation of a general purpose high-order accurate finite difference computer program for direct and large-eddy simulations of turbulence on the CM-5 in the data parallel mode. Benchmarking studies for a direct simulation of turbulent channel flow are discussed. Performance of up to 8.8 GFLOPS is obtained for the high-order formulations on 512 processing nodes of the CM-5. The execution time for a simulation with 24 million nodes in a domain with two periodic directions is in the range of 0.2 {mu}secs/time-step/degree of freedom on 512 processing nodes of the CM-5.
Automatic Data Traffic Control on DSM Architecture

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry; Kwak, Dochan (Technical Monitor)

2000-01-01

We study data traffic on distributed shared memory machines and conclude that data placement and grouping improve performance of scientific codes. We present several methods which user can employ to improve data traffic in his code. We report on implementation of a tool which detects the code fragments causing data congestions and advises user on improvements of data routing in these fragments. The capabilities of the tool include deduction of data alignment and affinity from the source code; detection of the code constructs having abnormally high cache or TLB misses; generation of data placement constructs. We demonstrate the capabilities of the tool on experiments with NAS parallel benchmarks and with a simple computational fluid dynamics application ARC3D.
Turbulent shear layers in confining channels

NASA Astrophysics Data System (ADS)

Benham, Graham P.; Castrejon-Pita, Alfonso A.; Hewitt, Ian J.; Please, Colin P.; Style, Rob W.; Bird, Paul A. D.

2018-06-01

We present a simple model for the development of shear layers between parallel flows in confining channels. Such flows are important across a wide range of topics from diffusers, nozzles and ducts to urban air flow and geophysical fluid dynamics. The model approximates the flow in the shear layer as a linear profile separating uniform-velocity streams. Both the channel geometry and wall drag affect the development of the flow. The model shows good agreement with both particle image velocimetry experiments and computational turbulence modelling. The simplicity and low computational cost of the model allows it to be used for benchmark predictions and design purposes, which we demonstrate by investigating optimal pressure recovery in diffusers with non-uniform inflow.
GPU acceleration for digitally reconstructed radiographs using bindless texture objects and CUDA/OpenGL interoperability.

PubMed

Abdellah, Marwan; Eldeib, Ayman; Owis, Mohamed I

2015-01-01

This paper features an advanced implementation of the X-ray rendering algorithm that harnesses the giant computing power of the current commodity graphics processors to accelerate the generation of high resolution digitally reconstructed radiographs (DRRs). The presented pipeline exploits the latest features of NVIDIA Graphics Processing Unit (GPU) architectures, mainly bindless texture objects and dynamic parallelism. The rendering throughput is substantially improved by exploiting the interoperability mechanisms between CUDA and OpenGL. The benchmarks of our optimized rendering pipeline reflect its capability of generating DRRs with resolutions of 2048(2) and 4096(2) at interactive and semi interactive frame-rates using an NVIDIA GeForce 970 GTX device.
Beyond core count: a look at new mainstream computing platforms for HEP workloads

NASA Astrophysics Data System (ADS)

Szostek, P.; Nowak, A.; Bitzes, G.; Valsan, L.; Jarp, S.; Dotti, A.

2014-06-01

As Moore's Law continues to deliver more and more transistors, the mainstream processor industry is preparing to expand its investments in areas other than simple core count. These new interests include deep integration of on-chip components, advanced vector units, memory, cache and interconnect technologies. We examine these moving trends with parallelized and vectorized High Energy Physics workloads in mind. In particular, we report on practical experience resulting from experiments with scalable HEP benchmarks on the Intel "Ivy Bridge-EP" and "Haswell" processor families. In addition, we examine the benefits of the new "Haswell" microarchitecture and its impact on multiple facets of HEP software. Finally, we report on the power efficiency of new systems.
Better than $l/Mflops sustained: a scalable PC-based parallel computer for lattice QCD

NASA Astrophysics Data System (ADS)

Fodor, Zoltán; Katz, Sándor D.; Papp, Gábor

2003-05-01

We study the feasibility of a PC-based parallel computer for medium to large scale lattice QCD simulations. The Eötvös Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single precision sustained performance for dynamical QCD without communication is 1510 Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives a total performance of 208 Gflops for Wilson and 133 Gflops for staggered QCD, respectively (for 64-bit applications the performance is approximately halved). The novel feature of our system is its communication architecture. In order to have a scalable, cost-effective machine we use Gigabit Ethernet cards for nearest-neighbor communications in a two-dimensional mesh. This type of communication is cost effective (only 30% of the hardware costs is spent on the communication). According to our benchmark measurements this type of communication results in around 40% communication time fraction for lattices upto 48 3·96 in full QCD simulations. The price/sustained-performance ratio for full QCD is better than l/Mflops for Wilson (and around 1.5/Mflops for staggered) quarks for practically any lattice size, which can fit in our parallel computer. The communication software is freely available upon request for non-profit organizations.
Large-scale virtual screening on public cloud resources with Apache Spark.

PubMed

Capuccini, Marco; Ahmed, Laeeq; Schaal, Wesley; Laure, Erwin; Spjuth, Ola

2017-01-01

Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against [Formula: see text]2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub (https://github.com/mcapuccini/spark-vs).Graphical abstract.
The UBO-TSUFD tsunami inundation model: validation and application to a tsunami case study focused on the city of Catania, Italy

NASA Astrophysics Data System (ADS)

Tinti, S.; Tonini, R.

2013-07-01

Nowadays numerical models are a powerful tool in tsunami research since they can be used (i) to reconstruct modern and historical events, (ii) to cast new light on tsunami sources by inverting tsunami data and observations, (iii) to build scenarios in the frame of tsunami mitigation plans, and (iv) to produce forecasts of tsunami impact and inundation in systems of early warning. In parallel with the general recognition of the importance of numerical tsunami simulations, the demand has grown for reliable tsunami codes, validated through tests agreed upon by the tsunami community. This paper presents the tsunami code UBO-TSUFD that has been developed at the University of Bologna, Italy, and that solves the non-linear shallow water (NSW) equations in a Cartesian frame, with inclusion of bottom friction and exclusion of the Coriolis force, by means of a leapfrog (LF) finite-difference scheme on a staggered grid and that accounts for moving boundaries to compute sea inundation and withdrawal at the coast. Results of UBO-TSUFD applied to four classical benchmark problems are shown: two benchmarks are based on analytical solutions, one on a plane wave propagating on a flat channel with a constant slope beach; and one on a laboratory experiment. The code is proven to perform very satisfactorily since it reproduces quite well the benchmark theoretical and experimental data. Further, the code is applied to a realistic tsunami case: a scenario of a tsunami threatening the coasts of eastern Sicily, Italy, is defined and discussed based on the historical tsunami of 11 January 1693, i.e. one of the most severe events in the Italian history.
A tiered asthma hazard characterization and exposure assessment approach for evaluation of consumer product ingredients.

PubMed

Maier, Andrew; Vincent, Melissa J; Parker, Ann; Gadagbui, Bernard K; Jayjock, Michael

2015-12-01

Asthma is a complex syndrome with significant consequences for those affected. The number of individuals affected is growing, although the reasons for the increase are uncertain. Ensuring the effective management of potential exposures follows from substantial evidence that exposure to some chemicals can increase the likelihood of asthma responses. We have developed a safety assessment approach tailored to the screening of asthma risks from residential consumer product ingredients as a proactive risk management tool. Several key features of the proposed approach advance the assessment resources often used for asthma issues. First, a quantitative health benchmark for asthma or related endpoints (irritation and sensitization) is provided that extends qualitative hazard classification methods. Second, a parallel structure is employed to include dose-response methods for asthma endpoints and methods for scenario specific exposure estimation. The two parallel tracks are integrated in a risk characterization step. Third, a tiered assessment structure is provided to accommodate different amounts of data for both the dose-response assessment (i.e., use of existing benchmarks, hazard banding, or the threshold of toxicological concern) and exposure estimation (i.e., use of empirical data, model estimates, or exposure categories). Tools building from traditional methods and resources have been adapted to address specific issues pertinent to asthma toxicology (e.g., mode-of-action and dose-response features) and the nature of residential consumer product use scenarios (e.g., product use patterns and exposure durations). A case study for acetic acid as used in various sentinel products and residential cleaning scenarios was developed to test the safety assessment methodology. In particular, the results were used to refine and verify relationships among tiered approaches such that each lower data tier in the approach provides a similar or greater margin of safety for a given scenario. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Developing a Shuffled Complex-Self Adaptive Hybrid Evolution (SC-SAHEL) Framework for Water Resources Management and Water-Energy System Optimization

NASA Astrophysics Data System (ADS)

Rahnamay Naeini, M.; Sadegh, M.; AghaKouchak, A.; Hsu, K. L.; Sorooshian, S.; Yang, T.

2017-12-01

Meta-Heuristic optimization algorithms have gained a great deal of attention in a wide variety of fields. Simplicity and flexibility of these algorithms, along with their robustness, make them attractive tools for solving optimization problems. Different optimization methods, however, hold algorithm-specific strengths and limitations. Performance of each individual algorithm obeys the "No-Free-Lunch" theorem, which means a single algorithm cannot consistently outperform all possible optimization problems over a variety of problems. From users' perspective, it is a tedious process to compare, validate, and select the best-performing algorithm for a specific problem or a set of test cases. In this study, we introduce a new hybrid optimization framework, entitled Shuffled Complex-Self Adaptive Hybrid EvoLution (SC-SAHEL), which combines the strengths of different evolutionary algorithms (EAs) in a parallel computing scheme, and allows users to select the most suitable algorithm tailored to the problem at hand. The concept of SC-SAHEL is to execute different EAs as separate parallel search cores, and let all participating EAs to compete during the course of the search. The newly developed SC-SAHEL algorithm is designed to automatically select, the best performing algorithm for the given optimization problem. This algorithm is rigorously effective in finding the global optimum for several strenuous benchmark test functions, and computationally efficient as compared to individual EAs. We benchmark the proposed SC-SAHEL algorithm over 29 conceptual test functions, and two real-world case studies - one hydropower reservoir model and one hydrological model (SAC-SMA). Results show that the proposed framework outperforms individual EAs in an absolute majority of the test problems, and can provide competitive results to the fittest EA algorithm with more comprehensive information during the search. The proposed framework is also flexible for merging additional EAs, boundary-handling techniques, and sampling schemes, and has good potential to be used in Water-Energy system optimal operation and management.
IPRT polarized radiative transfer model intercomparison project - Phase A

NASA Astrophysics Data System (ADS)

Emde, Claudia; Barlakas, Vasileios; Cornet, Céline; Evans, Frank; Korkin, Sergey; Ota, Yoshifumi; Labonnote, Laurent C.; Lyapustin, Alexei; Macke, Andreas; Mayer, Bernhard; Wendisch, Manfred

2015-10-01

The polarization state of electromagnetic radiation scattered by atmospheric particles such as aerosols, cloud droplets, or ice crystals contains much more information about the optical and microphysical properties than the total intensity alone. For this reason an increasing number of polarimetric observations are performed from space, from the ground and from aircraft. Polarized radiative transfer models are required to interpret and analyse these measurements and to develop retrieval algorithms exploiting polarimetric observations. In the last years a large number of new codes have been developed, mostly for specific applications. Benchmark results are available for specific cases, but not for more sophisticated scenarios including polarized surface reflection and multi-layer atmospheres. The International Polarized Radiative Transfer (IPRT) working group of the International Radiation Commission (IRC) has initiated a model intercomparison project in order to fill this gap. This paper presents the results of the first phase A of the IPRT project which includes ten test cases, from simple setups with only one layer and Rayleigh scattering to rather sophisticated setups with a cloud embedded in a standard atmosphere above an ocean surface. All scenarios in the first phase A of the intercomparison project are for a one-dimensional plane-parallel model geometry. The commonly established benchmark results are available at the IPRT website.
Status of BOUT fluid turbulence code: improvements and verification

NASA Astrophysics Data System (ADS)

Umansky, M. V.; Lodestro, L. L.; Xu, X. Q.

2006-10-01

BOUT is an electromagnetic fluid turbulence code for tokamak edge plasma [1]. BOUT performs time integration of reduced Braginskii plasma fluid equations, using spatial discretization in realistic geometry and employing a standard ODE integration package PVODE. BOUT has been applied to several tokamak experiments and in some cases calculated spectra of turbulent fluctuations compared favorably to experimental data. On the other hand, the desire to understand better the code results and to gain more confidence in it motivated investing effort in rigorous verification of BOUT. Parallel to the testing the code underwent substantial modification, mainly to improve its readability and tractability of physical terms, with some algorithmic improvements as well. In the verification process, a series of linear and nonlinear test problems was applied to BOUT, targeting different subgroups of physical terms. The tests include reproducing basic electrostatic and electromagnetic plasma modes in simplified geometry, axisymmetric benchmarks against the 2D edge code UEDGE in real divertor geometry, and neutral fluid benchmarks against the hydrodynamic code LCPFCT. After completion of the testing, the new version of the code is being applied to actual tokamak edge turbulence problems, and the results will be presented. [1] X. Q. Xu et al., Contr. Plas. Phys., 36,158 (1998). *Work performed for USDOE by Univ. Calif. LLNL under contract W-7405-ENG-48.
Data assimilation and prognostic whole ice sheet modelling with the variationally derived, higher order, open source, and fully parallel ice sheet model VarGlaS

NASA Astrophysics Data System (ADS)

Brinkerhoff, D. J.; Johnson, J. V.

2013-07-01

We introduce a novel, higher order, finite element ice sheet model called VarGlaS (Variational Glacier Simulator), which is built on the finite element framework FEniCS. Contrary to standard procedure in ice sheet modelling, VarGlaS formulates ice sheet motion as the minimization of an energy functional, conferring advantages such as a consistent platform for making numerical approximations, a coherent relationship between motion and heat generation, and implicit boundary treatment. VarGlaS also solves the equations of enthalpy rather than temperature, avoiding the solution of a contact problem. Rather than include a lengthy model spin-up procedure, VarGlaS possesses an automated framework for model inversion. These capabilities are brought to bear on several benchmark problems in ice sheet modelling, as well as a 500 yr simulation of the Greenland ice sheet at high resolution. VarGlaS performs well in benchmarking experiments and, given a constant climate and a 100 yr relaxation period, predicts a mass evolution of the Greenland ice sheet that matches present-day observations of mass loss. VarGlaS predicts a thinning in the interior and thickening of the margins of the ice sheet.

A hybrid interface tracking - level set technique for multiphase flow with soluble surfactant

NASA Astrophysics Data System (ADS)

Shin, Seungwon; Chergui, Jalel; Juric, Damir; Kahouadji, Lyes; Matar, Omar K.; Craster, Richard V.

2018-04-01

A formulation for soluble surfactant transport in multiphase flows recently presented by Muradoglu and Tryggvason (JCP 274 (2014) 737-757) [17] is adapted to the context of the Level Contour Reconstruction Method, LCRM, (Shin et al. IJNMF 60 (2009) 753-778, [8]) which is a hybrid method that combines the advantages of the Front-tracking and Level Set methods. Particularly close attention is paid to the formulation and numerical implementation of the surface gradients of surfactant concentration and surface tension. Various benchmark tests are performed to demonstrate the accuracy of different elements of the algorithm. To verify surfactant mass conservation, values for surfactant diffusion along the interface are compared with the exact solution for the problem of uniform expansion of a sphere. The numerical implementation of the discontinuous boundary condition for the source term in the bulk concentration is compared with the approximate solution. Surface tension forces are tested for Marangoni drop translation. Our numerical results for drop deformation in simple shear are compared with experiments and results from previous simulations. All benchmarking tests compare well with existing data thus providing confidence that the adapted LCRM formulation for surfactant advection and diffusion is accurate and effective in three-dimensional multiphase flows with a structured mesh. We also demonstrate that this approach applies easily to massively parallel simulations.
First Applications of the New Parallel Krylov Solver for MODFLOW on a National and Global Scale

NASA Astrophysics Data System (ADS)

Verkaik, J.; Hughes, J. D.; Sutanudjaja, E.; van Walsum, P.

2016-12-01

Integrated high-resolution hydrologic models are increasingly being used for evaluating water management measures at field scale. Their drawbacks are large memory requirements and long run times. Examples of such models are The Netherlands Hydrological Instrument (NHI) model and the PCRaster Global Water Balance (PCR-GLOBWB) model. Typical simulation periods are 30-100 years with daily timesteps. The NHI model predicts water demands in periods of drought, supporting operational and long-term water-supply decisions. The NHI is a state-of-the-art coupling of several models: a 7-layer MODFLOW groundwater model ( 6.5M 250m cells), a MetaSWAP model for the unsaturated zone (Richards emulator of 0.5M cells), and a surface water model (MOZART-DM). The PCR-GLOBWB model provides a grid-based representation of global terrestrial hydrology and this work uses the version that includes a 2-layer MODFLOW groundwater model ( 4.5M 10km cells). The Parallel Krylov Solver (PKS) speeds up computation by both distributed memory parallelization (Message Passing Interface) and shared memory parallelization (Open Multi-Processing). PKS includes conjugate gradient, bi-conjugate gradient stabilized, and generalized minimal residual linear accelerators that use an overlapping additive Schwarz domain decomposition preconditioner. PKS can be used for both structured and unstructured grids and has been fully integrated in MODFLOW-USG using METIS partitioning and in iMODFLOW using RCB partitioning. iMODFLOW is an accelerated version of MODFLOW-2005 that is implicitly and online coupled to MetaSWAP. Results for benchmarks carried out on the Cartesius Dutch supercomputer (https://userinfo.surfsara.nl/systems/cartesius) for the PCRGLOB-WB model and on a 2x16 core Windows machine for the NHI model show speedups up to 10-20 and 5-10, respectively.
Visualization of Octree Adaptive Mesh Refinement (AMR) in Astrophysical Simulations

NASA Astrophysics Data System (ADS)

Labadens, M.; Chapon, D.; Pomaréde, D.; Teyssier, R.

2012-09-01

Computer simulations are important in current cosmological research. Those simulations run in parallel on thousands of processors, and produce huge amount of data. Adaptive mesh refinement is used to reduce the computing cost while keeping good numerical accuracy in regions of interest. RAMSES is a cosmological code developed by the Commissariat à l'énergie atomique et aux énergies alternatives (English: Atomic Energy and Alternative Energies Commission) which uses Octree adaptive mesh refinement. Compared to grid based AMR, the Octree AMR has the advantage to fit very precisely the adaptive resolution of the grid to the local problem complexity. However, this specific octree data type need some specific software to be visualized, as generic visualization tools works on Cartesian grid data type. This is why the PYMSES software has been also developed by our team. It relies on the python scripting language to ensure a modular and easy access to explore those specific data. In order to take advantage of the High Performance Computer which runs the RAMSES simulation, it also uses MPI and multiprocessing to run some parallel code. We would like to present with more details our PYMSES software with some performance benchmarks. PYMSES has currently two visualization techniques which work directly on the AMR. The first one is a splatting technique, and the second one is a custom ray tracing technique. Both have their own advantages and drawbacks. We have also compared two parallel programming techniques with the python multiprocessing library versus the use of MPI run. The load balancing strategy has to be smartly defined in order to achieve a good speed up in our computation. Results obtained with this software are illustrated in the context of a massive, 9000-processor parallel simulation of a Milky Way-like galaxy.
Scalable direct Vlasov solver with discontinuous Galerkin method on unstructured mesh.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, J.; Ostroumov, P. N.; Mustapha, B.

2010-12-01

This paper presents the development of parallel direct Vlasov solvers with discontinuous Galerkin (DG) method for beam and plasma simulations in four dimensions. Both physical and velocity spaces are in two dimesions (2P2V) with unstructured mesh. Contrary to the standard particle-in-cell (PIC) approach for kinetic space plasma simulations, i.e., solving Vlasov-Maxwell equations, direct method has been used in this paper. There are several benefits to solving a Vlasov equation directly, such as avoiding noise associated with a finite number of particles and the capability to capture fine structure in the plasma. The most challanging part of a direct Vlasov solvermore » comes from higher dimensions, as the computational cost increases as N{sup 2d}, where d is the dimension of the physical space. Recently, due to the fast development of supercomputers, the possibility has become more realistic. Many efforts have been made to solve Vlasov equations in low dimensions before; now more interest has focused on higher dimensions. Different numerical methods have been tried so far, such as the finite difference method, Fourier Spectral method, finite volume method, and spectral element method. This paper is based on our previous efforts to use the DG method. The DG method has been proven to be very successful in solving Maxwell equations, and this paper is our first effort in applying the DG method to Vlasov equations. DG has shown several advantages, such as local mass matrix, strong stability, and easy parallelization. These are particularly suitable for Vlasov equations. Domain decomposition in high dimensions has been used for parallelization; these include a highly scalable parallel two-dimensional Poisson solver. Benchmark results have been shown and simulation results will be reported.« less
Real-time processing of radar return on a parallel computer

NASA Technical Reports Server (NTRS)

Aalfs, David D.

1992-01-01

NASA is working with the FAA to demonstrate the feasibility of pulse Doppler radar as a candidate airborne sensor to detect low altitude windshears. The need to provide the pilot with timely information about possible hazards has motivated a demand for real-time processing of a radar return. Investigated here is parallel processing as a means of accommodating the high data rates required. A PC based parallel computer, called the transputer, is used to investigate issues in real time concurrent processing of radar signals. A transputer network is made up of an array of single instruction stream processors that can be networked in a variety of ways. They are easily reconfigured and software development is largely independent of the particular network topology. The performance of the transputer is evaluated in light of the computational requirements. A number of algorithms have been implemented on the transputers in OCCAM, a language specially designed for parallel processing. These include signal processing algorithms such as the Fast Fourier Transform (FFT), pulse-pair, and autoregressive modelling, as well as routing software to support concurrency. The most computationally intensive task is estimating the spectrum. Two approaches have been taken on this problem, the first and most conventional of which is to use the FFT. By using table look-ups for the basis function and other optimizing techniques, an algorithm has been developed that is sufficient for real time. The other approach is to model the signal as an autoregressive process and estimate the spectrum based on the model coefficients. This technique is attractive because it does not suffer from the spectral leakage problem inherent in the FFT. Benchmark tests indicate that autoregressive modeling is feasible in real time.
Automatic data partitioning on distributed memory multicomputers. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Gupta, Manish

1992-01-01

Distributed-memory parallel computers are increasingly being used to provide high levels of performance for scientific applications. Unfortunately, such machines are not very easy to program. A number of research efforts seek to alleviate this problem by developing compilers that take over the task of generating communication. The communication overheads and the extent of parallelism exploited in the resulting target program are determined largely by the manner in which data is partitioned across different processors of the machine. Most of the compilers provide no assistance to the programmer in the crucial task of determining a good data partitioning scheme. A novel approach is presented, the constraints-based approach, to the problem of automatic data partitioning for numeric programs. In this approach, the compiler identifies some desirable requirements on the distribution of various arrays being referenced in each statement, based on performance considerations. These desirable requirements are referred to as constraints. For each constraint, the compiler determines a quality measure that captures its importance with respect to the performance of the program. The quality measure is obtained through static performance estimation, without actually generating the target data-parallel program with explicit communication. Each data distribution decision is taken by combining all the relevant constraints. The compiler attempts to resolve any conflicts between constraints such that the overall execution time of the parallel program is minimized. This approach has been implemented as part of a compiler called Paradigm, that accepts Fortran 77 programs, and specifies the partitioning scheme to be used for each array in the program. We have obtained results on some programs taken from the Linpack and Eispack libraries, and the Perfect Benchmarks. These results are quite promising, and demonstrate the feasibility of automatic data partitioning for a significant class of scientific application programs with regular computations.
A new code SORD for simulation of polarized light scattering in the Earth atmosphere

NASA Astrophysics Data System (ADS)

Korkin, Sergey; Lyapustin, Alexei; Sinyuk, Aliaksandr; Holben, Brent

2016-05-01

We report a new publicly available radiative transfer (RT) code for numerical simulation of polarized light scattering in plane-parallel Earth atmosphere. Using 44 benchmark tests, we prove high accuracy of the new RT code, SORD (Successive ORDers of scattering1, 2). We describe capabilities of SORD and show run time for each test on two different machines. At present, SORD is supposed to work as part of the Aerosol Robotic NETwork3 (AERONET) inversion algorithm. For natural integration with the AERONET software, SORD is coded in Fortran 90/95. The code is available by email request from the corresponding (first) author or from ftp://climate1.gsfc.nasa.gov/skorkin/SORD/ or ftp://maiac.gsfc.nasa.gov/pub/SORD.zip
Katome: de novo DNA assembler implemented in rust

NASA Astrophysics Data System (ADS)

Neumann, Łukasz; Nowak, Robert M.; Kuśmirek, Wiktor

2017-08-01

Katome is a new de novo sequence assembler written in the Rust programming language, designed with respect to future parallelization of the algorithms, run time and memory usage optimization. The application uses new algorithms for the correct assembly of repetitive sequences. Performance and quality tests were performed on various data, comparing the new application to `dnaasm', `ABySS' and `Velvet' genome assemblers. Quality tests indicate that the new assembler creates more contigs than well-established solutions, but the contigs have better quality with regard to mismatches per 100kbp and indels per 100kbp. Additionally, benchmarks indicate that the Rust-based implementation outperforms `dnaasm', `ABySS' and `Velvet' assemblers, written in C++, in terms of assembly time. Lower memory usage in comparison to `dnaasm' is observed.
Verification of TEMPEST with neoclassical transport theory

NASA Astrophysics Data System (ADS)

Xiong, Z.; Cohen, B. I.; Cohen, R. H.; Dorr, M.; Hittinger, J.; Kerbel, G.; Nevins, W. M.; Rognlien, T.; Umansky, M.; Xu, X.

2006-10-01

TEMPEST is an edge gyro-kinetic continuum code developed to study boundary plasma transport over the region extending from the H-mode pedestal across the separatrix to the divertor plates. For benchmark purposes, we present results from the 4D (2r,2v) TEMPEST for both steady-state transport and time-dependent Geodesic Acoustic Modes (GAMs). We focus on an annular region inside the separatrix of a circular cross-section tokamak where analytical and numerical results are available. The parallel flow velocity and radial particle flux are obtained for different collisional regimes and compared with previous neoclassical results. The effect of radial electric field and the transition to steep edge gradients is emphasized. The dynamical response of GAMs is also shown and compared to recent theory.
Genetically improved BarraCUDA.

PubMed

Langdon, W B; Lam, Brian Yee Hong

2017-01-01

BarraCUDA is an open source C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. Recently its source code was optimised using "Genetic Improvement". The genetically improved (GI) code is up to three times faster on short paired end reads from The 1000 Genomes Project and 60% more accurate on a short BioPlanet.com GCAT alignment benchmark. GPGPU BarraCUDA running on a single K80 Tesla GPU can align short paired end nextGen sequences up to ten times faster than bwa on a 12 core server. The speed up was such that the GI version was adopted and has been regularly downloaded from SourceForge for more than 12 months.
GRADSPMHD: A parallel MHD code based on the SPH formalism

NASA Astrophysics Data System (ADS)

Vanaverbeke, S.; Keppens, R.; Poedts, S.

2014-03-01

We present GRADSPMHD, a completely Lagrangian parallel magnetohydrodynamics code based on the SPH formalism. The implementation of the equations of SPMHD in the “GRAD-h” formalism assembles known results, including the derivation of the discretized MHD equations from a variational principle, the inclusion of time-dependent artificial viscosity, resistivity and conductivity terms, as well as the inclusion of a mixed hyperbolic/parabolic correction scheme for satisfying the ∇ṡB→ constraint on the magnetic field. The code uses a tree-based formalism for neighbor finding and can optionally use the tree code for computing the self-gravity of the plasma. The structure of the code closely follows the framework of our parallel GRADSPH FORTRAN 90 code which we added previously to the CPC program library. We demonstrate the capabilities of GRADSPMHD by running 1, 2, and 3 dimensional standard benchmark tests and we find good agreement with previous work done by other researchers. The code is also applied to the problem of simulating the magnetorotational instability in 2.5D shearing box tests as well as in global simulations of magnetized accretion disks. We find good agreement with available results on this subject in the literature. Finally, we discuss the performance of the code on a parallel supercomputer with distributed memory architecture. Catalogue identifier: AERP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERP_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 620503 No. of bytes in distributed program, including test data, etc.: 19837671 Distribution format: tar.gz Programming language: FORTRAN 90/MPI. Computer: HPC cluster. Operating system: Unix. Has the code been vectorized or parallelized?: Yes, parallelized using MPI. RAM: ˜30 MB for a Sedov test including 15625 particles on a single CPU. Classification: 12. Nature of problem: Evolution of a plasma in the ideal MHD approximation. Solution method: The equations of magnetohydrodynamics are solved using the SPH method. Running time: The test provided takes approximately 20 min using 4 processors.
Improved packing of protein side chains with parallel ant colonies

PubMed Central

2014-01-01

Introduction The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. Methods We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. Results We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. Conclusions This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms. PMID:25474164
Parallelization Issues and Particle-In Codes.

NASA Astrophysics Data System (ADS)

Elster, Anne Cathrine

1994-01-01

"Everything should be made as simple as possible, but not simpler." Albert Einstein. The field of parallel scientific computing has concentrated on parallelization of individual modules such as matrix solvers and factorizers. However, many applications involve several interacting modules. Our analyses of a particle-in-cell code modeling charged particles in an electric field, show that these accompanying dependencies affect data partitioning and lead to new parallelization strategies concerning processor, memory and cache utilization. Our test-bed, a KSR1, is a distributed memory machine with a globally shared addressing space. However, most of the new methods presented hold generally for hierarchical and/or distributed memory systems. We introduce a novel approach that uses dual pointers on the local particle arrays to keep the particle locations automatically partially sorted. Complexity and performance analyses with accompanying KSR benchmarks, have been included for both this scheme and for the traditional replicated grids approach. The latter approach maintains load-balance with respect to particles. However, our results demonstrate it fails to scale properly for problems with large grids (say, greater than 128-by-128) running on as few as 15 KSR nodes, since the extra storage and computation time associated with adding the grid copies, becomes significant. Our grid partitioning scheme, although harder to implement, does not need to replicate the whole grid. Consequently, it scales well for large problems on highly parallel systems. It may, however, require load balancing schemes for non-uniform particle distributions. Our dual pointer approach may facilitate this through dynamically partitioned grids. We also introduce hierarchical data structures that store neighboring grid-points within the same cache -line by reordering the grid indexing. This alignment produces a 25% savings in cache-hits for a 4-by-4 cache. A consideration of the input data's effect on the simulation may lead to further improvements. For example, in the case of mean particle drift, it is often advantageous to partition the grid primarily along the direction of the drift. The particle-in-cell codes for this study were tested using physical parameters, which lead to predictable phenomena including plasma oscillations and two-stream instabilities. An overview of the most central references related to parallel particle codes is also given.
Assessing 1D Atmospheric Solar Radiative Transfer Models: Interpretation and Handling of Unresolved Clouds.

NASA Astrophysics Data System (ADS)

Barker, H. W.; Stephens, G. L.; Partain, P. T.; Bergman, J. W.; Bonnel, B.; Campana, K.; Clothiaux, E. E.; Clough, S.; Cusack, S.; Delamere, J.; Edwards, J.; Evans, K. F.; Fouquart, Y.; Freidenreich, S.; Galin, V.; Hou, Y.; Kato, S.; Li, J.; Mlawer, E.; Morcrette, J.-J.; O'Hirok, W.; Räisänen, P.; Ramaswamy, V.; Ritter, B.; Rozanov, E.; Schlesinger, M.; Shibata, K.; Sporyshev, P.; Sun, Z.; Wendisch, M.; Wood, N.; Yang, F.

2003-08-01

The primary purpose of this study is to assess the performance of 1D solar radiative transfer codes that are used currently both for research and in weather and climate models. Emphasis is on interpretation and handling of unresolved clouds. Answers are sought to the following questions: (i) How well do 1D solar codes interpret and handle columns of information pertaining to partly cloudy atmospheres? (ii) Regardless of the adequacy of their assumptions about unresolved clouds, do 1D solar codes perform as intended?One clear-sky and two plane-parallel, homogeneous (PPH) overcast cloud cases serve to elucidate 1D model differences due to varying treatments of gaseous transmittances, cloud optical properties, and basic radiative transfer. The remaining four cases involve 3D distributions of cloud water and water vapor as simulated by cloud-resolving models. Results for 25 1D codes, which included two line-by-line (LBL) models (clear and overcast only) and four 3D Monte Carlo (MC) photon transport algorithms, were submitted by 22 groups. Benchmark, domain-averaged irradiance profiles were computed by the MC codes. For the clear and overcast cases, all MC estimates of top-of-atmosphere albedo, atmospheric absorptance, and surface absorptance agree with one of the LBL codes to within ±2%. Most 1D codes underestimate atmospheric absorptance by typically 15-25 W m-2 at overhead sun for the standard tropical atmosphere regardless of clouds.Depending on assumptions about unresolved clouds, the 1D codes were partitioned into four genres: (i) horizontal variability, (ii) exact overlap of PPH clouds, (iii) maximum/random overlap of PPH clouds, and (iv) random overlap of PPH clouds. A single MC code was used to establish conditional benchmarks applicable to each genre, and all MC codes were used to establish the full 3D benchmarks. There is a tendency for 1D codes to cluster near their respective conditional benchmarks, though intragenre variances typically exceed those for the clear and overcast cases. The majority of 1D codes fall into the extreme category of maximum/random overlap of PPH clouds and thus generally disagree with full 3D benchmark values. Given the fairly limited scope of these tests and the inability of any one code to perform extremely well for all cases begs the question that a paradigm shift is due for modeling 1D solar fluxes for cloudy atmospheres.
Power source selection for neutral particle beam systems

NASA Astrophysics Data System (ADS)

Silverman, Sidney W.; Chi, John W. H.; Hill, Gregory

Space based neutral particle beams (NPB) are being considered for use as an SDI weapon as well as a mid-course discriminator. These systems require a radio frequency (RF) power source. Five types of amplifiers were considered for the RF power source: the klystron, the klystrode, the tetrode, the cross field amplifier, and the solid state amplifier. A number of different types of power source systems (nuclear and non-nuclear) were considered for integration with these amplifiers. The most attractive amplifier power system concepts were identified through comparative evaluations that took into account the total masses of integrated amplifier power source systems as well as a number of other factors that consisted of development cost, technology risk, vulnerability, survivability, reliability, and impacts on spacecraft stabilization. These concepts are described and conclusions drawn.
Molecular hyperfine fields in organic magnetoresistance devices

NASA Astrophysics Data System (ADS)

Giro, Ronaldo; Rosselli, Flávia P.; dos Santos Carvalho, Rafael; Capaz, Rodrigo B.; Cremona, Marco; Achete, Carlos A.

2013-03-01

We calculate molecular hyperfine fields in organic magnetoresistance (OMAR) devices using ab initio calculations. To do so, we establish a protocol for the accurate determination of the average hyperfine field Bhf and apply it to selected molecular ions: NPB, TPD, and Alq3. Then, we make devices with precisely the same molecules and perform measurements of the OMAR effect, in order to address the role of hole-transport layer in the characteristic magnetic field B0 of OMAR. Contrary to common belief, we find that molecular hyperfine fields are not only caused by hydrogen nuclei. We also find that dipolar contributions to the hyperfine fields can be comparable to the Fermi contact contributions. However, such contributions are restricted to nuclei located in the same molecular ion as the charge carrier (intramolecular), as extramolecular contributions are negligible.
Merging parallel tempering with sequential geostatistical resampling for improved posterior exploration of high-dimensional subsurface categorical fields

NASA Astrophysics Data System (ADS)

Laloy, Eric; Linde, Niklas; Jacques, Diederik; Mariethoz, Grégoire

2016-04-01

The sequential geostatistical resampling (SGR) algorithm is a Markov chain Monte Carlo (MCMC) scheme for sampling from possibly non-Gaussian, complex spatially-distributed prior models such as geologic facies or categorical fields. In this work, we highlight the limits of standard SGR for posterior inference of high-dimensional categorical fields with realistically complex likelihood landscapes and benchmark a parallel tempering implementation (PT-SGR). Our proposed PT-SGR approach is demonstrated using synthetic (error corrupted) data from steady-state flow and transport experiments in categorical 7575- and 10,000-dimensional 2D conductivity fields. In both case studies, every SGR trial gets trapped in a local optima while PT-SGR maintains an higher diversity in the sampled model states. The advantage of PT-SGR is most apparent in an inverse transport problem where the posterior distribution is made bimodal by construction. PT-SGR then converges towards the appropriate data misfit much faster than SGR and partly recovers the two modes. In contrast, for the same computational resources SGR does not fit the data to the appropriate error level and hardly produces a locally optimal solution that looks visually similar to one of the two reference modes. Although PT-SGR clearly surpasses SGR in performance, our results also indicate that using a small number (16-24) of temperatures (and thus parallel cores) may not permit complete sampling of the posterior distribution by PT-SGR within a reasonable computational time (less than 1-2 weeks).
Exploiting Parallel R in the Cloud with SPRINT

PubMed Central

Piotrowski, M.; McGilvary, G.A.; Sloan, T. M.; Mewissen, M.; Lloyd, A.D.; Forster, T.; Mitchell, L.; Ghazal, P.; Hill, J.

2012-01-01

Background Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need. Objectives Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon’s Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. Methods The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences. Results It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of algorithm. Resource underutilization can further improve the time to result. End-user’s location impacts on costs due to factors such as local taxation. Conclusions: Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds. PMID:23223611
Massively Parallel Processing for Fast and Accurate Stamping Simulations

NASA Astrophysics Data System (ADS)

Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu

2005-08-01

The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.
Efficiently modeling neural networks on massively parallel computers

NASA Technical Reports Server (NTRS)

Farber, Robert M.

1993-01-01

Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.

Exploiting parallel R in the cloud with SPRINT.

PubMed

Piotrowski, M; McGilvary, G A; Sloan, T M; Mewissen, M; Lloyd, A D; Forster, T; Mitchell, L; Ghazal, P; Hill, J

2013-01-01

Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need. Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon's Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences. It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of the algorithm. Resource underutilization can further improve the time to result. End-user's location impacts on costs due to factors such as local taxation. Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds.
Performance Improvements of the CYCOFOS Flow Model

NASA Astrophysics Data System (ADS)

Radhakrishnan, Hari; Moulitsas, Irene; Syrakos, Alexandros; Zodiatis, George; Nikolaides, Andreas; Hayes, Daniel; Georgiou, Georgios C.

2013-04-01

The CYCOFOS-Cyprus Coastal Ocean Forecasting and Observing System has been operational since early 2002, providing daily sea current, temperature, salinity and sea level forecasting data for the next 4 and 10 days to end-users in the Levantine Basin, necessary for operational application in marine safety, particularly concerning oil spills and floating objects predictions. CYCOFOS flow model, similar to most of the coastal and sub-regional operational hydrodynamic forecasting systems of the MONGOOS-Mediterranean Oceanographic Network for Global Ocean Observing System is based on the POM-Princeton Ocean Model. CYCOFOS is nested with the MyOcean Mediterranean regional forecasting data and with SKIRON and ECMWF for surface forcing. The increasing demand for higher and higher resolution data to meet coastal and offshore downstream applications motivated the parallelization of the CYCOFOS POM model. This development was carried out in the frame of the IPcycofos project, funded by the Cyprus Research Promotion Foundation. The parallel processing provides a viable solution to satisfy these demands without sacrificing accuracy or omitting any physical phenomena. Prior to IPcycofos project, there are been several attempts to parallelise the POM, as for example the MP-POM. The existing parallel code models rely on the use of specific outdated hardware architectures and associated software. The objective of the IPcycofos project is to produce an operational parallel version of the CYCOFOS POM code that can replicate the results of the serial version of the POM code used in CYCOFOS. The parallelization of the CYCOFOS POM model use Message Passing Interface-MPI, implemented on commodity computing clusters running open source software and not depending on any specialized vendor hardware. The parallel CYCOFOS POM code constructed in a modular fashion, allowing a fast re-locatable downscaled implementation. The MPI takes advantage of the Cartesian nature of the POM mesh, and use the built-in functionality of MPI routines to split the mesh, using a weighting scheme, along longitude and latitude among the processors. Each server processor work on the model based on domain decomposition techniques. The new parallel CYCOFOS POM code has been benchmarked against the serial POM version of CYCOFOS for speed, accuracy, and resolution and the results are more than satisfactory. With a higher resolution CYCOFOS Levantine model domain the forecasts need much less time than the serial CYCOFOS POM coarser version, both with identical accuracy.
Viriato: a Fourier-Hermite spectral code for strongly magnetised fluid-kinetic plasma dynamics

NASA Astrophysics Data System (ADS)

Loureiro, Nuno; Dorland, William; Fazendeiro, Luis; Kanekar, Anjor; Mallet, Alfred; Zocco, Alessandro

2015-11-01

We report on the algorithms and numerical methods used in Viriato, a novel fluid-kinetic code that solves two distinct sets of equations: (i) the Kinetic Reduced Electron Heating Model equations [Zocco & Schekochihin, 2011] and (ii) the kinetic reduced MHD (KRMHD) equations [Schekochihin et al., 2009]. Two main applications of these equations are magnetised (Alfvnénic) plasma turbulence and magnetic reconnection. Viriato uses operator splitting to separate the dynamics parallel and perpendicular to the ambient magnetic field (assumed strong). Along the magnetic field, Viriato allows for either a second-order accurate MacCormack method or, for higher accuracy, a spectral-like scheme. Perpendicular to the field Viriato is pseudo-spectral, and the time integration is performed by means of an iterative predictor-corrector scheme. In addition, a distinctive feature of Viriato is its spectral representation of the parallel velocity-space dependence, achieved by means of a Hermite representation of the perturbed distribution function. A series of linear and nonlinear benchmarks and tests are presented, with focus on 3D decaying kinetic turbulence. Work partially supported by Fundação para a Ciência e Tecnologia via Grants UID/FIS/50010/2013 and IF/00530/2013.
High performance cellular level agent-based simulation with FLAME for the GPU.

PubMed

Richmond, Paul; Walker, Dawn; Coakley, Simon; Romano, Daniela

2010-05-01

Driven by the availability of experimental data and ability to simulate a biological scale which is of immediate interest, the cellular scale is fast emerging as an ideal candidate for middle-out modelling. As with 'bottom-up' simulation approaches, cellular level simulations demand a high degree of computational power, which in large-scale simulations can only be achieved through parallel computing. The flexible large-scale agent modelling environment (FLAME) is a template driven framework for agent-based modelling (ABM) on parallel architectures ideally suited to the simulation of cellular systems. It is available for both high performance computing clusters (www.flame.ac.uk) and GPU hardware (www.flamegpu.com) and uses a formal specification technique that acts as a universal modelling format. This not only creates an abstraction from the underlying hardware architectures, but avoids the steep learning curve associated with programming them. In benchmarking tests and simulations of advanced cellular systems, FLAME GPU has reported massive improvement in performance over more traditional ABM frameworks. This allows the time spent in the development and testing stages of modelling to be drastically reduced and creates the possibility of real-time visualisation for simple visual face-validation.
Fast MPEG-CDVS Encoder With GPU-CPU Hybrid Computing

NASA Astrophysics Data System (ADS)

Duan, Ling-Yu; Sun, Wei; Zhang, Xinfeng; Wang, Shiqi; Chen, Jie; Yin, Jianxiong; See, Simon; Huang, Tiejun; Kot, Alex C.; Gao, Wen

2018-05-01

The compact descriptors for visual search (CDVS) standard from ISO/IEC moving pictures experts group (MPEG) has succeeded in enabling the interoperability for efficient and effective image retrieval by standardizing the bitstream syntax of compact feature descriptors. However, the intensive computation of CDVS encoder unfortunately hinders its widely deployment in industry for large-scale visual search. In this paper, we revisit the merits of low complexity design of CDVS core techniques and present a very fast CDVS encoder by leveraging the massive parallel execution resources of GPU. We elegantly shift the computation-intensive and parallel-friendly modules to the state-of-the-arts GPU platforms, in which the thread block allocation and the memory access are jointly optimized to eliminate performance loss. In addition, those operations with heavy data dependence are allocated to CPU to resolve the extra but non-necessary computation burden for GPU. Furthermore, we have demonstrated the proposed fast CDVS encoder can work well with those convolution neural network approaches which has harmoniously leveraged the advantages of GPU platforms, and yielded significant performance improvements. Comprehensive experimental results over benchmarks are evaluated, which has shown that the fast CDVS encoder using GPU-CPU hybrid computing is promising for scalable visual search.
Acoustic 3D modeling by the method of integral equations

NASA Astrophysics Data System (ADS)

Malovichko, M.; Khokhlov, N.; Yavich, N.; Zhdanov, M.

2018-02-01

This paper presents a parallel algorithm for frequency-domain acoustic modeling by the method of integral equations (IE). The algorithm is applied to seismic simulation. The IE method reduces the size of the problem but leads to a dense system matrix. A tolerable memory consumption and numerical complexity were achieved by applying an iterative solver, accompanied by an effective matrix-vector multiplication operation, based on the fast Fourier transform (FFT). We demonstrate that, the IE system matrix is better conditioned than that of the finite-difference (FD) method, and discuss its relation to a specially preconditioned FD matrix. We considered several methods of matrix-vector multiplication for the free-space and layered host models. The developed algorithm and computer code were benchmarked against the FD time-domain solution. It was demonstrated that, the method could accurately calculate the seismic field for the models with sharp material boundaries and a point source and receiver located close to the free surface. We used OpenMP to speed up the matrix-vector multiplication, while MPI was used to speed up the solution of the system equations, and also for parallelizing across multiple sources. The practical examples and efficiency tests are presented as well.
Validating the simulation of large-scale parallel applications using statistical characteristics

DOE PAGES

Zhang, Deli; Wilke, Jeremiah; Hendry, Gilbert; ...

2016-03-01

Simulation is a widely adopted method to analyze and predict the performance of large-scale parallel applications. Validating the hardware model is highly important for complex simulations with a large number of parameters. Common practice involves calculating the percent error between the projected and the real execution time of a benchmark program. However, in a high-dimensional parameter space, this coarse-grained approach often suffers from parameter insensitivity, which may not be known a priori. Moreover, the traditional approach cannot be applied to the validation of software models, such as application skeletons used in online simulations. In this work, we present a methodologymore » and a toolset for validating both hardware and software models by quantitatively comparing fine-grained statistical characteristics obtained from execution traces. Although statistical information has been used in tasks like performance optimization, this is the first attempt to apply it to simulation validation. Lastly, our experimental results show that the proposed evaluation approach offers significant improvement in fidelity when compared to evaluation using total execution time, and the proposed metrics serve as reliable criteria that progress toward automating the simulation tuning process.« less
Multiobjective Multifactorial Optimization in Evolutionary Multitasking.

PubMed

Gupta, Abhishek; Ong, Yew-Soon; Feng, Liang; Tan, Kay Chen

2016-05-03

In recent decades, the field of multiobjective optimization has attracted considerable interest among evolutionary computation researchers. One of the main features that makes evolutionary methods particularly appealing for multiobjective problems is the implicit parallelism offered by a population, which enables simultaneous convergence toward the entire Pareto front. While a plethora of related algorithms have been proposed till date, a common attribute among them is that they focus on efficiently solving only a single optimization problem at a time. Despite the known power of implicit parallelism, seldom has an attempt been made to multitask, i.e., to solve multiple optimization problems simultaneously. It is contended that the notion of evolutionary multitasking leads to the possibility of automated transfer of information across different optimization exercises that may share underlying similarities, thereby facilitating improved convergence characteristics. In particular, the potential for automated transfer is deemed invaluable from the standpoint of engineering design exercises where manual knowledge adaptation and reuse are routine. Accordingly, in this paper, we present a realization of the evolutionary multitasking paradigm within the domain of multiobjective optimization. The efficacy of the associated evolutionary algorithm is demonstrated on some benchmark test functions as well as on a real-world manufacturing process design problem from the composites industry.
Streaming data analytics via message passing with application to graph algorithms

DOE PAGES

Plimpton, Steven J.; Shead, Tim

2014-05-06

The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of eithermore » message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.« less
Benchmarking reference services: step by step.

PubMed

Buchanan, H S; Marshall, J G

1996-01-01

This article is a companion to an introductory article on benchmarking published in an earlier issue of Medical Reference Services Quarterly. Librarians interested in benchmarking often ask the following questions: How do I determine what to benchmark; how do I form a benchmarking team; how do I identify benchmarking partners; what's the best way to collect and analyze benchmarking information; and what will I do with the data? Careful planning is a critical success factor of any benchmarking project, and these questions must be answered before embarking on a benchmarking study. This article summarizes the steps necessary to conduct benchmarking research. Relevant examples of each benchmarking step are provided.
Identification and analytical characterization of six synthetic cannabinoids NNL-3, 5F-NPB-22-7N, 5F-AKB-48-7N, 5F-EDMB-PINACA, EMB-FUBINACA, and EG-018.

PubMed

Liu, Cuimei; Jia, Wei; Hua, Zhendong; Qian, Zhenhua

2017-08-01

Clinical and forensic toxicology laboratories are continuously confronted by analytical challenges when dealing with the new psychoactive substances phenomenon. The number of synthetic cannabinoids, the chemical diversity, and the speed of emergence make this group of compounds particularly challenging in terms of detection, monitoring, and responding. Three indazole 7N positional isomer synthetic cannabinoids, two ethyl 2-amino-3-methylbutanoate-type synthetic cannabinoids, and one 9H-carbazole substituted synthetic cannabinoid were identified in seized materials. These six synthetic cannabinoid derivatives included: 1H-benzo[d] [1,2,3]triazol-1-yl 1-(5-fluoropentyl)-1H-pyrrolo[2,3-b]pyridine-3-carboxylate (NNL-3, 1), quinolin-8-yl 1-(5-fluoropentyl)-1H-pyrrolo[2,3-b]pyridine-3-carboxylate (5F-NPB-22-7N, 2), N-((1 s,3 s)-adamantan-1-yl)-1-(5-fluoropentyl)-1H-pyrrolo[2,3-b]pyridine-3-carboxamide (5F-AKB-48-7N, 3), ethyl 2-(1-(5-fluoropentyl)-1H-indazole-3-carboxamido)-3,3-dimethylbutanoate (5F-EDMB-PINACA, 4), ethyl 2-(1-(4-fluorobenzyl)-1H-indazole-3-carboxamido)-3-methylbutanoate (EMB-FUBINACA, 5), and naphthalen-1-yl(9-pentyl-9H-carbazol-3-yl)methanone (EG-018, 6). The identification was based on ultra-high-performance liquid chromatography-quadrupole time-of-flight-mass spectrometry (UHPLC-QTOF-MS), gas chromatography-mass spectrometry (GC-MS), and nuclear magnetic resonance spectroscopy (NMR). The analytical characterization of these six synthetic cannabinoids was described, so as to assist forensic laboratories in identifying these compounds or other substances with similar structure in their case work. To our knowledge, no analytical data about the compounds 1-5 have appeared until now, making this the first report on these compounds. The GC-MS data of 6 has been reported, but this study added the LC-MS, NMR, and Fourier transform infrared (FTIR), data to render the analytical data collection process more complete. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Plint, Trevor; Lessard, Benoît H.; Bender, Timothy P.

In this study, we have assessed the potential application of group 13 and 14 metal and metalloid phthalocyanines ((X){sub n}-MPcs) and their axially substituted derivatives as hole-transporting layers in organic light emitting diodes (OLEDs). OLEDs studied herein have the generic structure of glass/ITO/(N,N′-di(1-naphthyl)-N,N′-diphenyl-(1,1′-biphenyl)-4,4′-diamine (NPB) or (X){sub n}-MPc)(50 nm)/Alq{sub 3} (60 nm)/LiF (1 nm)/Al (80 nm), where X is an axial substituent group. OLEDs using chloro aluminum phthalocyanine (Cl-AlPc) showed good peak luminance values of 2620 ± 113 cd/m{sup 2} at 11 V. To our knowledge, Cl-AlPc has not previously been shown to work as a hole transport material (HTL) in OLEDs. Conversely, the di-chlorides of silicon, germanium, andmore » tin phthalocyanine (Cl{sub 2}-SiPc, Cl{sub 2}-GePc, and Cl{sub 2}-SnPc, respectively) showed poor performance compared to Cl-AlPc, having peak luminances of only 38 ± 4 cd/m{sup 2} (12 V), 23 ± 1 cd/m{sup 2} (8.5 V), and 59 ± 5 cd/m{sup 2} (13.5 V), respectively. However, by performing a simple axial substitution of the chloride groups of Cl{sub 2}-SiPc with pentafluorophenoxy groups, the resulting bis(pentafluorophenoxy) silicon phthalocyanine (F{sub 10}-SiPc) containing OLED had a peak luminance of 5141 ± 941 cd/m{sup 2} (10 V), a two order of magnitude increase over its chlorinated precursor. This material showed OLED characteristics approaching those of a baseline OLED based on the well-studied triarylamine NPB. Attempts to attach the pentafluorophenoxy axial group to both SnPc and GePc were hindered by synthetic difficulties and low thermal stability, respectively. In light of the performance improvements observed by simple axial substitution of SiPc in OLEDs, the use of axially substituted MPcs in organic electronic devices remains of continuing interest to us and potentially the field in general.« less
Assessing the potential of group 13 and 14 metal/metalloid phthalocyanines as hole transport layers in organic light emitting diodes

NASA Astrophysics Data System (ADS)

Plint, Trevor; Lessard, Benoît H.; Bender, Timothy P.

2016-04-01

In this study, we have assessed the potential application of group 13 and 14 metal and metalloid phthalocyanines ((X)n-MPcs) and their axially substituted derivatives as hole-transporting layers in organic light emitting diodes (OLEDs). OLEDs studied herein have the generic structure of glass/ITO/(N,N'-di(1-naphthyl)-N,N'-diphenyl-(1,1'-biphenyl)-4,4'-diamine (NPB) or (X)n-MPc)(50 nm)/Alq3 (60 nm)/LiF (1 nm)/Al (80 nm), where X is an axial substituent group. OLEDs using chloro aluminum phthalocyanine (Cl-AlPc) showed good peak luminance values of 2620 ± 113 cd/m2 at 11 V. To our knowledge, Cl-AlPc has not previously been shown to work as a hole transport material (HTL) in OLEDs. Conversely, the di-chlorides of silicon, germanium, and tin phthalocyanine (Cl2-SiPc, Cl2-GePc, and Cl2-SnPc, respectively) showed poor performance compared to Cl-AlPc, having peak luminances of only 38 ± 4 cd/m2 (12 V), 23 ± 1 cd/m2 (8.5 V), and 59 ± 5 cd/m2 (13.5 V), respectively. However, by performing a simple axial substitution of the chloride groups of Cl2-SiPc with pentafluorophenoxy groups, the resulting bis(pentafluorophenoxy) silicon phthalocyanine (F10-SiPc) containing OLED had a peak luminance of 5141 ± 941 cd/m2 (10 V), a two order of magnitude increase over its chlorinated precursor. This material showed OLED characteristics approaching those of a baseline OLED based on the well-studied triarylamine NPB. Attempts to attach the pentafluorophenoxy axial group to both SnPc and GePc were hindered by synthetic difficulties and low thermal stability, respectively. In light of the performance improvements observed by simple axial substitution of SiPc in OLEDs, the use of axially substituted MPcs in organic electronic devices remains of continuing interest to us and potentially the field in general.
Differential Assimilation of Inorganic Carbon and Leucine by Prochlorococcus in the Oligotrophic North Pacific Subtropical Gyre

PubMed Central

Björkman, Karin M.; Church, Matthew J.; Doggett, Joseph K.; Karl, David M.

2015-01-01

The light effect on photoheterotrophic processes in Prochlorococcus, and primary and bacterial productivity in the oligotrophic North Pacific Subtropical Gyre was investigated using 14C-bicarbonate and 3H-leucine. Light and dark incubation experiments were conducted in situ throughout the euphotic zone (0–175 m) on nine expeditions to Station ALOHA over a 3-year period. Photosynthetrons were also used to elucidate rate responses in leucine and inorganic carbon assimilation as a function of light intensity. Taxonomic group and cell-specific rates were assessed using flow cytometric sorting. The light:dark assimilation rate ratios of leucine in the top 150 m were ∼7:1 for Prochlorococcus, whereas the light:dark ratios for the non-pigmented bacteria (NPB) were not significant different from 1:1. Prochlorococcus assimilated leucine in the dark at per cell rates similar to the NPB, with a contribution to the total community bacterial production, integrated over the euphotic zone, of approximately 20% in the dark and 60% in the light. Depth-resolved primary productivity and leucine incorporation showed that the ratio of Prochlorococcus leucine:primary production peaked at 100 m then declined steeply below the deep chlorophyll maximum (DCM). The photosynthetron experiments revealed that, for Prochlorococcus at the DCM, the saturating irradiance (Ek) for leucine incorporation was reached at approximately half the light intensity required for light saturation of 14C-bicarbonate assimilation. Additionally, high and low red fluorescing Prochlorococcus populations (HRF and LRF), co-occurring at the DCM, had similar Ek values for their respective substrates, however, maximum assimilation rates, for both leucine and inorganic carbon, were two times greater for HRF cells. Our results show that Prochlorococcus contributes significantly to bacterial production estimates using 3H-leucine, whether or not the incubations are conducted in the dark or light, and this should be considered when making assessments of bacterial production in marine environments where Prochlorococcus is present. Furthermore, Prochlorococcus primary productivity showed rate to light-flux patterns that were different from its light enhanced leucine incorporation. This decoupling from autotrophic growth may indicate a separate light stimulated mechanism for leucine acquisition. PMID:26733953
Limitations of Community College Benchmarking and Benchmarks

ERIC Educational Resources Information Center

Bers, Trudy H.

2006-01-01

This chapter distinguishes between benchmarks and benchmarking, describes a number of data and cultural limitations to benchmarking projects, and suggests that external demands for accountability are the dominant reason for growing interest in benchmarking among community colleges.
APRON: A Cellular Processor Array Simulation and Hardware Design Tool

NASA Astrophysics Data System (ADS)

Barr, David R. W.; Dudek, Piotr

2009-12-01

We present a software environment for the efficient simulation of cellular processor arrays (CPAs). This software (APRON) is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.
Multi-partitioning for ADI-schemes on message passing architectures

NASA Technical Reports Server (NTRS)

Vanderwijngaart, Rob F.

1994-01-01

A kind of discrete-operator splitting called Alternating Direction Implicit (ADI) has been found to be useful in simulating fluid flow problems. In particular, it is being used to study the effects of hot exhaust jets from high performance aircraft on landing surfaces. Decomposition techniques that minimize load imbalance and message-passing frequency are described. Three strategies that are investigated for implementing the NAS Scalar Penta-diagonal Parallel Benchmark (SP) are transposition, pipelined Gaussian elimination, and multipartitioning. The multipartitioning strategy, which was used on Ethernet, was found to be the most efficient, although it was considered only a moderate success because of Ethernet's limited communication properties. The efficiency derived largely from the coarse granularity of the strategy, which reduced latencies and allowed overlap of communication and computation.
A modular case-mix classification system for medical rehabilitation illustrated.

PubMed

Stineman, M G; Granger, C V

1997-01-01

The authors present a modular set of patient classification systems designed for medical rehabilitation that predict resource use and outcomes for clinically similar groups of individuals. The systems, based on the Functional Independence Measure, are referred to as Function-Related Groups (FIM-FRGs). Using data from 23,637 lower extremity fracture patients from 458 inpatient medical rehabilitation facilities, 1995 benchmarks are provided and illustrated for length of stay, functional outcome, and discharge to home and skilled nursing facilities (SNFs). The FIM-FRG modules may be used in parallel to study interactions between resource use and quality and could ultimately yield an integrated strategy for payment and outcomes measurement. This could position the rehabilitation community to take a pioneering role in the application of outcomes-based clinical indicators.
The CP-PACS project

NASA Astrophysics Data System (ADS)

Iwasaki, Y.; CP-PACS Collaboration

1998-01-01

The CP-PACS project is a five year plan, which formally started in April 1992 and has been completed in March 1997, to develop a massively parallel computer for carrying out research in computational physics with primary emphasis on lattice QCD. The initial version of the CP-PACS computer with a theoretical peak speed of 307 GFLOPS with 1024 processors was completed in March 1996. The final version with a peak speed of 614 GFLOPS with 2048 processors was completed in September 1996, and has been in full operation since October 1996. We describe the architecture, the final specification, the hardware implementation, and the software of the CP-PACS computer. The CP-PACS has been used for hadron spectroscopy production runs since July 1996. The performance for lattice QCD applications and the LINPACK benchmark are given.
HTM Spatial Pooler With Memristor Crossbar Circuits for Sparse Biometric Recognition.

PubMed

James, Alex Pappachen; Fedorova, Irina; Ibrayev, Timur; Kudithipudi, Dhireesha

2017-06-01

Hierarchical Temporal Memory (HTM) is an online machine learning algorithm that emulates the neo-cortex. The development of a scalable on-chip HTM architecture is an open research area. The two core substructures of HTM are spatial pooler and temporal memory. In this work, we propose a new Spatial Pooler circuit design with parallel memristive crossbar arrays for the 2D columns. The proposed design was validated on two different benchmark datasets, face recognition, and speech recognition. The circuits are simulated and analyzed using a practical memristor device model and 0.18 μm IBM CMOS technology model. The databases AR, YALE, ORL, and UFI, are used to test the performance of the design in face recognition. TIMIT dataset is used for the speech recognition.

mm_par2.0: An object-oriented molecular dynamics simulation program parallelized using a hierarchical scheme with MPI and OPENMP

NASA Astrophysics Data System (ADS)

Oh, Kwang Jin; Kang, Ji Hoon; Myung, Hun Joo

2012-02-01

We have revised a general purpose parallel molecular dynamics simulation program mm_par using the object-oriented programming. We parallelized the revised version using a hierarchical scheme in order to utilize more processors for a given system size. The benchmark result will be presented here. New version program summaryProgram title: mm_par2.0 Catalogue identifier: ADXP_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADXP_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 2 390 858 No. of bytes in distributed program, including test data, etc.: 25 068 310 Distribution format: tar.gz Programming language: C++ Computer: Any system operated by Linux or Unix Operating system: Linux Classification: 7.7 External routines: We provide wrappers for FFTW [1], Intel MKL library [2] FFT routine, and Numerical recipes [3] FFT, random number generator, and eigenvalue solver routines, SPRNG [4] random number generator, Mersenne Twister [5] random number generator, space filling curve routine. Catalogue identifier of previous version: ADXP_v1_0 Journal reference of previous version: Comput. Phys. Comm. 174 (2006) 560 Does the new version supersede the previous version?: Yes Nature of problem: Structural, thermodynamic, and dynamical properties of fluids and solids from microscopic scales to mesoscopic scales. Solution method: Molecular dynamics simulation in NVE, NVT, and NPT ensemble, Langevin dynamics simulation, dissipative particle dynamics simulation. Reasons for new version: First, object-oriented programming has been used, which is known to be open for extension and closed for modification. It is also known to be better for maintenance. Second, version 1.0 was based on atom decomposition and domain decomposition scheme [6] for parallelization. However, atom decomposition is not popular due to its poor scalability. On the other hand, domain decomposition scheme is better for scalability. It still has a limitation in utilizing a large number of cores on recent petascale computers due to the requirement that the domain size is larger than the potential cutoff distance. To go beyond such a limitation, a hierarchical parallelization scheme has been adopted in this new version and implemented using MPI [7] and OPENMP [8]. Summary of revisions: (1) Object-oriented programming has been used. (2) A hierarchical parallelization scheme has been adopted. (3) SPME routine has been fully parallelized with parallel 3D FFT using volumetric decomposition scheme [9]. K.J.O. thanks Mr. Seung Min Lee for useful discussion on programming and debugging. Running time: Running time depends on system size and methods used. For test system containing a protein (PDB id: 5DHFR) with CHARMM22 force field [10] and 7023 TIP3P [11] waters in simulation box having dimension 62.23 Å×62.23 Å×62.23 Å, the benchmark results are given in Fig. 1. Here the potential cutoff distance was set to 12 Å and the switching function was applied from 10 Å for the force calculation in real space. For the SPME [12] calculation, K, K, and K were set to 64 and the interpolation order was set to 4. To do the fast Fourier transform, we used Intel MKL library. All bonds including hydrogen atoms were constrained using SHAKE/RATTLE algorithms [13,14]. The code was compiled using Intel compiler version 11.1 and mvapich2 version 1.5. Fig. 2 shows performance gains from using CUDA-enabled version [15] of mm_par for 5DHFR simulation in water on Intel Core2Quad 2.83 GHz and GeForce GTX 580. Even though mm_par2.0 is not ported yet for GPU, its performance data would be useful to expect mm_par2.0 performance on GPU. Timing results for 1000 MD steps. 1, 2, 4, and 8 in the figure mean the number of OPENMP threads. Timing results for 1000 MD steps from double precision simulation on CPU, single precision simulation on GPU, and double precision simulation on GPU.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gibson, Garth

Petascale computing infrastructures for scientific discovery make petascale demands on information storage capacity, performance, concurrency, reliability, availability, and manageability. The Petascale Data Storage Institute focuses on the data storage problems found in petascale scientific computing environments, with special attention to community issues such as interoperability, community buy-in, and shared tools. The Petascale Data Storage Institute is a collaboration between researchers at Carnegie Mellon University, National Energy Research Scientific Computing Center, Pacific Northwest National Laboratory, Oak Ridge National Laboratory, Sandia National Laboratory, Los Alamos National Laboratory, University of Michigan, and the University of California at Santa Cruz. Because the Institute focusesmore » on low level files systems and storage systems, its role in improving SciDAC systems was one of supporting application middleware such as data management and system-level performance tuning. In retrospect, the Petascale Data Storage Institute’s most innovative and impactful contribution is the Parallel Log-structured File System (PLFS). Published in SC09, PLFS is middleware that operates in MPI-IO or embedded in FUSE for non-MPI applications. Its function is to decouple concurrently written files into a per-process log file, whose impact (the contents of the single file that the parallel application was concurrently writing) is determined on later reading, rather than during its writing. PLFS is transparent to the parallel application, offering a POSIX or MPI-IO interface, and it shows an order of magnitude speedup to the Chombo benchmark and two orders of magnitude to the FLASH benchmark. Moreover, LANL production applications see speedups of 5X to 28X, so PLFS has been put into production at LANL. Originally conceived and prototyped in a PDSI collaboration between LANL and CMU, it has grown to engage many other PDSI institutes, international partners like AWE, and has a large team at EMC supporting and enhancing it. PLFS is open sourced with a BSD license on sourceforge. Post PDSI funding comes from NNSA and industry sources. Moreover, PLFS has spin out half a dozen or more papers, partnered on research with multiple schools and vendors, and has projects to transparently 1) dis- tribute metadata over independent metadata servers, 2) exploit drastically non-POSIX Hadoop storage for HPC POSIX applications, 3) compress checkpoints on the fly, 4) batch delayed writes for write speed, 5) compress read-back indexes and parallelize their redistribution, 6) double-buffer writes in NAND Flash storage to decouple host blocking during checkpoint from disk write time in the storage system, 7) pack small files into a smaller number of bigger containers. There are two large scale open source Linux software projects that PDSI significantly incubated, though neither were initated in PDSI. These are 1) Ceph, a UCSC parallel object storage research project that has continued to be a vehicle for research, and has become a released part of Linux, and 2) Parallel NFS (pNFS) a portion of the IETF’s NFSv4.1 that brings the core data parallelism found in Lustre, PanFS, PVFS, and Ceph to the industry standard NFS, with released code in Linux 3.0, and its vendor offerings, with products from NetApp, EMC, BlueArc and RedHat. Both are fundamentally supported and advanced by vendor companies now, but were critcally transferred from research demonstration to viable product with funding from PDSI, in part. At this point Lustre remains the primary path to scalable IO in Exascale systems, but both Ceph and pNFS are viable alternatives with different fundamental advantages. Finally, research community building was a big success for PDSI. Through the HECFSIO workshops and HECURA project with NSF PDSI stimulated and helped to steer leveraged funding of over $25M. Through the Petascale (now Parallel) Data Storage Workshop series, www.pdsw.org, colocated with SCxy each year, PDSI created and incubated five offerings of this high-attendance workshop. The workshop has gone on without PDSI support with two more highly successfully workshops, rewriting its organizational structure to be community managed. More than 70 peer reviewed papers have been presented at PDSW workshops.« less
Employing exciton transfer molecules to increase the lifetime of phosphorescent red organic light emitting diodes

NASA Astrophysics Data System (ADS)

Lindla, Florian; Boesing, Manuel; van Gemmern, Philipp; Bertram, Dietrich; Keiper, Dietmar; Heuken, Michael; Kalisch, Holger; Jansen, Rolf H.

2011-04-01

The lifetime of phosphorescent red organic light emitting diodes (OLEDs) is investigated employing either N,N'-diphenyl-N,N'-bis(1-naphthylphenyl)-1,1'-biphenyl-4,4'-diamine (NPB), TMM117, or 4,4',4″-tris(N-carbazolyl)-triphenylamine (TCTA) as hole-conducting host material (mixed with an electron conductor). All OLED (organic vapor phase deposition-processed) show similar efficiencies around 30 lm/W but strongly different lifetimes. Quickly degrading OLED based on TCTA can be stabilized by doping exciton transfer molecules [tris-(phenyl-pyridyl)-Ir (Ir(ppy)3)] to the emission layer. At a current density of 50 mA/cm2 (12 800 cd/m2), a lifetime of 387 h can be achieved. Employing exciton transfer molecules is suggested to prevent the degradation of the red emission layer in phosphorescent white OLED.
Narrowband ultraviolet photodetector based on MgZnO and NPB heterojunction.

PubMed

Hu, Zuofu; Li, Zhenjun; Zhu, Lu; Liu, Fengjuan; Lv, Yanwu; Zhang, Xiqing; Wang, Yongsheng

2012-08-01

An ultraviolet photodetector was fabricated based on Mg0.07Zn0.93O heterojunction. N, N'-bis (naphthalen-1-y1)-N, N'-bis(pheny) benzidine was selected as the hole transporting layer. I-V characteristic curves of the device were measured in the dark and under the illumination of 340 nm UV light with density of 1.33 mW/cm2. The device showed a low dark current of about 3×10(-10) A and a high photo-dark current ratio of 1×10(5) at -2 V bias. A narrowband photoresponse was observed from 300 to 400 nm and centered at 340 nm with a full width at half-maximum of only 30 nm. The maximum peak response is at 340 nm, which is 0.192 A/W at the bias of -1 V.
Oxygen sensing with an absolute optical sensor based on biluminescence (Conference Presentation)

NASA Astrophysics Data System (ADS)

Salas Redondo, Caterin; Reineke, Sebastian

2017-06-01

Organic semiconductors are materials having the benefits of semiconductors together with those of organic molecules. That means, on one hand, these are compounds able to absorb and emit light, as well as conduct electricity to a certain extent, which is enough for the functionality of solid state devices. On the other hand, a remarkable characteristic is that the excitations are typically localized on individual molecules, such that the exchange interactions lead to energetically distinct singlet and triplet states. According to the spectroscopic selection rules in quantum mechanics, only transitions from the singlet excited state are allowed, deactivating radiatively while generating fluorescence emission in the process, whereas transitions from the triplet excited state are not allowed, because its decay involves a spin flip, and therefore, it is theoretically forbidden by electric dipole transitions. Nevertheless, there is a small probability of these forbidden transitions to occur at a low rate, resulting in a slow radiative deactivation known as phosphorescence emission. In this context, the property of an organic molecule able to emit light from both their singlet and triplet excited states is called biluminescence. Although this dual state emission, particularly at room temperature, is difficult to achieve by purely organic molecules, it becomes possible if competitive thermal decay is suppressed effectively, allowing emission from the triplet states (i.e. phosphorescence) in addition to the conventional fluorescence. Here, we have identified biluminescence in simple host:guest systems in which a biluminophore (i.e. organic molecule with biluminescence property) is embedded in an optimum rigid matrix, for example, a combination of PMMA [poly(methyl methacrylate)] as host and NPB [N,N'-di(naphtha-1-yl)-N,N'-diphenyl-benzidine] as biluminophore [Reineke and Baldo, Sci. Rep.]. Such system is unique not only because of the dual state emission, but also the large exciton dynamic range extended up to nine orders of magnitude between nanosecond-lifetime fluorescence and millisecond-lifetime phosphorescence. In this presentation, we will report on the oxygen sensing characteristics of this luminescent system compared to a benchmarked single state optical sensor. Such properties can be evaluated because of the sensitivity of the triplet state to oxygen and therefore, we investigate the dependence of the persistent phosphorescence on the oxygen content. Furthermore, we will address our efforts towards the potential integration of novel optical biluminescent sensing into organic electronics.
Benchmarking specialty hospitals, a scoping review on theory and practice.

PubMed

Wind, A; van Harten, W H

2017-04-04

Although benchmarking may improve hospital processes, research on this subject is limited. The aim of this study was to provide an overview of publications on benchmarking in specialty hospitals and a description of study characteristics. We searched PubMed and EMBASE for articles published in English in the last 10 years. Eligible articles described a project stating benchmarking as its objective and involving a specialty hospital or specific patient category; or those dealing with the methodology or evaluation of benchmarking. Of 1,817 articles identified in total, 24 were included in the study. Articles were categorized into: pathway benchmarking, institutional benchmarking, articles on benchmark methodology or -evaluation and benchmarking using a patient registry. There was a large degree of variability:(1) study designs were mostly descriptive and retrospective; (2) not all studies generated and showed data in sufficient detail; and (3) there was variety in whether a benchmarking model was just described or if quality improvement as a consequence of the benchmark was reported upon. Most of the studies that described a benchmark model described the use of benchmarking partners from the same industry category, sometimes from all over the world. Benchmarking seems to be more developed in eye hospitals, emergency departments and oncology specialty hospitals. Some studies showed promising improvement effects. However, the majority of the articles lacked a structured design, and did not report on benchmark outcomes. In order to evaluate the effectiveness of benchmarking to improve quality in specialty hospitals, robust and structured designs are needed including a follow up to check whether the benchmark study has led to improvements.
Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.

2000-01-01

Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining, gather/scatter, and redistribution. At the end of the conversion process most intermediate Charon function calls will have been removed, the non-distributed arrays will have been deleted, and virtually the only remaining Charon functions calls are the high-level, highly optimized communications. Distribution of the data is under complete control of the programmer, although a wide range of useful distributions is easily available through predefined functions. A crucial aspect of the library is that it does not allocate space for distributed arrays, but accepts programmer-specified memory. This has two major consequences. First, codes parallelized using Charon do not suffer from encapsulation; user data is always directly accessible. This provides high efficiency, and also retains the possibility of using message passing directly for highly irregular communications. Second, non-distributed arrays can be interpreted as (trivial) distributions in the Charon sense, which allows them to be mapped to truly distributed arrays, and vice versa. This is the mechanism that enables incremental parallelization. In this paper we provide a brief introduction of the library and then focus on the actual steps in the parallelization process, using some representative examples from, among others, the NAS Parallel Benchmarks. We show how a complicated two-dimensional pipeline-the prototypical non-data-parallel algorithm- can be constructed with ease. To demonstrate the flexibility of the library, we give examples of the stepwise, efficient parallel implementation of nonlocal boundary conditions common in aircraft simulations, as well as the construction of the sequence of grids required for multigrid.
All inclusive benchmarking.

PubMed

Ellis, Judith

2006-07-01

The aim of this article is to review published descriptions of benchmarking activity and synthesize benchmarking principles to encourage the acceptance and use of Essence of Care as a new benchmarking approach to continuous quality improvement, and to promote its acceptance as an integral and effective part of benchmarking activity in health services. The Essence of Care, was launched by the Department of Health in England in 2001 to provide a benchmarking tool kit to support continuous improvement in the quality of fundamental aspects of health care, for example, privacy and dignity, nutrition and hygiene. The tool kit is now being effectively used by some frontline staff. However, use is inconsistent, with the value of the tool kit, or the support clinical practice benchmarking requires to be effective, not always recognized or provided by National Health Service managers, who are absorbed with the use of quantitative benchmarking approaches and measurability of comparative performance data. This review of published benchmarking literature, was obtained through an ever-narrowing search strategy commencing from benchmarking within quality improvement literature through to benchmarking activity in health services and including access to not only published examples of benchmarking approaches and models used but the actual consideration of web-based benchmarking data. This supported identification of how benchmarking approaches have developed and been used, remaining true to the basic benchmarking principles of continuous improvement through comparison and sharing (Camp 1989). Descriptions of models and exemplars of quantitative and specifically performance benchmarking activity in industry abound (Camp 1998), with far fewer examples of more qualitative and process benchmarking approaches in use in the public services and then applied to the health service (Bullivant 1998). The literature is also in the main descriptive in its support of the effectiveness of benchmarking activity and although this does not seem to have restricted its popularity in quantitative activity, reticence about the value of the more qualitative approaches, for example Essence of Care, needs to be overcome in order to improve the quality of patient care and experiences. The perceived immeasurability and subjectivity of Essence of Care and clinical practice benchmarks means that these benchmarking approaches are not always accepted or supported by health service organizations as valid benchmarking activity. In conclusion, Essence of Care benchmarking is a sophisticated clinical practice benchmarking approach which needs to be accepted as an integral part of health service benchmarking activity to support improvement in the quality of patient care and experiences.
Full dimensional (15-dimensional) quantum-dynamical simulation of the protonated water-dimer III: Mixed Jacobi-valence parametrization and benchmark results for the zero point energy, vibrationally excited states, and infrared spectrum.

PubMed

Vendrell, Oriol; Brill, Michael; Gatti, Fabien; Lauvergnat, David; Meyer, Hans-Dieter

2009-06-21

Quantum dynamical calculations are reported for the zero point energy, several low-lying vibrational states, and the infrared spectrum of the H(5)O(2)(+) cation. The calculations are performed by the multiconfiguration time-dependent Hartree (MCTDH) method. A new vector parametrization based on a mixed Jacobi-valence description of the system is presented. With this parametrization the potential energy surface coupling is reduced with respect to a full Jacobi description, providing a better convergence of the n-mode representation of the potential. However, new coupling terms appear in the kinetic energy operator. These terms are derived and discussed. A mode-combination scheme based on six combined coordinates is used, and the representation of the 15-dimensional potential in terms of a six-combined mode cluster expansion including up to some 7-dimensional grids is discussed. A statistical analysis of the accuracy of the n-mode representation of the potential at all orders is performed. Benchmark, fully converged results are reported for the zero point energy, which lie within the statistical uncertainty of the reference diffusion Monte Carlo result for this system. Some low-lying vibrationally excited eigenstates are computed by block improved relaxation, illustrating the applicability of the approach to large systems. Benchmark calculations of the linear infrared spectrum are provided, and convergence with increasing size of the time-dependent basis and as a function of the order of the n-mode representation is studied. The calculations presented here make use of recent developments in the parallel version of the MCTDH code, which are briefly discussed. We also show that the infrared spectrum can be computed, to a very good approximation, within D(2d) symmetry, instead of the G(16) symmetry used before, in which the complete rotation of one water molecule with respect to the other is allowed, thus simplifying the dynamical problem.
Visualization assisted by parallel processing

NASA Astrophysics Data System (ADS)

Lange, B.; Rey, H.; Vasques, X.; Puech, W.; Rodriguez, N.

2011-01-01

This paper discusses the experimental results of our visualization model for data extracted from sensors. The objective of this paper is to find a computationally efficient method to produce a real time rendering visualization for a large amount of data. We develop visualization method to monitor temperature variance of a data center. Sensors are placed on three layers and do not cover all the room. We use particle paradigm to interpolate data sensors. Particles model the "space" of the room. In this work we use a partition of the particle set, using two mathematical methods: Delaunay triangulation and Voronoý cells. Avis and Bhattacharya present these two algorithms in. Particles provide information on the room temperature at different coordinates over time. To locate and update particles data we define a computational cost function. To solve this function in an efficient way, we use a client server paradigm. Server computes data and client display this data on different kind of hardware. This paper is organized as follows. The first part presents related algorithm used to visualize large flow of data. The second part presents different platforms and methods used, which was evaluated in order to determine the better solution for the task proposed. The benchmark use the computational cost of our algorithm that formed based on located particles compared to sensors and on update of particles value. The benchmark was done on a personal computer using CPU, multi core programming, GPU programming and hybrid GPU/CPU. GPU programming method is growing in the research field; this method allows getting a real time rendering instates of a precompute rendering. For improving our results, we compute our algorithm on a High Performance Computing (HPC), this benchmark was used to improve multi-core method. HPC is commonly used in data visualization (astronomy, physic, etc) for improving the rendering and getting real-time.
Follow-Up After Cardiac Surgery Should be Extended to at Least 120 Days When Benchmarking Cardiac Surgery Centers.

PubMed

Hansen, Laura S; Sloth, Erik; Hjortdal, Vibeke E; Jakobsen, Carl-Johan

2015-08-01

Short-term (30 days) mortality frequently is used as an outcome measure after cardiac surgery, although it has been proposed that the follow-up period should be extended to 120 days to allow for more accurate benchmarking. The authors aimed to evaluate whether mortality rates 120 days after surgery were comparable to general mortality and to compare causes of death between the cohort and the general population. A multicenter descriptive cohort study using prospectively entered registry data. University hospital. The cohort was obtained from the Western Denmark Heart Registry and matched to the Danish National Hospital Register as well as the Danish Register of Causes of Death. A weighted, age-matched general population consisting of all Danish patients who died within the study period was identified through the central authority on Danish statistics. A total of 11,988 patients (>15 years) who underwent cardiac-surgery at Aarhus, Aalborg and Odense University Hospitals from April 1, 2006 to December 31, 2012 were included. Coronary artery bypass grafting, valve surgery and combinations. Mortality after cardiac surgery matches with mortality in the general population after 140 days. Mortality curves run almost parallel from this point onwards, regardless of The European system for cardiac operative risk evaluation (EuroSCORE) and intervention. The causes of death in the cohort differed statistically significantly from the background population (p<0.0001; one-sample t-test) throughout the first postoperative year. The leading cause of death in the cohort was cardiac (38%); 53% of which was categorized as heart failure. A total of 54% of these patients were assessed preoperatively as having normal or mildly impaired heart function (EuroSCORE). This study supported an extended follow-up period after cardiac surgery when benchmarking cardiac surgery centers. Regardless of preoperative heart function, heart failure was the consistent leading cause of death. Copyright © 2015 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Biyikli, Emre; To, Albert C., E-mail: albertto@pitt.edu

Atomistic/continuum coupling methods combine accurate atomistic methods and efficient continuum methods to simulate the behavior of highly ordered crystalline systems. Coupled methods utilize the advantages of both approaches to simulate systems at a lower computational cost, while retaining the accuracy associated with atomistic methods. Many concurrent atomistic/continuum coupling methods have been proposed in the past; however, their true computational efficiency has not been demonstrated. The present work presents an efficient implementation of a concurrent coupling method called the Multiresolution Molecular Mechanics (MMM) for serial, parallel, and adaptive analysis. First, we present the features of the software implemented along with themore » associated technologies. The scalability of the software implementation is demonstrated, and the competing effects of multiscale modeling and parallelization are discussed. Then, the algorithms contributing to the efficiency of the software are presented. These include algorithms for eliminating latent ghost atoms from calculations and measurement-based dynamic balancing of parallel workload. The efficiency improvements made by these algorithms are demonstrated by benchmark tests. The efficiency of the software is found to be on par with LAMMPS, a state-of-the-art Molecular Dynamics (MD) simulation code, when performing full atomistic simulations. Speed-up of the MMM method is shown to be directly proportional to the reduction of the number of the atoms visited in force computation. Finally, an adaptive MMM analysis on a nanoindentation problem, containing over a million atoms, is performed, yielding an improvement of 6.3–8.5 times in efficiency, over the full atomistic MD method. For the first time, the efficiency of a concurrent atomistic/continuum coupling method is comprehensively investigated and demonstrated.« less
Multiresolution molecular mechanics: Implementation and efficiency

NASA Astrophysics Data System (ADS)

Biyikli, Emre; To, Albert C.

2017-01-01

Atomistic/continuum coupling methods combine accurate atomistic methods and efficient continuum methods to simulate the behavior of highly ordered crystalline systems. Coupled methods utilize the advantages of both approaches to simulate systems at a lower computational cost, while retaining the accuracy associated with atomistic methods. Many concurrent atomistic/continuum coupling methods have been proposed in the past; however, their true computational efficiency has not been demonstrated. The present work presents an efficient implementation of a concurrent coupling method called the Multiresolution Molecular Mechanics (MMM) for serial, parallel, and adaptive analysis. First, we present the features of the software implemented along with the associated technologies. The scalability of the software implementation is demonstrated, and the competing effects of multiscale modeling and parallelization are discussed. Then, the algorithms contributing to the efficiency of the software are presented. These include algorithms for eliminating latent ghost atoms from calculations and measurement-based dynamic balancing of parallel workload. The efficiency improvements made by these algorithms are demonstrated by benchmark tests. The efficiency of the software is found to be on par with LAMMPS, a state-of-the-art Molecular Dynamics (MD) simulation code, when performing full atomistic simulations. Speed-up of the MMM method is shown to be directly proportional to the reduction of the number of the atoms visited in force computation. Finally, an adaptive MMM analysis on a nanoindentation problem, containing over a million atoms, is performed, yielding an improvement of 6.3-8.5 times in efficiency, over the full atomistic MD method. For the first time, the efficiency of a concurrent atomistic/continuum coupling method is comprehensively investigated and demonstrated.
Photonic reservoir computing: a new approach to optical information processing

NASA Astrophysics Data System (ADS)

Vandoorne, Kristof; Fiers, Martin; Verstraeten, David; Schrauwen, Benjamin; Dambre, Joni; Bienstman, Peter

2010-06-01

Despite ever increasing computational power, recognition and classification problems remain challenging to solve. Recently, advances have been made by the introduction of the new concept of reservoir computing. This is a methodology coming from the field of machine learning and neural networks that has been successfully used in several pattern classification problems, like speech and image recognition. Thus far, most implementations have been in software, limiting their speed and power efficiency. Photonics could be an excellent platform for a hardware implementation of this concept because of its inherent parallelism and unique nonlinear behaviour. Moreover, a photonic implementation offers the promise of massively parallel information processing with low power and high speed. We propose using a network of coupled Semiconductor Optical Amplifiers (SOA) and show in simulation that it could be used as a reservoir by comparing it to conventional software implementations using a benchmark speech recognition task. In spite of the differences with classical reservoir models, the performance of our photonic reservoir is comparable to that of conventional implementations and sometimes slightly better. As our implementation uses coherent light for information processing, we find that phase tuning is crucial to obtain high performance. In parallel we investigate the use of a network of photonic crystal cavities. The coupled mode theory (CMT) is used to investigate these resonators. A new framework is designed to model networks of resonators and SOAs. The same network topologies are used, but feedback is added to control the internal dynamics of the system. By adjusting the readout weights of the network in a controlled manner, we can generate arbitrary periodic patterns.
Multi-GPU three dimensional Stokes solver for simulating glacier flow

NASA Astrophysics Data System (ADS)

Licul, Aleksandar; Herman, Frédéric; Podladchikov, Yuri; Räss, Ludovic; Omlin, Samuel

2016-04-01

Here we present how we have recently developed a three-dimensional Stokes solver on the GPUs and apply it to a glacier flow. We numerically solve the Stokes momentum balance equations together with the incompressibility equation, while also taking into account strong nonlinearities for ice rheology. We have developed a fully three-dimensional numerical MATLAB application based on an iterative finite difference scheme with preconditioning of residuals. Differential equations are discretized on a regular staggered grid. We have ported it to C-CUDA to run it on GPU's in parallel, using MPI. We demonstrate the accuracy and efficiency of our developed model by manufactured analytical solution test for three-dimensional Stokes ice sheet models (Leng et al.,2013) and by comparison with other well-established ice sheet models on diagnostic ISMIP-HOM benchmark experiments (Pattyn et al., 2008). The results show that our developed model is capable to accurately and efficiently solve Stokes system of equations in a variety of different test scenarios, while preserving good parallel efficiency on up to 80 GPU's. For example, in 3D test scenarios with 250000 grid points our solver converges in around 3 minutes for single precision computations and around 10 minutes for double precision computations. We have also optimized the developed code to efficiently run on our newly acquired state-of-the-art GPU cluster octopus. This allows us to solve our problem on more than 20 million grid points, by just increasing the number of GPU used, while keeping the computation time the same. In future work we will apply our solver to real world applications and implement the free surface evolution capabilities. REFERENCES Leng,W.,Ju,L.,Gunzburger,M. & Price,S., 2013. Manufactured solutions and the verification of three-dimensional stokes ice-sheet models. Cryosphere 7,19-29. Pattyn, F., Perichon, L., Aschwanden, A., Breuer, B., de Smedt, B., Gagliardini, O., Gudmundsson,G.H., Hindmarsh, R.C.A., Hubbard, A., Johnson, J.V., Kleiner, T., Konovalov,Y., Martin, C., Payne, A.J., Pollard, D., Price, S., Rckamp, M., Saito, F., Souk, O.,Sugiyama, S. & Zwinger, T., 2008. Benchmark experiments for higher-order and full-stokes ice sheet models (ismiphom). The Cryosphere 2, 95-108.
Results Oriented Benchmarking: The Evolution of Benchmarking at NASA from Competitive Comparisons to World Class Space Partnerships

NASA Technical Reports Server (NTRS)

Bell, Michael A.

1999-01-01

Informal benchmarking using personal or professional networks has taken place for many years at the Kennedy Space Center (KSC). The National Aeronautics and Space Administration (NASA) recognized early on, the need to formalize the benchmarking process for better utilization of resources and improved benchmarking performance. The need to compete in a faster, better, cheaper environment has been the catalyst for formalizing these efforts. A pioneering benchmarking consortium was chartered at KSC in January 1994. The consortium known as the Kennedy Benchmarking Clearinghouse (KBC), is a collaborative effort of NASA and all major KSC contractors. The charter of this consortium is to facilitate effective benchmarking, and leverage the resulting quality improvements across KSC. The KBC acts as a resource with experienced facilitators and a proven process. One of the initial actions of the KBC was to develop a holistic methodology for Center-wide benchmarking. This approach to Benchmarking integrates the best features of proven benchmarking models (i.e., Camp, Spendolini, Watson, and Balm). This cost-effective alternative to conventional Benchmarking approaches has provided a foundation for consistent benchmarking at KSC through the development of common terminology, tools, and techniques. Through these efforts a foundation and infrastructure has been built which allows short duration benchmarking studies yielding results gleaned from world class partners that can be readily implemented. The KBC has been recognized with the Silver Medal Award (in the applied research category) from the International Benchmarking Clearinghouse.
Accelerating population balance-Monte Carlo simulation for coagulation dynamics from the Markov jump model, stochastic algorithm and GPU parallel computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Zuwei; Zhao, Haibo, E-mail: klinsmannzhb@163.com; Zheng, Chuguang

2015-01-15

This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) simulation of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and GPU (graphics processing unit) parallel computing, a significant gain in computational efficiency is achieved. The Markov jump model constructs a coagulation-rule matrix of differentially-weighted simulation particles, so as to capture the time evolution of particle size distribution with low statistical noise over the full size range and as far as possible to reduce the number of time loopings. Here three coagulation rules are highlighted and it is found that constructing appropriate coagulation rule providesmore » a route to attain the compromise between accuracy and cost of PBMC methods. Further, in order to avoid double looping over all simulation particles when considering the two-particle events (typically, particle coagulation), the weighted majorant kernel is introduced to estimate the maximum coagulation rates being used for acceptance–rejection processes by single-looping over all particles, and meanwhile the mean time-step of coagulation event is estimated by summing the coagulation kernels of rejected and accepted particle pairs. The computational load of these fast differentially-weighted PBMC simulations (based on the Markov jump model) is reduced greatly to be proportional to the number of simulation particles in a zero-dimensional system (single cell). Finally, for a spatially inhomogeneous multi-dimensional (multi-cell) simulation, the proposed fast PBMC is performed in each cell, and multiple cells are parallel processed by multi-cores on a GPU that can implement the massively threaded data-parallel tasks to obtain remarkable speedup ratio (comparing with CPU computation, the speedup ratio of GPU parallel computing is as high as 200 in a case of 100 cells with 10 000 simulation particles per cell). These accelerating approaches of PBMC are demonstrated in a physically realistic Brownian coagulation case. The computational accuracy is validated with benchmark solution of discrete-sectional method. The simulation results show that the comprehensive approach can attain very favorable improvement in cost without sacrificing computational accuracy.« less
Toxicological benchmarks for screening potential contaminants of concern for effects on aquatic biota: 1996 revision

DOE Office of Scientific and Technical Information (OSTI.GOV)

Suter, G.W. II; Tsao, C.L.

1996-06-01

This report presents potential screening benchmarks for protection of aquatic life form contaminants in water. Because there is no guidance for screening for benchmarks, a set of alternative benchmarks is presented herein. This report presents the alternative benchmarks for chemicals that have been detected on the Oak Ridge Reservation. It also presents the data used to calculate the benchmarks and the sources of the data. It compares the benchmarks and discusses their relative conservatism and utility. Also included is the updates of benchmark values where appropriate, new benchmark values, secondary sources are replaced by primary sources, and a more completemore » documentation of the sources and derivation of all values are presented.« less
Benchmarking in emergency health systems.

PubMed

Kennedy, Marcus P; Allen, Jacqueline; Allen, Greg

2002-12-01

This paper discusses the role of benchmarking as a component of quality management. It describes the historical background of benchmarking, its competitive origin and the requirement in today's health environment for a more collaborative approach. The classical 'functional and generic' types of benchmarking are discussed with a suggestion to adopt a different terminology that describes the purpose and practicalities of benchmarking. Benchmarking is not without risks. The consequence of inappropriate focus and the need for a balanced overview of process is explored. The competition that is intrinsic to benchmarking is questioned and the negative impact it may have on improvement strategies in poorly performing organizations is recognized. The difficulty in achieving cross-organizational validity in benchmarking is emphasized, as is the need to scrutinize benchmarking measures. The cost effectiveness of benchmarking projects is questioned and the concept of 'best value, best practice' in an environment of fixed resources is examined.
Benchmarking and Performance Measurement.

ERIC Educational Resources Information Center

Town, J. Stephen

This paper defines benchmarking and its relationship to quality management, describes a project which applied the technique in a library context, and explores the relationship between performance measurement and benchmarking. Numerous benchmarking methods contain similar elements: deciding what to benchmark; identifying partners; gathering…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.