memory message passing: Topics by Science.gov

Sample records for memory message passing

MPF: A portable message passing facility for shared memory multiprocessors

NASA Technical Reports Server (NTRS)

Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.

1987-01-01

The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)

2002-01-01

The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
A message passing kernel for the hypercluster parallel processing test bed

NASA Technical Reports Server (NTRS)

Blech, Richard A.; Quealy, Angela; Cole, Gary L.

1989-01-01

A Message-Passing Kernel (MPK) for the Hypercluster parallel-processing test bed is described. The Hypercluster is being developed at the NASA Lewis Research Center to support investigations of parallel algorithms and architectures for computational fluid and structural mechanics applications. The Hypercluster resembles the hypercube architecture except that each node consists of multiple processors communicating through shared memory. The MPK efficiently routes information through the Hypercluster, using a message-passing protocol when necessary and faster shared-memory communication whenever possible. The MPK also interfaces all of the processors with the Hypercluster operating system (HYCLOPS), which runs on a Front-End Processor (FEP). This approach distributes many of the I/O tasks to the Hypercluster processors and eliminates the need for a separate I/O support program on the FEP.
Combining Distributed and Shared Memory Models: Approach and Evolution of the Global Arrays Toolkit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nieplocha, Jarek; Harrison, Robert J.; Kumar, Mukul

2002-07-29

Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in the modern computers this characteristic might have a negative impact on performance and scalability. Various techniques, such as code restructuring to increase data reuse and introducing blocking in data accesses, can address the problem and yield performance competitive with message passing[Singh], however at the cost of compromising the ease of use feature. Distributed memory models such as message passing or one-sided communication offer performance and scalability butmore » they compromise the ease-of-use. In this context, the message-passing model is sometimes referred to as?assembly programming for the scientific computing?. The Global Arrays toolkit[GA1, GA2] attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed explicitly by the programmer. This management is achieved by explicit calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be explicitly specified and hence managed. The GA model exposes to the programmer the hierarchical memory of modern high-performance computer systems, and by recognizing the communication overhead for remote data transfer, it promotes data reuse and locality of reference. This paper describes the characteristics of the Global Arrays programming model, capabilities of the toolkit, and discusses its evolution.« less
Parallelization of KENO-Va Monte Carlo code

NASA Astrophysics Data System (ADS)

Ramón, Javier; Peña, Jorge

1995-07-01

KENO-Va is a code integrated within the SCALE system developed by Oak Ridge that solves the transport equation through the Monte Carlo Method. It is being used at the Consejo de Seguridad Nuclear (CSN) to perform criticality calculations for fuel storage pools and shipping casks. Two parallel versions of the code: one for shared memory machines and other for distributed memory systems using the message-passing interface PVM have been generated. In both versions the neutrons of each generation are tracked in parallel. In order to preserve the reproducibility of the results in both versions, advanced seeds for random numbers were used. The CONVEX C3440 with four processors and shared memory at CSN was used to implement the shared memory version. A FDDI network of 6 HP9000/735 was employed to implement the message-passing version using proprietary PVM. The speedup obtained was 3.6 in both cases.
DMA engine for repeating communication patterns

DOEpatents

Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Steinmacher-Burow, Burkhard; Vranas, Pavlos

2010-09-21

A parallel computer system is constructed as a network of interconnected compute nodes to operate a global message-passing application for performing communications across the network. Each of the compute nodes includes one or more individual processors with memories which run local instances of the global message-passing application operating at each compute node to carry out local processing operations independent of processing operations carried out at other compute nodes. Each compute node also includes a DMA engine constructed to interact with the application via Injection FIFO Metadata describing multiple Injection FIFOs where each Injection FIFO may containing an arbitrary number of message descriptors in order to process messages with a fixed processing overhead irrespective of the number of message descriptors included in the Injection FIFO.
Users manual for the Chameleon parallel programming tools

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gropp, W.; Smith, B.

1993-06-01

Message passing is a common method for writing programs for distributed-memory parallel computers. Unfortunately, the lack of a standard for message passing has hampered the construction of portable and efficient parallel programs. In an attempt to remedy this problem, a number of groups have developed their own message-passing systems, each with its own strengths and weaknesses. Chameleon is a second-generation system of this type. Rather than replacing these existing systems, Chameleon is meant to supplement them by providing a uniform way to access many of these systems. Chameleon`s goals are to (a) be very lightweight (low over-head), (b) be highlymore » portable, and (c) help standardize program startup and the use of emerging message-passing operations such as collective operations on subsets of processors. Chameleon also provides a way to port programs written using PICL or Intel NX message passing to other systems, including collections of workstations. Chameleon is tracking the Message-Passing Interface (MPI) draft standard and will provide both an MPI implementation and an MPI transport layer. Chameleon provides support for heterogeneous computing by using p4 and PVM. Chameleon`s support for homogeneous computing includes the portable libraries p4, PICL, and PVM and vendor-specific implementation for Intel NX, IBM EUI (SP-1), and Thinking Machines CMMD (CM-5). Support for Ncube and PVM 3.x is also under development.« less
CSlib, a library to couple codes via Client/Server messaging

DOE Office of Scientific and Technical Information (OSTI.GOV)

Plimpton, Steve

The CSlib is a small, portable library which enables two (or more) independent simulation codes to be coupled, by exchanging messages with each other. Both codes link to the library when they are built, and can them communicate with each other as they run. The messages contain data or instructions that the two codes send back-and-forth to each other. The messaging can take place via files, sockets, or MPI. The latter is a standard distributed-memory message-passing library.
Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster

NASA Technical Reports Server (NTRS)

Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)

2002-01-01

In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Cooperative Data Sharing: Simple Support for Clusters of SMP Nodes

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Balley, David H. (Technical Monitor)

1997-01-01

Libraries like PVM and MPI send typed messages to allow for heterogeneous cluster computing. Lower-level libraries, such as GAM, provide more efficient access to communication by removing the need to copy messages between the interface and user space in some cases. still lower-level interfaces, such as UNET, get right down to the hardware level to provide maximum performance. However, these are all still interfaces for passing messages from one process to another, and have limited utility in a shared-memory environment, due primarily to the fact that message passing is just another term for copying. This drawback is made more pertinent by today's hybrid architectures (e.g. clusters of SMPs), where it is difficult to know beforehand whether two communicating processes will share memory. As a result, even portable language tools (like HPF compilers) must either map all interprocess communication, into message passing with the accompanying performance degradation in shared memory environments, or they must check each communication at run-time and implement the shared-memory case separately for efficiency. Cooperative Data Sharing (CDS) is a single user-level API which abstracts all communication between processes into the sharing and access coordination of memory regions, in a model which might be described as "distributed shared messages" or "large-grain distributed shared memory". As a result, the user programs to a simple latency-tolerant abstract communication specification which can be mapped efficiently to either a shared-memory or message-passing based run-time system, depending upon the available architecture. Unlike some distributed shared memory interfaces, the user still has complete control over the assignment of data to processors, the forwarding of data to its next likely destination, and the queuing of data until it is needed, so even the relatively high latency present in clusters can be accomodated. CDS does not require special use of an MMU, which can add overhead to some DSM systems, and does not require an SPMD programming model. unlike some message-passing interfaces, CDS allows the user to implement efficient demand-driven applications where processes must "fight" over data, and does not perform copying if processes share memory and do not attempt concurrent writes. CDS also supports heterogeneous computing, dynamic process creation, handlers, and a very simple thread-arbitration mechanism. Additional support for array subsections is currently being considered. The CDS1 API, which forms the kernel of CDS, is built primarily upon only 2 communication primitives, one process initiation primitive, and some data translation (and marshalling) routines, memory allocation routines, and priority control routines. The entire current collection of 28 routines provides enough functionality to implement most (or all) of MPI 1 and 2, which has a much larger interface consisting of hundreds of routines. still, the API is small enough to consider integrating into standard os interfaces for handling inter-process communication in a network-independent way. This approach would also help to solve many of the problems plaguing other higher-level standards such as MPI and PVM which must, in some cases, "play OS" to adequately address progress and process control issues. The CDS2 API, a higher level of interface roughly equivalent in functionality to MPI and to be built entirely upon CDS1, is still being designed. It is intended to add support for the equivalent of communicators, reduction and other collective operations, process topologies, additional support for process creation, and some automatic memory management. CDS2 will not exactly match MPI, because the copy-free semantics of communication from CDS1 will be supported. CDS2 application programs will be free to carefully also use CDS1. CDS1 has been implemented on networks of workstations running unmodified Unix-based operating systems, using UDP/IP and vendor-supplied high- performance locks. Although its inter-node performance is currently unimpressive due to rudimentary implementation technique, it even now outperforms highly-optimized MPI implementation on intra-node communication due to its support for non-copy communication. The similarity of the CDS1 architecture to that of other projects such as UNET and TRAP suggests that the inter-node performance can be increased significantly to surpass MPI or PVM, and it may be possible to migrate some of its functionality to communication controllers.
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Saini, Subhash; Grassi, Charles

1994-01-01

The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
A real-time MPEG software decoder using a portable message-passing library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kwong, Man Kam; Tang, P.T. Peter; Lin, Biquan

1995-12-31

We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4 and MPI. The parallel MPEG decoder currently runs on the IBM SP system but can be easil ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, 1/0 capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.
Message Passing vs. Shared Address Space on a Cluster of SMPs

NASA Technical Reports Server (NTRS)

Shan, Hongzhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswas, Rupak

2000-01-01

The convergence of scalable computer architectures using clusters of PCs (or PC-SMPs) with commodity networking has become an attractive platform for high end scientific computing. Currently, message-passing and shared address space (SAS) are the two leading programming paradigms for these systems. Message-passing has been standardized with MPI, and is the most common and mature programming approach. However message-passing code development can be extremely difficult, especially for irregular structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality, and high protocol overhead. In this paper, we compare the performance of and programming effort, required for six applications under both programming models on a 32 CPU PC-SMP cluster. Our application suite consists of codes that typically do not exhibit high efficiency under shared memory programming. due to their high communication to computation ratios and complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications: however, on certain classes of problems SAS performance is competitive with MPI. We also present new algorithms for improving the PC cluster performance of MPI collective operations.
Reducing Interprocessor Dependence in Recoverable Distributed Shared Memory

NASA Technical Reports Server (NTRS)

Janssens, Bob; Fuchs, W. Kent

1994-01-01

Checkpointing techniques in parallel systems use dependency tracking and/or message logging to ensure that a system rolls back to a consistent state. Traditional dependency tracking in distributed shared memory (DSM) systems is expensive because of high communication frequency. In this paper we show that, if designed correctly, a DSM system only needs to consider dependencies due to the transfer of blocks of data, resulting in reduced dependency tracking overhead and reduced potential for rollback propagation. We develop an ownership timestamp scheme to tolerate the loss of block state information and develop a passive server model of execution where interactions between processors are considered atomic. With our scheme, dependencies are significantly reduced compared to the traditional message-passing model.
Dust Dynamics in Protoplanetary Disks: Parallel Computing with PVM

NASA Astrophysics Data System (ADS)

de La Fuente Marcos, Carlos; Barge, Pierre; de La Fuente Marcos, Raúl

2002-03-01

We describe a parallel version of our high-order-accuracy particle-mesh code for the simulation of collisionless protoplanetary disks. We use this code to carry out a massively parallel, two-dimensional, time-dependent, numerical simulation, which includes dust particles, to study the potential role of large-scale, gaseous vortices in protoplanetary disks. This noncollisional problem is easy to parallelize on message-passing multicomputer architectures. We performed the simulations on a cache-coherent nonuniform memory access Origin 2000 machine, using both the parallel virtual machine (PVM) and message-passing interface (MPI) message-passing libraries. Our performance analysis suggests that, for our problem, PVM is about 25% faster than MPI. Using PVM and MPI made it possible to reduce CPU time and increase code performance. This allows for simulations with a large number of particles (N ~ 105-106) in reasonable CPU times. The performances of our implementation of the pa! rallel code on an Origin 2000 supercomputer are presented and discussed. They exhibit very good speedup behavior and low load unbalancing. Our results confirm that giant gaseous vortices can play a dominant role in giant planet formation.
New NAS Parallel Benchmarks Results

NASA Technical Reports Server (NTRS)

Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

1997-01-01

NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
Monitoring Data-Structure Evolution in Distributed Message-Passing Programs

NASA Technical Reports Server (NTRS)

Sarukkai, Sekhar R.; Beers, Andrew; Woodrow, Thomas S. (Technical Monitor)

1996-01-01

Monitoring the evolution of data structures in parallel and distributed programs, is critical for debugging its semantics and performance. However, the current state-of-art in tracking and presenting data-structure information on parallel and distributed environments is cumbersome and does not scale. In this paper we present a methodology that automatically tracks memory bindings (not the actual contents) of static and dynamic data-structures of message-passing C programs, using PVM. With the help of a number of examples we show that in addition to determining the impact of memory allocation overheads on program performance, graphical views can help in debugging the semantics of program execution. Scalable animations of virtual address bindings of source-level data-structures are used for debugging the semantics of parallel programs across all processors. In conjunction with light-weight core-files, this technique can be used to complement traditional debuggers on single processors. Detailed information (such as data-structure contents), on specific nodes, can be determined using traditional debuggers after the data structure evolution leading to the semantic error is observed graphically.
Parallel and fault-tolerant algorithms for hypercube multiprocessors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aykanat, C.

1988-01-01

Several techniques for increasing the performance of parallel algorithms on distributed-memory message-passing multi-processor systems are investigated. These techniques are effectively implemented for the parallelization of the Scaled Conjugate Gradient (SCG) algorithm on a hypercube connected message-passing multi-processor. Significant performance improvement is achieved by using these techniques. The SCG algorithm is used for the solution phase of an FE modeling system. Almost linear speed-up is achieved, and it is shown that hypercube topology is scalable for an FE class of problem. The SCG algorithm is also shown to be suitable for vectorization, and near supercomputer performance is achieved on a vectormore » hypercube multiprocessor by exploiting both parallelization and vectorization. Fault-tolerance issues for the parallel SCG algorithm and for the hypercube topology are also addressed.« less
Supporting shared data structures on distributed memory architectures

NASA Technical Reports Server (NTRS)

Koelbel, Charles; Mehrotra, Piyush; Vanrosendale, John

1990-01-01

Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece owned by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. A new programming environment is presented for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. The analysis and program transformations required to implement this environment are described, and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes are described.
NAS Parallel Benchmark. Results 11-96: Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks. 1.0

NASA Technical Reports Server (NTRS)

Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.

Compiling global name-space programs for distributed execution

NASA Technical Reports Server (NTRS)

Koelbel, Charles; Mehrotra, Piyush

1990-01-01

Distributed memory machines do not provide hardware support for a global address space. Thus programmers are forced to partition the data across the memories of the architecture and use explicit message passing to communicate data between processors. The compiler support required to allow programmers to express their algorithms using a global name-space is examined. A general method is presented for analysis of a high level source program and along with its translation to a set of independently executing tasks communicating via messages. If the compiler has enough information, this translation can be carried out at compile-time. Otherwise run-time code is generated to implement the required data movement. The analysis required in both situations is described and the performance of the generated code on the Intel iPSC/2 is presented.
Principles for problem aggregation and assignment in medium scale multiprocessors

NASA Technical Reports Server (NTRS)

Nicol, David M.; Saltz, Joel H.

1987-01-01

One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior.
Performance and Application of Parallel OVERFLOW Codes on Distributed and Shared Memory Platforms

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Rizk, Yehia M.

1999-01-01

The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each group are approximately balanced. A proper number of threads are initially allocated to each group, and in subsequent iterations during the run-time, the number of threads are adjusted to achieve load balancing across the processes. Each process exploits the multitasking directives already established in Overflow.
Vascular system modeling in parallel environment - distributed and shared memory approaches

PubMed Central

Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne

2011-01-01

The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
Distributed memory compiler design for sparse problems

NASA Technical Reports Server (NTRS)

Wu, Janet; Saltz, Joel; Berryman, Harry; Hiranandani, Seema

1991-01-01

A compiler and runtime support mechanism is described and demonstrated. The methods presented are capable of solving a wide range of sparse and unstructured problems in scientific computing. The compiler takes as input a FORTRAN 77 program enhanced with specifications for distributing data, and the compiler outputs a message passing program that runs on a distributed memory computer. The runtime support for this compiler is a library of primitives designed to efficiently support irregular patterns of distributed array accesses and irregular distributed array partitions. A variety of Intel iPSC/860 performance results obtained through the use of this compiler are presented.
Distributed-Memory Computing With the Langley Aerothermodynamic Upwind Relaxation Algorithm (LAURA)

NASA Technical Reports Server (NTRS)

Riley, Christopher J.; Cheatwood, F. McNeil

1997-01-01

The Langley Aerothermodynamic Upwind Relaxation Algorithm (LAURA), a Navier-Stokes solver, has been modified for use in a parallel, distributed-memory environment using the Message-Passing Interface (MPI) standard. A standard domain decomposition strategy is used in which the computational domain is divided into subdomains with each subdomain assigned to a processor. Performance is examined on dedicated parallel machines and a network of desktop workstations. The effect of domain decomposition and frequency of boundary updates on performance and convergence is also examined for several realistic configurations and conditions typical of large-scale computational fluid dynamic analysis.
High-performance computing — an overview

NASA Astrophysics Data System (ADS)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
Control Transfer in Operating System Kernels

DTIC Science & Technology

1994-05-13

microkernel system that runs less code in the kernel address space. To realize the performance benefit of allocating stacks in unmapped kseg0 memory, the...review how I modified the Mach 3.0 kernel to use continuations. Because of Mach’s message-passing microkernel structure, interprocess communication was...critical control transfer paths, deeply- nested call chains are undesirable in any case because of the function call overhead. 4.1.3 Microkernel Operating
Intel NX to PVM 3.2 message passing conversion library

NASA Technical Reports Server (NTRS)

Arthur, Trey; Nelson, Michael L.

1993-01-01

NASA Langley Research Center has developed a library that allows Intel NX message passing codes to be executed under the more popular and widely supported Parallel Virtual Machine (PVM) message passing library. PVM was developed at Oak Ridge National Labs and has become the defacto standard for message passing. This library will allow the many programs that were developed on the Intel iPSC/860 or Intel Paragon in a Single Program Multiple Data (SPMD) design to be ported to the numerous architectures that PVM (version 3.2) supports. Also, the library adds global operations capability to PVM. A familiarity with Intel NX and PVM message passing is assumed.
Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.

2000-01-01

Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining, gather/scatter, and redistribution. At the end of the conversion process most intermediate Charon function calls will have been removed, the non-distributed arrays will have been deleted, and virtually the only remaining Charon functions calls are the high-level, highly optimized communications. Distribution of the data is under complete control of the programmer, although a wide range of useful distributions is easily available through predefined functions. A crucial aspect of the library is that it does not allocate space for distributed arrays, but accepts programmer-specified memory. This has two major consequences. First, codes parallelized using Charon do not suffer from encapsulation; user data is always directly accessible. This provides high efficiency, and also retains the possibility of using message passing directly for highly irregular communications. Second, non-distributed arrays can be interpreted as (trivial) distributions in the Charon sense, which allows them to be mapped to truly distributed arrays, and vice versa. This is the mechanism that enables incremental parallelization. In this paper we provide a brief introduction of the library and then focus on the actual steps in the parallelization process, using some representative examples from, among others, the NAS Parallel Benchmarks. We show how a complicated two-dimensional pipeline-the prototypical non-data-parallel algorithm- can be constructed with ease. To demonstrate the flexibility of the library, we give examples of the stepwise, efficient parallel implementation of nonlocal boundary conditions common in aircraft simulations, as well as the construction of the sequence of grids required for multigrid.
A Multi-Level Parallelization Concept for High-Fidelity Multi-Block Solvers

NASA Technical Reports Server (NTRS)

Hatay, Ferhat F.; Jespersen, Dennis C.; Guruswamy, Guru P.; Rizk, Yehia M.; Byun, Chansup; Gee, Ken; VanDalsem, William R. (Technical Monitor)

1997-01-01

The integration of high-fidelity Computational Fluid Dynamics (CFD) analysis tools with the industrial design process benefits greatly from the robust implementations that are transportable across a wide range of computer architectures. In the present work, a hybrid domain-decomposition and parallelization concept was developed and implemented into the widely-used NASA multi-block Computational Fluid Dynamics (CFD) packages implemented in ENSAERO and OVERFLOW. The new parallel solver concept, PENS (Parallel Euler Navier-Stokes Solver), employs both fine and coarse granularity in data partitioning as well as data coalescing to obtain the desired load-balance characteristics on the available computer platforms. This multi-level parallelism implementation itself introduces no changes to the numerical results, hence the original fidelity of the packages are identically preserved. The present implementation uses the Message Passing Interface (MPI) library for interprocessor message passing and memory accessing. By choosing an appropriate combination of the available partitioning and coalescing capabilities only during the execution stage, the PENS solver becomes adaptable to different computer architectures from shared-memory to distributed-memory platforms with varying degrees of parallelism. The PENS implementation on the IBM SP2 distributed memory environment at the NASA Ames Research Center obtains 85 percent scalable parallel performance using fine-grain partitioning of single-block CFD domains using up to 128 wide computational nodes. Multi-block CFD simulations of complete aircraft simulations achieve 75 percent perfect load-balanced executions using data coalescing and the two levels of parallelism. SGI PowerChallenge, SGI Origin 2000, and a cluster of workstations are the other platforms where the robustness of the implementation is tested. The performance behavior on the other computer platforms with a variety of realistic problems will be included as this on-going study progresses.
Theoretic derivation of directed acyclic subgraph algorithm and comparisons with message passing algorithm

NASA Astrophysics Data System (ADS)

Ha, Jeongmok; Jeong, Hong

2016-07-01

This study investigates the directed acyclic subgraph (DAS) algorithm, which is used to solve discrete labeling problems much more rapidly than other Markov-random-field-based inference methods but at a competitive accuracy. However, the mechanism by which the DAS algorithm simultaneously achieves competitive accuracy and fast execution speed, has not been elucidated by a theoretical derivation. We analyze the DAS algorithm by comparing it with a message passing algorithm. Graphical models, inference methods, and energy-minimization frameworks are compared between DAS and message passing algorithms. Moreover, the performances of DAS and other message passing methods [sum-product belief propagation (BP), max-product BP, and tree-reweighted message passing] are experimentally compared.
A Parallel Workload Model and its Implications for Processor Allocation

DTIC Science & Technology

1996-11-01

with SEV or AVG, both of which can tolerate c = 0.4 { 0.6 before their performance deteriorates signi cantly. On the other hand, Setia [10] has...Sanjeev. K Setia . The interaction between memory allocation and adaptive partitioning in message-passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [11] Sanjeev K. Setia and Satish K. Tripathi. An analysis of several processor
GENASIS Basics: Object-oriented utilitarian functionality for large-scale physics simulations (Version 2)

NASA Astrophysics Data System (ADS)

Cardall, Christian Y.; Budiardja, Reuben D.

2017-05-01

GenASiS Basics provides Fortran 2003 classes furnishing extensible object-oriented utilitarian functionality for large-scale physics simulations on distributed memory supercomputers. This functionality includes physical units and constants; display to the screen or standard output device; message passing; I/O to disk; and runtime parameter management and usage statistics. This revision -Version 2 of Basics - makes mostly minor additions to functionality and includes some simplifying name changes.
A Survey of Rollback-Recovery Protocols in Message-Passing Systems

DTIC Science & Technology

1999-06-01

and M.A. Castillo. "Checkpointing through garbage collection." Technical report. Departamento de Ciencia de la Computation, Escuela de Ingenieria ...between consecutive checkpoints. It can be implemented by using the dirty-bit of the memory protection hardware or by emulating a dirty-bit in software [4...compare the program’s state with the previous checkpoint in software , and writing the difference in a new checkpoint [46]. The required storage and
An HLA-Based Approach to Quantify Achievable Performance for Tactical Edge Applications

DTIC Science & Technology

2011-05-01

in: Proceedings of the 2002 Fall Simulation Interoperability Workshop, 02F- SIW -068, Nov 2002. [16] P. Knight, et al. ―WBT RTI Independent...Benchmark Tests: Design, Implementation, and Updated Results‖, in: Proceedings of the 2002 Spring Simulation Interoperability Workshop, 02S- SIW -081, March...Interoperability Workshop, 98F- SIW -085, Nov 1998. [18] S. Ferenci and R. Fujimoto. ―RTI Performance on Shared Memory and Message Passing Architectures‖, in
Internode data communications in a parallel computer

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Miller, Douglas R.; Parker, Jeffrey J.; Ratterman, Joseph D.; Smith, Brian E.

2013-09-03

Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.
Internode data communications in a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E

2014-02-11

Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.
Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

NASA Technical Reports Server (NTRS)

Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

2016-01-01

In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.
Testing New Programming Paradigms with NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

2000-01-01

Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.

Efficient Data Generation and Publication as a Test Tool

NASA Technical Reports Server (NTRS)

Einstein, Craig Jakob

2017-01-01

A tool to facilitate the generation and publication of test data was created to test the individual components of a command and control system designed to launch spacecraft. Specifically, this tool was built to ensure messages are properly passed between system components. The tool can also be used to test whether the appropriate groups have access (read/write privileges) to the correct messages. The messages passed between system components take the form of unique identifiers with associated values. These identifiers are alphanumeric strings that identify the type of message and the additional parameters that are contained within the message. The values that are passed with the message depend on the identifier. The data generation tool allows for the efficient creation and publication of these messages. A configuration file can be used to set the parameters of the tool and also specify which messages to pass.
Communication Studies of DMP and SMP Machines

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

Understanding the interplay between machines and problems is key to obtaining high performance on parallel machines. This paper investigates the interplay between programming paradigms and communication capabilities of parallel machines. In particular, we explicate the communication capabilities of the IBM SP-2 distributed-memory multiprocessor and the SGI PowerCHALLENGEarray symmetric multiprocessor. Two benchmark problems of bitonic sorting and Fast Fourier Transform are selected for experiments. Communication-efficient algorithms are developed to exploit the overlapping capabilities of the machines. Programs are written in Message-Passing Interface for portability and identical codes are used for both machines. Various data sizes and message sizes are used to test the machines' communication capabilities. Experimental results indicate that the communication performance of the multiprocessors are consistent with the size of messages. The SP-2 is sensitive to message size but yields a much higher communication overlapping because of the communication co-processor. The PowerCHALLENGEarray is not highly sensitive to message size and yields a low communication overlapping. Bitonic sorting yields lower performance compared to FFT due to a smaller computation-to-communication ratio.
Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster

NASA Technical Reports Server (NTRS)

Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland

2003-01-01

In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Strategies for Energy Efficient Resource Management of Hybrid Programming Models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Dong; Supinski, Bronis de; Schulz, Martin

2013-01-01

Many scientific applications are programmed using hybrid programming models that use both message-passing and shared-memory, due to the increasing prevalence of large-scale systems with multicore, multisocket nodes. Previous work has shown that energy efficiency can be improved using software-controlled execution schemes that consider both the programming model and the power-aware execution capabilities of the system. However, such approaches have focused on identifying optimal resource utilization for one programming model, either shared-memory or message-passing, in isolation. The potential solution space, thus the challenge, increases substantially when optimizing hybrid models since the possible resource configurations increase exponentially. Nonetheless, with the accelerating adoptionmore » of hybrid programming models, we increasingly need improved energy efficiency in hybrid parallel applications on large-scale systems. In this work, we present new software-controlled execution schemes that consider the effects of dynamic concurrency throttling (DCT) and dynamic voltage and frequency scaling (DVFS) in the context of hybrid programming models. Specifically, we present predictive models and novel algorithms based on statistical analysis that anticipate application power and time requirements under different concurrency and frequency configurations. We apply our models and methods to the NPB MZ benchmarks and selected applications from the ASC Sequoia codes. Overall, we achieve substantial energy savings (8.74% on average and up to 13.8%) with some performance gain (up to 7.5%) or negligible performance loss.« less
Evaluation Metrics for the Paragon XP/S-15

NASA Technical Reports Server (NTRS)

Traversat, Bernard; McNab, David; Nitzberg, Bill; Fineberg, Sam; Blaylock, Bruce T. (Technical Monitor)

1993-01-01

On February 17th 1993, the Numerical Aerodynamic Simulation (NAS) facility located at the NASA Ames Research Center installed a 224 node Intel Paragon XP/S-15 system. After its installation, the Paragon was found to be in a very immature state and was unable to support a NAS users' workload, composed of a wide range of development and production activities. As a first step towards addressing this problem, we implemented a set of metrics to objectively monitor the system as operating system and hardware upgrades were installed. The metrics were designed to measure four aspects of the system that we consider essential to support our workload: availability, utilization, functionality, and performance. This report presents the metrics collected from February 1993 to August 1993. Since its installation, the Paragon availability has improved from a low of 15% uptime to a high of 80%, while its utilization has remained low. Functionality and performance have improved from merely running one of the NAS Parallel Benchmarks to running all of them faster (between 1 and 2 times) than on the iPSC/860. In spite of the progress accomplished, fundamental limitations of the Paragon operating system are restricting the Paragon from supporting the NAS workload. The maximum operating system message passing (NORMA IPC) bandwidth was measured at 11 Mbytes/s, well below the peak hardware bandwidth (175 Mbytes/s), limiting overall virtual memory and Unix services (i.e. Disk and HiPPI I/O) performance. The high NX application message passing latency (184 microns), three times than on the iPSC/860, was found to significantly degrade performance of applications relying on small message sizes. The amount of memory available for an application was found to be approximately 10 Mbytes per node, indicating that the OS is taking more space than anticipated (6 Mbytes per node).
Multi-petascale highly efficient parallel supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.

A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time andmore » supports DMA functionality allowing for parallel processing message-passing.« less
A multiarchitecture parallel-processing development environment

NASA Technical Reports Server (NTRS)

Townsend, Scott; Blech, Richard; Cole, Gary

1993-01-01

A description is given of the hardware and software of a multiprocessor test bed - the second generation Hypercluster system. The Hypercluster architecture consists of a standard hypercube distributed-memory topology, with multiprocessor shared-memory nodes. By using standard, off-the-shelf hardware, the system can be upgraded to use rapidly improving computer technology. The Hypercluster's multiarchitecture nature makes it suitable for researching parallel algorithms in computational field simulation applications (e.g., computational fluid dynamics). The dedicated test-bed environment of the Hypercluster and its custom-built software allows experiments with various parallel-processing concepts such as message passing algorithms, debugging tools, and computational 'steering'. Such research would be difficult, if not impossible, to achieve on shared, commercial systems.
NAS Parallel Benchmarks. 2.4

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob; Biegel, Bryan A. (Technical Monitor)

2002-01-01

We describe a new problem size, called Class D, for the NAS Parallel Benchmarks (NPB), whose MPI source code implementation is being released as NPB 2.4. A brief rationale is given for how the new class is derived. We also describe the modifications made to the MPI (Message Passing Interface) implementation to allow the new class to be run on systems with 32-bit integers, and with moderate amounts of memory. Finally, we give the verification values for the new problem size.
8755 Emulator Design

DTIC Science & Technology

1988-12-01

Break Control....................51 8755 I/O Control..................54 Z-100 Control Software................. 55 Pass User Memory...the emulator SRAM and the other is 55 for the target SRAM. If either signal is replaced by the NACK signal the host computer displays an error message...Block Diagram 69 AUU M/M 8 in Fiur[:. cemti Daga 70m F’M I-- ’I ANLAD AMam U52 COMM ow U2 i M.2 " -n ax- U 6_- Figure 2b. Schematic Diagram 71 )-M -A -I
Message Passing and Shared Address Space Parallelism on an SMP Cluster

NASA Technical Reports Server (NTRS)

Shan, Hongzhang; Singh, Jaswinder P.; Oliker, Leonid; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2002-01-01

Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI+SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.
SKIRT: Hybrid parallelization of radiative transfer simulations

NASA Astrophysics Data System (ADS)

Verstocken, S.; Van De Putte, D.; Camps, P.; Baes, M.

2017-07-01

We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modelling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behaviour of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.
The Raid distributed database system

NASA Technical Reports Server (NTRS)

Bhargava, Bharat; Riedl, John

1989-01-01

Raid, a robust and adaptable distributed database system for transaction processing (TP), is described. Raid is a message-passing system, with server processes on each site to manage concurrent processing, consistent replicated copies during site failures, and atomic distributed commitment. A high-level layered communications package provides a clean location-independent interface between servers. The latest design of the package delivers messages via shared memory in a configuration with several servers linked into a single process. Raid provides the infrastructure to investigate various methods for supporting reliable distributed TP. Measurements on TP and server CPU time are presented, along with data from experiments on communications software, consistent replicated copy control during site failures, and concurrent distributed checkpointing. A software tool for evaluating the implementation of TP algorithms in an operating-system kernel is proposed.
Extended write combining using a write continuation hint flag

DOEpatents

Chen, Dong; Gara, Alan; Heidelberger, Philip; Ohmacht, Martin; Vranas, Pavlos

2013-06-04

A computing apparatus for reducing the amount of processing in a network computing system which includes a network system device of a receiving node for receiving electronic messages comprising data. The electronic messages are transmitted from a sending node. The network system device determines when more data of a specific electronic message is being transmitted. A memory device stores the electronic message data and communicating with the network system device. A memory subsystem communicates with the memory device. The memory subsystem stores a portion of the electronic message when more data of the specific message will be received, and the buffer combines the portion with later received data and moves the data to the memory device for accessible storage.
Efficiently passing messages in distributed spiking neural network simulation.

PubMed

Thibeault, Corey M; Minkovich, Kirill; O'Brien, Michael J; Harris, Frederick C; Srinivasa, Narayan

2013-01-01

Efficiently passing spiking messages in a neural model is an important aspect of high-performance simulation. As the scale of networks has increased so has the size of the computing systems required to simulate them. In addition, the information exchange of these resources has become more of an impediment to performance. In this paper we explore spike message passing using different mechanisms provided by the Message Passing Interface (MPI). A specific implementation, MVAPICH, designed for high-performance clusters with Infiniband hardware is employed. The focus is on providing information about these mechanisms for users of commodity high-performance spiking simulators. In addition, a novel hybrid method for spike exchange was implemented and benchmarked.
Managing internode data communications for an uninitialized process in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R

2014-05-20

A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior tomore » initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.« less
Managing internode data communications for an uninitialized process in a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E

2014-05-20

A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.
Preventing messaging queue deadlocks in a DMA environment

DOEpatents

Blocksome, Michael A; Chen, Dong; Gooding, Thomas; Heidelberger, Philip; Parker, Jeff

2014-01-14

Embodiments of the invention may be used to manage message queues in a parallel computing environment to prevent message queue deadlock. A direct memory access controller of a compute node may determine when a messaging queue is full. In response, the DMA may generate and interrupt. An interrupt handler may stop the DMA and swap all descriptors from the full messaging queue into a larger queue (or enlarge the original queue). The interrupt handler then restarts the DMA. Alternatively, the interrupt handler stops the DMA, allocates a memory block to hold queue data, and then moves descriptors from the full messaging queue into the allocated memory block. The interrupt handler then restarts the DMA. During a normal messaging advance cycle, a messaging manager attempts to inject the descriptors in the memory block into other messaging queues until the descriptors have all been processed.
[Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].

PubMed

Furuta, Takuya; Sato, Tatsuhiko

2015-01-01

Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.
Managing coherence via put/get windows

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY

2011-01-11

A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Managing coherence via put/get windows

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY

2012-02-21

A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.

Passing crisis and emergency risk communications: the effects of communication channel, information type, and repetition.

PubMed

Edworthy, Judy; Hellier, Elizabeth; Newbold, Lex; Titchener, Kirsteen

2015-05-01

Three experiments explore several factors which influence information transmission when warning messages are passed from person to person. In Experiment 1, messages were passed down chains of participants using five different modes of communication. Written communication channels resulted in more accurate message transmission than verbal. In addition, some elements of the message endured further down the chain than others. Experiment 2 largely replicated these effects and also demonstrated that simple repetition of a message eliminated differences between written and spoken communication. In a final field experiment, chains of participants passed information however they wanted to, with the proviso that half of the chains could not use telephones. Here, the lack of ability to use a telephone did not affect accuracy, but did slow down the speed of transmission from the recipient of the message to the last person in the chain. Implications of the findings for crisis and emergency risk communication are discussed. Copyright © 2015 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Increasing available FIFO space to prevent messaging queue deadlocks in a DMA environment

DOEpatents

Blocksome, Michael A [Rochester, MN; Chen, Dong [Croton On Hudson, NY; Gooding, Thomas [Rochester, MN; Heidelberger, Philip [Cortlandt Manor, NY; Parker, Jeff [Rochester, MN

2012-02-07

Embodiments of the invention may be used to manage message queues in a parallel computing environment to prevent message queue deadlock. A direct memory access controller of a compute node may determine when a messaging queue is full. In response, the DMA may generate an interrupt. An interrupt handler may stop the DMA and swap all descriptors from the full messaging queue into a larger queue (or enlarge the original queue). The interrupt handler then restarts the DMA. Alternatively, the interrupt handler stops the DMA, allocates a memory block to hold queue data, and then moves descriptors from the full messaging queue into the allocated memory block. The interrupt handler then restarts the DMA. During a normal messaging advance cycle, a messaging manager attempts to inject the descriptors in the memory block into other messaging queues until the descriptors have all been processed.
Statistics of Epidemics in Networks by Passing Messages

NASA Astrophysics Data System (ADS)

Shrestha, Munik Kumar

Epidemic processes are common out-of-equilibrium phenomena of broad interdisciplinary interest. In this thesis, we show how message-passing approach can be a helpful tool for simulating epidemic models in disordered medium like networks, and in particular for estimating the probability that a given node will become infectious at a particular time. The sort of dynamics we consider are stochastic, where randomness can arise from the stochastic events or from the randomness of network structures. As in belief propagation, variables or messages in message-passing approach are defined on the directed edges of a network. However, unlike belief propagation, where the posterior distributions are updated according to Bayes' rule, in message-passing approach we write differential equations for the messages over time. It takes correlations between neighboring nodes into account while preventing causal signals from backtracking to their immediate source, and thus avoids "echo chamber effects" where a pair of adjacent nodes each amplify the probability that the other is infectious. In our first results, we develop a message-passing approach to threshold models of behavior popular in sociology. These are models, first proposed by Granovetter, where individuals have to hear about a trend or behavior from some number of neighbors before adopting it themselves. In thermodynamic limit of large random networks, we provide an exact analytic scheme while calculating the time dependence of the probabilities and thus learning about the whole dynamics of bootstrap percolation, which is a simple model known in statistical physics for exhibiting discontinuous phase transition. As an application, we apply a similar model to financial networks, studying when bankruptcies spread due to the sudden devaluation of shared assets in overlapping portfolios. We predict that although diversification may be good for individual institutions, it can create dangerous systemic effects, and as a result financial contagion gets worse with too much diversification. We also predict that financial system exhibits "robust yet fragile" behavior, with regions of the parameter space where contagion is rare but catastrophic whenever it occurs. In further results, we develop a message-passing approach to recurrent state epidemics like susceptible-infectious-susceptible and susceptible-infectious-recovered-susceptible where nodes can return to previously inhabited states and multiple waves of infection can pass through the population. Given that message-passing has been applied exclusively to models with one-way state changes like susceptible-infectious and susceptible-infectious-recovered, we develop message-passing for recurrent epidemics based on a new class of differential equations and demonstrate that our approach is simple and efficiently approximates results obtained from Monte Carlo simulation, and that the accuracy of message-passing is often superior to the pair approximation (which also takes second-order correlations into account).
Verification of Faulty Message Passing Systems with Continuous State Space in PVS

NASA Technical Reports Server (NTRS)

Pilotto, Concetta; White, Jerome

2010-01-01

We present a library of Prototype Verification System (PVS) meta-theories that verifies a class of distributed systems in which agent commu nication is through message-passing. The theoretic work, outlined in, consists of iterative schemes for solving systems of linear equations , such as message-passing extensions of the Gauss and Gauss-Seidel me thods. We briefly review that work and discuss the challenges in formally verifying it.
Porting Gravitational Wave Signal Extraction to Parallel Virtual Machine (PVM)

NASA Technical Reports Server (NTRS)

Thirumalainambi, Rajkumar; Thompson, David E.; Redmon, Jeffery

2009-01-01

Laser Interferometer Space Antenna (LISA) is a planned NASA-ESA mission to be launched around 2012. The Gravitational Wave detection is fundamentally the determination of frequency, source parameters, and waveform amplitude derived in a specific order from the interferometric time-series of the rotating LISA spacecrafts. The LISA Science Team has developed a Mock LISA Data Challenge intended to promote the testing of complicated nested search algorithms to detect the 100-1 millihertz frequency signals at amplitudes of 10E-21. However, it has become clear that, sequential search of the parameters is very time consuming and ultra-sensitive; hence, a new strategy has been developed. Parallelization of existing sequential search algorithms of Gravitational Wave signal identification consists of decomposing sequential search loops, beginning with outermost loops and working inward. In this process, the main challenge is to detect interdependencies among loops and partitioning the loops so as to preserve concurrency. Existing parallel programs are based upon either shared memory or distributed memory paradigms. In PVM, master and node programs are used to execute parallelization and process spawning. The PVM can handle process management and process addressing schemes using a virtual machine configuration. The task scheduling and the messaging and signaling can be implemented efficiently for the LISA Gravitational Wave search process using a master and 6 nodes. This approach is accomplished using a server that is available at NASA Ames Research Center, and has been dedicated to the LISA Data Challenge Competition. Historically, gravitational wave and source identification parameters have taken around 7 days in this dedicated single thread Linux based server. Using PVM approach, the parameter extraction problem can be reduced to within a day. The low frequency computation and a proxy signal-to-noise ratio are calculated in separate nodes that are controlled by the master using message and vector of data passing. The message passing among nodes follows a pattern of synchronous and asynchronous send-and-receive protocols. The communication model and the message buffers are allocated dynamically to address rapid search of gravitational wave source information in the Mock LISA data sets.
The serial message-passing schedule for LDPC decoding algorithms

NASA Astrophysics Data System (ADS)

Liu, Mingshan; Liu, Shanshan; Zhou, Yuan; Jiang, Xue

2015-12-01

The conventional message-passing schedule for LDPC decoding algorithms is the so-called flooding schedule. It has the disadvantage that the updated messages cannot be used until next iteration, thus reducing the convergence speed . In this case, the Layered Decoding algorithm (LBP) based on serial message-passing schedule is proposed. In this paper the decoding principle of LBP algorithm is briefly introduced, and then proposed its two improved algorithms, the grouped serial decoding algorithm (Grouped LBP) and the semi-serial decoding algorithm .They can improve LBP algorithm's decoding speed while maintaining a good decoding performance.
An Evaluation of Architectural Platforms for Parallel Navier-Stokes Computations

NASA Technical Reports Server (NTRS)

Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.

1996-01-01

We study the computational, communication, and scalability characteristics of a computational fluid dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architecture platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and distributed memory multiprocessors with different topologies - the IBM SP and the Cray T3D. We investigate the impact of various networks connecting the cluster of workstations on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Parallelizing Navier-Stokes Computations on a Variety of Architectural Platforms

NASA Technical Reports Server (NTRS)

Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.

1997-01-01

We study the computational, communication, and scalability characteristics of a Computational Fluid Dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architectural platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), distributed memory multiprocessors with different topologies-the IBM SP and the Cray T3D. We investigate the impact of various networks, connecting the cluster of workstations, on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Buntinas, D.; Mercier, G.; Gropp, W.

2005-12-02

This paper presents a new low-level communication subsystem called Nemesis. Nemesis has been designed and implemented to be scalable and efficient both in the intranode communication context using shared-memory and in the internode communication case using high-performance networks and is natively multimethod-enabled. Nemesis has been integrated in MPICH2 as a CH3 channel and delivers better performance than other dedicated communication channels in MPICH2. Furthermore, the resulting MPICH2 architecture outperforms other MPI implementations in point-to-point benchmarks.
Performance Measurement, Visualization and Modeling of Parallel and Distributed Programs

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Sarukkai, Sekhar R.; Mehra, Pankaj; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

This paper presents a methodology for debugging the performance of message-passing programs on both tightly coupled and loosely coupled distributed-memory machines. The AIMS (Automated Instrumentation and Monitoring System) toolkit, a suite of software tools for measurement and analysis of performance, is introduced and its application illustrated using several benchmark programs drawn from the field of computational fluid dynamics. AIMS includes (i) Xinstrument, a powerful source-code instrumentor, which supports both Fortran77 and C as well as a number of different message-passing libraries including Intel's NX Thinking Machines' CMMD, and PVM; (ii) Monitor, a library of timestamping and trace -collection routines that run on supercomputers (such as Intel's iPSC/860, Delta, and Paragon and Thinking Machines' CM5) as well as on networks of workstations (including Convex Cluster and SparcStations connected by a LAN); (iii) Visualization Kernel, a trace-animation facility that supports source-code clickback, simultaneous visualization of computation and communication patterns, as well as analysis of data movements; (iv) Statistics Kernel, an advanced profiling facility, that associates a variety of performance data with various syntactic components of a parallel program; (v) Index Kernel, a diagnostic tool that helps pinpoint performance bottlenecks through the use of abstract indices; (vi) Modeling Kernel, a facility for automated modeling of message-passing programs that supports both simulation -based and analytical approaches to performance prediction and scalability analysis; (vii) Intrusion Compensator, a utility for recovering true performance from observed performance by removing the overheads of monitoring and their effects on the communication pattern of the program; and (viii) Compatibility Tools, that convert AIMS-generated traces into formats used by other performance-visualization tools, such as ParaGraph, Pablo, and certain AVS/Explorer modules.
Performance Comparison of a Matrix Solver on a Heterogeneous Network Using Two Implementations of MPI: MPICH and LAM

NASA Technical Reports Server (NTRS)

Phillips, Jennifer K.

1995-01-01

Two of the current and most popular implementations of the Message-Passing Standard, Message Passing Interface (MPI), were contrasted: MPICH by Argonne National Laboratory, and LAM by the Ohio Supercomputer Center at Ohio State University. A parallel skyline matrix solver was adapted to be run in a heterogeneous environment using MPI. The Message-Passing Interface Forum was held in May 1994 which lead to a specification of library functions that implement the message-passing model of parallel communication. LAM, which creates it's own environment, is more robust in a highly heterogeneous network. MPICH uses the environment native to the machine architecture. While neither of these free-ware implementations provides the performance of native message-passing or vendor's implementations, MPICH begins to approach that performance on the SP-2. The machines used in this study were: IBM RS6000, 3 Sun4, SGI, and the IBM SP-2. Each machine is unique and a few machines required specific modifications during the installation. When installed correctly, both implementations worked well with only minor problems.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gorentla Venkata, Manjunath; Graham, Richard L; Ladd, Joshua S

This paper describes the design and implementation of InfiniBand (IB) CORE-Direct based blocking and nonblocking broadcast operations within the Cheetah collective operation framework. It describes a novel approach that fully ofFLoads collective operations and employs only user-supplied buffers. For a 64 rank communicator, the latency of CORE-Direct based hierarchical algorithm is better than production-grade Message Passing Interface (MPI) implementations, 150% better than the default Open MPI algorithm and 115% better than the shared memory optimized MVAPICH implementation for a one kilobyte (KB) message, and for eight mega-bytes (MB) it is 48% and 64% better, respectively. Flat-topology broadcast achieves 99.9% overlapmore » in a polling based communication-computation test, and 95.1% overlap for a wait based test, compared with 92.4% and 17.0%, respectively, for a similar Central Processing Unit (CPU) based implementation.« less
Streaming data analytics via message passing with application to graph algorithms

DOE PAGES

Plimpton, Steven J.; Shead, Tim

2014-05-06

The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of eithermore » message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.« less
Optimizing NEURON Simulation Environment Using Remote Memory Access with Recursive Doubling on Distributed Memory Systems.

PubMed

Shehzad, Danish; Bozkuş, Zeki

2016-01-01

Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA) by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models.
Optimizing NEURON Simulation Environment Using Remote Memory Access with Recursive Doubling on Distributed Memory Systems

PubMed Central

Bozkuş, Zeki

2016-01-01

Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA) by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models. PMID:27413363
Implementing Multidisciplinary and Multi-Zonal Applications Using MPI

NASA Technical Reports Server (NTRS)

Fineberg, Samuel A.

1995-01-01

Multidisciplinary and multi-zonal applications are an important class of applications in the area of Computational Aerosciences. In these codes, two or more distinct parallel programs or copies of a single program are utilized to model a single problem. To support such applications, it is common to use a programming model where a program is divided into several single program multiple data stream (SPMD) applications, each of which solves the equations for a single physical discipline or grid zone. These SPMD applications are then bound together to form a single multidisciplinary or multi-zonal program in which the constituent parts communicate via point-to-point message passing routines. Unfortunately, simple message passing models, like Intel's NX library, only allow point-to-point and global communication within a single system-defined partition. This makes implementation of these applications quite difficult, if not impossible. In this report it is shown that the new Message Passing Interface (MPI) standard is a viable portable library for implementing the message passing portion of multidisciplinary applications. Further, with the extension of a portable loader, fully portable multidisciplinary application programs can be developed. Finally, the performance of MPI is compared to that of some native message passing libraries. This comparison shows that MPI can be implemented to deliver performance commensurate with native message libraries.
Driver memory for in-vehicle visual and auditory messages

DOT National Transportation Integrated Search

1999-12-01

Three experiments were conducted in a driving simulator to evaluate effects of in-vehicle message modality and message format on comprehension and memory for younger and older drivers. Visual icons and text messages were effective in terms of high co...
How Communication Goals Determine when Audience Tuning Biases Memory

ERIC Educational Resources Information Center

Echterhoff, Gerald; Higgins, E. Tory; Kopietz, Rene; Groll, Stephan

2008-01-01

After tuning their message to suit their audience's attitude, communicators' own memories for the original information (e.g., a target person's behaviors) often reflect the biased view expressed in their message--producing an audience-congruent memory bias. Exploring the motivational circumstances of message production, the authors investigated…
Simplifying and speeding the management of intra-node cache coherence

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Phillip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY

2012-04-17

A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Managing coherence via put/get windows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blumrich, Matthias A; Chen, Dong; Coteus, Paul W

A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an areamore » of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.« less

Efficient Implementation of Multigrid Solvers on Message-Passing Parrallel Systems

NASA Technical Reports Server (NTRS)

Lou, John

1994-01-01

We discuss our implementation strategies for finite difference multigrid partial differential equation (PDE) solvers on message-passing systems. Our target parallel architecture is Intel parallel computers: the Delta and Paragon system.
SAHAYOG: A Testbed for Load Sharing under Failure,

DTIC Science & Technology

1987-07-01

messages, shared memory and semaphores . To communicate using messages, processes create message queues using system-provided prim- itives. The message...The size of the memory that is to be shared is decided by the process when it makes a request for memory allocation. The semaphore option of IPC can be...used to prevent two or more concurrent processes from executing their critical sections at the same time. Semaphores must be used when the processes
An implementation and evaluation of the MPI 3.0 one-sided communication interface

DOE PAGES

Dinan, James S.; Balaji, Pavan; Buntinas, Darius T.; ...

2016-01-09

The Q1 Message Passing Interface (MPI) 3.0 standard includes a significant revision to MPI’s remote memory access (RMA) interface, which provides support for one-sided communication. MPI-3 RMA is expected to greatly enhance the usability and performance ofMPI RMA.We present the first complete implementation of MPI-3 RMA and document implementation techniques and performance optimization opportunities enabled by the new interface. Our implementation targets messaging-based networks and is publicly available in the latest release of the MPICH MPI implementation. Here using this implementation, we explore the performance impact of new MPI-3 functionality and semantics. Results indicate that the MPI-3 RMA interface providesmore » significant advantages over the MPI-2 interface by enabling increased communication concurrency through relaxed semantics in the interface and additional routines that provide new window types, synchronization modes, and atomic operations.« less
An implementation and evaluation of the MPI 3.0 one-sided communication interface

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dinan, James S.; Balaji, Pavan; Buntinas, Darius T.

The Q1 Message Passing Interface (MPI) 3.0 standard includes a significant revision to MPI’s remote memory access (RMA) interface, which provides support for one-sided communication. MPI-3 RMA is expected to greatly enhance the usability and performance ofMPI RMA.We present the first complete implementation of MPI-3 RMA and document implementation techniques and performance optimization opportunities enabled by the new interface. Our implementation targets messaging-based networks and is publicly available in the latest release of the MPICH MPI implementation. Here using this implementation, we explore the performance impact of new MPI-3 functionality and semantics. Results indicate that the MPI-3 RMA interface providesmore » significant advantages over the MPI-2 interface by enabling increased communication concurrency through relaxed semantics in the interface and additional routines that provide new window types, synchronization modes, and atomic operations.« less
Message passing with parallel queue traversal

DOEpatents

Underwood, Keith D [Albuquerque, NM; Brightwell, Ronald B [Albuquerque, NM; Hemmert, K Scott [Albuquerque, NM

2012-05-01

In message passing implementations, associative matching structures are used to permit list entries to be searched in parallel fashion, thereby avoiding the delay of linear list traversal. List management capabilities are provided to support list entry turnover semantics and priority ordering semantics.
Driver memory retention of in-vehicle information system messages

DOT National Transportation Integrated Search

1997-01-01

Memory retention of drivers was tested for traffic- and traveler-related messages displayed on an in-vehicle information system (IVIS). Three research questions were asked: (a) How does in-vehicle visual message format affect comprehension? (b) How d...
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak

1999-01-01

The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
Parallel Navier-Stokes computations on shared and distributed memory architectures

NASA Technical Reports Server (NTRS)

Hayder, M. Ehtesham; Jayasimha, D. N.; Pillay, Sasi Kumar

1995-01-01

We study a high order finite difference scheme to solve the time accurate flow field of a jet using the compressible Navier-Stokes equations. As part of our ongoing efforts, we have implemented our numerical model on three parallel computing platforms to study the computational, communication, and scalability characteristics. The platforms chosen for this study are a cluster of workstations connected through fast networks (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and a distributed memory multiprocessor (the IBM SPI). Our focus in this study is on the LACE testbed. We present some results for the Cray YMP and the IBM SP1 mainly for comparison purposes. On the LACE testbed, we study: (1) the communication characteristics of Ethernet, FDDI, and the ALLNODE networks and (2) the overheads induced by the PVM message passing library used for parallelizing the application. We demonstrate that clustering of workstations is effective and has the potential to be computationally competitive with supercomputers at a fraction of the cost.
Software architecture for time-constrained machine vision applications

NASA Astrophysics Data System (ADS)

Usamentiaga, Rubén; Molleda, Julio; García, Daniel F.; Bulnes, Francisco G.

2013-01-01

Real-time image and video processing applications require skilled architects, and recent trends in the hardware platform make the design and implementation of these applications increasingly complex. Many frameworks and libraries have been proposed or commercialized to simplify the design and tuning of real-time image processing applications. However, they tend to lack flexibility, because they are normally oriented toward particular types of applications, or they impose specific data processing models such as the pipeline. Other issues include large memory footprints, difficulty for reuse, and inefficient execution on multicore processors. We present a novel software architecture for time-constrained machine vision applications that addresses these issues. The architecture is divided into three layers. The platform abstraction layer provides a high-level application programming interface for the rest of the architecture. The messaging layer provides a message-passing interface based on a dynamic publish/subscribe pattern. A topic-based filtering in which messages are published to topics is used to route the messages from the publishers to the subscribers interested in a particular type of message. The application layer provides a repository for reusable application modules designed for machine vision applications. These modules, which include acquisition, visualization, communication, user interface, and data processing, take advantage of the power of well-known libraries such as OpenCV, Intel IPP, or CUDA. Finally, the proposed architecture is applied to a real machine vision application: a jam detector for steel pickling lines.
Parallel definition of tear film maps on distributed-memory clusters for the support of dry eye diagnosis.

PubMed

González-Domínguez, Jorge; Remeseiro, Beatriz; Martín, María J

2017-02-01

The analysis of the interference patterns on the tear film lipid layer is a useful clinical test to diagnose dry eye syndrome. This task can be automated with a high degree of accuracy by means of the use of tear film maps. However, the time required by the existing applications to generate them prevents a wider acceptance of this method by medical experts. Multithreading has been previously successfully employed by the authors to accelerate the tear film map definition on multicore single-node machines. In this work, we propose a hybrid message-passing and multithreading parallel approach that further accelerates the generation of tear film maps by exploiting the computational capabilities of distributed-memory systems such as multicore clusters and supercomputers. The algorithm for drawing tear film maps is parallelized using Message Passing Interface (MPI) for inter-node communications and the multithreading support available in the C++11 standard for intra-node parallelization. The original algorithm is modified to reduce the communications and increase the scalability. The hybrid method has been tested on 32 nodes of an Intel cluster (with two 12-core Haswell 2680v3 processors per node) using 50 representative images. Results show that maximum runtime is reduced from almost two minutes using the previous only-multithreaded approach to less than ten seconds using the hybrid method. The hybrid MPI/multithreaded implementation can be used by medical experts to obtain tear film maps in only a few seconds, which will significantly accelerate and facilitate the diagnosis of the dry eye syndrome. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Neighbourhood-consensus message passing and its potentials in image processing applications

NASA Astrophysics Data System (ADS)

Ružic, Tijana; Pižurica, Aleksandra; Philips, Wilfried

2011-03-01

In this paper, a novel algorithm for inference in Markov Random Fields (MRFs) is presented. Its goal is to find approximate maximum a posteriori estimates in a simple manner by combining neighbourhood influence of iterated conditional modes (ICM) and message passing of loopy belief propagation (LBP). We call the proposed method neighbourhood-consensus message passing because a single joint message is sent from the specified neighbourhood to the central node. The message, as a function of beliefs, represents the agreement of all nodes within the neighbourhood regarding the labels of the central node. This way we are able to overcome the disadvantages of reference algorithms, ICM and LBP. On one hand, more information is propagated in comparison with ICM, while on the other hand, the huge amount of pairwise interactions is avoided in comparison with LBP by working with neighbourhoods. The idea is related to the previously developed iterated conditional expectations algorithm. Here we revisit it and redefine it in a message passing framework in a more general form. The results on three different benchmarks demonstrate that the proposed technique can perform well both for binary and multi-label MRFs without any limitations on the model definition. Furthermore, it manifests improved performance over related techniques either in terms of quality and/or speed.
Foundations for a healthy future.

PubMed

Riley, R; Green, J; Willis, S; Soden, E; Rushby, C; Postle, D; Wakeling, S

1998-01-01

Health promotion activities with children and young people are important as they take messages about health seriously and can be influential in spreading messages about healthy living to their friends and families. Child health professionals have an important role to play in passing on messages of positive health to children and young people. Peer education is a useful way of passing on messages about health to young people. This article shares examples of three health promotion projects with children in a community trust, looking at asthma, sex education and testicular examination.
Low Latency Messages on Distributed Memory Multiprocessors

DOE PAGES

Rosing, Matt; Saltz, Joel

1995-01-01

This article describes many of the issues in developing an efficient interface for communication on distributed memory machines. Although the hardware component of message latency is less than 1 ws on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 μs. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. This article describes several tests performed and many of the issues involvedmore » in supporting low latency messages on distributed memory machines.« less
The impact of cultural exposure and message framing on oral health behavior: Exploring the role of message memory

PubMed Central

Brick, Cameron; McCully, Scout N.; Updegraff, John A.; Ehret, Phillip J.; Areguin, Maira A.; Sherman, David K.

2015-01-01

Background Health messages are more effective when framed to be congruent with recipient characteristics, and health practitioners can strategically decide on message features to promote adherence to recommended behaviors. We present exposure to United States (U.S.) culture as a moderator of the impact of gain- vs. loss-frame messages. Since U.S. culture emphasizes individualism and approach orientation, greater cultural exposure was expected to predict improved patient choices and memory for gain-framed messages, whereas individuals with less exposure to U.S. culture would show these advantages for loss-framed messages. Methods 223 participants viewed a written oral health message in one of three randomized conditions: gain-frame, loss-frame, or no-message control, and were given ten flosses. Cultural exposure was measured with the proportions of life spent and parents born in the U.S. At baseline and one week later, participants completed recall tests and reported recent flossing behavior. Results Message frame and cultural exposure interacted to predict improved patient decisions (increased flossing) and memory maintenance for the health message over one week. E.g., those with low cultural exposure who saw a loss-frame message flossed more. Incongruent messages led to the same flossing rates as no message. Memory retention did not explain the effect of message congruency on flossing. Limitations Flossing behavior was self-reported. Cultural exposure may only have practical application in either highly individualistic or collectivistic countries. Conclusions In healthcare settings where patients are urged to follow a behavior, asking basic demographic questions could allow medical practitioners to intentionally communicate in terms of gains or losses to improve patient decision making and treatment adherence. PMID:25654986
Impact of Cultural Exposure and Message Framing on Oral Health Behavior: Exploring the Role of Message Memory.

PubMed

Brick, Cameron; McCully, Scout N; Updegraff, John A; Ehret, Phillip J; Areguin, Maira A; Sherman, David K

2016-10-01

Health messages are more effective when framed to be congruent with recipient characteristics, and health practitioners can strategically choose message features to promote adherence to recommended behaviors. We present exposure to US culture as a moderator of the impact of gain-frame versus loss-frame messages. Since US culture emphasizes individualism and approach orientation, greater cultural exposure was expected to predict improved patient choices and memory for gain-framed messages, whereas individuals with less exposure to US culture would show these advantages for loss-framed messages. 223 participants viewed a written oral health message in 1 of 3 randomized conditions-gain-frame, loss-frame, or no-message control-and were given 10 flosses. Cultural exposure was measured with the proportions of life spent and parents born in the US. At baseline and 1 week later, participants completed recall tests and reported recent flossing behavior. Message frame and cultural exposure interacted to predict improved patient decisions (increased flossing) and memory maintenance for the health message over 1 week; for example, those with low cultural exposure who saw a loss-frame message flossed more. Incongruent messages led to the same flossing rates as no message. Memory retention did not explain the effect of message congruency on flossing. Flossing behavior was self-reported. Cultural exposure may only have practical application in either highly individualistic or collectivistic countries. In health care settings where patients are urged to follow a behavior, asking basic demographic questions could allow medical practitioners to intentionally communicate in terms of gains or losses to improve patient decision making and treatment adherence. © The Author(s) 2015.
Dispatching packets on a global combining network of a parallel computer

DOEpatents

Almasi, Gheorghe [Ardsley, NY; Archer, Charles J [Rochester, MN

2011-07-19

Methods, apparatus, and products are disclosed for dispatching packets on a global combining network of a parallel computer comprising a plurality of nodes connected for data communications using the network capable of performing collective operations and point to point operations that include: receiving, by an origin system messaging module on an origin node from an origin application messaging module on the origin node, a storage identifier and an operation identifier, the storage identifier specifying storage containing an application message for transmission to a target node, and the operation identifier specifying a message passing operation; packetizing, by the origin system messaging module, the application message into network packets for transmission to the target node, each network packet specifying the operation identifier and an operation type for the message passing operation specified by the operation identifier; and transmitting, by the origin system messaging module, the network packets to the target node.
MPIRUN: A Portable Loader for Multidisciplinary and Multi-Zonal Applications

NASA Technical Reports Server (NTRS)

Fineberg, Samuel A.; Woodrow, Thomas S. (Technical Monitor)

1994-01-01

Multidisciplinary and multi-zonal applications are an important class of applications in the area of Computational Aerosciences. In these codes, two or more distinct parallel programs or copies of a single program are utilized to model a single problem. To support such applications, it is common to use a programming model where a program is divided into several single program multiple data stream (SPMD) applications, each of which solves the equations for a single physical discipline or grid zone. These SPMD applications are then bound together to form a single multidisciplinary or multi-zonal program in which the constituent parts communicate via point-to-point message passing routines. One method for implementing the message passing portion of these codes is with the new Message Passing Interface (MPI) standard. Unfortunately, this standard only specifies the message passing portion of an application, but does not specify any portable mechanisms for loading an application. MPIRUN was developed to provide a portable means for loading MPI programs, and was specifically targeted at multidisciplinary and multi-zonal applications. Programs using MPIRUN for loading and MPI for message passing are then portable between all machines supported by MPIRUN. MPIRUN is currently implemented for the Intel iPSC/860, TMC CM5, IBM SP-1 and SP-2, Intel Paragon, and workstation clusters. Further, MPIRUN is designed to be simple enough to port easily to any system supporting MPI.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Painter, J.; McCormick, P.; Krogh, M.

This paper presents the ACL (Advanced Computing Lab) Message Passing Library. It is a high throughput, low latency communications library, based on Thinking Machines Corp.`s CMMD, upon which message passing applications can be built. The library has been implemented on the Cray T3D, Thinking Machines CM-5, SGI workstations, and on top of PVM.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sayan Ghosh, Jeff Hammond

OpenSHMEM is a community effort to unifyt and standardize the SHMEM programming model. MPI (Message Passing Interface) is a well-known community standard for parallel programming using distributed memory. The most recen t release of MPI, version 3.0, was designed in part to support programming models like SHMEM.OSHMPI is an implementation of the OpenSHMEM standard using MPI-3 for the Linux operating system. It is the first implementation of SHMEM over MPI one-sided communication and has the potential to be widely adopted due to the portability and widely availability of Linux and MPI-3. OSHMPI has been tested on a variety of systemsmore » and implementations of MPI-3, includingInfiniBand clusters using MVAPICH2 and SGI shared-memory supercomputers using MPICH. Current support is limited to Linux but may be extended to Apple OSX if there is sufficient interest. The code is opensource via https://github.com/jeffhammond/oshmpi« less

A pervasive parallel framework for visualization: final report for FWP 10-014707

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreland, Kenneth D.

2014-01-01

We are on the threshold of a transformative change in the basic architecture of highperformance computing. The use of accelerator processors, characterized by large core counts, shared but asymmetrical memory, and heavy thread loading, is quickly becoming the norm in high performance computing. These accelerators represent significant challenges in updating our existing base of software. An intrinsic problem with this transition is a fundamental programming shift from message passing processes to much more fine thread scheduling with memory sharing. Another problem is the lack of stability in accelerator implementation; processor and compiler technology is currently changing rapidly. This report documentsmore » the results of our three-year ASCR project to address these challenges. Our project includes the development of the Dax toolkit, which contains the beginnings of new algorithms for a new generation of computers and the underlying infrastructure to rapidly prototype and build further algorithms as necessary.« less
A flexible software architecture for scalable real-time image and video processing applications

NASA Astrophysics Data System (ADS)

Usamentiaga, Rubén; Molleda, Julio; García, Daniel F.; Bulnes, Francisco G.

2012-06-01

Real-time image and video processing applications require skilled architects, and recent trends in the hardware platform make the design and implementation of these applications increasingly complex. Many frameworks and libraries have been proposed or commercialized to simplify the design and tuning of real-time image processing applications. However, they tend to lack flexibility because they are normally oriented towards particular types of applications, or they impose specific data processing models such as the pipeline. Other issues include large memory footprints, difficulty for reuse and inefficient execution on multicore processors. This paper presents a novel software architecture for real-time image and video processing applications which addresses these issues. The architecture is divided into three layers: the platform abstraction layer, the messaging layer, and the application layer. The platform abstraction layer provides a high level application programming interface for the rest of the architecture. The messaging layer provides a message passing interface based on a dynamic publish/subscribe pattern. A topic-based filtering in which messages are published to topics is used to route the messages from the publishers to the subscribers interested in a particular type of messages. The application layer provides a repository for reusable application modules designed for real-time image and video processing applications. These modules, which include acquisition, visualization, communication, user interface and data processing modules, take advantage of the power of other well-known libraries such as OpenCV, Intel IPP, or CUDA. Finally, we present different prototypes and applications to show the possibilities of the proposed architecture.
Distributed parallel messaging for multiprocessor systems

DOEpatents

Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

2013-06-04

A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.
PANDA: A distributed multiprocessor operating system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chubb, P.

1989-01-01

PANDA is a design for a distributed multiprocessor and an operating system. PANDA is designed to allow easy expansion of both hardware and software. As such, the PANDA kernel provides only message passing and memory and process management. The other features needed for the system (device drivers, secondary storage management, etc.) are provided as replaceable user tasks. The thesis presents PANDA's design and implementation, both hardware and software. PANDA uses multiple 68010 processors sharing memory on a VME bus, each such node potentially connected to others via a high speed network. The machine is completely homogeneous: there are no differencesmore » between processors that are detectable by programs running on the machine. A single two-processor node has been constructed. Each processor contains memory management circuits designed to allow processors to share page tables safely. PANDA presents a programmers' model similar to the hardware model: a job is divided into multiple tasks, each having its own address space. Within each task, multiple processes share code and data. Tasks can send messages to each other, and set up virtual circuits between themselves. Peripheral devices such as disc drives are represented within PANDA by tasks. PANDA divides secondary storage into volumes, each volume being accessed by a volume access task, or VAT. All knowledge about the way that data is stored on a disc is kept in its volume's VAT. The design is such that PANDA should provide a useful testbed for file systems and device drivers, as these can be installed without recompiling PANDA itself, and without rebooting the machine.« less
Development of an HL7 interface engine, based on tree structure and streaming algorithm, for large-size messages which include image data.

PubMed

Um, Ki Sung; Kwak, Yun Sik; Cho, Hune; Kim, Il Kon

2005-11-01

A basic assumption of Health Level Seven (HL7) protocol is 'No limitation of message length'. However, most existing commercial HL7 interface engines do limit message length because they use the string array method, which is run in the main memory for the HL7 message parsing process. Specifically, messages with image and multi-media data create a long string array and thus cause the computer system to raise critical and fatal problem. Consequently, HL7 messages cannot handle the image and multi-media data necessary in modern medical records. This study aims to solve this problem with the 'streaming algorithm' method. This new method for HL7 message parsing applies the character-stream object which process character by character between the main memory and hard disk device with the consequence that the processing load on main memory could be alleviated. The main functions of this new engine are generating, parsing, validating, browsing, sending, and receiving HL7 messages. Also, the engine can parse and generate XML-formatted HL7 messages. This new HL7 engine successfully exchanged HL7 messages with 10 megabyte size images and discharge summary information between two university hospitals.
Keeping the Spirits Up: The Effect of Teachers' and Parents' Emotional Support on Children's Working Memory Performance.

PubMed

Vandenbroucke, Loren; Spilt, Jantine; Verschueren, Karine; Baeyens, Dieter

2017-01-01

Working memory, used to temporarily store and mentally manipulate information, is important for children's learning. It is therefore valuable to understand which (contextual) factors promote or hinder working memory performance. Recent research shows positive associations between positive parent-child and teacher-student interactions and working memory performance and development. However, no study has yet experimentally investigated how parents and teachers affect working memory performance. Based on attachment theory, the current study investigated the role of parent and teacher emotional support in promoting working memory performance by buffering the negative effect of social stress. Questionnaires and an experimental session were completed by 170 children from grade 1 to 2 ( M age = 7 years 6 months, SD = 7 months). Questionnaires were used to assess children's perceptions of the teacher-student and parent-child relationship. During an experimental session, working memory was measured with the Corsi task backward (Milner, 1971) in a pre- and post-test design. In-between the tests stress was induced in the children using the Cyberball paradigm (Williams et al., 2000). Emotional support was manipulated (between-subjects) through an audio message (either a weather report, a supportive message of a stranger, a supportive message of a parent, or a supportive message of a teacher). Results of repeated measures ANOVA showed no clear effect of the stress induction. Nevertheless, an effect of parent and teacher support was found and depended on the quality of the parent-child relationship. When children had a positive relationship with their parent, support of parents and teachers had little effect on working memory performance. When children had a negative relationship with their parent, a supportive message of that parent decreased working memory performance, while a supportive message from the teacher increased performance. In sum, the current study suggests that parents and teachers can support working memory performance by being supportive for the child. Teacher support is most effective when the child has a negative relationship with the parent. These insights can give direction to specific measures aimed at preventing and resolving working memory problems and related issues.
Keeping the Spirits Up: The Effect of Teachers’ and Parents’ Emotional Support on Children’s Working Memory Performance

PubMed Central

Vandenbroucke, Loren; Spilt, Jantine; Verschueren, Karine; Baeyens, Dieter

2017-01-01

Working memory, used to temporarily store and mentally manipulate information, is important for children’s learning. It is therefore valuable to understand which (contextual) factors promote or hinder working memory performance. Recent research shows positive associations between positive parent–child and teacher–student interactions and working memory performance and development. However, no study has yet experimentally investigated how parents and teachers affect working memory performance. Based on attachment theory, the current study investigated the role of parent and teacher emotional support in promoting working memory performance by buffering the negative effect of social stress. Questionnaires and an experimental session were completed by 170 children from grade 1 to 2 (Mage = 7 years 6 months, SD = 7 months). Questionnaires were used to assess children’s perceptions of the teacher–student and parent–child relationship. During an experimental session, working memory was measured with the Corsi task backward (Milner, 1971) in a pre- and post-test design. In-between the tests stress was induced in the children using the Cyberball paradigm (Williams et al., 2000). Emotional support was manipulated (between-subjects) through an audio message (either a weather report, a supportive message of a stranger, a supportive message of a parent, or a supportive message of a teacher). Results of repeated measures ANOVA showed no clear effect of the stress induction. Nevertheless, an effect of parent and teacher support was found and depended on the quality of the parent–child relationship. When children had a positive relationship with their parent, support of parents and teachers had little effect on working memory performance. When children had a negative relationship with their parent, a supportive message of that parent decreased working memory performance, while a supportive message from the teacher increased performance. In sum, the current study suggests that parents and teachers can support working memory performance by being supportive for the child. Teacher support is most effective when the child has a negative relationship with the parent. These insights can give direction to specific measures aimed at preventing and resolving working memory problems and related issues. PMID:28421026
Eigensolver for a Sparse, Large Hermitian Matrix

NASA Technical Reports Server (NTRS)

Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris

2003-01-01

A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).
Nonlinear Fluid Computations in a Distributed Environment

NASA Technical Reports Server (NTRS)

Atwood, Christopher A.; Smith, Merritt H.

1995-01-01

The performance of a loosely and tightly-coupled workstation cluster is compared against a conventional vector supercomputer for the solution the Reynolds- averaged Navier-Stokes equations. The application geometries include a transonic airfoil, a tiltrotor wing/fuselage, and a wing/body/empennage/nacelle transport. Decomposition is of the manager-worker type, with solution of one grid zone per worker process coupled using the PVM message passing library. Task allocation is determined by grid size and processor speed, subject to available memory penalties. Each fluid zone is computed using an implicit diagonal scheme in an overset mesh framework, while relative body motion is accomplished using an additional worker process to re-establish grid communication.
Software Mechanisms for Multiprocessor TLB Consistency

DTIC Science & Technology

1989-12-01

thanks. Raj V aswani implemented the DASH message-passing system. Ramesh Govindan implemented part of the DASH virtual memory system. G . Scott...1 Latency (ma) 5𔃺 ---·-r··-·r···r··-T·-·-r··-·r····r····-r··-·r··- . g ~~~R=r 18 -----1·--·-r··-T·----1·--··r·---1·----1·--··r·---T·----1 16...model development. Synchronizing TLBs is similar to updating replicated data in a distributed environment. Lee and Garcia-Molina both used an M/ G /1
Simple Algorithms for Distributed Leader Election in Anonymous Synchronous Rings and Complete Networks Inspired by Neural Development in Fruit Flies.

PubMed

Xu, Lei; Jeavons, Peter

2015-11-01

Leader election in anonymous rings and complete networks is a very practical problem in distributed computing. Previous algorithms for this problem are generally designed for a classical message passing model where complex messages are exchanged. However, the need to send and receive complex messages makes such algorithms less practical for some real applications. We present some simple synchronous algorithms for distributed leader election in anonymous rings and complete networks that are inspired by the development of the neural system of the fruit fly. Our leader election algorithms all assume that only one-bit messages are broadcast by nodes in the network and processors are only able to distinguish between silence and the arrival of one or more messages. These restrictions allow implementations to use a simpler message-passing architecture. Even with these harsh restrictions our algorithms are shown to achieve good time and message complexity both analytically and experimentally.
Parallel, Distributed Scripting with Python

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, P J

2002-05-24

Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI librarymore » gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.« less
How communication goals determine when audience tuning biases memory.

PubMed

Echterhoff, Gerald; Higgins, E Tory; Kopietz, René; Groll, Stephan

2008-02-01

After tuning their message to suit their audience's attitude, communicators' own memories for the original information (e.g., a target person's behaviors) often reflect the biased view expressed in their message--producing an audience-congruent memory bias. Exploring the motivational circumstances of message production, the authors investigated whether this bias depends on the goals driving audience tuning. In 4 experiments, the memory bias was found to a greater extent when audience tuning served the creation of a shared reality than when it served alternative, nonshared reality goals (being polite toward a stigmatized-group audience; obtaining incentives; being entertaining; complying with a blatant demand). In addition, the authors found that these effects were mediated by the epistemic trust in the audience-congruent view but not by the rehearsal or accurate retrieval of the original input information, the ability to discriminate between the original and the message information, or a contrast away from extremely tuned messages. The central role of epistemic trust, a measure of the communicators' experience of shared reality, was supported in meta-analyses across the experiments. PsycINFO Database Record (c) 2008 APA, all rights reserved.
Information Management and Learning in Computer Conferences: Coping with Irrelevant and Unconnected Messages.

ERIC Educational Resources Information Center

Schwan, Stephan; Straub, Daniela; Hesse, Friedrich W.

2002-01-01

Describes a study of computer conferencing where learners interacted over the course of four log-in sessions to acquire the knowledge sufficient to pass a learning test. Studied the number of messages irrelevant to the topic, explicit threading of messages, reading times of relevant messages, and learning outcomes. (LRW)
Getting the message across: age differences in the positive and negative framing of health care messages.

PubMed

Shamaskin, Andrea M; Mikels, Joseph A; Reed, Andrew E

2010-09-01

Although valenced health care messages influence impressions, memory, and behavior (Levin, Schneider, & Gaeth, 1998) and the processing of valenced information changes with age (Carstensen & Mikels, 2005), these 2 lines of research have thus far been disconnected. This study examined impressions of, and memory for, positively and negatively framed health care messages that were presented in pamphlets to 25 older adults and 24 younger adults. Older adults relative to younger adults rated positive pamphlets more informative than negative pamphlets and remembered a higher proportion of positive to negative messages. However, older adults misremembered negative messages to be positive. These findings demonstrate the age-related positivity effect in health care messages with promise as to the persuasive nature and lingering effects of positive messages. (c) 2010 APA, all rights reserved.
What it Takes to Get Passed On: Message Content, Style, and Structure as Predictors of Retransmission in the Boston Marathon Bombing Response

PubMed Central

Sutton, Jeannette; Gibson, C. Ben; Spiro, Emma S.; League, Cedar; Fitzhugh, Sean M.; Butts, Carter T.

2015-01-01

Message retransmission is a central aspect of information diffusion. In a disaster context, the passing on of official warning messages by members of the public also serves as a behavioral indicator of message salience, suggesting that particular messages are (or are not) perceived by the public to be both noteworthy and valuable enough to share with others. This study provides the first examination of terse message retransmission of official warning messages in response to a domestic terrorist attack, the Boston Marathon Bombing in 2013. Using messages posted from public officials’ Twitter accounts that were active during the period of the Boston Marathon bombing and manhunt, we examine the features of messages that are associated with their retransmission. We focus on message content, style, and structure, as well as the networked relationships of message senders to answer the question: what characteristics of a terse message sent under conditions of imminent threat predict its retransmission among members of the public? We employ a negative binomial model to examine how message characteristics affect message retransmission. We find that, rather than any single effect dominating the process, retransmission of official Tweets during the Boston bombing response was jointly influenced by various message content, style, and sender characteristics. These findings suggest the need for more work that investigates impact of multiple factors on the allocation of attention and on message retransmission during hazard events. PMID:26295584
An alcohol message beneath the surface of ER: how implicit memory influences viewers' health attitudes and intentions using entertainment-education.

PubMed

Kim, Kyongseok; Lee, Mina; Macias, Wendy

2014-01-01

While previous research on entertainment-education has assessed its effectiveness, primarily at the conscious level (e.g., free recall and self-reported change in knowledge), few studies have explored its effect on viewers' implicit knowledge. To fill this gap, this study examined the mechanism through which viewers form implicit memory of short health messages inserted in a primetime TV show and its preconscious effects on viewers' health attitudes and intentions. An experiment was conducted using a 3-group (health message: present vs. absent vs. control), posttest-only design with additional planned analyses of differences by subject variables (past experience and involvement). Overall, findings supported the hypothesized effects of implicit memory of a brief antialcohol message embedded in an ER episode on college students' attitudes and intentions against binge drinking. Results showed that participants who were exposed to the health message reported less positive attitudes toward binge drinking and lower intentions to binge drink, compared with those who were not exposed; the causal relations among viewers' implicit memory, attitudes, and intentions were also validated. Results also showed that individuals' past experience and involvement moderated the effects of the health message on attitudes and intentions. Theoretical explanations and practical implications are discussed.
The Edge-Disjoint Path Problem on Random Graphs by Message-Passing.

PubMed

Altarelli, Fabrizio; Braunstein, Alfredo; Dall'Asta, Luca; De Bacco, Caterina; Franz, Silvio

2015-01-01

We present a message-passing algorithm to solve a series of edge-disjoint path problems on graphs based on the zero-temperature cavity equations. Edge-disjoint paths problems are important in the general context of routing, that can be defined by incorporating under a unique framework both traffic optimization and total path length minimization. The computation of the cavity equations can be performed efficiently by exploiting a mapping of a generalized edge-disjoint path problem on a star graph onto a weighted maximum matching problem. We perform extensive numerical simulations on random graphs of various types to test the performance both in terms of path length minimization and maximization of the number of accommodated paths. In addition, we test the performance on benchmark instances on various graphs by comparison with state-of-the-art algorithms and results found in the literature. Our message-passing algorithm always outperforms the others in terms of the number of accommodated paths when considering non trivial instances (otherwise it gives the same trivial results). Remarkably, the largest improvement in performance with respect to the other methods employed is found in the case of benchmarks with meshes, where the validity hypothesis behind message-passing is expected to worsen. In these cases, even though the exact message-passing equations do not converge, by introducing a reinforcement parameter to force convergence towards a sub optimal solution, we were able to always outperform the other algorithms with a peak of 27% performance improvement in terms of accommodated paths. On random graphs, we numerically observe two separated regimes: one in which all paths can be accommodated and one in which this is not possible. We also investigate the behavior of both the number of paths to be accommodated and their minimum total length.
The Edge-Disjoint Path Problem on Random Graphs by Message-Passing

PubMed Central

2015-01-01

We present a message-passing algorithm to solve a series of edge-disjoint path problems on graphs based on the zero-temperature cavity equations. Edge-disjoint paths problems are important in the general context of routing, that can be defined by incorporating under a unique framework both traffic optimization and total path length minimization. The computation of the cavity equations can be performed efficiently by exploiting a mapping of a generalized edge-disjoint path problem on a star graph onto a weighted maximum matching problem. We perform extensive numerical simulations on random graphs of various types to test the performance both in terms of path length minimization and maximization of the number of accommodated paths. In addition, we test the performance on benchmark instances on various graphs by comparison with state-of-the-art algorithms and results found in the literature. Our message-passing algorithm always outperforms the others in terms of the number of accommodated paths when considering non trivial instances (otherwise it gives the same trivial results). Remarkably, the largest improvement in performance with respect to the other methods employed is found in the case of benchmarks with meshes, where the validity hypothesis behind message-passing is expected to worsen. In these cases, even though the exact message-passing equations do not converge, by introducing a reinforcement parameter to force convergence towards a sub optimal solution, we were able to always outperform the other algorithms with a peak of 27% performance improvement in terms of accommodated paths. On random graphs, we numerically observe two separated regimes: one in which all paths can be accommodated and one in which this is not possible. We also investigate the behavior of both the number of paths to be accommodated and their minimum total length. PMID:26710102
Resource-Efficient, Hierarchical Auto-Tuning of a Hybrid Lattice Boltzmann Computation on the Cray XT4

DOE Office of Scientific and Technical Information (OSTI.GOV)

Computational Research Division, Lawrence Berkeley National Laboratory; NERSC, Lawrence Berkeley National Laboratory; Computer Science Department, University of California, Berkeley

2009-05-04

We apply auto-tuning to a hybrid MPI-pthreads lattice Boltzmann computation running on the Cray XT4 at National Energy Research Scientific Computing Center (NERSC). Previous work showed that multicore-specific auto-tuning can improve the performance of lattice Boltzmann magnetohydrodynamics (LBMHD) by a factor of 4x when running on dual- and quad-core Opteron dual-socket SMPs. We extend these studies to the distributed memory arena via a hybrid MPI/pthreads implementation. In addition to conventional auto-tuning at the local SMP node, we tune at the message-passing level to determine the optimal aspect ratio as well as the correct balance between MPI tasks and threads permore » MPI task. Our study presents a detailed performance analysis when moving along an isocurve of constant hardware usage: fixed total memory, total cores, and total nodes. Overall, our work points to approaches for improving intra- and inter-node efficiency on large-scale multicore systems for demanding scientific applications.« less

Efficient iteration in data-parallel programs with irregular and dynamically distributed data structures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Littlefield, R.J.

1990-02-01

To implement an efficient data-parallel program on a non-shared memory MIMD multicomputer, data and computations must be properly partitioned to achieve good load balance and locality of reference. Programs with irregular data reference patterns often require irregular partitions. Although good partitions may be easy to determine, they can be difficult or impossible to implement in programming languages that provide only regular data distributions, such as blocked or cyclic arrays. We are developing Onyx, a programming system that provides a shared memory model of distributed data structures and extends the concept of data distribution to include irregular and dynamic distributions. Thismore » provides a powerful means to specify irregular partitions. Perhaps surprisingly, programs using it can also execute efficiently. In this paper, we describe and evaluate the Onyx implementation of a model problem that repeatedly executes an irregular but fixed data reference pattern. On an NCUBE hypercube, the speed of the Onyx implementation is comparable to that of carefully handwritten message-passing code.« less
Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

NASA Technical Reports Server (NTRS)

Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

2017-01-01

In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.
Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.

1994-01-01

The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
Countering Craving with Disgust Images: Examining Nicotine Withdrawn Smokers' Motivated Message Processing of Anti-Tobacco Public Service Announcements.

PubMed

Clayton, Russell B; Leshner, Glenn; Tomko, Rachel L; Trull, Timothy J; Piasecki, Thomas M

2017-03-01

There is a lack of research examining whether smoking cues in anti-tobacco advertisements elicit cravings, or whether this effect is moderated by countervailing message attributes, such as disgusting images. Furthermore, no research has examined how these types of messages influence nicotine withdrawn smokers' cognitive processing and associated behavioral intentions. At a laboratory session, participants (N = 50 nicotine-deprived adults) were tested for cognitive processing and recognition memory of 12 anti-tobacco advertisements varying in depictions of smoking cues and disgust content. Self-report smoking urges and intentions to quit smoking were measured after each message. The results from this experiment indicated that smoking cue messages activated appetitive/approach motivation resulting in enhanced attention and memory, but increased craving and reduced quit intentions. Disgust messages also enhanced attention and memory, but activated aversive/avoid motivation resulting in reduced craving and increased quit intentions. The combination of smoking cues and disgust content resulted in moderate amounts of craving and quit intentions, but also led to heart rate acceleration (indicating defensive processing) and poorer recognition of message content. These data suggest that in order to counter nicotine-deprived smokers' craving and prolong abstinence, anti-tobacco messages should omit smoking cues but include disgust. Theoretical implications are also discussed.
Countering Craving with Disgust Images: Examining Nicotine Withdrawn Smokers’ Motivated Message Processing of Anti-Tobacco Public Service Announcements

PubMed Central

CLAYTON, RUSSELL B.; LESHNER, GLENN; TOMKO, RACHEL L.; TRULL, TIMOTHY J.; PIASECKI, THOMAS M.

2017-01-01

There is a lack of research examining whether smoking cues in anti-tobacco advertisements elicit cravings, or whether this effect is moderated by countervailing message attributes, such as disgusting images. Furthermore, no research has examined how these types of messages influence nicotine withdrawn smokers’ cognitive processing and associated behavioral intentions. At a laboratory session, participants (N = 50 nicotine-deprived adults) were tested for cognitive processing and recognition memory of 12 anti-tobacco advertisements varying in depictions of smoking cues and disgust content. Self-report smoking urges and intentions to quit smoking were measured after each message. The results from this experiment indicated that smoking cue messages activated appetitive/approach motivation resulting in enhanced attention and memory, but increased craving and reduced quit intentions. Disgust messages also enhanced attention and memory, but activated aversive/avoid motivation resulting in reduced craving and increased quit intentions. The combination of smoking cues and disgust content resulted in moderate amounts of craving and quit intentions, but also led to heart rate acceleration (indicating defensive processing) and poorer recognition of message content. These data suggest that in order to counter nicotine-deprived smokers’ craving and prolong abstinence, anti-tobacco messages should omit smoking cues but include disgust. Theoretical implications are also discussed. PMID:28248620
Positive messages enhance older adults' motivation and recognition memory for physical activity programmes.

PubMed

Notthoff, Nanna; Klomp, Peter; Doerwald, Friederike; Scheibe, Susanne

2016-09-01

Although physical activity is an effective way to cope with ageing-related impairments, few older people are motivated to turn their sedentary lifestyle into an active one. Recent evidence suggests that walking can be more effectively promoted in older adults with positive messages about the benefits of walking than with negative messages about the risks of inactivity. This study examined motivation and memory as the supposed mechanisms underlying the greater effectiveness of positively framed compared to negatively framed messages for promoting activity. Older adults ( N = 53, age 60-87 years) were introduced to six physical activity programmes that were randomly paired with either positively framed or negatively framed messages. Participants indicated how motivated they were to participate in each programme by providing ratings on attractiveness, suitability, capability and intention. They also completed surprise free recall and recognition tests. Respondents felt more motivated to participate in physical activity programmes paired with positively framed messages than in those with negatively framed ones. They also had better recognition memory for positively framed than negatively framed messages, and misremembered negatively framed messages to be positively framed. Findings support the notion that socioemotional selectivity theory-a theory of age-related changes in motivation-is a useful basis for health intervention design.
Message passing with a limited number of DMA byte counters

DOEpatents

Blocksome, Michael [Rochester, MN; Chen, Dong [Croton on Hudson, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kumar, Sameer [White Plains, NY; Parker, Jeffrey J [Rochester, MN

2011-10-04

A method for passing messages in a parallel computer system constructed as a plurality of compute nodes interconnected as a network where each compute node includes a DMA engine but includes only a limited number of byte counters for tracking a number of bytes that are sent or received by the DMA engine, where the byte counters may be used in shared counter or exclusive counter modes of operation. The method includes using rendezvous protocol, a source compute node deterministically sending a request to send (RTS) message with a single RTS descriptor using an exclusive injection counter to track both the RTS message and message data to be sent in association with the RTS message, to a destination compute node such that the RTS descriptor indicates to the destination compute node that the message data will be adaptively routed to the destination node. Using one DMA FIFO at the source compute node, the RTS descriptors are maintained for rendezvous messages destined for the destination compute node to ensure proper message data ordering thereat. Using a reception counter at a DMA engine, the destination compute node tracks reception of the RTS and associated message data and sends a clear to send (CTS) message to the source node in a rendezvous protocol form of a remote get to accept the RTS message and message data and processing the remote get (CTS) by the source compute node DMA engine to provide the message data to be sent.
Data communications in a parallel active messaging interface of a parallel computer

DOEpatents

Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2014-09-02

Eager send data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints that specify a client, a context, and a task, including receiving an eager send data communications instruction with transfer data disposed in a send buffer characterized by a read/write send buffer memory address in a read/write virtual address space of the origin endpoint; determining for the send buffer a read-only send buffer memory address in a read-only virtual address space, the read-only virtual address space shared by both the origin endpoint and the target endpoint, with all frames of physical memory mapped to pages of virtual memory in the read-only virtual address space; and communicating by the origin endpoint to the target endpoint an eager send message header that includes the read-only send buffer memory address.
Data communications in a parallel active messaging interface of a parallel computer

DOEpatents

Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2014-09-16

Eager send data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints that specify a client, a context, and a task, including receiving an eager send data communications instruction with transfer data disposed in a send buffer characterized by a read/write send buffer memory address in a read/write virtual address space of the origin endpoint; determining for the send buffer a read-only send buffer memory address in a read-only virtual address space, the read-only virtual address space shared by both the origin endpoint and the target endpoint, with all frames of physical memory mapped to pages of virtual memory in the read-only virtual address space; and communicating by the origin endpoint to the target endpoint an eager send message header that includes the read-only send buffer memory address.
GenASiS Basics: Object-oriented utilitarian functionality for large-scale physics simulations

DOE PAGES

Cardall, Christian Y.; Budiardja, Reuben D.

2015-06-11

Aside from numerical algorithms and problem setup, large-scale physics simulations on distributed-memory supercomputers require more basic utilitarian functionality, such as physical units and constants; display to the screen or standard output device; message passing; I/O to disk; and runtime parameter management and usage statistics. Here we describe and make available Fortran 2003 classes furnishing extensible object-oriented implementations of this sort of rudimentary functionality, along with individual `unit test' programs and larger example problems demonstrating their use. Lastly, these classes compose the Basics division of our developing astrophysics simulation code GenASiS (General Astrophysical Simulation System), but their fundamental nature makes themmore » useful for physics simulations in many fields.« less
Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

NASA Technical Reports Server (NTRS)

Quealy, Angela; Cole, Gary L.; Blech, Richard A.

1993-01-01

The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.
National Combustion Code Parallel Performance Enhancements

NASA Technical Reports Server (NTRS)

Quealy, Angela; Benyo, Theresa (Technical Monitor)

2002-01-01

The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. The unstructured grid, reacting flow code uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC code to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This report describes recent parallel processing modifications to NCC that have improved the parallel scalability of the code, enabling a two hour turnaround for a 1.3 million element fully reacting combustion simulation on an SGI Origin 2000.
A Multiple Sphere T-Matrix Fortran Code for Use on Parallel Computer Clusters

NASA Technical Reports Server (NTRS)

Mackowski, D. W.; Mishchenko, M. I.

2011-01-01

A general-purpose Fortran-90 code for calculation of the electromagnetic scattering and absorption properties of multiple sphere clusters is described. The code can calculate the efficiency factors and scattering matrix elements of the cluster for either fixed or random orientation with respect to the incident beam and for plane wave or localized- approximation Gaussian incident fields. In addition, the code can calculate maps of the electric field both interior and exterior to the spheres.The code is written with message passing interface instructions to enable the use on distributed memory compute clusters, and for such platforms the code can make feasible the calculation of absorption, scattering, and general EM characteristics of systems containing several thousand spheres.
Public-channel cryptography based on mutual chaos pass filters.

PubMed

Klein, Einat; Gross, Noam; Kopelowitz, Evi; Rosenbluh, Michael; Khaykovich, Lev; Kinzel, Wolfgang; Kanter, Ido

2006-10-01

We study the mutual coupling of chaotic lasers and observe both experimentally and in numeric simulations that there exists a regime of parameters for which two mutually coupled chaotic lasers establish isochronal synchronization, while a third laser coupled unidirectionally to one of the pair does not synchronize. We then propose a cryptographic scheme, based on the advantage of mutual coupling over unidirectional coupling, where all the parameters of the system are public knowledge. We numerically demonstrate that in such a scheme the two communicating lasers can add a message signal (compressed binary message) to the transmitted coupling signal and recover the message in both directions with high fidelity by using a mutual chaos pass filter procedure. An attacker, however, fails to recover an errorless message even if he amplifies the coupling signal.
Administering an epoch initiated for remote memory access

DOEpatents

Blocksome, Michael A; Miller, Douglas R

2014-03-18

Methods, systems, and products are disclosed for administering an epoch initiated for remote memory access that include: initiating, by an origin application messaging module on an origin compute node, one or more data transfers to a target compute node for the epoch; initiating, by the origin application messaging module after initiating the data transfers, a closing stage for the epoch, including rejecting any new data transfers after initiating the closing stage for the epoch; determining, by the origin application messaging module, whether the data transfers have completed; and closing, by the origin application messaging module, the epoch if the data transfers have completed.
Administering an epoch initiated for remote memory access

DOEpatents

Blocksome, Michael A; Miller, Douglas R

2012-10-23

Methods, systems, and products are disclosed for administering an epoch initiated for remote memory access that include: initiating, by an origin application messaging module on an origin compute node, one or more data transfers to a target compute node for the epoch; initiating, by the origin application messaging module after initiating the data transfers, a closing stage for the epoch, including rejecting any new data transfers after initiating the closing stage for the epoch; determining, by the origin application messaging module, whether the data transfers have completed; and closing, by the origin application messaging module, the epoch if the data transfers have completed.
Administering an epoch initiated for remote memory access

DOEpatents

Blocksome, Michael A.; Miller, Douglas R.

2013-01-01

Methods, systems, and products are disclosed for administering an epoch initiated for remote memory access that include: initiating, by an origin application messaging module on an origin compute node, one or more data transfers to a target compute node for the epoch; initiating, by the origin application messaging module after initiating the data transfers, a closing stage for the epoch, including rejecting any new data transfers after initiating the closing stage for the epoch; determining, by the origin application messaging module, whether the data transfers have completed; and closing, by the origin application messaging module, the epoch if the data transfers have completed.
Django Remote Submission

DOE Office of Scientific and Technical Information (OSTI.GOV)

Doucet, Mathieu; Hobson, Tanner C.; Ferraz Leal, Ricardo Miguel

The Django Remote Submission (DRS) is a Django (Django, n.d.) application to manage long running job submission, including starting the job, saving logs, and storing results. It is an independent project available as a standalone pypi package (PyPi, n.d.). It can be easily integrated in any Django project. The source code is freely available as a GitHub repository (django-remote-submission, n.d.). To run the jobs in background, DRS takes advantage of Celery (Celery, n.d.), a powerful asynchronous job queue used for running tasks in the background, and the Redis Server (Redis, n.d.), an in-memory data structure store. Celery uses brokers tomore » pass messages between a Django Project and the Celery workers. Redis is the message broker of DRS. In addition DRS provides real time monitoring of the progress of Jobs and associated logs. Through the Django Channels project (Channels, n.d.), and the usage of Web Sockets, it is possible to asynchronously display the Job Status and the live Job output (standard output and standard error) on a web page.« less
HYDRA : High-speed simulation architecture for precision spacecraft formation simulation

NASA Technical Reports Server (NTRS)

Martin, Bryan J.; Sohl, Garett.

2003-01-01

e Hierarchical Distributed Reconfigurable Architecture- is a scalable simulation architecture that provides flexibility and ease-of-use which take advantage of modern computation and communication hardware. It also provides the ability to implement distributed - or workstation - based simulations and high-fidelity real-time simulation from a common core. Originally designed to serve as a research platform for examining fundamental challenges in formation flying simulation for future space missions, it is also finding use in other missions and applications, all of which can take advantage of the underlying Object-Oriented structure to easily produce distributed simulations. Hydra automates the process of connecting disparate simulation components (Hydra Clients) through a client server architecture that uses high-level descriptions of data associated with each client to find and forge desirable connections (Hydra Services) at run time. Services communicate through the use of Connectors, which abstract messaging to provide single-interface access to any desired communication protocol, such as from shared-memory message passing to TCP/IP to ACE and COBRA. Hydra shares many features with the HLA, although providing more flexibility in connectivity services and behavior overriding.
Django Remote Submission

DOE PAGES

Doucet, Mathieu; Hobson, Tanner C.; Ferraz Leal, Ricardo Miguel

2017-08-01

The Django Remote Submission (DRS) is a Django (Django, n.d.) application to manage long running job submission, including starting the job, saving logs, and storing results. It is an independent project available as a standalone pypi package (PyPi, n.d.). It can be easily integrated in any Django project. The source code is freely available as a GitHub repository (django-remote-submission, n.d.). To run the jobs in background, DRS takes advantage of Celery (Celery, n.d.), a powerful asynchronous job queue used for running tasks in the background, and the Redis Server (Redis, n.d.), an in-memory data structure store. Celery uses brokers tomore » pass messages between a Django Project and the Celery workers. Redis is the message broker of DRS. In addition DRS provides real time monitoring of the progress of Jobs and associated logs. Through the Django Channels project (Channels, n.d.), and the usage of Web Sockets, it is possible to asynchronously display the Job Status and the live Job output (standard output and standard error) on a web page.« less

An implementation and performance measurement of the progressive retry technique

NASA Technical Reports Server (NTRS)

Suri, Gaurav; Huang, Yennun; Wang, Yi-Min; Fuchs, W. Kent; Kintala, Chandra

1995-01-01

This paper describes a recovery technique called progressive retry for bypassing software faults in message-passing applications. The technique is implemented as reusable modules to provide application-level software fault tolerance. The paper describes the implementation of the technique and presents results from the application of progressive retry to two telecommunications systems. the results presented show that the technique is helpful in reducing the total recovery time for message-passing applications.
Belief propagation decoding of quantum channels by passing quantum messages

NASA Astrophysics Data System (ADS)

Renes, Joseph M.

2017-07-01

The belief propagation (BP) algorithm is a powerful tool in a wide range of disciplines from statistical physics to machine learning to computational biology, and is ubiquitous in decoding classical error-correcting codes. The algorithm works by passing messages between nodes of the factor graph associated with the code and enables efficient decoding of the channel, in some cases even up to the Shannon capacity. Here we construct the first BP algorithm which passes quantum messages on the factor graph and is capable of decoding the classical-quantum channel with pure state outputs. This gives explicit decoding circuits whose number of gates is quadratic in the code length. We also show that this decoder can be modified to work with polar codes for the pure state channel and as part of a decoder for transmitting quantum information over the amplitude damping channel. These represent the first explicit capacity-achieving decoders for non-Pauli channels.
LCAMP: Location Constrained Approximate Message Passing for Compressed Sensing MRI

PubMed Central

Sung, Kyunghyun; Daniel, Bruce L; Hargreaves, Brian A

2016-01-01

Iterative thresholding methods have been extensively studied as faster alternatives to convex optimization methods for solving large-sized problems in compressed sensing. A novel iterative thresholding method called LCAMP (Location Constrained Approximate Message Passing) is presented for reducing computational complexity and improving reconstruction accuracy when a nonzero location (or sparse support) constraint can be obtained from view shared images. LCAMP modifies the existing approximate message passing algorithm by replacing the thresholding stage with a location constraint, which avoids adjusting regularization parameters or thresholding levels. This work is first compared with other conventional reconstruction methods using random 1D signals and then applied to dynamic contrast-enhanced breast MRI to demonstrate the excellent reconstruction accuracy (less than 2% absolute difference) and low computation time (5 - 10 seconds using Matlab) with highly undersampled 3D data (244 × 128 × 48; overall reduction factor = 10). PMID:23042658
Efficient Tracing for On-the-Fly Space-Time Displays in a Debugger for Message Passing Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Matthews, Gregory

2001-01-01

In this work we describe the implementation of a practical mechanism for collecting and displaying trace information in a debugger for message passing programs. We introduce a trace format that is highly compressible while still providing information adequate for debugging purposes. We make the mechanism convenient for users to access by incorporating the trace collection in a set of wrappers for the MPI (message passing interface) communication library. We implement several debugger operations that use the trace display: consistent stoplines, undo, and rollback. They all are implemented using controlled replay, which executes at full speed in target processes until the appropriate position in the computation is reached. They provide convenient mechanisms for getting to places in the execution where the full power of a state-based debugger can be brought to bear on isolating communication errors.
Design of a network for concurrent message passing systems

NASA Astrophysics Data System (ADS)

Song, Paul Y.

1988-08-01

We describe the design of the network design frame (NDF), a self-timed routing chip for a message-passing concurrent computer. The NDF uses a partitioned data path, low-voltage output drivers, and a distributed token-passing arbiter to provide a bandwidth of 450 Mbits/sec into the network. Wormhole routing and bidirectional virtual channels are used to provide low latency communications, less than 2us latency to deliver a 216 bit message across the diameter of a 1K node mess-connected machine. To support concurrent software systems, the NDF provides two logical networks, one for user messages and one for system messages. The two networks share the same set of physical wires. To facilitate the development of network nodes, the NDF is a design frame. The NDF circuitry is integrated into the pad frame of a chip leaving the center of the chip uncommitted. We define an analytic framework in which to study the effects of network size, network buffering capacity, bidirectional channels, and traffic on this class of networks. The response of the network to various combinations of these parameters are obtained through extensive simulation of the network model. Through simulation, we are able to observe the macro behavior of the network as opposed to the micro behavior of the NDF routing controller.
Shared reality in intergroup communication: Increasing the epistemic authority of an out-group audience.

PubMed

Echterhoff, Gerald; Kopietz, René; Higgins, E Tory

2017-06-01

Communicators typically tune messages to their audience's attitude. Such audience tuning biases communicators' memory for the topic toward the audience's attitude to the extent that they create a shared reality with the audience. To investigate shared reality in intergroup communication, we first established that a reduced memory bias after tuning messages to an out-group (vs. in-group) audience is a subtle index of communicators' denial of shared reality to that out-group audience (Experiments 1a and 1b). We then examined whether the audience-tuning memory bias might emerge when the out-group audience's epistemic authority is enhanced, either by increasing epistemic expertise concerning the communication topic or by creating epistemic consensus among members of a multiperson out-group audience. In Experiment 2, when Germans communicated to a Turkish audience with an attitude about a Turkish (vs. German) target, the audience-tuning memory bias appeared. In Experiment 3, when the audience of German communicators consisted of 3 Turks who all held the same attitude toward the target, the memory bias again appeared. The association between message valence and memory valence was consistently higher when the audience's epistemic authority was high (vs. low). An integrative analysis across all studies also suggested that the memory bias increases with increasing strength of epistemic inputs (epistemic expertise, epistemic consensus, and audience-tuned message production). The findings suggest novel ways of overcoming intergroup biases in intergroup relations. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations

PubMed Central

Páll, Szilárd; Fechner, Martin; Esztermann, Ansgar; de Groot, Bert L.; Grubmüller, Helmut

2015-01-01

The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well‐exploited with a combination of single instruction multiple data, multithreading, and message passing interface (MPI)‐based single program multiple data/multiple program multiple data parallelism while graphics processing units (GPUs) can be used as accelerators to compute interactions off‐loaded from the CPU. Here, we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance‐to‐price ratio, energy efficiency, and several other criteria. Although hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer‐class GPUs this improvement equally reflects in the performance‐to‐price ratio. Although memory issues in consumer‐class GPUs could pass unnoticed as these cards do not support error checking and correction memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost‐efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well‐balanced ratio of CPU and consumer‐class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. PMID:26238484
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations.

PubMed

Kutzner, Carsten; Páll, Szilárd; Fechner, Martin; Esztermann, Ansgar; de Groot, Bert L; Grubmüller, Helmut

2015-10-05

The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well-exploited with a combination of single instruction multiple data, multithreading, and message passing interface (MPI)-based single program multiple data/multiple program multiple data parallelism while graphics processing units (GPUs) can be used as accelerators to compute interactions off-loaded from the CPU. Here, we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance-to-price ratio, energy efficiency, and several other criteria. Although hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer-class GPUs this improvement equally reflects in the performance-to-price ratio. Although memory issues in consumer-class GPUs could pass unnoticed as these cards do not support error checking and correction memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost-efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well-balanced ratio of CPU and consumer-class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.
The graphical brain: Belief propagation and active inference

PubMed Central

Friston, Karl J.; Parr, Thomas; de Vries, Bert

2018-01-01

This paper considers functional integration in the brain from a computational perspective. We ask what sort of neuronal message passing is mandated by active inference—and what implications this has for context-sensitive connectivity at microscopic and macroscopic levels. In particular, we formulate neuronal processing as belief propagation under deep generative models. Crucially, these models can entertain both discrete and continuous states, leading to distinct schemes for belief updating that play out on the same (neuronal) architecture. Technically, we use Forney (normal) factor graphs to elucidate the requisite message passing in terms of its form and scheduling. To accommodate mixed generative models (of discrete and continuous states), one also has to consider link nodes or factors that enable discrete and continuous representations to talk to each other. When mapping the implicit computational architecture onto neuronal connectivity, several interesting features emerge. For example, Bayesian model averaging and comparison, which link discrete and continuous states, may be implemented in thalamocortical loops. These and other considerations speak to a computational connectome that is inherently state dependent and self-organizing in ways that yield to a principled (variational) account. We conclude with simulations of reading that illustrate the implicit neuronal message passing, with a special focus on how discrete (semantic) representations inform, and are informed by, continuous (visual) sampling of the sensorium. Author Summary This paper considers functional integration in the brain from a computational perspective. We ask what sort of neuronal message passing is mandated by active inference—and what implications this has for context-sensitive connectivity at microscopic and macroscopic levels. In particular, we formulate neuronal processing as belief propagation under deep generative models that can entertain both discrete and continuous states. This leads to distinct schemes for belief updating that play out on the same (neuronal) architecture. Technically, we use Forney (normal) factor graphs to characterize the requisite message passing, and link this formal characterization to canonical microcircuits and extrinsic connectivity in the brain. PMID:29417960
Design and Implementation of an Operations Module for the ARGOS paperless Ship System

DTIC Science & Technology

1989-06-01

A. OPERATIONS STACK SCRIPTS SCRIPTS FOR STACK: operations * BACKGROUND #1: Operations * on openStack hide message box show menuBar pass openStack end... openStack ** CARD #1, BUTTON #1: Up ***** on mouseUp visual effect zoom out go to card id 10931 of stack argos end mouseUp ** CARD #1, BUTTON #2...STACK SCRIPTS SCRIPTS FOR STACK: Reports ** BACKGROUND #1: Operations * on openStack hie message box show menuBar pass openStack end openStack ** CARD #1
Charon Message-Passing Toolkit for Scientific Computations

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Yan, Jerry (Technical Monitor)

2000-01-01

Charon is a library, callable from C and Fortran, that aids the conversion of structured-grid legacy codes-such as those used in the numerical computation of fluid flows-into parallel, high- performance codes. Key are functions that define distributed arrays, that map between distributed and non-distributed arrays, and that allow easy specification of common communications on structured grids. The library is based on the widely accepted MPI message passing standard. We present an overview of the functionality of Charon, and some representative results.
Suboptimal Exposure to Facial Expressions When Viewing Video Messages From a Small Screen: Effects on Emotion, Attention, and Memory

ERIC Educational Resources Information Center

Ravaja, Niklas; Kallinen, Kari; Saari, Timo; Keltikangas-Jarvinen, Liisa

2004-01-01

The authors examined the effects of suboptimally presented facial expressions on emotional and attentional responses and memory among 39 young adults viewing video (business news) messages from a small screen. Facial electromyography (EMG) and respiratory sinus arrhythmia were used as physiological measures of emotion and attention, respectively.…
The Effects of Involvement, Message Appeal, and Viewing Conditions on Memory and Evaluation of TV Commercials.

ERIC Educational Resources Information Center

Nowak, Glen; Thorson, Esther

A study tested an information processing model that incorporates the concepts of episodic and semantic memory. The model was designed to provide for the concurrent study of three advertising and communication variables: product involvement, message appeal, and distraction in viewing conditions. Among the five hypotheses being tested were that…
The influence of ATC message length and timing on pilot communication

NASA Technical Reports Server (NTRS)

Morrow, Daniel; Rodvold, Michelle

1993-01-01

Pilot-controller communication is critical to safe and efficient flight. It is often a challenging component of piloting, which is reflected in the number of incidents and accidents involving miscommunication. Our previous field study identified communication problems that disrupt routine communication between pilots and controllers. The present part-task simulation study followed up the field results with a more controlled investigation of communication problems. Pilots flew a simulation in which they were frequently vectored by Air Traffic Control (ATC), requiring intensive communication with the controller. While flying, pilots also performed a secondary visual monitoring task. We examined the influence of message length (one message with four commands vs. two messages with two commands each) and noncommunication workload on communication accuracy and length. Longer ATC messages appeared to overload pilot working memory, resulting in more incorrect or partial readbacks, as well as more requests to repeat the message. The timing between the two short messages also influenced communication. The second message interfered with memory for or response to the first short message when it was delivered too soon after the first message. Performing the secondary monitoring task did not influence communication. Instead, communication reduced monitoring accuracy.
Couriers in the Inca Empire: Getting Your Message Across. [Lesson Plan].

ERIC Educational Resources Information Center

2002

This lesson shows how the Inca communicated across the vast stretches of their mountain realm, the largest empire of the pre-industrial world. The lesson explains how couriers carried messages along mountain-ridge roads, up and down stone steps, and over chasm-spanning footbridges. It states that couriers could pass a message from Quito (Ecuador)…
Design considerations for parallel graphics libraries

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.

1994-01-01

Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
An Application-Based Performance Characterization of the Columbia Supercluster

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Djomehri, Jahed M.; Hood, Robert; Jin, Hoaqiang; Kiris, Cetin; Saini, Subhash

2005-01-01

Columbia is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processors each, and currently ranked as the second-fastest computer in the world. In this paper, we present the performance characteristics of Columbia obtained on up to four computing nodes interconnected via the InfiniBand and/or NUMAlink4 communication fabrics. We evaluate floating-point performance, memory bandwidth, message passing communication speeds, and compilers using a subset of the HPC Challenge benchmarks, and some of the NAS Parallel Benchmarks including the multi-zone versions. We present detailed performance results for three scientific applications of interest to NASA, one from molecular dynamics, and two from computational fluid dynamics. Our results show that both the NUMAlink4 and the InfiniBand hold promise for application scaling to a large number of processors.
Matrix multiplication on the Intel Touchstone Delta

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huss-Lederman, S.; Jacobson, E.M.; Tsao, A.

1993-12-31

Matrix multiplication is a key primitive in block matrix algorithms such as those found in LAPACK. We present results from our study of matrix multiplication algorithms on the Intel Touchstone Delta, a distributed memory message-passing architecture with a two-dimensional mesh topology. We obtain an implementation that uses communication primitives highly suited to the Delta and exploits the single node assembly-coded matrix multiplication. Our algorithm is completely general, able to deal with arbitrary mesh aspect ratios and matrix dimensions, and has achieved parallel efficiency of 86% with overall peak performance in excess of 8 Gflops on 256 nodes for an 8800more » {times} 8800 matrix. We describe our algorithm design and implementation, and present performance results that demonstrate scalability and robust behavior over varying mesh topologies.« less
Tuning collective communication for Partitioned Global Address Space programming models

DOE PAGES

Nishtala, Rajesh; Zheng, Yili; Hargrove, Paul H.; ...

2011-06-12

Partitioned Global Address Space (PGAS) languages offer programmers the convenience of a shared memory programming style combined with locality control necessary to run on large-scale distributed memory systems. Even within a PGAS language programmers often need to perform global communication operations such as broadcasts or reductions, which are best performed as collective operations in which a group of threads work together to perform the operation. In this study we consider the problem of implementing collective communication within PGAS languages and explore some of the design trade-offs in both the interface and implementation. In particular, PGAS collectives have semantic issues thatmore » are different than in send–receive style message passing programs, and different implementation approaches that take advantage of the one-sided communication style in these languages. We present an implementation framework for PGAS collectives as part of the GASNet communication layer, which supports shared memory, distributed memory and hybrids. The framework supports a broad set of algorithms for each collective, over which the implementation may be automatically tuned. In conclusion, we demonstrate the benefit of optimized GASNet collectives using application benchmarks written in UPC, and demonstrate that the GASNet collectives can deliver scalable performance on a variety of state-of-the-art parallel machines including a Cray XT4, an IBM BlueGene/P, and a Sun Constellation system with InfiniBand interconnect.« less
Inference in the brain: Statistics flowing in redundant population codes

PubMed Central

Pitkow, Xaq; Angelaki, Dora E

2017-01-01

It is widely believed that the brain performs approximate probabilistic inference to estimate causal variables in the world from ambiguous sensory data. To understand these computations, we need to analyze how information is represented and transformed by the actions of nonlinear recurrent neural networks. We propose that these probabilistic computations function by a message-passing algorithm operating at the level of redundant neural populations. To explain this framework, we review its underlying concepts, including graphical models, sufficient statistics, and message-passing, and then describe how these concepts could be implemented by recurrently connected probabilistic population codes. The relevant information flow in these networks will be most interpretable at the population level, particularly for redundant neural codes. We therefore outline a general approach to identify the essential features of a neural message-passing algorithm. Finally, we argue that to reveal the most important aspects of these neural computations, we must study large-scale activity patterns during moderately complex, naturalistic behaviors. PMID:28595050

Critical field-exponents for secure message-passing in modular networks

NASA Astrophysics Data System (ADS)

Shekhtman, Louis M.; Danziger, Michael M.; Bonamassa, Ivan; Buldyrev, Sergey V.; Caldarelli, Guido; Zlatić, Vinko; Havlin, Shlomo

2018-05-01

We study secure message-passing in the presence of multiple adversaries in modular networks. We assume a dominant fraction of nodes in each module have the same vulnerability, i.e., the same entity spying on them. We find both analytically and via simulations that the links between the modules (interlinks) have effects analogous to a magnetic field in a spin-system in that for any amount of interlinks the system no longer undergoes a phase transition. We then define the exponents δ, which relates the order parameter (the size of the giant secure component) at the critical point to the field strength (average number of interlinks per node), and γ, which describes the susceptibility near criticality. These are found to be δ = 2 and γ = 1 (with the scaling of the order parameter near the critical point given by β = 1). When two or more vulnerabilities are equally present in a module we find δ = 1 and γ = 0 (with β ≥ 2). Apart from defining a previously unidentified universality class, these exponents show that increasing connections between modules is more beneficial for security than increasing connections within modules. We also measure the correlation critical exponent ν, and the upper critical dimension d c , finding that ν {d}c=3 as for ordinary percolation, suggesting that for secure message-passing d c = 6. These results provide an interesting analogy between secure message-passing in modular networks and the physics of magnetic spin-systems.
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2002-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.
Roofline model toolkit: A practical tool for architectural and program analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lo, Yu Jung; Williams, Samuel; Van Straalen, Brian

We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measuremore » sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.« less
A parallel Monte Carlo code for planar and SPECT imaging: implementation, verification and applications in (131)I SPECT.

PubMed

Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F

2002-02-01

This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Taft, James R.

1999-01-01

Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
The implementation of an aeronautical CFD flow code onto distributed memory parallel systems

NASA Astrophysics Data System (ADS)

Ierotheou, C. S.; Forsey, C. R.; Leatham, M.

2000-04-01

The parallelization of an industrially important in-house computational fluid dynamics (CFD) code for calculating the airflow over complex aircraft configurations using the Euler or Navier-Stokes equations is presented. The code discussed is the flow solver module of the SAUNA CFD suite. This suite uses a novel grid system that may include block-structured hexahedral or pyramidal grids, unstructured tetrahedral grids or a hybrid combination of both. To assist in the rapid convergence to a solution, a number of convergence acceleration techniques are employed including implicit residual smoothing and a multigrid full approximation storage scheme (FAS). Key features of the parallelization approach are the use of domain decomposition and encapsulated message passing to enable the execution in parallel using a single programme multiple data (SPMD) paradigm. In the case where a hybrid grid is used, a unified grid partitioning scheme is employed to define the decomposition of the mesh. The parallel code has been tested using both structured and hybrid grids on a number of different distributed memory parallel systems and is now routinely used to perform industrial scale aeronautical simulations. Copyright
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Access to Attitude-Relevant Information in Memory as a Determinant of Persuasion: The Role of Message and Communicator Attributes.

ERIC Educational Resources Information Center

Wood, Wendy; And Others

Research literature shows that people with access to attitude-relevant information in memory are able to draw on relevant beliefs and prior experiences when analyzing a persuasive message. This suggests that people who can retrieve little attitude-relevant information should be less able to engage in systematic processing. Two experiments were…
Memory for Physical Features of Discourse as a Function of Their Relevance.

ERIC Educational Resources Information Center

Fisher, Ronald P.; Cuervo, Asela

Memory for sex of the speaker and language of presentation of a spoken message was high and reliably better when the features were instrumental for comprehending the message than when they were not. This suggests that the physical characteristics of an event may be deeply or elaborately encoded when they are meaningful in light of the task…
The Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

NASA Technical Reports Server (NTRS)

Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)

2001-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
PRAIS: Distributed, real-time knowledge-based systems made easy

NASA Technical Reports Server (NTRS)

Goldstein, David G.

1990-01-01

This paper discusses an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS). PRAIS strives for transparently parallelizing production (rule-based) systems, even when under real-time constraints. PRAIS accomplishes these goals by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors.
Gene-network inference by message passing

NASA Astrophysics Data System (ADS)

Braunstein, A.; Pagnani, A.; Weigt, M.; Zecchina, R.

2008-01-01

The inference of gene-regulatory processes from gene-expression data belongs to the major challenges of computational systems biology. Here we address the problem from a statistical-physics perspective and develop a message-passing algorithm which is able to infer sparse, directed and combinatorial regulatory mechanisms. Using the replica technique, the algorithmic performance can be characterized analytically for artificially generated data. The algorithm is applied to genome-wide expression data of baker's yeast under various environmental conditions. We find clear cases of combinatorial control, and enrichment in common functional annotations of regulated genes and their regulators.
Optimal mapping of neural-network learning on message-passing multicomputers

NASA Technical Reports Server (NTRS)

Chu, Lon-Chan; Wah, Benjamin W.

1992-01-01

A minimization of learning-algorithm completion time is sought in the present optimal-mapping study of the learning process in multilayer feed-forward artificial neural networks (ANNs) for message-passing multicomputers. A novel approximation algorithm for mappings of this kind is derived from observations of the dominance of a parallel ANN algorithm over its communication time. Attention is given to both static and dynamic mapping schemes for systems with static and dynamic background workloads, as well as to experimental results obtained for simulated mappings on multicomputers with dynamic background workloads.
Trace-Driven Debugging of Message Passing Programs

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Hood, Robert; Lopez, Louis; Bailey, David (Technical Monitor)

1998-01-01

In this paper we report on features added to a parallel debugger to simplify the debugging of parallel message passing programs. These features include replay, setting consistent breakpoints based on interprocess event causality, a parallel undo operation, and communication supervision. These features all use trace information collected during the execution of the program being debugged. We used a number of different instrumentation techniques to collect traces. We also implemented trace displays using two different trace visualization systems. The implementation was tested on an SGI Power Challenge cluster and a network of SGI workstations.
Gauge-free cluster variational method by maximal messages and moment matching.

PubMed

Domínguez, Eduardo; Lage-Castellanos, Alejandro; Mulet, Roberto; Ricci-Tersenghi, Federico

2017-04-01

We present an implementation of the cluster variational method (CVM) as a message passing algorithm. The kind of message passing algorithm used for CVM, usually named generalized belief propagation (GBP), is a generalization of the belief propagation algorithm in the same way that CVM is a generalization of the Bethe approximation for estimating the partition function. However, the connection between fixed points of GBP and the extremal points of the CVM free energy is usually not a one-to-one correspondence because of the existence of a gauge transformation involving the GBP messages. Our contribution is twofold. First, we propose a way of defining messages (fields) in a generic CVM approximation, such that messages arrive on a given region from all its ancestors, and not only from its direct parents, as in the standard parent-to-child GBP. We call this approach maximal messages. Second, we focus on the case of binary variables, reinterpreting the messages as fields enforcing the consistency between the moments of the local (marginal) probability distributions. We provide a precise rule to enforce all consistencies, avoiding any redundancy, that would otherwise lead to a gauge transformation on the messages. This moment matching method is gauge free, i.e., it guarantees that the resulting GBP is not gauge invariant. We apply our maximal messages and moment matching GBP to obtain an analytical expression for the critical temperature of the Ising model in general dimensions at the level of plaquette CVM. The values obtained outperform Bethe estimates, and are comparable with loop corrected belief propagation equations. The method allows for a straightforward generalization to disordered systems.
Gauge-free cluster variational method by maximal messages and moment matching

NASA Astrophysics Data System (ADS)

Domínguez, Eduardo; Lage-Castellanos, Alejandro; Mulet, Roberto; Ricci-Tersenghi, Federico

2017-04-01

We present an implementation of the cluster variational method (CVM) as a message passing algorithm. The kind of message passing algorithm used for CVM, usually named generalized belief propagation (GBP), is a generalization of the belief propagation algorithm in the same way that CVM is a generalization of the Bethe approximation for estimating the partition function. However, the connection between fixed points of GBP and the extremal points of the CVM free energy is usually not a one-to-one correspondence because of the existence of a gauge transformation involving the GBP messages. Our contribution is twofold. First, we propose a way of defining messages (fields) in a generic CVM approximation, such that messages arrive on a given region from all its ancestors, and not only from its direct parents, as in the standard parent-to-child GBP. We call this approach maximal messages. Second, we focus on the case of binary variables, reinterpreting the messages as fields enforcing the consistency between the moments of the local (marginal) probability distributions. We provide a precise rule to enforce all consistencies, avoiding any redundancy, that would otherwise lead to a gauge transformation on the messages. This moment matching method is gauge free, i.e., it guarantees that the resulting GBP is not gauge invariant. We apply our maximal messages and moment matching GBP to obtain an analytical expression for the critical temperature of the Ising model in general dimensions at the level of plaquette CVM. The values obtained outperform Bethe estimates, and are comparable with loop corrected belief propagation equations. The method allows for a straightforward generalization to disordered systems.
Multiple node remote messaging

DOEpatents

Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Ohmacht, Martin; Salapura, Valentina; Steinmacher-Burow, Burkhard; Vranas, Pavlos

2010-08-31

A method for passing remote messages in a parallel computer system formed as a network of interconnected compute nodes includes that a first compute node (A) sends a single remote message to a remote second compute node (B) in order to control the remote second compute node (B) to send at least one remote message. The method includes various steps including controlling a DMA engine at first compute node (A) to prepare the single remote message to include a first message descriptor and at least one remote message descriptor for controlling the remote second compute node (B) to send at least one remote message, including putting the first message descriptor into an injection FIFO at the first compute node (A) and sending the single remote message and the at least one remote message descriptor to the second compute node (B).
Low Power LDPC Code Decoder Architecture Based on Intermediate Message Compression Technique

NASA Astrophysics Data System (ADS)

Shimizu, Kazunori; Togawa, Nozomu; Ikenaga, Takeshi; Goto, Satoshi

Reducing the power dissipation for LDPC code decoder is a major challenging task to apply it to the practical digital communication systems. In this paper, we propose a low power LDPC code decoder architecture based on an intermediate message-compression technique which features as follows: (i) An intermediate message compression technique enables the decoder to reduce the required memory capacity and write power dissipation. (ii) A clock gated shift register based intermediate message memory architecture enables the decoder to decompress the compressed messages in a single clock cycle while reducing the read power dissipation. The combination of the above two techniques enables the decoder to reduce the power dissipation while keeping the decoding throughput. The simulation results show that the proposed architecture improves the power efficiency up to 52% and 18% compared to that of the decoder based on the overlapped schedule and the rapid convergence schedule without the proposed techniques respectively.
HyperForest: A high performance multi-processor architecture for real-time intelligent systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Garcia, P. Jr.; Rebeil, J.P.; Pollard, H.

1997-04-01

Intelligent Systems are characterized by the intensive use of computer power. The computer revolution of the last few years is what has made possible the development of the first generation of Intelligent Systems. Software for second generation Intelligent Systems will be more complex and will require more powerful computing engines in order to meet real-time constraints imposed by new robots, sensors, and applications. A multiprocessor architecture was developed that merges the advantages of message-passing and shared-memory structures: expendability and real-time compliance. The HyperForest architecture will provide an expandable real-time computing platform for computationally intensive Intelligent Systems and open the doorsmore » for the application of these systems to more complex tasks in environmental restoration and cleanup projects, flexible manufacturing systems, and DOE`s own production and disassembly activities.« less

Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Heber, Gerd; Biswas, Rupak

2000-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.
Non-volatile memory for checkpoint storage

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blumrich, Matthias A.; Chen, Dong; Cipolla, Thomas M.

A system, method and computer program product for supporting system initiated checkpoints in high performance parallel computing systems and storing of checkpoint data to a non-volatile memory storage device. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity. In one embodiment, themore » non-volatile memory is a pluggable flash memory card.« less
Three-pass protocol scheme for bitmap image security by using vernam cipher algorithm

NASA Astrophysics Data System (ADS)

Rachmawati, D.; Budiman, M. A.; Aulya, L.

2018-02-01

Confidentiality, integrity, and efficiency are the crucial aspects of data security. Among the other digital data, image data is too prone to abuse of operation like duplication, modification, etc. There are some data security techniques, one of them is cryptography. The security of Vernam Cipher cryptography algorithm is very dependent on the key exchange process. If the key is leaked, security of this algorithm will collapse. Therefore, a method that minimizes key leakage during the exchange of messages is required. The method which is used, is known as Three-Pass Protocol. This protocol enables message delivery process without the key exchange. Therefore, the sending messages process can reach the receiver safely without fear of key leakage. The system is built by using Java programming language. The materials which are used for system testing are image in size 200×200 pixel, 300×300 pixel, 500×500 pixel, 800×800 pixel and 1000×1000 pixel. The result of experiments showed that Vernam Cipher algorithm in Three-Pass Protocol scheme could restore the original image.
Ordering Traces Logically to Identify Lateness in Message Passing Programs

DOE PAGES

Isaacs, Katherine E.; Gamblin, Todd; Bhatele, Abhinav; ...

2015-03-30

Event traces are valuable for understanding the behavior of parallel programs. However, automatically analyzing a large parallel trace is difficult, especially without a specific objective. We aid this endeavor by extracting a trace's logical structure, an ordering of trace events derived from happened-before relationships, while taking into account developer intent. Using this structure, we can calculate an operation's delay relative to its peers on other processes. The logical structure also serves as a platform for comparing and clustering processes as well as highlighting communication patterns in a trace visualization. We present an algorithm for determining this idealized logical structure frommore » traces of message passing programs, and we develop metrics to quantify delays and differences among processes. We implement our techniques in Ravel, a parallel trace visualization tool that displays both logical and physical timelines. Rather than showing the duration of each operation, we display where delays begin and end, and how they propagate. As a result, we apply our approach to the traces of several message passing applications, demonstrating the accuracy of our extracted structure and its utility in analyzing these codes.« less
Low latency messages on distributed memory multiprocessors

NASA Technical Reports Server (NTRS)

Rosing, Matthew; Saltz, Joel

1993-01-01

Many of the issues in developing an efficient interface for communication on distributed memory machines are described and a portable interface is proposed. Although the hardware component of message latency is less than one microsecond on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 microseconds. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. Based on several tests that were run on the iPSC/860, an interface that will better match current distributed memory machines is proposed. The model used in the proposed interface consists of a computation processor and a communication processor on each node. Communication between these processors and other nodes in the system is done through a buffered network. Information that is transmitted is either data or procedures to be executed on the remote processor. The dual processor system is better suited for efficiently handling asynchronous communications compared to a single processor system. The ability to send data or procedure is very flexible for minimizing message latency, based on the type of communication being performed. The test performed and the proposed interface are described.
Twin Neurons for Efficient Real-World Data Distribution in Networks of Neural Cliques: Applications in Power Management in Electronic Circuits.

PubMed

Boguslawski, Bartosz; Gripon, Vincent; Seguin, Fabrice; Heitzmann, Frédéric

2016-02-01

Associative memories are data structures that allow retrieval of previously stored messages given part of their content. They, thus, behave similarly to the human brain's memory that is capable, for instance, of retrieving the end of a song, given its beginning. Among different families of associative memories, sparse ones are known to provide the best efficiency (ratio of the number of bits stored to that of the bits used). Recently, a new family of sparse associative memories achieving almost optimal efficiency has been proposed. Their structure, relying on binary connections and neurons, induces a direct mapping between input messages and stored patterns. Nevertheless, it is well known that nonuniformity of the stored messages can lead to a dramatic decrease in performance. In this paper, we show the impact of nonuniformity on the performance of this recent model, and we exploit the structure of the model to improve its performance in practical applications, where data are not necessarily uniform. In order to approach the performance of networks with uniformly distributed messages presented in theoretical studies, twin neurons are introduced. To assess the adapted model, twin neurons are used with the real-world data to optimize power consumption of electronic circuits in practical test cases.
TravelAid : in-vehicle signing and variable speed limit evaluation

DOT National Transportation Integrated Search

2001-12-01

This report discusses the effectiveness of using variable message signs (VMSs) and in-vehicle traffic advisory systems (IVUs) on a mountainous pass (Snoqualmie Pass on Interstate 90 in Washington State) for changing driver behavior. As part of this p...
Holographic disk with high data transfer rate: its application to an audio response memory.

PubMed

Kubota, K; Ono, Y; Kondo, M; Sugama, S; Nishida, N; Sakaguchi, M

1980-03-15

This paper describes a memory realized with a high data transfer rate using the holographic parallel-processing function and its application to an audio response system that supplies many audio messages to many terminals simultaneously. Digitalized audio messages are recorded as tiny 1-D Fourier transform holograms on a holographic disk. A hologram recorder and a hologram reader were constructed to test and demonstrate the holographic audio response memory feasibility. Experimental results indicate the potentiality of an audio response system with a 2000-word vocabulary and 250-Mbit/sec bit transfer rate.
Implementing TCP/IP and a socket interface as a server in a message-passing operating system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hipp, E.; Wiltzius, D.

1990-03-01

The UNICOS 4.3BSD network code and socket transport interface are the basis of an explicit network server for NLTSS, a message passing operating system on the Cray YMP. A BSD socket user library provides access to the network server using an RPC mechanism. The advantages of this server methodology are its modularity and extensibility to migrate to future protocol suites (e.g. OSI) and transport interfaces. In addition, the network server is implemented in an explicit multi-tasking environment to take advantage of the Cray YMP multi-processor platform. 19 refs., 5 figs.
Complexity of parallel implementation of domain decomposition techniques for elliptic partial differential equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gropp, W.D.; Keyes, D.E.

1988-03-01

The authors discuss the parallel implementation of preconditioned conjugate gradient (PCG)-based domain decomposition techniques for self-adjoint elliptic partial differential equations in two dimensions on several architectures. The complexity of these methods is described on a variety of message-passing parallel computers as a function of the size of the problem, number of processors and relative communication speeds of the processors. They show that communication startups are very important, and that even the small amount of global communication in these methods can significantly reduce the performance of many message-passing architectures.
n-body simulations using message passing parallel computers.

NASA Astrophysics Data System (ADS)

Grama, A. Y.; Kumar, V.; Sameh, A.

The authors present new parallel formulations of the Barnes-Hut method for n-body simulations on message passing computers. These parallel formulations partition the domain efficiently incurring minimal communication overhead. This is in contrast to existing schemes that are based on sorting a large number of keys or on the use of global data structures. The new formulations are augmented by alternate communication strategies which serve to minimize communication overhead. The impact of these communication strategies is experimentally studied. The authors report on experimental results obtained from an astrophysical simulation on an nCUBE2 parallel computer.
Implementation of a Message Passing Interface into a Cloud-Resolving Model for Massively Parallel Computing

NASA Technical Reports Server (NTRS)

Juang, Hann-Ming Henry; Tao, Wei-Kuo; Zeng, Xi-Ping; Shie, Chung-Lin; Simpson, Joanne; Lang, Steve

2004-01-01

The capability for massively parallel programming (MPP) using a message passing interface (MPI) has been implemented into a three-dimensional version of the Goddard Cumulus Ensemble (GCE) model. The design for the MPP with MPI uses the concept of maintaining similar code structure between the whole domain as well as the portions after decomposition. Hence the model follows the same integration for single and multiple tasks (CPUs). Also, it provides for minimal changes to the original code, so it is easily modified and/or managed by the model developers and users who have little knowledge of MPP. The entire model domain could be sliced into one- or two-dimensional decomposition with a halo regime, which is overlaid on partial domains. The halo regime requires that no data be fetched across tasks during the computational stage, but it must be updated before the next computational stage through data exchange via MPI. For reproducible purposes, transposing data among tasks is required for spectral transform (Fast Fourier Transform, FFT), which is used in the anelastic version of the model for solving the pressure equation. The performance of the MPI-implemented codes (i.e., the compressible and anelastic versions) was tested on three different computing platforms. The major results are: 1) both versions have speedups of about 99% up to 256 tasks but not for 512 tasks; 2) the anelastic version has better speedup and efficiency because it requires more computations than that of the compressible version; 3) equal or approximately-equal numbers of slices between the x- and y- directions provide the fastest integration due to fewer data exchanges; and 4) one-dimensional slices in the x-direction result in the slowest integration due to the need for more memory relocation for computation.
Message passing interface and multithreading hybrid for parallel molecular docking of large databases on petascale high performance computing machines.

PubMed

Zhang, Xiaohua; Wong, Sergio E; Lightstone, Felice C

2013-04-30

A mixed parallel scheme that combines message passing interface (MPI) and multithreading was implemented in the AutoDock Vina molecular docking program. The resulting program, named VinaLC, was tested on the petascale high performance computing (HPC) machines at Lawrence Livermore National Laboratory. To exploit the typical cluster-type supercomputers, thousands of docking calculations were dispatched by the master process to run simultaneously on thousands of slave processes, where each docking calculation takes one slave process on one node, and within the node each docking calculation runs via multithreading on multiple CPU cores and shared memory. Input and output of the program and the data handling within the program were carefully designed to deal with large databases and ultimately achieve HPC on a large number of CPU cores. Parallel performance analysis of the VinaLC program shows that the code scales up to more than 15K CPUs with a very low overhead cost of 3.94%. One million flexible compound docking calculations took only 1.4 h to finish on about 15K CPUs. The docking accuracy of VinaLC has been validated against the DUD data set by the re-docking of X-ray ligands and an enrichment study, 64.4% of the top scoring poses have RMSD values under 2.0 Å. The program has been demonstrated to have good enrichment performance on 70% of the targets in the DUD data set. An analysis of the enrichment factors calculated at various percentages of the screening database indicates VinaLC has very good early recovery of actives. Copyright © 2013 Wiley Periodicals, Inc.
Audience-tuning effects on memory: the role of shared reality.

PubMed

Echterhoff, Gerald; Higgins, E Tory; Groll, Stephan

2005-09-01

After tuning to an audience, communicators' own memories for the topic often reflect the biased view expressed in their messages. Three studies examined explanations for this bias. Memories for a target person were biased when feedback signaled the audience's successful identification of the target but not after failed identification (Experiment 1). Whereas communicators tuning to an in-group audience exhibited the bias, communicators tuning to an out-group audience did not (Experiment 2). These differences did not depend on communicators' mood but were mediated by communicators' trust in their audience's judgment about other people (Experiments 2 and 3). Message and memory were more closely associated for high than for low trusters. Apparently, audience-tuning effects depend on the communicators' experience of a shared reality.
Rated Measures of Narrative Structure for Written Smoking-Cessation Texts

PubMed Central

Sanders-Jackson, Ashley

2014-01-01

This article describes the effect of a series of rated measures of narrative structure on recognition memory, agreement on story-relevant beliefs, and intention to engage in a health-related behavior—in this case smoking cessation. Using short smoking-cessation stories as stimuli, data were collected in a nationally representative sample of adult smokers (n = 1,312). Results suggested that messages rated as more sequential improved encoding and messages rated as containing more context decreased encoding. Messages rated high in transportation were associated with increased recognition, agreement with story-relevant beliefs, and intention to quit. Both positive and negative emotion were positively associated with intention to quit, but were negatively associated with recognition memory. PMID:24447036
Parallel Implementation of Triangular Cellular Automata for Computing Two-Dimensional Elastodynamic Response on Arbitrary Domains

NASA Astrophysics Data System (ADS)

Leamy, Michael J.; Springer, Adam C.

In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.
Implementation of a Parallel Kalman Filter for Stratospheric Chemical Tracer Assimilation

NASA Technical Reports Server (NTRS)

Chang, Lang-Ping; Lyster, Peter M.; Menard, R.; Cohn, S. E.

1998-01-01

A Kalman filter for the assimilation of long-lived atmospheric chemical constituents has been developed for two-dimensional transport models on isentropic surfaces over the globe. An important attribute of the Kalman filter is that it calculates error covariances of the constituent fields using the tracer dynamics. Consequently, the current Kalman-filter assimilation is a five-dimensional problem (coordinates of two points and time), and it can only be handled on computers with large memory and high floating point speed. In this paper, an implementation of the Kalman filter for distributed-memory, message-passing parallel computers is discussed. Two approaches were studied: an operator decomposition and a covariance decomposition. The latter was found to be more scalable than the former, and it possesses the property that the dynamical model does not need to be parallelized, which is of considerable practical advantage. This code is currently used to assimilate constituent data retrieved by limb sounders on the Upper Atmosphere Research Satellite. Tests of the code examined the variance transport and observability properties. Aspects of the parallel implementation, some timing results, and a brief discussion of the physical results will be presented.
Run-time scheduling and execution of loops on message passing machines

NASA Technical Reports Server (NTRS)

Crowley, Kay; Saltz, Joel; Mirchandaney, Ravi; Berryman, Harry

1989-01-01

Sparse system solvers and general purpose codes for solving partial differential equations are examples of the many types of problems whose irregularity can result in poor performance on distributed memory machines. Often, the data structures used in these problems are very flexible. Crucial details concerning loop dependences are encoded in these structures rather than being explicitly represented in the program. Good methods for parallelizing and partitioning these types of problems require assignment of computations in rather arbitrary ways. Naive implementations of programs on distributed memory machines requiring general loop partitions can be extremely inefficient. Instead, the scheduling mechanism needs to capture the data reference patterns of the loops in order to partition the problem. First, the indices assigned to each processor must be locally numbered. Next, it is necessary to precompute what information is needed by each processor at various points in the computation. The precomputed information is then used to generate an execution template designed to carry out the computation, communication, and partitioning of data, in an optimized manner. The design is presented for a general preprocessor and schedule executer, the structures of which do not vary, even though the details of the computation and of the type of information are problem dependent.
Run-time scheduling and execution of loops on message passing machines

NASA Technical Reports Server (NTRS)

Saltz, Joel; Crowley, Kathleen; Mirchandaney, Ravi; Berryman, Harry

1990-01-01

Sparse system solvers and general purpose codes for solving partial differential equations are examples of the many types of problems whose irregularity can result in poor performance on distributed memory machines. Often, the data structures used in these problems are very flexible. Crucial details concerning loop dependences are encoded in these structures rather than being explicitly represented in the program. Good methods for parallelizing and partitioning these types of problems require assignment of computations in rather arbitrary ways. Naive implementations of programs on distributed memory machines requiring general loop partitions can be extremely inefficient. Instead, the scheduling mechanism needs to capture the data reference patterns of the loops in order to partition the problem. First, the indices assigned to each processor must be locally numbered. Next, it is necessary to precompute what information is needed by each processor at various points in the computation. The precomputed information is then used to generate an execution template designed to carry out the computation, communication, and partitioning of data, in an optimized manner. The design is presented for a general preprocessor and schedule executer, the structures of which do not vary, even though the details of the computation and of the type of information are problem dependent.
Relatively effortless listening promotes understanding and recall of medical instructions in older adults

PubMed Central

DiDonato, Roberta M.; Surprenant, Aimée M.

2015-01-01

Communication success under adverse conditions requires efficient and effective recruitment of both bottom-up (sensori-perceptual) and top-down (cognitive-linguistic) resources to decode the intended auditory-verbal message. Employing these limited capacity resources has been shown to vary across the lifespan, with evidence indicating that younger adults out-perform older adults for both comprehension and memory of the message. This study examined how sources of interference arising from the speaker (message spoken with conversational vs. clear speech technique), the listener (hearing-listening and cognitive-linguistic factors), and the environment (in competing speech babble noise vs. quiet) interact and influence learning and memory performance using more ecologically valid methods than has been done previously. The results suggest that when older adults listened to complex medical prescription instructions with “clear speech,” (presented at audible levels through insertion earphones) their learning efficiency, immediate, and delayed memory performance improved relative to their performance when they listened with a normal conversational speech rate (presented at audible levels in sound field). This better learning and memory performance for clear speech listening was maintained even in the presence of speech babble noise. The finding that there was the largest learning-practice effect on 2nd trial performance in the conversational speech when the clear speech listening condition was first is suggestive of greater experience-dependent perceptual learning or adaptation to the speaker's speech and voice pattern in clear speech. This suggests that experience-dependent perceptual learning plays a role in facilitating the language processing and comprehension of a message and subsequent memory encoding. PMID:26106353

Accelerating atomistic calculations of quantum energy eigenstates on graphic cards

NASA Astrophysics Data System (ADS)

Rodrigues, Walter; Pecchia, A.; Lopez, M.; Auf der Maur, M.; Di Carlo, A.

2014-10-01

Electronic properties of nanoscale materials require the calculation of eigenvalues and eigenvectors of large matrices. This bottleneck can be overcome by parallel computing techniques or the introduction of faster algorithms. In this paper we report a custom implementation of the Lanczos algorithm with simple restart, optimized for graphical processing units (GPUs). The whole algorithm has been developed using CUDA and runs entirely on the GPU, with a specialized implementation that spares memory and reduces at most machine-to-device data transfers. Furthermore parallel distribution over several GPUs has been attained using the standard message passing interface (MPI). Benchmark calculations performed on a GaN/AlGaN wurtzite quantum dot with up to 600,000 atoms are presented. The empirical tight-binding (ETB) model with an sp3d5s∗+spin-orbit parametrization has been used to build the system Hamiltonian (H).
Paradigms and strategies for scientific computing on distributed memory concurrent computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Foster, I.T.; Walker, D.W.

1994-06-01

In this work we examine recent advances in parallel languages and abstractions that have the potential for improving the programmability and maintainability of large-scale, parallel, scientific applications running on high performance architectures and networks. This paper focuses on Fortran M, a set of extensions to Fortran 77 that supports the modular design of message-passing programs. We describe the Fortran M implementation of a particle-in-cell (PIC) plasma simulation application, and discuss issues in the optimization of the code. The use of two other methodologies for parallelizing the PIC application are considered. The first is based on the shared object abstraction asmore » embodied in the Orca language. The second approach is the Split-C language. In Fortran M, Orca, and Split-C the ability of the programmer to control the granularity of communication is important is designing an efficient implementation.« less
Distributed computing for membrane-based modeling of action potential propagation.

PubMed

Porras, D; Rogers, J M; Smith, W M; Pollard, A E

2000-08-01

Action potential propagation simulations with physiologic membrane currents and macroscopic tissue dimensions are computationally expensive. We, therefore, analyzed distributed computing schemes to reduce execution time in workstation clusters by parallelizing solutions with message passing. Four schemes were considered in two-dimensional monodomain simulations with the Beeler-Reuter membrane equations. Parallel speedups measured with each scheme were compared to theoretical speedups, recognizing the relationship between speedup and code portions that executed serially. A data decomposition scheme based on total ionic current provided the best performance. Analysis of communication latencies in that scheme led to a load-balancing algorithm in which measured speedups at 89 +/- 2% and 75 +/- 8% of theoretical speedups were achieved in homogeneous and heterogeneous clusters of workstations. Speedups in this scheme with the Luo-Rudy dynamic membrane equations exceeded 3.0 with eight distributed workstations. Cluster speedups were comparable to those measured during parallel execution on a shared memory machine.
A fast parallel 3D Poisson solver with longitudinal periodic and transverse open boundary conditions for space-charge simulations

NASA Astrophysics Data System (ADS)

Qiang, Ji

2017-10-01

A three-dimensional (3D) Poisson solver with longitudinal periodic and transverse open boundary conditions can have important applications in beam physics of particle accelerators. In this paper, we present a fast efficient method to solve the Poisson equation using a spectral finite-difference method. This method uses a computational domain that contains the charged particle beam only and has a computational complexity of O(Nu(logNmode)) , where Nu is the total number of unknowns and Nmode is the maximum number of longitudinal or azimuthal modes. This saves both the computational time and the memory usage of using an artificial boundary condition in a large extended computational domain. The new 3D Poisson solver is parallelized using a message passing interface (MPI) on multi-processor computers and shows a reasonable parallel performance up to hundreds of processor cores.
Multicore Challenges and Benefits for High Performance Scientific Computing

DOE PAGES

Nielsen, Ida M. B.; Janssen, Curtis L.

2008-01-01

Until recently, performance gains in processors were achieved largely by improvements in clock speeds and instruction level parallelism. Thus, applications could obtain performance increases with relatively minor changes by upgrading to the latest generation of computing hardware. Currently, however, processor performance improvements are realized by using multicore technology and hardware support for multiple threads within each core, and taking full advantage of this technology to improve the performance of applications requires exposure of extreme levels of software parallelism. We will here discuss the architecture of parallel computers constructed from many multicore chips as well as techniques for managing the complexitymore » of programming such computers, including the hybrid message-passing/multi-threading programming model. We will illustrate these ideas with a hybrid distributed memory matrix multiply and a quantum chemistry algorithm for energy computation using Møller–Plesset perturbation theory.« less
Tolerant (parallel) Programming

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Bailey, David H. (Technical Monitor)

1997-01-01

In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
A 3D staggered-grid finite difference scheme for poroelastic wave equation

NASA Astrophysics Data System (ADS)

Zhang, Yijie; Gao, Jinghuai

2014-10-01

Three dimensional numerical modeling has been a viable tool for understanding wave propagation in real media. The poroelastic media can better describe the phenomena of hydrocarbon reservoirs than acoustic and elastic media. However, the numerical modeling in 3D poroelastic media demands significantly more computational capacity, including both computational time and memory. In this paper, we present a 3D poroelastic staggered-grid finite difference (SFD) scheme. During the procedure, parallel computing is implemented to reduce the computational time. Parallelization is based on domain decomposition, and communication between processors is performed using message passing interface (MPI). Parallel analysis shows that the parallelized SFD scheme significantly improves the simulation efficiency and 3D decomposition in domain is the most efficient. We also analyze the numerical dispersion and stability condition of the 3D poroelastic SFD method. Numerical results show that the 3D numerical simulation can provide a real description of wave propagation.
Discrete sensitivity derivatives of the Navier-Stokes equations with a parallel Krylov solver

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Taylor, Arthur C., III

1994-01-01

This paper solves an 'incremental' form of the sensitivity equations derived by differentiating the discretized thin-layer Navier Stokes equations with respect to certain design variables of interest. The equations are solved with a parallel, preconditioned Generalized Minimal RESidual (GMRES) solver on a distributed-memory architecture. The 'serial' sensitivity analysis code is parallelized by using the Single Program Multiple Data (SPMD) programming model, domain decomposition techniques, and message-passing tools. Sensitivity derivatives are computed for low and high Reynolds number flows over a NACA 1406 airfoil on a 32-processor Intel Hypercube, and found to be identical to those computed on a single-processor Cray Y-MP. It is estimated that the parallel sensitivity analysis code has to be run on 40-50 processors of the Intel Hypercube in order to match the single-processor processing time of a Cray Y-MP.
User-Defined Data Distributions in High-Level Programming Languages

NASA Technical Reports Server (NTRS)

Diaconescu, Roxana E.; Zima, Hans P.

2006-01-01

One of the characteristic features of today s high performance computing systems is a physically distributed memory. Efficient management of locality is essential for meeting key performance requirements for these architectures. The standard technique for dealing with this issue has involved the extension of traditional sequential programming languages with explicit message passing, in the context of a processor-centric view of parallel computation. This has resulted in complex and error-prone assembly-style codes in which algorithms and communication are inextricably interwoven. This paper presents a high-level approach to the design and implementation of data distributions. Our work is motivated by the need to improve the current parallel programming methodology by introducing a paradigm supporting the development of efficient and reusable parallel code. This approach is currently being implemented in the context of a new programming language called Chapel, which is designed in the HPCS project Cascade.
National Combustion Code: Parallel Implementation and Performance

NASA Technical Reports Server (NTRS)

Quealy, A.; Ryder, R.; Norris, A.; Liu, N.-S.

2000-01-01

The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. CORSAIR-CCD is the current baseline reacting flow solver for NCC. This is a parallel, unstructured grid code which uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC flow solver to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This paper describes the parallel implementation of the NCC flow solver and summarizes its current parallel performance on an SGI Origin 2000. Earlier parallel performance results on an IBM SP-2 are also included. The performance improvements which have enabled a turnaround of less than 15 hours for a 1.3 million element fully reacting combustion simulation are described.
Three-Dimensional High-Lift Analysis Using a Parallel Unstructured Multigrid Solver

NASA Technical Reports Server (NTRS)

Mavriplis, Dimitri J.

1998-01-01

A directional implicit unstructured agglomeration multigrid solver is ported to shared and distributed memory massively parallel machines using the explicit domain-decomposition and message-passing approach. Because the algorithm operates on local implicit lines in the unstructured mesh, special care is required in partitioning the problem for parallel computing. A weighted partitioning strategy is described which avoids breaking the implicit lines across processor boundaries, while incurring minimal additional communication overhead. Good scalability is demonstrated on a 128 processor SGI Origin 2000 machine and on a 512 processor CRAY T3E machine for reasonably fine grids. The feasibility of performing large-scale unstructured grid calculations with the parallel multigrid algorithm is demonstrated by computing the flow over a partial-span flap wing high-lift geometry on a highly resolved grid of 13.5 million points in approximately 4 hours of wall clock time on the CRAY T3E.
Networking and AI systems: Requirements and benefits

NASA Technical Reports Server (NTRS)

1988-01-01

The price performance benefits of network systems is well documented. The ability to share expensive resources sold timesharing for mainframes, department clusters of minicomputers, and now local area networks of workstations and servers. In the process, other fundamental system requirements emerged. These have now been generalized with open system requirements for hardware, software, applications and tools. The ability to interconnect a variety of vendor products has led to a specification of interfaces that allow new techniques to extend existing systems for new and exciting applications. As an example of the message passing system, local area networks provide a testbed for many of the issues addressed by future concurrent architectures: synchronization, load balancing, fault tolerance and scalability. Gold Hill has been working with a number of vendors on distributed architectures that range from a network of workstations to a hypercube of microprocessors with distributed memory. Results from early applications are promising both for performance and scalability.
Space Reclamation for Uncoordinated Checkpointing in Message-Passing Systems. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Wang, Yi-Min

1993-01-01

Checkpointing and rollback recovery are techniques that can provide efficient recovery from transient process failures. In a message-passing system, the rollback of a message sender may cause the rollback of the corresponding receiver, and the system needs to roll back to a consistent set of checkpoints called recovery line. If the processes are allowed to take uncoordinated checkpoints, the above rollback propagation may result in the domino effect which prevents recovery line progression. Traditionally, only obsolete checkpoints before the global recovery line can be discarded, and the necessary and sufficient condition for identifying all garbage checkpoints has remained an open problem. A necessary and sufficient condition for achieving optimal garbage collection is derived and it is proved that the number of useful checkpoints is bounded by N(N+1)/2, where N is the number of processes. The approach is based on the maximum-sized antichain model of consistent global checkpoints and the technique of recovery line transformation and decomposition. It is also shown that, for systems requiring message logging to record in-transit messages, the same approach can be used to achieve optimal message log reclamation. As a final topic, a unifying framework is described by considering checkpoint coordination and exploiting piecewise determinism as mechanisms for bounding rollback propagation, and the applicability of the optimal garbage collection algorithm to domino-free recovery protocols is demonstrated.
Relations Among Central Auditory Abilities, Socio-Economic Factors, Speech Delay, Phonic Abilities and Reading Achievement: A Longitudinal Study.

ERIC Educational Resources Information Center

Flowers, Arthur; Crandell, Edwin W.

Three auditory perceptual processes (resistance to distortion, selective listening in the form of auditory dedifferentiation, and binaural synthesis) were evaluated by five assessment techniques: (1) low pass filtered speech, (2) accelerated speech, (3) competing messages, (4) accelerated plus competing messages, and (5) binaural synthesis.…
Quantum cluster variational method and message passing algorithms revisited

NASA Astrophysics Data System (ADS)

Domínguez, E.; Mulet, Roberto

2018-02-01

We present a general framework to study quantum disordered systems in the context of the Kikuchi's cluster variational method (CVM). The method relies in the solution of message passing-like equations for single instances or in the iterative solution of complex population dynamic algorithms for an average case scenario. We first show how a standard application of the Kikuchi's CVM can be easily translated to message passing equations for specific instances of the disordered system. We then present an "ad hoc" extension of these equations to a population dynamic algorithm representing an average case scenario. At the Bethe level, these equations are equivalent to the dynamic population equations that can be derived from a proper cavity ansatz. However, at the plaquette approximation, the interpretation is more subtle and we discuss it taking also into account previous results in classical disordered models. Moreover, we develop a formalism to properly deal with the average case scenario using a replica-symmetric ansatz within this CVM for quantum disordered systems. Finally, we present and discuss numerical solutions of the different approximations for the quantum transverse Ising model and the quantum random field Ising model in two-dimensional lattices.
Applying the Concept of Working Memory to Foreign Language Listening Comprehension.

ERIC Educational Resources Information Center

Service, Elisabet

Memory research has recently moved from looking at performance in highly artificial laboratory tasks to examination of tasks in everyday life. One consequence is the development of the concept of "working memory." For the learner, foreign language comprehension makes great demands on working memory capacity. Comprehension of a message requires…
Low latency, high bandwidth data communications between compute nodes in a parallel computer

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2010-11-02

Methods, parallel computers, and computer program products are disclosed for low latency, high bandwidth data communications between compute nodes in a parallel computer. Embodiments include receiving, by an origin direct memory access (`DMA`) engine of an origin compute node, data for transfer to a target compute node; sending, by the origin DMA engine of the origin compute node to a target DMA engine on the target compute node, a request to send (`RTS`) message; transferring, by the origin DMA engine, a predetermined portion of the data to the target compute node using memory FIFO operation; determining, by the origin DMA engine whether an acknowledgement of the RTS message has been received from the target DMA engine; if the an acknowledgement of the RTS message has not been received, transferring, by the origin DMA engine, another predetermined portion of the data to the target compute node using a memory FIFO operation; and if the acknowledgement of the RTS message has been received by the origin DMA engine, transferring, by the origin DMA engine, any remaining portion of the data to the target compute node using a direct put operation.
Distributed Processor/Memory Architectures Design Program

DTIC Science & Technology

1975-02-01

Event Scheduling Plo 31 Globat LAl Message Input Event Sicheduling Fhou ..... ............... 106 32 It tc Iata Representation...298 138 GEX LEX Scheduling Phlmophy ....... ...................... 300 139 Executive Comirol Herarchy... Scheduler Subroutine lnterrelatiomhips . ..... ................. 312 145 Task Scheduler Message Scatuer. . ...... ....................... 315 146
Direct memory access transfer completion notification

DOEpatents

Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Parker, Jeffrey J [Rochester, MN

2011-02-15

DMA transfer completion notification includes: inserting, by an origin DMA engine on an origin node in an injection first-in-first-out (`FIFO`) buffer, a data descriptor for an application message to be transferred to a target node on behalf of an application on the origin node; inserting, by the origin DMA engine, a completion notification descriptor in the injection FIFO buffer after the data descriptor for the message, the completion notification descriptor specifying a packet header for a completion notification packet; transferring, by the origin DMA engine to the target node, the message in dependence upon the data descriptor; sending, by the origin DMA engine, the completion notification packet to a local reception FIFO buffer using a local memory FIFO transfer operation; and notifying, by the origin DMA engine, the application that transfer of the message is complete in response to receiving the completion notification packet in the local reception FIFO buffer.
Distributed Memory Parallel Computing with SEAWAT

NASA Astrophysics Data System (ADS)

Verkaik, J.; Huizer, S.; van Engelen, J.; Oude Essink, G.; Ram, R.; Vuik, K.

2017-12-01

Fresh groundwater reserves in coastal aquifers are threatened by sea-level rise, extreme weather conditions, increasing urbanization and associated groundwater extraction rates. To counteract these threats, accurate high-resolution numerical models are required to optimize the management of these precious reserves. The major model drawbacks are long run times and large memory requirements, limiting the predictive power of these models. Distributed memory parallel computing is an efficient technique for reducing run times and memory requirements, where the problem is divided over multiple processor cores. A new Parallel Krylov Solver (PKS) for SEAWAT is presented. PKS has recently been applied to MODFLOW and includes Conjugate Gradient (CG) and Biconjugate Gradient Stabilized (BiCGSTAB) linear accelerators. Both accelerators are preconditioned by an overlapping additive Schwarz preconditioner in a way that: a) subdomains are partitioned using Recursive Coordinate Bisection (RCB) load balancing, b) each subdomain uses local memory only and communicates with other subdomains by Message Passing Interface (MPI) within the linear accelerator, c) it is fully integrated in SEAWAT. Within SEAWAT, the PKS-CG solver replaces the Preconditioned Conjugate Gradient (PCG) solver for solving the variable-density groundwater flow equation and the PKS-BiCGSTAB solver replaces the Generalized Conjugate Gradient (GCG) solver for solving the advection-diffusion equation. PKS supports the third-order Total Variation Diminishing (TVD) scheme for computing advection. Benchmarks were performed on the Dutch national supercomputer (https://userinfo.surfsara.nl/systems/cartesius) using up to 128 cores, for a synthetic 3D Henry model (100 million cells) and the real-life Sand Engine model ( 10 million cells). The Sand Engine model was used to investigate the potential effect of the long-term morphological evolution of a large sand replenishment and climate change on fresh groundwater resources. Speed-ups up to 40 were obtained with the new PKS solver.

Re-Presenting Subversive Songs: Applying Strategies for Invention and Arrangement to Nontraditional Speech Texts

ERIC Educational Resources Information Center

Charlesworth, Dacia

2010-01-01

Invention deals with the content of a speech, arrangement involves placing the content in an order that is most strategic, style focuses on selecting linguistic devices, such as metaphor, to make the message more appealing, memory assists the speaker in delivering the message correctly, and delivery ideally enables great reception of the message.…
Effects of redundancy in the comparison of speech and pictorial displays in the cockpit environment.

PubMed

Byblow, W D

1990-06-01

Synthesised speech and pictorial displays were compared in a spatially compatible simulated cockpit environment. Messages of high or low levels of redundancy were presented to subjects in both modality conditions. Subjects responded to warnings presented in a warning-only condition and in a dual-task condition, in which a simulated flight task was performed with visual and manual input/output modalities. Because the amount of information presented in most real-world applications and experimental paradigms is quantifiably large with respect to present guidelines for the use of synthesised speech warnings, the low-redundancy condition was hypothesised to allow for better performance. Results showed that subjects respond quicker to messages of low redundancy in both modalities. It is suggested that speech messages with low-redundancy levels were effective in minimising message length and ensuring that messages did not overload the short-term memory required to process and maintain speech in memory. Manipulation of phrase structure was used to optimise message redundancy and enhance the conceptual compatibility of the message without increasing message length or imposing a perceptual cost or memory overload. The results also suggest that system response times were quicker when synthesised speech warnings were used. This result is consistent with predictions from multiple resource theory which states that the resources required for the perception of verbal warnings are different from those for the flight task. It is also suggested that the perception of a pictorial display requires the same resources used for the perception of the primary flight task. An alternative explanation is that pictorial displays impose a visual scanning cost which is responsible for decreased performance. Based on the findings reported here, it is suggested that speech displays be incorporated in a spatially compatible cockpit environment because they allow equal or better performance when compared with pictorial displays. More importantly, the amount of time that the operator must direct his vision away from information vital to the flight task is decreased.
"A Message for My Brother": The Vietnam Veterans' Memorial as Rhetorical Situation.

ERIC Educational Resources Information Center

Carlson, A. Cheree; Hocking, John E.

An examination of letters left at the Vietnam Veteran's Memorial in Washington, D.C. between November, 1984 and April, 1986 revealed that the memorial serves as a rhetorical situation that urges its visitors to eloquence. The memorial is an excellent proving ground for situational theory because the interaction of site and perception is vital to…
Direct memory access transfer completion notification

DOEpatents

Chen, Dong; Giampapa, Mark E.; Heidelberger, Philip; Kumar, Sameer; Parker, Jeffrey J.; Steinmacher-Burow, Burkhard D.; Vranas, Pavlos

2010-07-27

Methods, compute nodes, and computer program products are provided for direct memory access (`DMA`) transfer completion notification. Embodiments include determining, by an origin DMA engine on an origin compute node, whether a data descriptor for an application message to be sent to a target compute node is currently in an injection first-in-first-out (`FIFO`) buffer in dependence upon a sequence number previously associated with the data descriptor, the total number of descriptors currently in the injection FIFO buffer, and the current sequence number for the newest data descriptor stored in the injection FIFO buffer; and notifying a processor core on the origin DMA engine that the message has been sent if the data descriptor for the message is not currently in the injection FIFO buffer.
Asbestos: Securing Untrusted Software with Interposition

DTIC Science & Technology

2005-09-01

consistent intelligible interfaces to different types of resource. Message-based operating systems, such as Accent, Amoeba, Chorus, L4 , Spring...control on self-authenticating capabilities, precluding policies that restrict delegation. L4 uses a strict hierarchy of interpositions, useful for...the OS de- sign space amenable to secure application construction. Similar effects might be possible with message-passing microkernels , or unwieldy
Remote direct memory access

DOEpatents

Archer, Charles J.; Blocksome, Michael A.

2012-12-11

Methods, parallel computers, and computer program products are disclosed for remote direct memory access. Embodiments include transmitting, from an origin DMA engine on an origin compute node to a plurality target DMA engines on target compute nodes, a request to send message, the request to send message specifying a data to be transferred from the origin DMA engine to data storage on each target compute node; receiving, by each target DMA engine on each target compute node, the request to send message; preparing, by each target DMA engine, to store data according to the data storage reference and the data length, including assigning a base storage address for the data storage reference; sending, by one or more of the target DMA engines, an acknowledgment message acknowledging that all the target DMA engines are prepared to receive a data transmission from the origin DMA engine; receiving, by the origin DMA engine, the acknowledgement message from the one or more of the target DMA engines; and transferring, by the origin DMA engine, data to data storage on each of the target compute nodes according to the data storage reference using a single direct put operation.
A Memory Pacer for Improving Stimulus Generalization.

ERIC Educational Resources Information Center

Browning, Ellen R.

1983-01-01

The Memory Tracer, which provides prompts by playing prerecorded appropriate messages, was effective in maintaining a low rate of negative verbalizations by six adolescents with autism, schizophrenia, and severe behavior problems. (CL) 4B
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
ADAPTIVE TETRAHEDRAL GRID REFINEMENT AND COARSENING IN MESSAGE-PASSING ENVIRONMENTS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hallberg, J.; Stagg, A.

2000-10-01

A grid refinement and coarsening scheme has been developed for tetrahedral and triangular grid-based calculations in message-passing environments. The element adaption scheme is based on an edge bisection of elements marked for refinement by an appropriate error indicator. Hash-table/linked-list data structures are used to store nodal and element formation. The grid along inter-processor boundaries is refined and coarsened consistently with the update of these data structures via MPI calls. The parallel adaption scheme has been applied to the solution of a transient, three-dimensional, nonlinear, groundwater flow problem. Timings indicate efficiency of the grid refinement process relative to the flow solvermore » calculations.« less
Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications

PubMed Central

2014-01-01

Background The huge quantity of data produced in Biomedical research needs sophisticated algorithmic methodologies for its storage, analysis, and processing. High Performance Computing (HPC) appears as a magic bullet in this challenge. However, several hard to solve parallelization and load balancing problems arise in this context. Here we discuss the HPC-oriented implementation of a general purpose learning algorithm, originally conceived for DNA analysis and recently extended to treat uncertainty on data (U-BRAIN). The U-BRAIN algorithm is a learning algorithm that finds a Boolean formula in disjunctive normal form (DNF), of approximately minimum complexity, that is consistent with a set of data (instances) which may have missing bits. The conjunctive terms of the formula are computed in an iterative way by identifying, from the given data, a family of sets of conditions that must be satisfied by all the positive instances and violated by all the negative ones; such conditions allow the computation of a set of coefficients (relevances) for each attribute (literal), that form a probability distribution, allowing the selection of the term literals. The great versatility that characterizes it, makes U-BRAIN applicable in many of the fields in which there are data to be analyzed. However the memory and the execution time required by the running are of O(n3) and of O(n5) order, respectively, and so, the algorithm is unaffordable for huge data sets. Results We find mathematical and programming solutions able to lead us towards the implementation of the algorithm U-BRAIN on parallel computers. First we give a Dynamic Programming model of the U-BRAIN algorithm, then we minimize the representation of the relevances. When the data are of great size we are forced to use the mass memory, and depending on where the data are actually stored, the access times can be quite different. According to the evaluation of algorithmic efficiency based on the Disk Model, in order to reduce the costs of the communications between different memories (RAM, Cache, Mass, Virtual) and to achieve efficient I/O performance, we design a mass storage structure able to access its data with a high degree of temporal and spatial locality. Then we develop a parallel implementation of the algorithm. We model it as a SPMD system together to a Message-Passing Programming Paradigm. Here, we adopt the high-level message-passing systems MPI (Message Passing Interface) in the version for the Java programming language, MPJ. The parallel processing is organized into four stages: partitioning, communication, agglomeration and mapping. The decomposition of the U-BRAIN algorithm determines the necessity of a communication protocol design among the processors involved. Efficient synchronization design is also discussed. Conclusions In the context of a collaboration between public and private institutions, the parallel model of U-BRAIN has been implemented and tested on the INTEL XEON E7xxx and E5xxx family of the CRESCO structure of Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), developed in the framework of the European Grid Infrastructure (EGI), a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The implementation is able to minimize both the memory space and the execution time. The test data used in this study are IPDATA (Irvine Primate splice- junction DATA set), a subset of HS3D (Homo Sapiens Splice Sites Dataset) and a subset of COSMIC (the Catalogue of Somatic Mutations in Cancer). The execution time and the speed-up on IPDATA reach the best values within about 90 processors. Then the parallelization advantage is balanced by the greater cost of non-local communications between the processors. A similar behaviour is evident on HS3D, but at a greater number of processors, so evidencing the direct relationship between data size and parallelization gain. This behaviour is confirmed on COSMIC. Overall, the results obtained show that the parallel version is up to 30 times faster than the serial one. PMID:25077818
Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications.

PubMed

D'Angelo, Gianni; Rampone, Salvatore

2014-01-01

The huge quantity of data produced in Biomedical research needs sophisticated algorithmic methodologies for its storage, analysis, and processing. High Performance Computing (HPC) appears as a magic bullet in this challenge. However, several hard to solve parallelization and load balancing problems arise in this context. Here we discuss the HPC-oriented implementation of a general purpose learning algorithm, originally conceived for DNA analysis and recently extended to treat uncertainty on data (U-BRAIN). The U-BRAIN algorithm is a learning algorithm that finds a Boolean formula in disjunctive normal form (DNF), of approximately minimum complexity, that is consistent with a set of data (instances) which may have missing bits. The conjunctive terms of the formula are computed in an iterative way by identifying, from the given data, a family of sets of conditions that must be satisfied by all the positive instances and violated by all the negative ones; such conditions allow the computation of a set of coefficients (relevances) for each attribute (literal), that form a probability distribution, allowing the selection of the term literals. The great versatility that characterizes it, makes U-BRAIN applicable in many of the fields in which there are data to be analyzed. However the memory and the execution time required by the running are of O(n(3)) and of O(n(5)) order, respectively, and so, the algorithm is unaffordable for huge data sets. We find mathematical and programming solutions able to lead us towards the implementation of the algorithm U-BRAIN on parallel computers. First we give a Dynamic Programming model of the U-BRAIN algorithm, then we minimize the representation of the relevances. When the data are of great size we are forced to use the mass memory, and depending on where the data are actually stored, the access times can be quite different. According to the evaluation of algorithmic efficiency based on the Disk Model, in order to reduce the costs of the communications between different memories (RAM, Cache, Mass, Virtual) and to achieve efficient I/O performance, we design a mass storage structure able to access its data with a high degree of temporal and spatial locality. Then we develop a parallel implementation of the algorithm. We model it as a SPMD system together to a Message-Passing Programming Paradigm. Here, we adopt the high-level message-passing systems MPI (Message Passing Interface) in the version for the Java programming language, MPJ. The parallel processing is organized into four stages: partitioning, communication, agglomeration and mapping. The decomposition of the U-BRAIN algorithm determines the necessity of a communication protocol design among the processors involved. Efficient synchronization design is also discussed. In the context of a collaboration between public and private institutions, the parallel model of U-BRAIN has been implemented and tested on the INTEL XEON E7xxx and E5xxx family of the CRESCO structure of Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), developed in the framework of the European Grid Infrastructure (EGI), a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The implementation is able to minimize both the memory space and the execution time. The test data used in this study are IPDATA (Irvine Primate splice- junction DATA set), a subset of HS3D (Homo Sapiens Splice Sites Dataset) and a subset of COSMIC (the Catalogue of Somatic Mutations in Cancer). The execution time and the speed-up on IPDATA reach the best values within about 90 processors. Then the parallelization advantage is balanced by the greater cost of non-local communications between the processors. A similar behaviour is evident on HS3D, but at a greater number of processors, so evidencing the direct relationship between data size and parallelization gain. This behaviour is confirmed on COSMIC. Overall, the results obtained show that the parallel version is up to 30 times faster than the serial one.
Statistical mechanics of unsupervised feature learning in a restricted Boltzmann machine with binary synapses

NASA Astrophysics Data System (ADS)

Huang, Haiping

2017-05-01

Revealing hidden features in unlabeled data is called unsupervised feature learning, which plays an important role in pretraining a deep neural network. Here we provide a statistical mechanics analysis of the unsupervised learning in a restricted Boltzmann machine with binary synapses. A message passing equation to infer the hidden feature is derived, and furthermore, variants of this equation are analyzed. A statistical analysis by replica theory describes the thermodynamic properties of the model. Our analysis confirms an entropy crisis preceding the non-convergence of the message passing equation, suggesting a discontinuous phase transition as a key characteristic of the restricted Boltzmann machine. Continuous phase transition is also confirmed depending on the embedded feature strength in the data. The mean-field result under the replica symmetric assumption agrees with that obtained by running message passing algorithms on single instances of finite sizes. Interestingly, in an approximate Hopfield model, the entropy crisis is absent, and a continuous phase transition is observed instead. We also develop an iterative equation to infer the hyper-parameter (temperature) hidden in the data, which in physics corresponds to iteratively imposing Nishimori condition. Our study provides insights towards understanding the thermodynamic properties of the restricted Boltzmann machine learning, and moreover important theoretical basis to build simplified deep networks.
Performance analysis of distributed applications using automatic classification of communication inefficiencies

DOEpatents

Vetter, Jeffrey S.

2005-02-01

The method and system described herein presents a technique for performance analysis that helps users understand the communication behavior of their message passing applications. The method and system described herein may automatically classifies individual communication operations and reveal the cause of communication inefficiencies in the application. This classification allows the developer to quickly focus on the culprits of truly inefficient behavior, rather than manually foraging through massive amounts of performance data. Specifically, the method and system described herein trace the message operations of Message Passing Interface (MPI) applications and then classify each individual communication event using a supervised learning technique: decision tree classification. The decision tree may be trained using microbenchmarks that demonstrate both efficient and inefficient communication. Since the method and system described herein adapt to the target system's configuration through these microbenchmarks, they simultaneously automate the performance analysis process and improve classification accuracy. The method and system described herein may improve the accuracy of performance analysis and dramatically reduce the amount of data that users must encounter.
Distraction Effects of Smoking Cues in Antismoking Messages: Examining Resource Allocation to Message Processing as a Function of Smoking Cues and Argument Strength

PubMed Central

Lee, Sungkyoung; Cappella, Joseph N.

2014-01-01

Findings from previous studies on smoking cues and argument strength in antismoking messages have shown that the presence of smoking cues undermines the persuasiveness of antismoking public service announcements (PSAs) with weak arguments. This study conceptualized smoking cues (i.e., scenes showing smoking-related objects and behaviors) as stimuli motivationally relevant to the former smoker population and examined how smoking cues influence former smokers’ processing of antismoking PSAs. Specifically, by defining smoking cues and the strength of antismoking arguments in terms of resource allocation, this study examined former smokers’ recognition accuracy, memory strength, and memory judgment of visual (i.e., scenes excluding smoking cues) and audio information from antismoking PSAs. In line with previous findings, the results of the study showed that the presence of smoking cues undermined former smokers’ encoding of antismoking arguments, which includes the visual and audio information that compose the main content of antismoking messages. PMID:25477766
Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

DOEpatents

Blocksome, Michael A.; Mamidala, Amith R.

2013-09-03

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

DOEpatents

Blocksome, Michael A; Mamidala, Amith R

2014-02-11

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Earth Observation

NASA Image and Video Library

2013-08-20

Earth observation taken during day pass by an Expedition 36 crew member on board the International Space Station (ISS). Per Twitter message: Looking southwest over northern Africa. Libya, Algeria, Niger.
Application of a distributed network in computational fluid dynamic simulations

NASA Technical Reports Server (NTRS)

Deshpande, Manish; Feng, Jinzhang; Merkle, Charles L.; Deshpande, Ashish

1994-01-01

A general-purpose 3-D, incompressible Navier-Stokes algorithm is implemented on a network of concurrently operating workstations using parallel virtual machine (PVM) and compared with its performance on a CRAY Y-MP and on an Intel iPSC/860. The problem is relatively computationally intensive, and has a communication structure based primarily on nearest-neighbor communication, making it ideally suited to message passing. Such problems are frequently encountered in computational fluid dynamics (CDF), and their solution is increasingly in demand. The communication structure is explicitly coded in the implementation to fully exploit the regularity in message passing in order to produce a near-optimal solution. Results are presented for various grid sizes using up to eight processors.
Approximate message passing with restricted Boltzmann machine priors

NASA Astrophysics Data System (ADS)

Tramel, Eric W.; Drémeau, Angélique; Krzakala, Florent

2016-07-01

Approximate message passing (AMP) has been shown to be an excellent statistical approach to signal inference and compressed sensing problems. The AMP framework provides modularity in the choice of signal prior; here we propose a hierarchical form of the Gauss-Bernoulli prior which utilizes a restricted Boltzmann machine (RBM) trained on the signal support to push reconstruction performance beyond that of simple i.i.d. priors for signals whose support can be well represented by a trained binary RBM. We present and analyze two methods of RBM factorization and demonstrate how these affect signal reconstruction performance within our proposed algorithm. Finally, using the MNIST handwritten digit dataset, we show experimentally that using an RBM allows AMP to approach oracle-support performance.
An Efficient Multiblock Method for Aerodynamic Analysis and Design on Distributed Memory Systems

NASA Technical Reports Server (NTRS)

Reuther, James; Alonso, Juan Jose; Vassberg, John C.; Jameson, Antony; Martinelli, Luigi

1997-01-01

The work presented in this paper describes the application of a multiblock gridding strategy to the solution of aerodynamic design optimization problems involving complex configurations. The design process is parallelized using the MPI (Message Passing Interface) Standard such that it can be efficiently run on a variety of distributed memory systems ranging from traditional parallel computers to networks of workstations. Substantial improvements to the parallel performance of the baseline method are presented, with particular attention to their impact on the scalability of the program as a function of the mesh size. Drag minimization calculations at a fixed coefficient of lift are presented for a business jet configuration that includes the wing, body, pylon, aft-mounted nacelle, and vertical and horizontal tails. An aerodynamic design optimization is performed with both the Euler and Reynolds Averaged Navier-Stokes (RANS) equations governing the flow solution and the results are compared. These sample calculations establish the feasibility of efficient aerodynamic optimization of complete aircraft configurations using the RANS equations as the flow model. There still exists, however, the need for detailed studies of the importance of a true viscous adjoint method which holds the promise of tackling the minimization of not only the wave and induced components of drag, but also the viscous drag.

Visualization Co-Processing of a CFD Simulation

NASA Technical Reports Server (NTRS)

Vaziri, Arsi

1999-01-01

OVERFLOW, a widely used CFD simulation code, is combined with a visualization system, pV3, to experiment with an environment for simulation/visualization co-processing on a SGI Origin 2000 computer(O2K) system. The shared memory version of the solver is used with the O2K 'pfa' preprocessor invoked to automatically discover parallelism in the source code. No other explicit parallelism is enabled. In order to study the scaling and performance of the visualization co-processing system, sample runs are made with different processor groups in the range of 1 to 254 processors. The data exchange between the visualization system and the simulation system is rapid enough for user interactivity when the problem size is small. This shared memory version of OVERFLOW, with minimal parallelization, does not scale well to an increasing number of available processors. The visualization task takes about 18 to 30% of the total processing time and does not appear to be a major contributor to the poor scaling. Improper load balancing and inter-processor communication overhead are contributors to this poor performance. Work is in progress which is aimed at obtaining improved parallel performance of the solver and removing the limitations of serial data transfer to pV3 by examining various parallelization/communication strategies, including the use of the explicit message passing.
SpaceWire Driver Software for Special DSPs

NASA Technical Reports Server (NTRS)

Clark, Douglas; Lux, James; Nishimoto, Kouji; Lang, Minh

2003-01-01

A computer program provides a high-level C-language interface to electronics circuitry that controls a SpaceWire interface in a system based on a space qualified version of the ADSP-21020 digital signal processor (DSP). SpaceWire is a spacecraft-oriented standard for packet-switching data-communication networks that comprise nodes connected through bidirectional digital serial links that utilize low-voltage differential signaling (LVDS). The software is tailored to the SMCS-332 application-specific integrated circuit (ASIC) (also available as the TSS901E), which provides three highspeed (150 Mbps) serial point-to-point links compliant with the proposed Institute of Electrical and Electronics Engineers (IEEE) Standard 1355.2 and equivalent European Space Agency (ESA) Standard ECSS-E-50-12. In the specific application of this software, the SpaceWire ASIC was combined with the DSP processor, memory, and control logic in a Multi-Chip Module DSP (MCM-DSP). The software is a collection of low-level driver routines that provide a simple message-passing application programming interface (API) for software running on the DSP. Routines are provided for interrupt-driven access to the two styles of interface provided by the SMCS: (1) the "word at a time" conventional host interface (HOCI); and (2) a higher performance "dual port memory" style interface (COMI).
Xyce parallel electronic simulator users guide, version 6.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users' guide, Version 6.0.1.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users guide, version 6.0.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Use of Hilbert Curves in Parallelized CUDA code: Interaction of Interstellar Atoms with the Heliosphere

NASA Astrophysics Data System (ADS)

Destefano, Anthony; Heerikhuisen, Jacob

2015-04-01

Fully 3D particle simulations can be a computationally and memory expensive task, especially when high resolution grid cells are required. The problem becomes further complicated when parallelization is needed. In this work we focus on computational methods to solve these difficulties. Hilbert curves are used to map the 3D particle space to the 1D contiguous memory space. This method of organization allows for minimized cache misses on the GPU as well as a sorted structure that is equivalent to an octal tree data structure. This type of sorted structure is attractive for uses in adaptive mesh implementations due to the logarithm search time. Implementations using the Message Passing Interface (MPI) library and NVIDIA's parallel computing platform CUDA will be compared, as MPI is commonly used on server nodes with many CPU's. We will also compare static grid structures with those of adaptive mesh structures. The physical test bed will be simulating heavy interstellar atoms interacting with a background plasma, the heliosphere, simulated from fully consistent coupled MHD/kinetic particle code. It is known that charge exchange is an important factor in space plasmas, specifically it modifies the structure of the heliosphere itself. We would like to thank the Alabama Supercomputer Authority for the use of their computational resources.
Intranode data communications in a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E

2014-01-07

Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a computer node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.
Intranode data communications in a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E

2013-07-23

Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a compute node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.
Signaling completion of a message transfer from an origin compute node to a target compute node

DOEpatents

Blocksome, Michael A [Rochester, MN

2011-02-15

Signaling completion of a message transfer from an origin node to a target node includes: sending, by an origin DMA engine, an RTS message, the RTS message specifying an application message for transfer to the target node from the origin node; receiving, by the origin DMA engine, a remote get message containing a data descriptor for the message and a completion notification descriptor, the completion notification descriptor specifying a local memory FIFO data transfer operation for transferring data locally on the origin node; inserting, by the origin DMA engine in an injection FIFO buffer, the data descriptor followed by the completion notification descriptor; transferring, by the origin DMA engine to the target node, the message in dependence upon the data descriptor; and notifying, by the origin DMA engine, the application that transfer of the message is complete in dependence upon the completion notification descriptor.
Passing messages between biological networks to refine predicted interactions.

PubMed

Glass, Kimberly; Huttenhower, Curtis; Quackenbush, John; Yuan, Guo-Cheng

2013-01-01

Regulatory network reconstruction is a fundamental problem in computational biology. There are significant limitations to such reconstruction using individual datasets, and increasingly people attempt to construct networks using multiple, independent datasets obtained from complementary sources, but methods for this integration are lacking. We developed PANDA (Passing Attributes between Networks for Data Assimilation), a message-passing model using multiple sources of information to predict regulatory relationships, and used it to integrate protein-protein interaction, gene expression, and sequence motif data to reconstruct genome-wide, condition-specific regulatory networks in yeast as a model. The resulting networks were not only more accurate than those produced using individual data sets and other existing methods, but they also captured information regarding specific biological mechanisms and pathways that were missed using other methodologies. PANDA is scalable to higher eukaryotes, applicable to specific tissue or cell type data and conceptually generalizable to include a variety of regulatory, interaction, expression, and other genome-scale data. An implementation of the PANDA algorithm is available at www.sourceforge.net/projects/panda-net.
Influence of emotional content and context on memory in mild Alzheimer's disease.

PubMed

Perrin, Margaux; Henaff, Marie-Anne; Padovan, Catherine; Faillenot, Isabelle; Merville, Adrien; Krolak-Salmon, Pierre

2012-01-01

Healthy subjects remember emotional stimuli better than neutral, as well as stimuli embedded in an emotional context. This better memory of emotional messages is linked to an amygdalo-hippocampal cooperation taking place in a larger fronto-temporal network particularly sensitive to pathological aging. Amygdala is mainly involved in gist memory of emotional messages. Whether emotional content or context enhances memory in mild Alzheimer's disease (AD) patients is still debated. The aim of the present study is to examine the influence of emotional content and emotional context on the memory in mild AD, and whether this influence is linked to amygdala volume. Fifteen patients affected by mild AD and 15 age-matched controls were submitted to series of negative, positive, and neutral pictures. Each series was embedded in an emotional or neutral sound context. At the end of each series, participants had to freely recall pictures, and answer questions about each picture. Amygdala volumes were measured on patient 3D-MRI scans. In the present study, emotional content significantly favored memory of gist but not of details in healthy elderly and in AD patients. Patients' amygdala volume was positively correlated to emotional content memory effect, implying a reduced memory benefit from emotional content when amygdala was atrophied. A positive context enhanced memory of pictures in healthy elderly, but not in AD, corroborating early fronto-temporal dysfunction and early working memory limitation in this disease.
Defining Audio/Video Redundancy from a Limited Capacity Information Processing Perspective.

ERIC Educational Resources Information Center

Lang, Annie

1995-01-01

Investigates whether audio/video redundancy improves memory for television messages. Suggests a theoretical framework for classifying previous work and reinterpreting the results. Suggests general support for the notion that redundancy levels affect the capacity requirements of the message, which impact differentially on audio or visual…
Talking Wheelchair

NASA Technical Reports Server (NTRS)

1981-01-01

Communication is made possible for disabled individuals by means of an electronic system, developed at Stanford University's School of Medicine, which produces highly intelligible synthesized speech. Familiarly known as the "talking wheelchair" and formally as the Versatile Portable Speech Prosthesis (VPSP). Wheelchair mounted system consists of a word processor, a video screen, a voice synthesizer and a computer program which instructs the synthesizer how to produce intelligible sounds in response to user commands. Computer's memory contains 925 words plus a number of common phrases and questions. Memory can also store several thousand other words of the user's choice. Message units are selected by operating a simple switch, joystick or keyboard. Completed message appears on the video screen, then user activates speech synthesizer, which generates a voice with a somewhat mechanical tone. With the keyboard, an experienced user can construct messages as rapidly as 30 words per minute.
Earth observation taken by the Expedition 46 crew

NASA Image and Video Library

2016-01-23

ISS046e021993 (01/23/2016) --- Earth observation of the coast of Oman taken during a night pass by the Expedition 46 crew aboard the International Space Station. NASA astronaut Tim Kopra tweeted this image out with this message: "Passing over the Gulf of #Oman at night -- city lights of #Muscat #Dubai #AbuDhabi and #Doha in the distance".
Optimized collectives using a DMA on a parallel computer

DOEpatents

Chen, Dong [Croton On Hudson, NY; Gabor, Dozsa [Ardsley, NY; Giampapa, Mark E [Irvington, NY; Heidelberger,; Phillip, [Cortlandt Manor, NY

2011-02-08

Optimizing collective operations using direct memory access controller on a parallel computer, in one aspect, may comprise establishing a byte counter associated with a direct memory access controller for each submessage in a message. The byte counter includes at least a base address of memory and a byte count associated with a submessage. A byte counter associated with a submessage is monitored to determine whether at least a block of data of the submessage has been received. The block of data has a predetermined size, for example, a number of bytes. The block is processed when the block has been fully received, for example, when the byte count indicates all bytes of the block have been received. The monitoring and processing may continue for all blocks in all submessages in the message.
Flow of Emotional Messages in Artificial Social Networks

NASA Astrophysics Data System (ADS)

Chmiel, Anna; Hołyst, Janusz A.

Models of message flows in an artificial group of users communicating via the Internet are introduced and investigated using numerical simulations. We assumed that messages possess an emotional character with a positive valence and that the willingness to send the next affective message to a given person increases with the number of messages received from this person. As a result, the weights of links between group members evolve over time. Memory effects are introduced, taking into account that the preferential selection of message receivers depends on the communication intensity during the recent period only. We also model the phenomenon of secondary social sharing when the reception of an emotional e-mail triggers the distribution of several emotional e-mails to other people.
Cluster Computing For Real Time Seismic Array Analysis.

NASA Astrophysics Data System (ADS)

Martini, M.; Giudicepietro, F.

A seismic array is an instrument composed by a dense distribution of seismic sen- sors that allow to measure the directional properties of the wavefield (slowness or wavenumber vector) radiated by a seismic source. Over the last years arrays have been widely used in different fields of seismological researches. In particular they are applied in the investigation of seismic sources on volcanoes where they can be suc- cessfully used for studying the volcanic microtremor and long period events which are critical for getting information on the volcanic systems evolution. For this reason arrays could be usefully employed for the volcanoes monitoring, however the huge amount of data produced by this type of instruments and the processing techniques which are quite time consuming limited their potentiality for this application. In order to favor a direct application of arrays techniques to continuous volcano monitoring we designed and built a small PC cluster able to near real time computing the kinematics properties of the wavefield (slowness or wavenumber vector) produced by local seis- mic source. The cluster is composed of 8 Intel Pentium-III bi-processors PC working at 550 MHz, and has 4 Gigabytes of RAM memory. It runs under Linux operating system. The developed analysis software package is based on the Multiple SIgnal Classification (MUSIC) algorithm and is written in Fortran. The message-passing part is based upon the LAM programming environment package, an open-source imple- mentation of the Message Passing Interface (MPI). The developed software system includes modules devote to receiving date by internet and graphical applications for the continuous displaying of the processing results. The system has been tested with a data set collected during a seismic experiment conducted on Etna in 1999 when two dense seismic arrays have been deployed on the northeast and the southeast flanks of this volcano. A real time continuous acquisition system has been simulated by a pro- gram which reads data from disk files and send them to a remote host by using the Internet protocols.
Everyday memory impairment in patients with temporal lobe epilepsy caused by hippocampal sclerosis.

PubMed

Rzezak, Patrícia; Lima, Ellen Marise; Gargaro, Ana Carolina; Coimbra, Erica; de Vincentiis, Silvia; Velasco, Tonicarlo Rodrigues; Leite, João Pereira; Busatto, Geraldo F; Valente, Kette D

2017-04-01

Patients with temporal lobe epilepsy caused by hippocampal sclerosis (TLE-HS) have episodic memory impairment. Memory has rarely been evaluated using an ecologic measure, even though performance on these tests is more related to patients' memory complaints. We aimed to measure everyday memory of patients with TLE-HS to age- and gender-matched controls. We evaluated 31 patients with TLE-HS and 34 healthy controls, without epilepsy and psychiatric disorders, using the Rivermead Behavioral Memory Test (RBMT), Visual Reproduction (WMS-III) and Logical Memory (WMS-III). We evaluated the impact of clinical variables such as the age of onset, epilepsy duration, AED use, history of status epilepticus, and seizure frequency on everyday memory. Statistical analyses were performed using MANCOVA with years of education as a confounding factor. Patients showed worse performance than controls on traditional memory tests and in the overall score of RBMT. Patients had more difficulties to recall names, a hidden belonging, to deliver a message, object recognition, to remember a story full of details, a previously presented short route, and in time and space orientation. Clinical epilepsy variables were not associated with RBMT performance. Memory span and working memory were correlated with worse performance on RBMT. Patients with TLE-HS demonstrated deficits in everyday memory functions. A standard neuropsychological battery, designed to assess episodic memory, would not evaluate these impairments. Impairment in recalling names, routes, stories, messages, and space/time disorientation can adversely impact social adaptation, and we must consider these ecologic measures with greater attention in the neuropsychological evaluation of patients with memory complaints. Copyright © 2017 Elsevier Inc. All rights reserved.
Mean-field message-passing equations in the Hopfield model and its generalizations

NASA Astrophysics Data System (ADS)

Mézard, Marc

2017-02-01

Motivated by recent progress in using restricted Boltzmann machines as preprocessing algorithms for deep neural network, we revisit the mean-field equations [belief-propagation and Thouless-Anderson Palmer (TAP) equations] in the best understood of such machines, namely the Hopfield model of neural networks, and we explicit how they can be used as iterative message-passing algorithms, providing a fast method to compute the local polarizations of neurons. In the "retrieval phase", where neurons polarize in the direction of one memorized pattern, we point out a major difference between the belief propagation and TAP equations: The set of belief propagation equations depends on the pattern which is retrieved, while one can use a unique set of TAP equations. This makes the latter method much better suited for applications in the learning process of restricted Boltzmann machines. In the case where the patterns memorized in the Hopfield model are not independent, but are correlated through a combinatorial structure, we show that the TAP equations have to be modified. This modification can be seen either as an alteration of the reaction term in TAP equations or, more interestingly, as the consequence of message passing on a graphical model with several hidden layers, where the number of hidden layers depends on the depth of the correlations in the memorized patterns. This layered structure is actually necessary when one deals with more general restricted Boltzmann machines.
Earth Observation

NASA Image and Video Library

2013-08-03

Earth observation taken during day pass by an Expedition 36 crew member on board the International Space Station (ISS). Per Twitter message: Perhaps a dandelion losing its seeds in the wind? Love clouds!

Paralysis

MedlinePlus

Paralysis is the loss of muscle function in part of your body. It happens when something goes ... way messages pass between your brain and muscles. Paralysis can be complete or partial. It can occur ...
Advanced Numerical Techniques of Performance Evaluation. Volume 2

DTIC Science & Technology

1990-06-01

multiprocessor environment. This factor is determined by the overhead of the primitives available in the system ( semaphore , monitor , or message... semaphore , monitor , or message passing primitives ) and U the programming ability of the user who implements the simulation. " t,: the sequential...Warp Operating System . i Pro" lftevcnth ACM Symposum on Operating Systems Princlplcs, pages 77 9:3, Auslin, TX, Nov wicr 1987. ACM. [121 D.R. Jefferson
Benchmarking hypercube hardware and software

NASA Technical Reports Server (NTRS)

Grunwald, Dirk C.; Reed, Daniel A.

1986-01-01

It was long a truism in computer systems design that balanced systems achieve the best performance. Message passing parallel processors are no different. To quantify the balance of a hypercube design, an experimental methodology was developed and the associated suite of benchmarks was applied to several existing hypercubes. The benchmark suite includes tests of both processor speed in the absence of internode communication and message transmission speed as a function of communication patterns.
Data transfer using complete bipartite graph

NASA Astrophysics Data System (ADS)

Chandrasekaran, V. M.; Praba, B.; Manimaran, A.; Kailash, G.

2017-11-01

Information exchange extent is an estimation of the amount of information sent between two focuses on a framework in a given time period. It is an extremely significant perception in present world. There are many ways of message passing in the present situations. Some of them are through encryption, decryption, by using complete bipartite graph. In this paper, we recommend a method for communication using messages through encryption of a complete bipartite graph.
Tough2{_}MP: A parallel version of TOUGH2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Wu, Yu-Shu; Ding, Chris

2003-04-09

TOUGH2{_}MP is a massively parallel version of TOUGH2. It was developed for running on distributed-memory parallel computers to simulate large simulation problems that may not be solved by the standard, single-CPU TOUGH2 code. The new code implements an efficient massively parallel scheme, while preserving the full capacity and flexibility of the original TOUGH2 code. The new software uses the METIS software package for grid partitioning and AZTEC software package for linear-equation solving. The standard message-passing interface is adopted for communication among processors. Numerical performance of the current version code has been tested on CRAY-T3E and IBM RS/6000 SP platforms. Inmore » addition, the parallel code has been successfully applied to real field problems of multi-million-cell simulations for three-dimensional multiphase and multicomponent fluid and heat flow, as well as solute transport. In this paper, we will review the development of the TOUGH2{_}MP, and discuss the basic features, modules, and their applications.« less
MPI, HPF or OpenMP: A Study with the NAS Benchmarks

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Hribar, Michelle; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

1999-01-01

Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but the task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study,potentials of applying some of the techniques to realistic aerospace applications will be presented
MPI, HPF or OpenMP: A Study with the NAS Benchmarks

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Hribar, M.; Waheed, A.; Yan, J.; Saini, Subhash (Technical Monitor)

1999-01-01

Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but this task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study, we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study, potentials of applying some of the techniques to realistic aerospace applications will be presented.
Extending the length and time scales of Gram-Schmidt Lyapunov vector computations

NASA Astrophysics Data System (ADS)

Costa, Anthony B.; Green, Jason R.

2013-08-01

Lyapunov vectors have found growing interest recently due to their ability to characterize systems out of thermodynamic equilibrium. The computation of orthogonal Gram-Schmidt vectors requires multiplication and QR decomposition of large matrices, which grow as N2 (with the particle count). This expense has limited such calculations to relatively small systems and short time scales. Here, we detail two implementations of an algorithm for computing Gram-Schmidt vectors. The first is a distributed-memory message-passing method using Scalapack. The second uses the newly-released MAGMA library for GPUs. We compare the performance of both codes for Lennard-Jones fluids from N=100 to 1300 between Intel Nahalem/Infiniband DDR and NVIDIA C2050 architectures. To our best knowledge, these are the largest systems for which the Gram-Schmidt Lyapunov vectors have been computed, and the first time their calculation has been GPU-accelerated. We conclude that Lyapunov vector calculations can be significantly extended in length and time by leveraging the power of GPU-accelerated linear algebra.
Parallel Fortran-MPI software for numerical inversion of the Laplace transform and its application to oscillatory water levels in groundwater environments

USGS Publications Warehouse

Zhan, X.

2005-01-01

A parallel Fortran-MPI (Message Passing Interface) software for numerical inversion of the Laplace transform based on a Fourier series method is developed to meet the need of solving intensive computational problems involving oscillatory water level's response to hydraulic tests in a groundwater environment. The software is a parallel version of ACM (The Association for Computing Machinery) Transactions on Mathematical Software (TOMS) Algorithm 796. Running 38 test examples indicated that implementation of MPI techniques with distributed memory architecture speedups the processing and improves the efficiency. Applications to oscillatory water levels in a well during aquifer tests are presented to illustrate how this package can be applied to solve complicated environmental problems involved in differential and integral equations. The package is free and is easy to use for people with little or no previous experience in using MPI but who wish to get off to a quick start in parallel computing. ?? 2004 Elsevier Ltd. All rights reserved.
Accelerating three-dimensional FDTD calculations on GPU clusters for electromagnetic field simulation.

PubMed

Nagaoka, Tomoaki; Watanabe, Soichi

2012-01-01

Electromagnetic simulation with anatomically realistic computational human model using the finite-difference time domain (FDTD) method has recently been performed in a number of fields in biomedical engineering. To improve the method's calculation speed and realize large-scale computing with the computational human model, we adapt three-dimensional FDTD code to a multi-GPU cluster environment with Compute Unified Device Architecture and Message Passing Interface. Our multi-GPU cluster system consists of three nodes. The seven GPU boards (NVIDIA Tesla C2070) are mounted on each node. We examined the performance of the FDTD calculation on multi-GPU cluster environment. We confirmed that the FDTD calculation on the multi-GPU clusters is faster than that on a multi-GPU (a single workstation), and we also found that the GPU cluster system calculate faster than a vector supercomputer. In addition, our GPU cluster system allowed us to perform the large-scale FDTD calculation because were able to use GPU memory of over 100 GB.
Load Balancing Strategies for Multiphase Flows on Structured Grids

NASA Astrophysics Data System (ADS)

Olshefski, Kristopher; Owkes, Mark

2017-11-01

The computation time required to perform large simulations of complex systems is currently one of the leading bottlenecks of computational research. Parallelization allows multiple processing cores to perform calculations simultaneously and reduces computational times. However, load imbalances between processors waste computing resources as processors wait for others to complete imbalanced tasks. In multiphase flows, these imbalances arise due to the additional computational effort required at the gas-liquid interface. However, many current load balancing schemes are only designed for unstructured grid applications. The purpose of this research is to develop a load balancing strategy while maintaining the simplicity of a structured grid. Several approaches are investigated including brute force oversubscription, node oversubscription through Message Passing Interface (MPI) commands, and shared memory load balancing using OpenMP. Each of these strategies are tested with a simple one-dimensional model prior to implementation into the three-dimensional NGA code. Current results show load balancing will reduce computational time by at least 30%.
Parallel implementation of a Lagrangian-based model on an adaptive mesh in C++: Application to sea-ice

NASA Astrophysics Data System (ADS)

Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar

2017-12-01

We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.
Spectral Element Method for the Simulation of Unsteady Compressible Flows

NASA Technical Reports Server (NTRS)

Diosady, Laslo Tibor; Murman, Scott M.

2013-01-01

This work uses a discontinuous-Galerkin spectral-element method (DGSEM) to solve the compressible Navier-Stokes equations [1{3]. The inviscid ux is computed using the approximate Riemann solver of Roe [4]. The viscous fluxes are computed using the second form of Bassi and Rebay (BR2) [5] in a manner consistent with the spectral-element approximation. The method of lines with the classical 4th-order explicit Runge-Kutta scheme is used for time integration. Results for polynomial orders up to p = 15 (16th order) are presented. The code is parallelized using the Message Passing Interface (MPI). The computations presented in this work are performed using the Sandy Bridge nodes of the NASA Pleiades supercomputer at NASA Ames Research Center. Each Sandy Bridge node consists of 2 eight-core Intel Xeon E5-2670 processors with a clock speed of 2.6Ghz and 2GB per core memory. On a Sandy Bridge node the Tau Benchmark [6] runs in a time of 7.6s.
Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, Haoqiang; VanderWijngaart, Rob F.

2003-01-01

We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of grids, but had not previously been captured in bench-marks. The new suite, named NPB Multi-Zone, is extended from the NAS Parallel Benchmarks suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the Message Passing Interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on three different parallel computers. We also use an empirical formula to investigate the performance characteristics of the multi-zone benchmarks.
Employing multi-GPU power for molecular dynamics simulation: an extension of GALAMOST

NASA Astrophysics Data System (ADS)

Zhu, You-Liang; Pan, Deng; Li, Zhan-Wei; Liu, Hong; Qian, Hu-Jun; Zhao, Yang; Lu, Zhong-Yuan; Sun, Zhao-Yan

2018-04-01

We describe the algorithm of employing multi-GPU power on the basis of Message Passing Interface (MPI) domain decomposition in a molecular dynamics code, GALAMOST, which is designed for the coarse-grained simulation of soft matters. The code of multi-GPU version is developed based on our previous single-GPU version. In multi-GPU runs, one GPU takes charge of one domain and runs single-GPU code path. The communication between neighbouring domains takes a similar algorithm of CPU-based code of LAMMPS, but is optimised specifically for GPUs. We employ a memory-saving design which can enlarge maximum system size at the same device condition. An optimisation algorithm is employed to prolong the update period of neighbour list. We demonstrate good performance of multi-GPU runs on the simulation of Lennard-Jones liquid, dissipative particle dynamics liquid, polymer and nanoparticle composite, and two-patch particles on workstation. A good scaling of many nodes on cluster for two-patch particles is presented.
The influence of banner advertisements on attention and memory: human faces with averted gaze can enhance advertising effectiveness

PubMed Central

Sajjacholapunt, Pitch; Ball, Linden J.

2014-01-01

Research suggests that banner advertisements used in online marketing are often overlooked, especially when positioned horizontally on webpages. Such inattention invariably gives rise to an inability to remember advertising brands and messages, undermining the effectiveness of this marketing method. Recent interest has focused on whether human faces within banner advertisements can increase attention to the information they contain, since the gaze cues conveyed by faces can influence where observers look. We report an experiment that investigated the efficacy of faces located in banner advertisements to enhance the attentional processing and memorability of banner contents. We tracked participants' eye movements when they examined webpages containing either bottom-right vertical banners or bottom-center horizontal banners. We also manipulated facial information such that banners either contained no face, a face with mutual gaze or a face with averted gaze. We additionally assessed people's memories for brands and advertising messages. Results indicated that relative to other conditions, the condition involving faces with averted gaze increased attention to the banner overall, as well as to the advertising text and product. Memorability of the brand and advertising message was also enhanced. Conversely, in the condition involving faces with mutual gaze, the focus of attention was localized more on the face region rather than on the text or product, weakening any memory benefits for the brand and advertising message. This detrimental impact of mutual gaze on attention to advertised products was especially marked for vertical banners. These results demonstrate that the inclusion of human faces with averted gaze in banner advertisements provides a promising means for marketers to increase the attention paid to such adverts, thereby enhancing memory for advertising information. PMID:24624104
The influence of banner advertisements on attention and memory: human faces with averted gaze can enhance advertising effectiveness.

PubMed

Sajjacholapunt, Pitch; Ball, Linden J

2014-01-01

Research suggests that banner advertisements used in online marketing are often overlooked, especially when positioned horizontally on webpages. Such inattention invariably gives rise to an inability to remember advertising brands and messages, undermining the effectiveness of this marketing method. Recent interest has focused on whether human faces within banner advertisements can increase attention to the information they contain, since the gaze cues conveyed by faces can influence where observers look. We report an experiment that investigated the efficacy of faces located in banner advertisements to enhance the attentional processing and memorability of banner contents. We tracked participants' eye movements when they examined webpages containing either bottom-right vertical banners or bottom-center horizontal banners. We also manipulated facial information such that banners either contained no face, a face with mutual gaze or a face with averted gaze. We additionally assessed people's memories for brands and advertising messages. Results indicated that relative to other conditions, the condition involving faces with averted gaze increased attention to the banner overall, as well as to the advertising text and product. Memorability of the brand and advertising message was also enhanced. Conversely, in the condition involving faces with mutual gaze, the focus of attention was localized more on the face region rather than on the text or product, weakening any memory benefits for the brand and advertising message. This detrimental impact of mutual gaze on attention to advertised products was especially marked for vertical banners. These results demonstrate that the inclusion of human faces with averted gaze in banner advertisements provides a promising means for marketers to increase the attention paid to such adverts, thereby enhancing memory for advertising information.
Trellis Tone Modulation Multiple-Access for Peer Discovery in D2D Networks

PubMed Central

Lim, Chiwoo; Kim, Sang-Hyo

2018-01-01

In this paper, a new non-orthogonal multiple-access scheme, trellis tone modulation multiple-access (TTMMA), is proposed for peer discovery of distributed device-to-device (D2D) communication. The range and capacity of discovery are important performance metrics in peer discovery. The proposed trellis tone modulation uses single-tone transmission and achieves a long discovery range due to its low Peak-to-Average Power Ratio (PAPR). The TTMMA also exploits non-orthogonal resource assignment to increase the discovery capacity. For the multi-user detection of superposed multiple-access signals, a message-passing algorithm with supplementary schemes are proposed. With TTMMA and its message-passing demodulation, approximately 1.5 times the number of devices are discovered compared to the conventional frequency division multiple-access (FDMA)-based discovery. PMID:29673167
Statistical physics of hard combinatorial optimization: Vertex cover problem

NASA Astrophysics Data System (ADS)

Zhao, Jin-Hua; Zhou, Hai-Jun

2014-07-01

Typical-case computation complexity is a research topic at the boundary of computer science, applied mathematics, and statistical physics. In the last twenty years, the replica-symmetry-breaking mean field theory of spin glasses and the associated message-passing algorithms have greatly deepened our understanding of typical-case computation complexity. In this paper, we use the vertex cover problem, a basic nondeterministic-polynomial (NP)-complete combinatorial optimization problem of wide application, as an example to introduce the statistical physical methods and algorithms. We do not go into the technical details but emphasize mainly the intuitive physical meanings of the message-passing equations. A nonfamiliar reader shall be able to understand to a large extent the physics behind the mean field approaches and to adjust the mean field methods in solving other optimization problems.
Trellis Tone Modulation Multiple-Access for Peer Discovery in D2D Networks.

PubMed

Lim, Chiwoo; Jang, Min; Kim, Sang-Hyo

2018-04-17

In this paper, a new non-orthogonal multiple-access scheme, trellis tone modulation multiple-access (TTMMA), is proposed for peer discovery of distributed device-to-device (D2D) communication. The range and capacity of discovery are important performance metrics in peer discovery. The proposed trellis tone modulation uses single-tone transmission and achieves a long discovery range due to its low Peak-to-Average Power Ratio (PAPR). The TTMMA also exploits non-orthogonal resource assignment to increase the discovery capacity. For the multi-user detection of superposed multiple-access signals, a message-passing algorithm with supplementary schemes are proposed. With TTMMA and its message-passing demodulation, approximately 1.5 times the number of devices are discovered compared to the conventional frequency division multiple-access (FDMA)-based discovery.

A Message Passing Approach to Side Chain Positioning with Applications in Protein Docking Refinement *

PubMed Central

Moghadasi, Mohammad; Kozakov, Dima; Mamonov, Artem B.; Vakili, Pirooz; Vajda, Sandor; Paschalidis, Ioannis Ch.

2013-01-01

We introduce a message-passing algorithm to solve the Side Chain Positioning (SCP) problem. SCP is a crucial component of protein docking refinement, which is a key step of an important class of problems in computational structural biology called protein docking. We model SCP as a combinatorial optimization problem and formulate it as a Maximum Weighted Independent Set (MWIS) problem. We then employ a modified and convergent belief-propagation algorithm to solve a relaxation of MWIS and develop randomized estimation heuristics that use the relaxed solution to obtain an effective MWIS feasible solution. Using a benchmark set of protein complexes we demonstrate that our approach leads to more accurate docking predictions compared to a baseline algorithm that does not solve the SCP. PMID:23515575
Multi-partitioning for ADI-schemes on message passing architectures

NASA Technical Reports Server (NTRS)

Vanderwijngaart, Rob F.

1994-01-01

A kind of discrete-operator splitting called Alternating Direction Implicit (ADI) has been found to be useful in simulating fluid flow problems. In particular, it is being used to study the effects of hot exhaust jets from high performance aircraft on landing surfaces. Decomposition techniques that minimize load imbalance and message-passing frequency are described. Three strategies that are investigated for implementing the NAS Scalar Penta-diagonal Parallel Benchmark (SP) are transposition, pipelined Gaussian elimination, and multipartitioning. The multipartitioning strategy, which was used on Ethernet, was found to be the most efficient, although it was considered only a moderate success because of Ethernet's limited communication properties. The efficiency derived largely from the coarse granularity of the strategy, which reduced latencies and allowed overlap of communication and computation.
Ultrascalable petaflop parallel supercomputer

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY

2010-07-20

A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
Humor in print health advertisements: enhanced attention, privileged recognition, and persuasiveness of preventive messages.

PubMed

Blanc, Nathalie; Brigaud, Emmanuelle

2014-01-01

This study tested the effect of humor in one particular type of print advertisement: the preventive health ads for three topics (alcohol, tobacco, obesity). Previous research using commercial ads demonstrated that individuals' attention is spontaneously attracted by humor, leading to a memory advantage for humorous information over nonhumorous information. Two experiments investigated whether the positive effect of humor can occur with preventive health ads. In Experiment 1, participants observed humorous and nonhumorous health ads while their viewing times were recorded. In Experiment 2, to compare humorous and nonhumorous ads, the memory of health messages was assessed through a recognition task and a convincing score was collected. The results confirmed that, compared to nonhumorous health ads, those using humor received prolonged attention, were judged more convincing, and their messages were better recognized. Overall, these findings suggest that humor can be of use in preventive health communication.
Message Passing on GPUs

NASA Astrophysics Data System (ADS)

Stuart, J. A.

2011-12-01

This paper explores the challenges in implementing a message passing interface usable on systems with data-parallel processors, and more specifically GPUs. As a case study, we design and implement the ``DCGN'' API on NVIDIA GPUs that is similar to MPI and allows full access to the underlying architecture. We introduce the notion of data-parallel thread-groups as a way to map resources to MPI ranks. We use a method that also allows the data-parallel processors to run autonomously from user-written CPU code. In order to facilitate communication, we use a sleep-based polling system to store and retrieve messages. Unlike previous systems, our method provides both performance and flexibility. By running a test suite of applications with different communication requirements, we find that a tolerable amount of overhead is incurred, somewhere between one and five percent depending on the application, and indicate the locations where this overhead accumulates. We conclude that with innovations in chipsets and drivers, this overhead will be mitigated and provide similar performance to typical CPU-based MPI implementations while providing fully-dynamic communication.
Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blocksome, Michael A.; Mamidala, Amith R.

2013-09-03

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segmentmore » of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.« less
Aerodynamic Shape Optimization of Supersonic Aircraft Configurations via an Adjoint Formulation on Parallel Computers

NASA Technical Reports Server (NTRS)

Reuther, James; Alonso, Juan Jose; Rimlinger, Mark J.; Jameson, Antony

1996-01-01

This work describes the application of a control theory-based aerodynamic shape optimization method to the problem of supersonic aircraft design. The design process is greatly accelerated through the use of both control theory and a parallel implementation on distributed memory computers. Control theory is employed to derive the adjoint differential equations whose solution allows for the evaluation of design gradient information at a fraction of the computational cost required by previous design methods. The resulting problem is then implemented on parallel distributed memory architectures using a domain decomposition approach, an optimized communication schedule, and the MPI (Message Passing Interface) Standard for portability and efficiency. The final result achieves very rapid aerodynamic design based on higher order computational fluid dynamics methods (CFD). In our earlier studies, the serial implementation of this design method was shown to be effective for the optimization of airfoils, wings, wing-bodies, and complex aircraft configurations using both the potential equation and the Euler equations. In our most recent paper, the Euler method was extended to treat complete aircraft configurations via a new multiblock implementation. Furthermore, during the same conference, we also presented preliminary results demonstrating that this basic methodology could be ported to distributed memory parallel computing architectures. In this paper, our concern will be to demonstrate that the combined power of these new technologies can be used routinely in an industrial design environment by applying it to the case study of the design of typical supersonic transport configurations. A particular difficulty of this test case is posed by the propulsion/airframe integration.
Altered retrieval of melodic information in congenital amusia: insights from dynamic causal modeling of MEG data.

PubMed

Albouy, Philippe; Mattout, Jérémie; Sanchez, Gaëtan; Tillmann, Barbara; Caclin, Anne

2015-01-01

Congenital amusia is a neuro-developmental disorder that primarily manifests as a difficulty in the perception and memory of pitch-based materials, including music. Recent findings have shown that the amusic brain exhibits altered functioning of a fronto-temporal network during pitch perception and short-term memory. Within this network, during the encoding of melodies, a decreased right backward frontal-to-temporal connectivity was reported in amusia, along with an abnormal connectivity within and between auditory cortices. The present study investigated whether connectivity patterns between these regions were affected during the short-term memory retrieval of melodies. Amusics and controls had to indicate whether sequences of six tones that were presented in pairs were the same or different. When melodies were different only one tone changed in the second melody. Brain responses to the changed tone in "Different" trials and to its equivalent (original) tone in "Same" trials were compared between groups using Dynamic Causal Modeling (DCM). DCM results confirmed that congenital amusia is characterized by an altered effective connectivity within and between the two auditory cortices during sound processing. Furthermore, right temporal-to-frontal message passing was altered in comparison to controls, with notably an increase in "Same" trials. An additional analysis in control participants emphasized that the detection of an unexpected event in the typically functioning brain is supported by right fronto-temporal connections. The results can be interpreted in a predictive coding framework as reflecting an abnormal prediction error sent by temporal auditory regions towards frontal areas in the amusic brain.
Words with Friends: Effects of Associative and Semantic Relationships on Subject-Verb Agreement Errors during Sentence Production

ERIC Educational Resources Information Center

Penta, Darrell J.

2017-01-01

The sentence production system transforms preverbal messages in the mind of a speaker into coherent grammatical utterances. During this process, which unfolds rapidly, the system has to link meaning information from the speaker's message to appropriate lexical and grammatical information from the speaker's memory. It usually does so with fluency…
Passing Messages between Biological Networks to Refine Predicted Interactions

PubMed Central

Glass, Kimberly; Huttenhower, Curtis; Quackenbush, John; Yuan, Guo-Cheng

2013-01-01

Regulatory network reconstruction is a fundamental problem in computational biology. There are significant limitations to such reconstruction using individual datasets, and increasingly people attempt to construct networks using multiple, independent datasets obtained from complementary sources, but methods for this integration are lacking. We developed PANDA (Passing Attributes between Networks for Data Assimilation), a message-passing model using multiple sources of information to predict regulatory relationships, and used it to integrate protein-protein interaction, gene expression, and sequence motif data to reconstruct genome-wide, condition-specific regulatory networks in yeast as a model. The resulting networks were not only more accurate than those produced using individual data sets and other existing methods, but they also captured information regarding specific biological mechanisms and pathways that were missed using other methodologies. PANDA is scalable to higher eukaryotes, applicable to specific tissue or cell type data and conceptually generalizable to include a variety of regulatory, interaction, expression, and other genome-scale data. An implementation of the PANDA algorithm is available at www.sourceforge.net/projects/panda-net. PMID:23741402
Applications of Probabilistic Combiners on Linear Feedback Shift Register Sequences

DTIC Science & Technology

2016-12-01

on the resulting output strings show a drastic increase in complexity, while simultaneously passing the stringent randomness tests required by the...a three-variable function. Our tests on the resulting output strings show a drastic increase in complex- ity, while simultaneously passing the...10001101 01000010 11101001 Decryption of a message that has been encrypted using bitwise XOR is quite simple. Since each bit is its own additive inverse
Deceived, Disgusted, and Defensive: Motivated Processing of Anti-Tobacco Advertisements.

PubMed

Leshner, Glenn; Clayton, Russell B; Bolls, Paul D; Bhandari, Manu

2017-08-29

A 2 × 2 experiment was conducted, where participants watched anti-tobacco messages that varied in deception (content portraying tobacco companies as dishonest) and disgust (negative graphic images) content. Psychophysiological measures, self-report, and a recognition test were used to test hypotheses generated from the motivated cognition framework. The results of this study indicate that messages containing both deception and disgust push viewers into a cascade of defensive responses reflected by increased self-reported unpleasantness, reduced resources allocated to encoding, worsened recognition memory, and dampened emotional responses compared to messages depicting one attribute or neither. Findings from this study demonstrate the value of applying a motivated cognition theoretical framework in research on responses to emotional content in health messages and support previous research on defensive processing and message design of anti-tobacco messages.
Long-Term Memory and Learning

ERIC Educational Resources Information Center

Crossland, John

2011-01-01

The English National Curriculum Programmes of Study emphasise the importance of knowledge, understanding and skills, and teachers are well versed in structuring learning in those terms. Research outcomes into how long-term memory is stored and retrieved provide support for structuring learning in this way. Four further messages are added to the…
Earth Observation

NASA Image and Video Library

2013-07-26

Earth observation taken during day pass by an Expedition 36 crew member on board the International Space Station (ISS). Per Twitter message: Never tire of finding shapes in the clouds! These look very botanical to me. Simply perfect.
PREEMIE Reauthorization Act

THOMAS, 112th Congress

Sen. Alexander, Lamar [R-TN

2011-07-28

Senate - 12/19/2012 Message on House action received in Senate and at desk: House amendments to Senate bill. (All Actions) Tracker: This bill has the status Passed HouseHere are the steps for Status of Legislation:
Electronic Cigarettes on Twitter – Spreading the Appeal of Flavors

PubMed Central

Chu, Kar-Hai; Unger, Jennifer B.; Cruz, Tess Boley; Soto, Daniel W.

2016-01-01

Objectives Social media platforms are used by tobacco companies to promote products. This study examines message content on Twitter from e-cigarette brands and determines if messages about flavors are more likely than non-flavor messages to be passed along to other viewers. Methods We examined Twitter data from 2 e-cigarette brands and identified messages that contained terms related to e-cigarette flavors. Results Flavor-related posts were retweeted at a significantly higher rate by e-cigarette brands (p = .04) and other Twitter users (p < .01) than non-flavor posts. Conclusions E-cigarette brands and other Twitter users pay attention to flavor-related posts and retweet them often. These findings suggest flavors continue to be an attractive characteristic and their marketing should be monitored closely. PMID:27853734
Parallelization of the TRIGRS model for rainfall-induced landslides using the message passing interface

USGS Publications Warehouse

Alvioli, M.; Baum, R.L.

2016-01-01

We describe a parallel implementation of TRIGRS, the Transient Rainfall Infiltration and Grid-Based Regional Slope-Stability Model for the timing and distribution of rainfall-induced shallow landslides. We have parallelized the four time-demanding execution modes of TRIGRS, namely both the saturated and unsaturated model with finite and infinite soil depth options, within the Message Passing Interface framework. In addition to new features of the code, we outline details of the parallel implementation and show the performance gain with respect to the serial code. Results are obtained both on commercial hardware and on a high-performance multi-node machine, showing the different limits of applicability of the new code. We also discuss the implications for the application of the model on large-scale areas and as a tool for real-time landslide hazard monitoring.
Identifying an influential spreader from a single seed in complex networks via a message-passing approach

NASA Astrophysics Data System (ADS)

Min, Byungjoon

2018-01-01

Identifying the most influential spreaders is one of outstanding problems in physics of complex systems. So far, many approaches have attempted to rank the influence of nodes but there is still the lack of accuracy to single out influential spreaders. Here, we directly tackle the problem of finding important spreaders by solving analytically the expected size of epidemic outbreaks when spreading originates from a single seed. We derive and validate a theory for calculating the size of epidemic outbreaks with a single seed using a message-passing approach. In addition, we find that the probability to occur epidemic outbreaks is highly dependent on the location of the seed but the size of epidemic outbreaks once it occurs is insensitive to the seed. We also show that our approach can be successfully adapted into weighted networks.
The design of multi-core DSP parallel model based on message passing and multi-level pipeline

NASA Astrophysics Data System (ADS)

Niu, Jingyu; Hu, Jian; He, Wenjing; Meng, Fanrong; Li, Chuanrong

2017-10-01

Currently, the design of embedded signal processing system is often based on a specific application, but this idea is not conducive to the rapid development of signal processing technology. In this paper, a parallel processing model architecture based on multi-core DSP platform is designed, and it is mainly suitable for the complex algorithms which are composed of different modules. This model combines the ideas of multi-level pipeline parallelism and message passing, and summarizes the advantages of the mainstream model of multi-core DSP (the Master-Slave model and the Data Flow model), so that it has better performance. This paper uses three-dimensional image generation algorithm to validate the efficiency of the proposed model by comparing with the effectiveness of the Master-Slave and the Data Flow model.
Generalized hypercube structures and hyperswitch communication network

NASA Technical Reports Server (NTRS)

Young, Steven D.

1992-01-01

This paper discusses an ongoing study that uses a recent development in communication control technology to implement hybrid hypercube structures. These architectures are similar to binary hypercubes, but they also provide added connectivity between the processors. This added connectivity increases communication reliability while decreasing the latency of interprocessor message passing. Because these factors directly determine the speed that can be obtained by multiprocessor systems, these architectures are attractive for applications such as remote exploration and experimentation, where high performance and ultrareliability are required. This paper describes and enumerates these architectures and discusses how they can be implemented with a modified version of the hyperswitch communication network (HCN). The HCN is analyzed because it has three attractive features that enable these architectures to be effective: speed, fault tolerance, and the ability to pass multiple messages simultaneously through the same hyperswitch controller.

Quantum Secure Direct Communication with Quantum Memory

NASA Astrophysics Data System (ADS)

Zhang, Wei; Ding, Dong-Sheng; Sheng, Yu-Bo; Zhou, Lan; Shi, Bao-Sen; Guo, Guang-Can

2017-06-01

Quantum communication provides an absolute security advantage, and it has been widely developed over the past 30 years. As an important branch of quantum communication, quantum secure direct communication (QSDC) promotes high security and instantaneousness in communication through directly transmitting messages over a quantum channel. The full implementation of a quantum protocol always requires the ability to control the transfer of a message effectively in the time domain; thus, it is essential to combine QSDC with quantum memory to accomplish the communication task. In this Letter, we report the experimental demonstration of QSDC with state-of-the-art atomic quantum memory for the first time in principle. We use the polarization degrees of freedom of photons as the information carrier, and the fidelity of entanglement decoding is verified as approximately 90%. Our work completes a fundamental step toward practical QSDC and demonstrates a potential application for long-distance quantum communication in a quantum network.
Quantum Secure Direct Communication with Quantum Memory.

PubMed

Zhang, Wei; Ding, Dong-Sheng; Sheng, Yu-Bo; Zhou, Lan; Shi, Bao-Sen; Guo, Guang-Can

2017-06-02

Quantum communication provides an absolute security advantage, and it has been widely developed over the past 30 years. As an important branch of quantum communication, quantum secure direct communication (QSDC) promotes high security and instantaneousness in communication through directly transmitting messages over a quantum channel. The full implementation of a quantum protocol always requires the ability to control the transfer of a message effectively in the time domain; thus, it is essential to combine QSDC with quantum memory to accomplish the communication task. In this Letter, we report the experimental demonstration of QSDC with state-of-the-art atomic quantum memory for the first time in principle. We use the polarization degrees of freedom of photons as the information carrier, and the fidelity of entanglement decoding is verified as approximately 90%. Our work completes a fundamental step toward practical QSDC and demonstrates a potential application for long-distance quantum communication in a quantum network.
Design structure for in-system redundant array repair in integrated circuits

DOEpatents

Bright, Arthur A.; Crumley, Paul G.; Dombrowa, Marc; Douskey, Steven M.; Haring, Rudolf A.; Oakland, Steven F.; Quellette, Michael R.; Strissel, Scott A.

2008-11-25

A design structure for repairing an integrated circuit during operation of the integrated circuit. The integrated circuit comprising of a multitude of memory arrays and a fuse box holding control data for controlling redundancy logic of the arrays. The design structure provides the integrated circuit with a control data selector for passing the control data from the fuse box to the memory arrays; providing a source of alternate control data, external of the integrated circuit; and connecting the source of alternate control data to the control data selector. The design structure further passes the alternate control data from the source thereof, through the control data selector and to the memory arrays to control the redundancy logic of the memory arrays.
Practical Formal Verification of MPI and Thread Programs

NASA Astrophysics Data System (ADS)

Gopalakrishnan, Ganesh; Kirby, Robert M.

Large-scale simulation codes in science and engineering are written using the Message Passing Interface (MPI). Shared memory threads are widely used directly, or to implement higher level programming abstractions. Traditional debugging methods for MPI or thread programs are incapable of providing useful formal guarantees about coverage. They get bogged down in the sheer number of interleavings (schedules), often missing shallow bugs. In this tutorial we will introduce two practical formal verification tools: ISP (for MPI C programs) and Inspect (for Pthread C programs). Unlike other formal verification tools, ISP and Inspect run directly on user source codes (much like a debugger). They pursue only the relevant set of process interleavings, using our own customized Dynamic Partial Order Reduction algorithms. For a given test harness, DPOR allows these tools to guarantee the absence of deadlocks, instrumented MPI object leaks and communication races (using ISP), and shared memory races (using Inspect). ISP and Inspect have been used to verify large pieces of code: in excess of 10,000 lines of MPI/C for ISP in under 5 seconds, and about 5,000 lines of Pthread/C code in a few hours (and much faster with the use of a cluster or by exploiting special cases such as symmetry) for Inspect. We will also demonstrate the Microsoft Visual Studio and Eclipse Parallel Tools Platform integrations of ISP (these will be available on the LiveCD).
Parallelization of TWOPORFLOW, a Cartesian Grid based Two-phase Porous Media Code for Transient Thermo-hydraulic Simulations

NASA Astrophysics Data System (ADS)

Trost, Nico; Jiménez, Javier; Imke, Uwe; Sanchez, Victor

2014-06-01

TWOPORFLOW is a thermo-hydraulic code based on a porous media approach to simulate single- and two-phase flow including boiling. It is under development at the Institute for Neutron Physics and Reactor Technology (INR) at KIT. The code features a 3D transient solution of the mass, momentum and energy conservation equations for two inter-penetrating fluids with a semi-implicit continuous Eulerian type solver. The application domain of TWOPORFLOW includes the flow in standard porous media and in structured porous media such as micro-channels and cores of nuclear power plants. In the latter case, the fluid domain is coupled to a fuel rod model, describing the heat flow inside the solid structure. In this work, detailed profiling tools have been utilized to determine the optimization potential of TWOPORFLOW. As a result, bottle-necks were identified and reduced in the most feasible way, leading for instance to an optimization of the water-steam property computation. Furthermore, an OpenMP implementation addressing the routines in charge of inter-phase momentum-, energy- and mass-coupling delivered good performance together with a high scalability on shared memory architectures. In contrast to that, the approach for distributed memory systems was to solve sub-problems resulting by the decomposition of the initial Cartesian geometry. Thread communication for the sub-problem boundary updates was accomplished by the Message Passing Interface (MPI) standard.
Parallelization of a Monte Carlo particle transport simulation code

NASA Astrophysics Data System (ADS)

Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

2010-05-01

We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Xyce Parallel Electronic Simulator Users' Guide Version 6.8

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows onemore » to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase$-$ a message passing parallel implementation $-$ which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Recall Performance of Children Failing Memory Portions of a Speech--Language--Memory Screening Battery.

ERIC Educational Resources Information Center

Tobey, Emily A.; And Others

1982-01-01

Recall performance of 22 first-grade and third-grade children who failed memory portions of a speech-language-memory screen was examined using digit and consonant-vowel (CV) stimulus sets. Data indicate children failing the screening battery differed quantitatively, rather than qualitatively, from children passing the screening batter. (Author)
Earth Observation

NASA Image and Video Library

2013-08-03

Earth observation taken during day pass by an Expedition 36 crew member on board the International Space Station (ISS). Per Twitter message: From southernmost point of orbit over the South Pacific- all clouds seemed to be leading to the South Pole.
Earth Observation

NASA Image and Video Library

2013-07-21

Earth observation taken during night pass by an Expedition 36 crew member on board the International Space Station (ISS). Per Twitter message this is labeled as : Tehran, Iran. Lights along the coast of the Caspian Sea visible through clouds. July 21.
Whistleblower Protection Enhancement Act of 2010

THOMAS, 111th Congress

Sen. Akaka, Daniel K. [D-HI

2009-02-03

Senate - 12/22/2010 Message on House action received in Senate and at desk: House amendments to Senate bill. (All Actions) Tracker: This bill has the status Passed HouseHere are the steps for Status of Legislation:
Earth Observation

NASA Image and Video Library

2014-05-31

Earth Observation taken during a day pass by the Expedition 40 crew aboard the International Space Station (ISS). Folder lists this as: CEO - Arena de Sao Paolo. View used for Twitter message: Cloudy skies over São Paulo Brazil
Self-pacing direct memory access data transfer operations for compute nodes in a parallel computer

DOEpatents

Blocksome, Michael A

2015-02-17

Methods, apparatus, and products are disclosed for self-pacing DMA data transfer operations for nodes in a parallel computer that include: transferring, by an origin DMA on an origin node, a RTS message to a target node, the RTS message specifying an message on the origin node for transfer to the target node; receiving, in an origin injection FIFO for the origin DMA from a target DMA on the target node in response to transferring the RTS message, a target RGET descriptor followed by a DMA transfer operation descriptor, the DMA descriptor for transmitting a message portion to the target node, the target RGET descriptor specifying an origin RGET descriptor on the origin node that specifies an additional DMA descriptor for transmitting an additional message portion to the target node; processing, by the origin DMA, the target RGET descriptor; and processing, by the origin DMA, the DMA transfer operation descriptor.
A Connectionist/Control Architecture for Working Memory and Workload: Why Working Memory Is Not 7+ /-2

DTIC Science & Technology

1987-09-29

load on the centra and regional controllers. Strategy #S: Reduce message Chase. W.G., & Ericsson. K.A. ( 1082 ). Skill andInterference of concurrently...bypsycholocy Lf learning and motivation. Vol. Ia. strengthening region. to,.region connections on theNeYokAcdmcPs. innerlooip. Crowder, R.G. ( 1082
Interprocess Communication in Highly Distributed Systems - A Workshop Report - 20 to 22, November 1978.

DTIC Science & Technology

1979-12-01

Links between processes can be aLlocated strictLy to controL functions. In fact, the degree of separation of control and data is an important research is...delays or Loss of control messages. Cognoscienti agree that message-passing IPC schemes are equivalent in "power" to schemes which employ shared...THEORETICAL WORK Page 55 SECTION 6 THEORETICAL WORK 6.1 WORKING GRUP JIM REP.OR STRUCTURE of Discussion: Distributed system without central (or any) control
Effects of cell-phone and text-message distractions on true and false recognition.

PubMed

Smith, Theodore S; Isaak, Matthew I; Senette, Christian G; Abadie, Brenton G

2011-06-01

This study examined the effects of electronic communication distractions, including cell-phone and texting demands, on true and false recognition, specifically semantically related words presented and not presented on a computer screen. Participants were presented with 24 Deese-Roediger-McDermott (DRM) lists while manipulating the concurrent presence or absence of cell-phone and text-message distractions during study. In the DRM paradigm, participants study lists of semantically related words (e.g., mother, crib, and diaper) linked to a non-presented critical lure (e.g., baby). After studying the lists of words, participants are then requested to recall or recognize previously presented words. Participants often not only demonstrate high remembrance for presented words (true memory: crib), but also recollection for non-presented words (false memory: baby). In the present study, true memory was highest when participants were not presented with any distraction tasks during study of DRM words, but poorer when they were required to complete a cell-phone conversation or text-message task during study. False recognition measures did not statistically vary across distraction conditions. Signal detection analyses showed that participants better discriminated true targets (list items presented during study) from true target controls (items presented during study only) when cell-phone or text-message distractions were absent than when they were present. Response bias did not vary significantly across distraction conditions, as there were no differences in the likelihood that a participant would claim an item as "old" (previously presented) rather than "new" (not previously presented). Results of this study are examined with respect to both activation monitoring and fuzzy trace theories.
Method and apparatus for in-system redundant array repair on integrated circuits

DOEpatents

Bright, Arthur A [Croton-on-Hudson, NY; Crumley, Paul G [Yorktown Heights, NY; Dombrowa, Marc B [Bronx, NY; Douskey, Steven M [Rochester, MN; Haring, Rudolf A [Cortlandt Manor, NY; Oakland, Steven F [Colchester, VT; Ouellette, Michael R [Westford, VT; Strissel, Scott A [Byron, MN

2008-07-29

Disclosed is a method of repairing an integrated circuit of the type comprising of a multitude of memory arrays and a fuse box holding control data for controlling redundancy logic of the arrays. The method comprises the steps of providing the integrated circuit with a control data selector for passing the control data from the fuse box to the memory arrays; providing a source of alternate control data, external of the integrated circuit; and connecting the source of alternate control data to the control data selector. The method comprises the further step of, at a given time, passing the alternate control data from the source thereof, through the control data selector and to the memory arrays to control the redundancy logic of the memory arrays.
Method and apparatus for in-system redundant array repair on integrated circuits

DOEpatents

Bright, Arthur A [Croton-on-Hudson, NY; Crumley, Paul G [Yorktown Heights, NY; Dombrowa, Marc B [Bronx, NY; Douskey, Steven M [Rochester, MN; Haring, Rudolf A [Cortlandt Manor, NY; Oakland, Steven F [Colchester, VT; Ouellette, Michael R [Westford, VT; Strissel, Scott A [Byron, MN

2008-07-08

Disclosed is a method of repairing an integrated circuit of the type comprising of a multitude of memory arrays and a fuse box holding control data for controlling redundancy logic of the arrays. The method comprises the steps of providing the integrated circuit with a control data selector for passing the control data from the fuse box to the memory arrays; providing a source of alternate control data, external of the integrated circuit; and connecting the source of alternate control data to the control data selector. The method comprises the further step of, at a given time, passing the alternate control data from the source thereof, through the control data selector and to the memory arrays to control the redundancy logic of the memory arrays.
Method and apparatus for in-system redundant array repair on integrated circuits

DOEpatents

Bright, Arthur A.; Crumley, Paul G.; Dombrowa, Marc B.; Douskey, Steven M.; Haring, Rudolf A.; Oakland, Steven F.; Ouellette, Michael R.; Strissel, Scott A.

2007-12-18

Disclosed is a method of repairing an integrated circuit of the type comprising of a multitude of memory arrays and a fuse box holding control data for controlling redundancy logic of the arrays. The method comprises the steps of providing the integrated circuit with a control data selector for passing the control data from the fuse box to the memory arrays; providing a source of alternate control data, external of the integrated circuit; and connecting the source of alternate control data to the control data selector. The method comprises the further step of, at a given time, passing the alternate control data from the source thereof, through the control data selector and to the memory arrays to control the redundancy logic of the memory arrays.
Shape memory polymer medical device

DOEpatents

Maitland, Duncan [Pleasant Hill, CA; Benett, William J [Livermore, CA; Bearinger, Jane P [Livermore, CA; Wilson, Thomas S [San Leandro, CA; Small, IV, Ward; Schumann, Daniel L [Concord, CA; Jensen, Wayne A [Livermore, CA; Ortega, Jason M [Pacifica, CA; Marion, III, John E.; Loge, Jeffrey M [Stockton, CA

2010-06-29

A system for removing matter from a conduit. The system includes the steps of passing a transport vehicle and a shape memory polymer material through the conduit, transmitting energy to the shape memory polymer material for moving the shape memory polymer material from a first shape to a second and different shape, and withdrawing the transport vehicle and the shape memory polymer material through the conduit carrying the matter.

Terrorism Risk Insurance Program Reauthorization Act of 2014

THOMAS, 113th Congress

Sen. Schumer, Charles E. [D-NY

2014-04-10

Senate - 12/11/2014 Message on House action received in Senate and at desk: House amendment to Senate bill. (All Actions) Tracker: This bill has the status Passed HouseHere are the steps for Status of Legislation:
Electronic Message Preservation Act

THOMAS, 111th Congress

Rep. Hodes, Paul W. [D-NH-2

2009-03-09

Senate - 03/18/2010 Received in the Senate and Read twice and referred to the Committee on Homeland Security and Governmental Affairs. (All Actions) Tracker: This bill has the status Passed HouseHere are the steps for Status of Legislation:
Federal Financial Assistance Management Improvement Act of 2009

THOMAS, 111th Congress

Sen. Voinovich, George V. [R-OH

2009-01-22

Senate - 12/15/2009 Message on House action received in Senate and at desk: House amendment to Senate bill. (All Actions) Tracker: This bill has the status Passed HouseHere are the steps for Status of Legislation:
Messaging to Increase Public Support for Naloxone Distribution Policies in the United States: Results from a Randomized Survey Experiment.

PubMed

Bachhuber, Marcus A; McGinty, Emma E; Kennedy-Hendricks, Alene; Niederdeppe, Jeff; Barry, Colleen L

2015-01-01

Barriers to public support for naloxone distribution include lack of knowledge, concerns about potential unintended consequences, and lack of sympathy for people at risk of overdose. A randomized survey experiment was conducted with a nationally-representative web-based survey research panel (GfK KnowledgePanel). Participants were randomly assigned to read different messages alone or in combination: 1) factual information about naloxone; 2) pre-emptive refutation of potential concerns about naloxone distribution; and 3) a sympathetic narrative about a mother whose daughter died of an opioid overdose. Participants were then asked if they support or oppose policies related to naloxone distribution. For each policy item, logistic regression models were used to test the effect of each message exposure compared with the no-exposure control group. The final sample consisted of 1,598 participants (completion rate: 72.6%). Factual information and the sympathetic narrative alone each led to higher support for training first responders to use naloxone, providing naloxone to friends and family members of people using opioids, and passing laws to protect people who administer naloxone. Participants receiving the combination of the sympathetic narrative and factual information, compared to factual information alone, were more likely to support all policies: providing naloxone to friends and family members (OR: 2.0 [95% CI: 1.4 to 2.9]), training first responders to use naloxone (OR: 2.0 [95% CI: 1.2 to 3.4]), passing laws to protect people if they administer naloxone (OR: 1.5 [95% CI: 1.04 to 2.2]), and passing laws to protect people if they call for medical help for an overdose (OR: 1.7 [95% CI: 1.2 to 2.5]). All messages increased public support, but combining factual information and the sympathetic narrative was most effective. Public support for naloxone distribution can be improved through education and sympathetic portrayals of the population who stands to benefit from these policies.
Statistical analysis of loopy belief propagation in random fields

NASA Astrophysics Data System (ADS)

Yasuda, Muneki; Kataoka, Shun; Tanaka, Kazuyuki

2015-10-01

Loopy belief propagation (LBP), which is equivalent to the Bethe approximation in statistical mechanics, is a message-passing-type inference method that is widely used to analyze systems based on Markov random fields (MRFs). In this paper, we propose a message-passing-type method to analytically evaluate the quenched average of LBP in random fields by using the replica cluster variation method. The proposed analytical method is applicable to general pairwise MRFs with random fields whose distributions differ from each other and can give the quenched averages of the Bethe free energies over random fields, which are consistent with numerical results. The order of its computational cost is equivalent to that of standard LBP. In the latter part of this paper, we describe the application of the proposed method to Bayesian image restoration, in which we observed that our theoretical results are in good agreement with the numerical results for natural images.
Message-passing-interface-based parallel FDTD investigation on the EM scattering from a 1-D rough sea surface using uniaxial perfectly matched layer absorbing boundary.

PubMed

Li, J; Guo, L-X; Zeng, H; Han, X-B

2009-06-01

A message-passing-interface (MPI)-based parallel finite-difference time-domain (FDTD) algorithm for the electromagnetic scattering from a 1-D randomly rough sea surface is presented. The uniaxial perfectly matched layer (UPML) medium is adopted for truncation of FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different processors is illustrated for one sea surface realization, and the computation time of the parallel FDTD algorithm is dramatically reduced compared to a single-process implementation. Finally, some numerical results are shown, including the backscattering characteristics of sea surface for different polarization and the bistatic scattering from a sea surface with large incident angle and large wind speed.
Extensions to the Parallel Real-Time Artificial Intelligence System (PRAIS) for fault-tolerant heterogeneous cycle-stealing reasoning

NASA Technical Reports Server (NTRS)

Goldstein, David

1991-01-01

Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.
Reviewing the Role of Cognitive Load, Expertise Level, Motivation, and Unconscious Processing in Working Memory Performance

ERIC Educational Resources Information Center

Kuldas, Seffetullah; Hashim, Shahabuddin; Ismail, Hairul Nizam; Abu Bakar, Zainudin

2015-01-01

Human cognitive capacity is unavailable for conscious processing of every amount of instructional messages. Aligning an instructional design with learner expertise level would allow better use of available working memory capacity in a cognitive learning task. Motivating students to learn consciously is also an essential determinant of the capacity…
A manual for PARTI runtime primitives

NASA Technical Reports Server (NTRS)

Berryman, Harry; Saltz, Joel

1990-01-01

Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
Articulatory Suppression in Language Interpretation: Working Memory Capacity, Dual Tasking and Word Knowledge

ERIC Educational Resources Information Center

Padilla, Francisca; Bajo, Maria Teresa; Macizo, Pedro

2005-01-01

How do interpreters manage to cope with the adverse effects of concurrent articulation while trying to comprehend the message in the source language? In Experiments 1-3, we explored three possible working memory (WM) functions that may underlie the ability to simultaneously comprehend and produce in the interpreters: WM storage capacity,…
Co-creators of memory, metaphors for resilience, and mechanisms for recovery: flora in living memorials to 9/11

Treesearch

Heather L. McMillen; Lindsay K. Campbell; Erika S. Svendsen

2017-01-01

Planting trees to mark the passing of events and people is a longstanding tradition around the world. To better understand the contemporary practices of memorialization through planting, we examine living memorials created in response to the events of September 11, 2001. As an extension of the U.S. Forest Service Living Memorials Project, we reviewed existing data (n...
First Applications of the New Parallel Krylov Solver for MODFLOW on a National and Global Scale

NASA Astrophysics Data System (ADS)

Verkaik, J.; Hughes, J. D.; Sutanudjaja, E.; van Walsum, P.

2016-12-01

Integrated high-resolution hydrologic models are increasingly being used for evaluating water management measures at field scale. Their drawbacks are large memory requirements and long run times. Examples of such models are The Netherlands Hydrological Instrument (NHI) model and the PCRaster Global Water Balance (PCR-GLOBWB) model. Typical simulation periods are 30-100 years with daily timesteps. The NHI model predicts water demands in periods of drought, supporting operational and long-term water-supply decisions. The NHI is a state-of-the-art coupling of several models: a 7-layer MODFLOW groundwater model ( 6.5M 250m cells), a MetaSWAP model for the unsaturated zone (Richards emulator of 0.5M cells), and a surface water model (MOZART-DM). The PCR-GLOBWB model provides a grid-based representation of global terrestrial hydrology and this work uses the version that includes a 2-layer MODFLOW groundwater model ( 4.5M 10km cells). The Parallel Krylov Solver (PKS) speeds up computation by both distributed memory parallelization (Message Passing Interface) and shared memory parallelization (Open Multi-Processing). PKS includes conjugate gradient, bi-conjugate gradient stabilized, and generalized minimal residual linear accelerators that use an overlapping additive Schwarz domain decomposition preconditioner. PKS can be used for both structured and unstructured grids and has been fully integrated in MODFLOW-USG using METIS partitioning and in iMODFLOW using RCB partitioning. iMODFLOW is an accelerated version of MODFLOW-2005 that is implicitly and online coupled to MetaSWAP. Results for benchmarks carried out on the Cartesius Dutch supercomputer (https://userinfo.surfsara.nl/systems/cartesius) for the PCRGLOB-WB model and on a 2x16 core Windows machine for the NHI model show speedups up to 10-20 and 5-10, respectively.
Aerodynamic Shape Optimization of Supersonic Aircraft Configurations via an Adjoint Formulation on Parallel Computers

NASA Technical Reports Server (NTRS)

Reuther, James; Alonso, Juan Jose; Rimlinger, Mark J.; Jameson, Antony

1996-01-01

This work describes the application of a control theory-based aerodynamic shape optimization method to the problem of supersonic aircraft design. The design process is greatly accelerated through the use of both control theory and a parallel implementation on distributed memory computers. Control theory is employed to derive the adjoint differential equations whose solution allows for the evaluation of design gradient information at a fraction of the computational cost required by previous design methods (13, 12, 44, 38). The resulting problem is then implemented on parallel distributed memory architectures using a domain decomposition approach, an optimized communication schedule, and the MPI (Message Passing Interface) Standard for portability and efficiency. The final result achieves very rapid aerodynamic design based on higher order computational fluid dynamics methods (CFD). In our earlier studies, the serial implementation of this design method (19, 20, 21, 23, 39, 25, 40, 41, 42, 43, 9) was shown to be effective for the optimization of airfoils, wings, wing-bodies, and complex aircraft configurations using both the potential equation and the Euler equations (39, 25). In our most recent paper, the Euler method was extended to treat complete aircraft configurations via a new multiblock implementation. Furthermore, during the same conference, we also presented preliminary results demonstrating that the basic methodology could be ported to distributed memory parallel computing architectures [241. In this paper, our concem will be to demonstrate that the combined power of these new technologies can be used routinely in an industrial design environment by applying it to the case study of the design of typical supersonic transport configurations. A particular difficulty of this test case is posed by the propulsion/airframe integration.
A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoemmen, Mark

2010-11-01

Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches formore » orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.« less
The relation between working memory capacity and auditory lateralization in children with auditory processing disorders.

PubMed

Moossavi, Abdollah; Mehrkian, Saiedeh; Lotfi, Yones; Faghihzadeh, Soghrat; sajedi, Hamed

2014-11-01

Auditory processing disorder (APD) describes a complex and heterogeneous disorder characterized by poor speech perception, especially in noisy environments. APD may be responsible for a range of sensory processing deficits associated with learning difficulties. There is no general consensus about the nature of APD and how the disorder should be assessed or managed. This study assessed the effect of cognition abilities (working memory capacity) on sound lateralization in children with auditory processing disorders, in order to determine how "auditory cognition" interacts with APD. The participants in this cross-sectional comparative study were 20 typically developing and 17 children with a diagnosed auditory processing disorder (9-11 years old). Sound lateralization abilities investigated using inter-aural time (ITD) differences and inter-aural intensity (IID) differences with two stimuli (high pass and low pass noise) in nine perceived positions. Working memory capacity was evaluated using the non-word repetition, and forward and backward digits span tasks. Linear regression was employed to measure the degree of association between working memory capacity and localization tests between the two groups. Children in the APD group had consistently lower scores than typically developing subjects in lateralization and working memory capacity measures. The results showed working memory capacity had significantly negative correlation with ITD errors especially with high pass noise stimulus but not with IID errors in APD children. The study highlights the impact of working memory capacity on auditory lateralization. The finding of this research indicates that the extent to which working memory influences auditory processing depend on the type of auditory processing and the nature of stimulus/listening situation. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Parallel SOR methods with a parabolic-diffusion acceleration technique for solving an unstructured-grid Poisson equation on 3D arbitrary geometries

NASA Astrophysics Data System (ADS)

Zapata, M. A. Uh; Van Bang, D. Pham; Nguyen, K. D.

2016-05-01

This paper presents a parallel algorithm for the finite-volume discretisation of the Poisson equation on three-dimensional arbitrary geometries. The proposed method is formulated by using a 2D horizontal block domain decomposition and interprocessor data communication techniques with message passing interface. The horizontal unstructured-grid cells are reordered according to the neighbouring relations and decomposed into blocks using a load-balanced distribution to give all processors an equal amount of elements. In this algorithm, two parallel successive over-relaxation methods are presented: a multi-colour ordering technique for unstructured grids based on distributed memory and a block method using reordering index following similar ideas of the partitioning for structured grids. In all cases, the parallel algorithms are implemented with a combination of an acceleration iterative solver. This solver is based on a parabolic-diffusion equation introduced to obtain faster solutions of the linear systems arising from the discretisation. Numerical results are given to evaluate the performances of the methods showing speedups better than linear.
Parallel computation and the Basis system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, G.R.

1992-12-16

A software package has been written that can facilitate efforts to develop powerful, flexible, and easy-to-use programs that can run in single-processor, massively parallel, and distributed computing environments. Particular attention has been given to the difficulties posed by a program consisting of many science packages that represent subsystems of a complicated, coupled system. Methods have been found to maintain independence of the packages by hiding data structures without increasing the communication costs in a parallel computing environment. Concepts developed in this work are demonstrated by a prototype program that uses library routines from two existing software systems, Basis and Parallelmore » Virtual Machine (PVM). Most of the details of these libraries have been encapsulated in routines and macros that could be rewritten for alternative libraries that possess certain minimum capabilities. The prototype software uses a flexible master-and-slaves paradigm for parallel computation and supports domain decomposition with message passing for partitioning work among slaves. Facilities are provided for accessing variables that are distributed among the memories of slaves assigned to subdomains. The software is named PROTOPAR.« less
Parallel computation and the basis system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, G.R.

1993-05-01

A software package has been written that can facilitate efforts to develop powerful, flexible, and easy-to use programs that can run in single-processor, massively parallel, and distributed computing environments. Particular attention has been given to the difficulties posed by a program consisting of many science packages that represent subsystems of a complicated, coupled system. Methods have been found to maintain independence of the packages by hiding data structures without increasing the communications costs in a parallel computing environment. Concepts developed in this work are demonstrated by a prototype program that uses library routines from two existing software systems, Basis andmore » Parallel Virtual Machine (PVM). Most of the details of these libraries have been encapsulated in routines and macros that could be rewritten for alternative libraries that possess certain minimum capabilities. The prototype software uses a flexible master-and-slaves paradigm for parallel computation and supports domain decomposition with message passing for partitioning work among slaves. Facilities are provided for accessing variables that are distributed among the memories of slaves assigned to subdomains. The software is named PROTOPAR.« less
Impact of the implementation of MPI point-to-point communications on the performance of two general sparse solvers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amestoy, Patrick R.; Duff, Iain S.; L'Excellent, Jean-Yves

2001-10-10

We examine the mechanics of the send and receive mechanism of MPI and in particular how we can implement message passing in a robust way so that our performance is not significantly affected by changes to the MPI system. This leads us to using the Isend/Irecv protocol which will entail sometimes significant algorithmic changes. We discuss this within the context of two different algorithms for sparse Gaussian elimination that we have parallelized. One is a multifrontal solver called MUMPS, the other is a supernodal solver called SuperLU. Both algorithms are difficult to parallelize on distributed memory machines. Our initial strategiesmore » were based on simple MPI point-to-point communication primitives. With such approaches, the parallel performance of both codes are very sensitive to the MPI implementation, the way MPI internal buffers are used in particular. We then modified our codes to use more sophisticated nonblocking versions of MPI communication. This significantly improved the performance robustness (independent of the MPI buffering mechanism) and scalability, but at the cost of increased code complexity.« less
Parallel Monte Carlo transport modeling in the context of a time-dependent, three-dimensional multi-physics code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Procassini, R.J.

1997-12-31

The fine-scale, multi-space resolution that is envisioned for accurate simulations of complex weapons systems in three spatial dimensions implies flop-rate and memory-storage requirements that will only be obtained in the near future through the use of parallel computational techniques. Since the Monte Carlo transport models in these simulations usually stress both of these computational resources, they are prime candidates for parallelization. The MONACO Monte Carlo transport package, which is currently under development at LLNL, will utilize two types of parallelism within the context of a multi-physics design code: decomposition of the spatial domain across processors (spatial parallelism) and distribution ofmore » particles in a given spatial subdomain across additional processors (particle parallelism). This implementation of the package will utilize explicit data communication between domains (message passing). Such a parallel implementation of a Monte Carlo transport model will result in non-deterministic communication patterns. The communication of particles between subdomains during a Monte Carlo time step may require a significant level of effort to achieve a high parallel efficiency.« less

Analysis of a parallelized nonlinear elliptic boundary value problem solver with application to reacting flows

NASA Technical Reports Server (NTRS)

Keyes, David E.; Smooke, Mitchell D.

1987-01-01

A parallelized finite difference code based on the Newton method for systems of nonlinear elliptic boundary value problems in two dimensions is analyzed in terms of computational complexity and parallel efficiency. An approximate cost function depending on 15 dimensionless parameters is derived for algorithms based on stripwise and boxwise decompositions of the domain and a one-to-one assignment of the strip or box subdomains to processors. The sensitivity of the cost functions to the parameters is explored in regions of parameter space corresponding to model small-order systems with inexpensive function evaluations and also a coupled system of nineteen equations with very expensive function evaluations. The algorithm was implemented on the Intel Hypercube, and some experimental results for the model problems with stripwise decompositions are presented and compared with the theory. In the context of computational combustion problems, multiprocessors of either message-passing or shared-memory type may be employed with stripwise decompositions to realize speedup of O(n), where n is mesh resolution in one direction, for reasonable n.
Extending the length and time scales of Gram–Schmidt Lyapunov vector computations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Costa, Anthony B., E-mail: acosta@northwestern.edu; Green, Jason R., E-mail: jason.green@umb.edu; Department of Chemistry, University of Massachusetts Boston, Boston, MA 02125

Lyapunov vectors have found growing interest recently due to their ability to characterize systems out of thermodynamic equilibrium. The computation of orthogonal Gram–Schmidt vectors requires multiplication and QR decomposition of large matrices, which grow as N{sup 2} (with the particle count). This expense has limited such calculations to relatively small systems and short time scales. Here, we detail two implementations of an algorithm for computing Gram–Schmidt vectors. The first is a distributed-memory message-passing method using Scalapack. The second uses the newly-released MAGMA library for GPUs. We compare the performance of both codes for Lennard–Jones fluids from N=100 to 1300 betweenmore » Intel Nahalem/Infiniband DDR and NVIDIA C2050 architectures. To our best knowledge, these are the largest systems for which the Gram–Schmidt Lyapunov vectors have been computed, and the first time their calculation has been GPU-accelerated. We conclude that Lyapunov vector calculations can be significantly extended in length and time by leveraging the power of GPU-accelerated linear algebra.« less
A hybrid parallel framework for the cellular Potts model simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jiang, Yi; He, Kejing; Dong, Shoubin

2009-01-01

The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approachmore » achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).« less
Hybrid MPI/OpenMP Implementation of the ORAC Molecular Dynamics Program for Generalized Ensemble and Fast Switching Alchemical Simulations.

PubMed

Procacci, Piero

2016-06-27

We present a new release (6.0β) of the ORAC program [Marsili et al. J. Comput. Chem. 2010, 31, 1106-1116] with a hybrid OpenMP/MPI (open multiprocessing message passing interface) multilevel parallelism tailored for generalized ensemble (GE) and fast switching double annihilation (FS-DAM) nonequilibrium technology aimed at evaluating the binding free energy in drug-receptor system on high performance computing platforms. The production of the GE or FS-DAM trajectories is handled using a weak scaling parallel approach on the MPI level only, while a strong scaling force decomposition scheme is implemented for intranode computations with shared memory access at the OpenMP level. The efficiency, simplicity, and inherent parallel nature of the ORAC implementation of the FS-DAM algorithm, project the code as a possible effective tool for a second generation high throughput virtual screening in drug discovery and design. The code, along with documentation, testing, and ancillary tools, is distributed under the provisions of the General Public License and can be freely downloaded at www.chim.unifi.it/orac .
Automated Instrumentation, Monitoring and Visualization of PVM Programs Using AIMS

NASA Technical Reports Server (NTRS)

Mehra, Pankaj; VanVoorst, Brian; Yan, Jerry; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

We present views and analysis of the execution of several PVM (Parallel Virtual Machine) codes for Computational Fluid Dynamics on a networks of Sparcstations, including: (1) NAS Parallel Benchmarks CG and MG; (2) a multi-partitioning algorithm for NAS Parallel Benchmark SP; and (3) an overset grid flowsolver. These views and analysis were obtained using our Automated Instrumentation and Monitoring System (AIMS) version 3.0, a toolkit for debugging the performance of PVM programs. We will describe the architecture, operation and application of AIMS. The AIMS toolkit contains: (1) Xinstrument, which can automatically instrument various computational and communication constructs in message-passing parallel programs; (2) Monitor, a library of runtime trace-collection routines; (3) VK (Visual Kernel), an execution-animation tool with source-code clickback; and (4) Tally, a tool for statistical analysis of execution profiles. Currently, Xinstrument can handle C and Fortran 77 programs using PVM 3.2.x; Monitor has been implemented and tested on Sun 4 systems running SunOS 4.1.2; and VK uses XIIR5 and Motif 1.2. Data and views obtained using AIMS clearly illustrate several characteristic features of executing parallel programs on networked workstations: (1) the impact of long message latencies; (2) the impact of multiprogramming overheads and associated load imbalance; (3) cache and virtual-memory effects; and (4) significant skews between workstation clocks. Interestingly, AIMS can compensate for constant skew (zero drift) by calibrating the skew between a parent and its spawned children. In addition, AIMS' skew-compensation algorithm can adjust timestamps in a way that eliminates physically impossible communications (e.g., messages going backwards in time). Our current efforts are directed toward creating new views to explain the observed performance of PVM programs. Some of the features planned for the near future include: (1) ConfigView, showing the physical topology of the virtual machine, inferred using specially formatted IP (Internet Protocol) packets: and (2) LoadView, synchronous animation of PVM-program execution and resource-utilization patterns.
Single Sided Messaging v. 0.6.6

DOE Office of Scientific and Technical Information (OSTI.GOV)

Curry, Matthew Leon; Farmer, Matthew Shane; Hassani, Amin

Single-Sided Messaging (SSM) is a portable, multitransport networking library that enables applications to leverage potential one-sided capabilities of underlying network transports. It also provides desirable semantics that services for highperformance, massively parallel computers can leverage, such as an explicit cancel operation for pending transmissions, as well as enhanced matching semantics favoring large numbers of buffers attached to a single match entry. This release supports TCP/IP, shared memory, and Infiniband.
Accumulative Roll Bonding and Post-Deformation Annealing of Cu-Al-Mn Shape Memory Alloy

NASA Astrophysics Data System (ADS)

Moghaddam, Ahmad Ostovari; Ketabchi, Mostafa; Afrasiabi, Yaser

2014-12-01

Accumulative roll bonding is a severe plastic deformation process used for Cu-Al-Mn shape memory alloy. The main purpose of this study is to investigate the possibility of grain refinement of Cu-9.5Al-8.2Mn (in wt.%) shape memory alloy using accumulative roll bonding and post-deformation annealing. The alloy was successfully subjected to 5 passes of accumulative roll bonding at 600 °C. The microstructure, properties as well as post-deformation annealing of this alloy were investigated by optical microscopy, scanning electron microscopy, x-ray diffraction, differential scanning calorimeter, and bend and tensile testing. The results showed that after 5 passes of ARB at 600 °C, specimens possessed α + β microstructure with the refined grains, but martensite phases and consequently shape memory effect completely disappeared. Post-deformation annealing was carried out at 700 °C, and the martensite phase with the smallest grain size (less than 40 μm) was obtained after 150 s of annealing at 700 °C. It was found that after 5 passes of ARB and post-deformation annealing, the stability of SME during thermal cycling improved. Also, tensile properties of alloys significantly improved after post-deformation annealing.
Comparison of neuronal spike exchange methods on a Blue Gene/P supercomputer.

PubMed

Hines, Michael; Kumar, Sameer; Schürmann, Felix

2011-01-01

For neural network simulations on parallel machines, interprocessor spike communication can be a significant portion of the total simulation time. The performance of several spike exchange methods using a Blue Gene/P (BG/P) supercomputer has been tested with 8-128 K cores using randomly connected networks of up to 32 M cells with 1 k connections per cell and 4 M cells with 10 k connections per cell, i.e., on the order of 4·10(10) connections (K is 1024, M is 1024(2), and k is 1000). The spike exchange methods used are the standard Message Passing Interface (MPI) collective, MPI_Allgather, and several variants of the non-blocking Multisend method either implemented via non-blocking MPI_Isend, or exploiting the possibility of very low overhead direct memory access (DMA) communication available on the BG/P. In all cases, the worst performing method was that using MPI_Isend due to the high overhead of initiating a spike communication. The two best performing methods-the persistent Multisend method using the Record-Replay feature of the Deep Computing Messaging Framework DCMF_Multicast; and a two-phase multisend in which a DCMF_Multicast is used to first send to a subset of phase one destination cores, which then pass it on to their subset of phase two destination cores-had similar performance with very low overhead for the initiation of spike communication. Departure from ideal scaling for the Multisend methods is almost completely due to load imbalance caused by the large variation in number of cells that fire on each processor in the interval between synchronization. Spike exchange time itself is negligible since transmission overlaps with computation and is handled by a DMA controller. We conclude that ideal performance scaling will be ultimately limited by imbalance between incoming processor spikes between synchronization intervals. Thus, counterintuitively, maximization of load balance requires that the distribution of cells on processors should not reflect neural net architecture but be randomly distributed so that sets of cells which are burst firing together should be on different processors with their targets on as large a set of processors as possible.
Message passing with queues and channels

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dozsa, Gabor J; Heidelberger, Philip; Kumar, Sameer

In an embodiment, a send thread receives an identifier that identifies a destination node and a pointer to data. The send thread creates a first send request in response to the receipt of the identifier and the data pointer. The send thread selects a selected channel from among a plurality of channels. The selected channel comprises a selected hand-off queue and an identification of a selected message unit. Each of the channels identifies a different message unit. The selected hand-off queue is randomly accessible. If the selected hand-off queue contains an available entry, the send thread adds the first sendmore » request to the selected hand-off queue. If the selected hand-off queue does not contain an available entry, the send thread removes a second send request from the selected hand-off queue and sends the second send request to the selected message unit.« less
Scalable Optical-Fiber Communication Networks

NASA Technical Reports Server (NTRS)

Chow, Edward T.; Peterson, John C.

1993-01-01

Scalable arbitrary fiber extension network (SAFEnet) is conceptual fiber-optic communication network passing digital signals among variety of computers and input/output devices at rates from 200 Mb/s to more than 100 Gb/s. Intended for use with very-high-speed computers and other data-processing and communication systems in which message-passing delays must be kept short. Inherent flexibility makes it possible to match performance of network to computers by optimizing configuration of interconnections. In addition, interconnections made redundant to provide tolerance to faults.
Domestic Minor Sex Trafficking Deterrence and Victims Support Act of 2010

THOMAS, 111th Congress

Sen. Wyden, Ron [D-OR

2009-12-22

Senate - 12/21/2010 Message on House action received in Senate and at desk: House amendment to Senate bill. (All Actions) Tracker: This bill has the status Passed HouseHere are the steps for Status of Legislation:
Multiplexed memory-insensitive quantum repeaters.

PubMed

Collins, O A; Jenkins, S D; Kuzmich, A; Kennedy, T A B

2007-02-09

Long-distance quantum communication via distant pairs of entangled quantum bits (qubits) is the first step towards secure message transmission and distributed quantum computing. To date, the most promising proposals require quantum repeaters to mitigate the exponential decrease in communication rate due to optical fiber losses. However, these are exquisitely sensitive to the lifetimes of their memory elements. We propose a multiplexing of quantum nodes that should enable the construction of quantum networks that are largely insensitive to the coherence times of the quantum memory elements.
Chaos synchronization communication using extremely unsymmetrical bidirectional injections.

PubMed

Zhang, Wei Li; Pan, Wei; Luo, Bin; Zou, Xi Hua; Wang, Meng Yao; Zhou, Zhi

2008-02-01

Chaos synchronization and message transmission between two semiconductor lasers with extremely unsymmetrical bidirectional injections (EUBIs) are discussed. By using EUBIs, synchronization is realized through injection locking. Numerical results show that if the laser subjected to strong injection serves as the receiver, chaos pass filtering (CPF) of the system is similar to that of unidirectional coupled systems. Moreover, if the other laser serves as the receiver, a stronger CPF can be obtained. Finally, we demonstrate that messages can be extracted successfully from either of the two transmission directions of the system.
Automation Framework for Flight Dynamics Products Generation

NASA Technical Reports Server (NTRS)

Wiegand, Robert E.; Esposito, Timothy C.; Watson, John S.; Jun, Linda; Shoan, Wendy; Matusow, Carla

2010-01-01

XFDS provides an easily adaptable automation platform. To date it has been used to support flight dynamics operations. It coordinates the execution of other applications such as Satellite TookKit, FreeFlyer, MATLAB, and Perl code. It provides a mechanism for passing messages among a collection of XFDS processes, and allows sending and receiving of GMSEC messages. A unified and consistent graphical user interface (GUI) is used for the various tools. Its automation configuration is stored in text files, and can be edited either directly or using the GUI.
The effects of frame, appeal, and outcome extremity of antismoking messages on cognitive processing.

PubMed

Leshner, Glenn; Cheng, I-Huei

2009-04-01

Research on the impact of antismoking advertisements in countermarketing cigarette advertising is equivocal. Although many studies examined how different message appeal types influence people's attitudes and behavior, there have been few studies that have explored the mechanism of how individuals attend to and remember antismoking information. This study examined how message attributes of antismoking TV ads (frame, appeal type, and outcome extremity) interacted to influence people's attention (secondary task reaction time) and memory (recognition). Antismoking public service announcements were chosen that were either loss- or gain-framed, had either a health or social appeal, or had either a more or less extreme outcome described in the message. Among the key findings were that loss-framed messages with more extreme outcomes required the most processing resources (i.e., had the slowest secondary task reaction times) and were the best remembered (i.e., were best recognized). These findings indicate ways that different message attributes affect individuals' cognitive processing, and they are discussed in light of prior framing and persuasion research.
A manual for PARTI runtime primitives, revision 1

NASA Technical Reports Server (NTRS)

Das, Raja; Saltz, Joel; Berryman, Harry

1991-01-01

Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
Replenishing data descriptors in a DMA injection FIFO buffer

DOEpatents

Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Cernohous, Bob R [Rochester, MN; Heidelberger, Philip [Cortlandt Manor, NY; Kumar, Sameer [White Plains, NY; Parker, Jeffrey J [Rochester, MN

2011-10-11

Methods, apparatus, and products are disclosed for replenishing data descriptors in a Direct Memory Access (`DMA`) injection first-in-first-out (`FIFO`) buffer that include: determining, by a messaging module on an origin compute node, whether a number of data descriptors in a DMA injection FIFO buffer exceeds a predetermined threshold, each data descriptor specifying an application message for transmission to a target compute node; queuing, by the messaging module, a plurality of new data descriptors in a pending descriptor queue if the number of the data descriptors in the DMA injection FIFO buffer exceeds the predetermined threshold; establishing, by the messaging module, interrupt criteria that specify when to replenish the injection FIFO buffer with the plurality of new data descriptors in the pending descriptor queue; and injecting, by the messaging module, the plurality of new data descriptors into the injection FIFO buffer in dependence upon the interrupt criteria.
"For whom the bell tolls": emotional rubbernecking in Facebook memorial groups.

PubMed

DeGroot, Jocelyn M

2014-01-01

Facebook memorial groups are often formed as a way for people to remember a deceased loved one. Because of the public nature of communication on Facebook, people who did not intimately know the deceased (emotional rubberneckers) can locate memorial groups and watch as people grieve the loss of their friend or family member. Using grounded theory methods, the author identified and examined the function of the rubberneckers' messages posted on 10 Facebook memorial group walls. Emotional rubberneckers identified with the deceased and expressed sadness at their death, indicating a connection with the deceased stranger.
Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

DOEpatents

Ohmacht, Martin

2017-08-15

In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.
Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

DOEpatents

Ohmacht, Martin

2014-09-09

In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.

Disaster Relief Appropriations Act, 2013

THOMAS, 112th Congress

Rep. Rogers, Harold [R-KY-5

2011-02-11

Senate - 12/30/2012 Message on Senate action sent to the House. (All Actions) Notes: This bill was for a time the Senate legislative vehicle for Hurricane Sandy supplemental appropriations. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
I/O Parallelization for the Goddard Earth Observing System Data Assimilation System (GEOS DAS)

NASA Technical Reports Server (NTRS)

Lucchesi, Rob; Sawyer, W.; Takacs, L. L.; Lyster, P.; Zero, J.

1998-01-01

The National Aeronautics and Space Administration (NASA) Data Assimilation Office (DAO) at the Goddard Space Flight Center (GSFC) has developed the GEOS DAS, a data assimilation system that provides production support for NASA missions and will support NASA's Earth Observing System (EOS) in the coming years. The GEOS DAS will be used to provide background fields of meteorological quantities to EOS satellite instrument teams for use in their data algorithms as well as providing assimilated data sets for climate studies on decadal time scales. The DAO has been involved in prototyping parallel implementations of the GEOS DAS for a number of years and is now embarking on an effort to convert the production version from shared-memory parallelism to distributed-memory parallelism using the portable Message-Passing Interface (MPI). The GEOS DAS consists of two main components, an atmospheric General Circulation Model (GCM) and a Physical-space Statistical Analysis System (PSAS). The GCM operates on data that are stored on a regular grid while PSAS works with observational data that are scattered irregularly throughout the atmosphere. As a result, the two components have different data decompositions. The GCM is decomposed horizontally as a checkerboard with all vertical levels of each box existing on the same processing element(PE). The dynamical core of the GCM can also operate on a rotated grid, which requires communication-intensive grid transformations during GCM integration. PSAS groups observations on PEs in a more irregular and dynamic fashion.
NavP: Structured and Multithreaded Distributed Parallel Programming

NASA Technical Reports Server (NTRS)

Pan, Lei

2007-01-01

We present Navigational Programming (NavP) -- a distributed parallel programming methodology based on the principles of migrating computations and multithreading. The four major steps of NavP are: (1) Distribute the data using the data communication pattern in a given algorithm; (2) Insert navigational commands for the computation to migrate and follow large-sized distributed data; (3) Cut the sequential migrating thread and construct a mobile pipeline; and (4) Loop back for refinement. NavP is significantly different from the current prevailing Message Passing (MP) approach. The advantages of NavP include: (1) NavP is structured distributed programming and it does not change the code structure of an original algorithm. This is in sharp contrast to MP as MP implementations in general do not resemble the original sequential code; (2) NavP implementations are always competitive with the best MPI implementations in terms of performance. Approaches such as DSM or HPF have failed to deliver satisfying performance as of today in contrast, even if they are relatively easy to use compared to MP; (3) NavP provides incremental parallelization, which is beyond the reach of MP; and (4) NavP is a unifying approach that allows us to exploit both fine- (multithreading on shared memory) and coarse- (pipelined tasks on distributed memory) grained parallelism. This is in contrast to the currently popular hybrid use of MP+OpenMP, which is known to be complex to use. We present experimental results that demonstrate the effectiveness of NavP.
Xyce™ Parallel Electronic Simulator Users' Guide, Version 6.5.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik V.; Mei, Ting

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.« less
Asynchronous Communication Scheme For Hypercube Computer

NASA Technical Reports Server (NTRS)

Madan, Herb S.

1988-01-01

Scheme devised for asynchronous-message communication system for Mark III hypercube concurrent-processor network. Network consists of up to 1,024 processing elements connected electrically as though were at corners of 10-dimensional cube. Each node contains two Motorola 68020 processors along with Motorola 68881 floating-point processor utilizing up to 4 megabytes of shared dynamic random-access memory. Scheme intended to support applications requiring passage of both polled or solicited and unsolicited messages.
Interactive Supercomputing’s Star-P Platform

DOE Office of Scientific and Technical Information (OSTI.GOV)

Edelman, Alan; Husbands, Parry; Leibman, Steve

2006-09-19

The thesis of this extended abstract is simple. High productivity comes from high level infrastructures. To measure this, we introduce a methodology that goes beyond the tradition of timing software in serial and tuned parallel modes. We perform a classroom productivity study involving 29 students who have written a homework exercise in a low level language (MPI message passing) and a high level language (Star-P with MATLAB client). Our conclusions indicate what perhaps should be of little surprise: (1) the high level language is always far easier on the students than the low level language. (2) The early versions ofmore » the high level language perform inadequately compared to the tuned low level language, but later versions substantially catch up. Asymptotically, the analogy must hold that message passing is to high level language parallel programming as assembler is to high level environments such as MATLAB, Mathematica, Maple, or even Python. We follow the Kepner method that correctly realizes that traditional speedup numbers without some discussion of the human cost of reaching these numbers can fail to reflect the true human productivity cost of high performance computing. Traditional data compares low level message passing with serial computation. With the benefit of a high level language system in place, in our case Star-P running with MATLAB client, and with the benefit of a large data pool: 29 students, each running the same code ten times on three evolutions of the same platform, we can methodically demonstrate the productivity gains. To date we are not aware of any high level system as extensive and interoperable as Star-P, nor are we aware of an experiment of this kind performed with this volume of data.« less
Applications Performance Under MPL and MPI on NAS IBM SP2

NASA Technical Reports Server (NTRS)

Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)

1994-01-01

On July 5, 1994, an IBM Scalable POWER parallel System (IBM SP2) with 64 nodes, was installed at the Numerical Aerodynamic Simulation (NAS) Facility Each node of NAS IBM SP2 is a "wide node" consisting of a RISC 6000/590 workstation module with a clock of 66.5 MHz which can perform four floating point operations per clock with a peak performance of 266 Mflop/s. By the end of 1994, 64 nodes of IBM SP2 will be upgraded to 160 nodes with a peak performance of 42.5 Gflop/s. An overview of the IBM SP2 hardware is presented. The basic understanding of architectural details of RS 6000/590 will help application scientists the porting, optimizing, and tuning of codes from other machines such as the CRAY C90 and the Paragon to the NAS SP2. Optimization techniques such as quad-word loading, effective utilization of two floating point units, and data cache optimization of RS 6000/590 is illustrated, with examples giving performance gains at each optimization step. The conversion of codes using Intel's message passing library NX to codes using native Message Passing Library (MPL) and the Message Passing Interface (NMI) library available on the IBM SP2 is illustrated. In particular, we will present the performance of Fast Fourier Transform (FFT) kernel from NAS Parallel Benchmarks (NPB) under MPL and MPI. We have also optimized some of Fortran BLAS 2 and BLAS 3 routines, e.g., the optimized Fortran DAXPY runs at 175 Mflop/s and optimized Fortran DGEMM runs at 230 Mflop/s per node. The performance of the NPB (Class B) on the IBM SP2 is compared with the CRAY C90, Intel Paragon, TMC CM-5E, and the CRAY T3D.
Parallelization of a hydrological model using the message passing interface

USGS Publications Warehouse

Wu, Yiping; Li, Tiejian; Sun, Liqun; Chen, Ji

2013-01-01

With the increasing knowledge about the natural processes, hydrological models such as the Soil and Water Assessment Tool (SWAT) are becoming larger and more complex with increasing computation time. Additionally, other procedures such as model calibration, which may require thousands of model iterations, can increase running time and thus further reduce rapid modeling and analysis. Using the widely-applied SWAT as an example, this study demonstrates how to parallelize a serial hydrological model in a Windows® environment using a parallel programing technology—Message Passing Interface (MPI). With a case study, we derived the optimal values for the two parameters (the number of processes and the corresponding percentage of work to be distributed to the master process) of the parallel SWAT (P-SWAT) on an ordinary personal computer and a work station. Our study indicates that model execution time can be reduced by 42%–70% (or a speedup of 1.74–3.36) using multiple processes (two to five) with a proper task-distribution scheme (between the master and slave processes). Although the computation time cost becomes lower with an increasing number of processes (from two to five), this enhancement becomes less due to the accompanied increase in demand for message passing procedures between the master and all slave processes. Our case study demonstrates that the P-SWAT with a five-process run may reach the maximum speedup, and the performance can be quite stable (fairly independent of a project size). Overall, the P-SWAT can help reduce the computation time substantially for an individual model run, manual and automatic calibration procedures, and optimization of best management practices. In particular, the parallelization method we used and the scheme for deriving the optimal parameters in this study can be valuable and easily applied to other hydrological or environmental models.
Team performance in networked supervisory control of unmanned air vehicles: effects of automation, working memory, and communication content.

PubMed

McKendrick, Ryan; Shaw, Tyler; de Visser, Ewart; Saqer, Haneen; Kidwell, Brian; Parasuraman, Raja

2014-05-01

Assess team performance within a net-worked supervisory control setting while manipulating automated decision aids and monitoring team communication and working memory ability. Networked systems such as multi-unmanned air vehicle (UAV) supervision have complex properties that make prediction of human-system performance difficult. Automated decision aid can provide valuable information to operators, individual abilities can limit or facilitate team performance, and team communication patterns can alter how effectively individuals work together. We hypothesized that reliable automation, higher working memory capacity, and increased communication rates of task-relevant information would offset performance decrements attributed to high task load. Two-person teams performed a simulated air defense task with two levels of task load and three levels of automated aid reliability. Teams communicated and received decision aid messages via chat window text messages. Task Load x Automation effects were significant across all performance measures. Reliable automation limited the decline in team performance with increasing task load. Average team spatial working memory was a stronger predictor than other measures of team working memory. Frequency of team rapport and enemy location communications positively related to team performance, and word count was negatively related to team performance. Reliable decision aiding mitigated team performance decline during increased task load during multi-UAV supervisory control. Team spatial working memory, communication of spatial information, and team rapport predicted team success. An automated decision aid can improve team performance under high task load. Assessment of spatial working memory and the communication of task-relevant information can help in operator and team selection in supervisory control systems.
Enabling Energy Saving Innovations Act

THOMAS, 112th Congress

Rep. Aderholt, Robert B. [R-AL-4

2012-04-26

Senate - 09/24/2012 Message on Senate action sent to the House. (All Actions) Notes: For further action, see H.R.6582, which became Public Law 112-210 on 12/18/2012. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
A Model for Speedup of Parallel Programs

DTIC Science & Technology

1997-01-01

Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static
Interstate Recognition of Notarizations Act of 2010

THOMAS, 111th Congress

Rep. Aderholt, Robert B. [R-AL-4

2009-10-14

House - 11/17/2010 On motion to refer the bill and the accompanying veto message to the Committee on the Judiciary. Agreed to without objection. (All Actions) Tracker: This bill has the status Failed to pass over vetoHere are the steps for Status of Legislation:
Approximate message passing for nonconvex sparse regularization with stability and asymptotic analysis

NASA Astrophysics Data System (ADS)

Sakata, Ayaka; Xu, Yingying

2018-03-01

We analyse a linear regression problem with nonconvex regularization called smoothly clipped absolute deviation (SCAD) under an overcomplete Gaussian basis for Gaussian random data. We propose an approximate message passing (AMP) algorithm considering nonconvex regularization, namely SCAD-AMP, and analytically show that the stability condition corresponds to the de Almeida-Thouless condition in spin glass literature. Through asymptotic analysis, we show the correspondence between the density evolution of SCAD-AMP and the replica symmetric (RS) solution. Numerical experiments confirm that for a sufficiently large system size, SCAD-AMP achieves the optimal performance predicted by the replica method. Through replica analysis, a phase transition between replica symmetric and replica symmetry breaking (RSB) region is found in the parameter space of SCAD. The appearance of the RS region for a nonconvex penalty is a significant advantage that indicates the region of smooth landscape of the optimization problem. Furthermore, we analytically show that the statistical representation performance of the SCAD penalty is better than that of \
Message passing for integrating and assessing renewable generation in a redundant power grid

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zdeborova, Lenka; Backhaus, Scott; Chertkov, Michael

2009-01-01

A simplified model of a redundant power grid is used to study integration of fluctuating renewable generation. The grid consists of large number of generator and consumer nodes. The net power consumption is determined by the difference between the gross consumption and the level of renewable generation. The gross consumption is drawn from a narrow distribution representing the predictability of aggregated loads, and we consider two different distributions representing wind and solar resources. Each generator is connected to D consumers, and redundancy is built in by connecting R {le} D of these consumers to other generators. The lines are switchablemore » so that at any instance each consumer is connected to a single generator. We explore the capacity of the renewable generation by determining the level of 'firm' generation capacity that can be displaced for different levels of redundancy R. We also develop message-passing control algorithm for finding switch sellings where no generator is overloaded.« less
Mapping a battlefield simulation onto message-passing parallel architectures

NASA Technical Reports Server (NTRS)

Nicol, David M.

1987-01-01

Perhaps the most critical problem in distributed simulation is that of mapping: without an effective mapping of workload to processors the speedup potential of parallel processing cannot be realized. Mapping a simulation onto a message-passing architecture is especially difficult when the computational workload dynamically changes as a function of time and space; this is exactly the situation faced by battlefield simulations. This paper studies an approach where the simulated battlefield domain is first partitioned into many regions of equal size; typically there are more regions than processors. The regions are then assigned to processors; a processor is responsible for performing all simulation activity associated with the regions. The assignment algorithm is quite simple and attempts to balance load by exploiting locality of workload intensity. The performance of this technique is studied on a simple battlefield simulation implemented on the Flex/32 multiprocessor. Measurements show that the proposed method achieves reasonable processor efficiencies. Furthermore, the method shows promise for use in dynamic remapping of the simulation.
Validity of the Wechsler Test of Adult Reading (WTAR): effort considered in a clinical sample of U.S. military veterans.

PubMed

Whitney, Kriscinda A; Shepard, Polly H; Mariner, Jennifer; Mossbarger, Brad; Herman, Steven M

2010-07-01

The current study represents an examination of the construct validity of the Wechsler Test of Adult Reading (WTAR) among a sample of U.S. military veterans referred for outpatient neuropsychological evaluation that included a measure of negative response bias, namely, the Test of Memory Malingering (TOMM). This retrospective data analysis examined the relationship between the WTAR and measures of current verbal general intellectual function and current cognitive skills. Findings showed that, among patients passing the TOMM (N = 98), WTAR scores were most highly correlated with current verbal IQ but also showed significant correlations with verbal memory and lesser, but still significant, correlations with measures of visual-spatial memory. Discriminant validity for the WTAR was also shown among the group passing the TOMM in the sense that the WTAR, which is designed to measure verbal premorbid general intellectual skill, was not as highly correlated with measures of learning and memory as was a measure of current verbal general intellectual skill. Whereas scores on most study measures did significantly differ between the groups that passed versus failed the TOMM (N = 26), scores on the WTAR did not, suggesting that the WTAR may remain robust even in the face of suboptimal effort.
In memory of Natacha Betz

NASA Astrophysics Data System (ADS)

Bouffard, Serge

2018-01-01

Natacha Betz has participated to the creation of the IRAP conference series and co-organized three of them. Unfortunately, ten years ago, she passed away. The organization of IRAP 2016 in France gives us the opportunity to pay a tribute to her memory.
Kingman and Heritage Islands Act of 2009

THOMAS, 111th Congress

Rep. Norton, Eleanor Holmes [D-DC-At Large

2009-04-23

Senate - 09/28/2010 Message on Senate action sent to the House. (All Actions) Notes: For further action, see H.R.6278, which became Public Law 111-328 on 12/22/2010. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
Distance Learning and Public School Finance.

ERIC Educational Resources Information Center

Monahan, Brian; Wimber, Charles

This discussion of the application of computers and telecommunications technology to distance learning begins by describing a workstation equipped to produce "lessonware" in such formats as audiocassettes, videocassettes, diskettes, and voice messages. Such workstations would be located in classrooms equipped with two-way reverse passes to a cable…
Reducing Flight Delays Act of 2013

THOMAS, 113th Congress

Sen. Collins, Susan M. [R-ME

2013-04-25

Senate - 04/30/2013 Message on Senate action sent to the House. (All Actions) Notes: For further action, see H.R.1765, which became Public Law 113-9 on 5/1/2013. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:

Temporary Bankruptcy Judgeships Extension Act of 2011

THOMAS, 112th Congress

Rep. Smith, Lamar [R-TX-21

2011-03-10

Senate - 04/23/2012 Message on Senate action sent to the House. (All Actions) Notes: For further action, see H.R.4967, which became Public Law 112-121 on 5/25/2012. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
Six degrees of depolarization. Comment on "Network science of biological systems at different scales: A review" by Marko Gosak et al.

NASA Astrophysics Data System (ADS)

Wedgwood, Kyle C. A.; Satin, Leslie S.

2018-03-01

In 1967, the social psychologist Stanley Milgram asked how many intermediates would be required for a letter from participants in various U.S. states to be received by a contact in Boston, given that participants were given only basic information about the contact, and were instructed only to pass along the message to someone that might know the intended recipient [27]. The surprising answer was that, in spite of the social and geographical distances covered, on average, only 6 people were required for the message to reach its intended target.
A distributed infrastructure for publishing VO services: an implementation

NASA Astrophysics Data System (ADS)

Cepparo, Francesco; Scagnetto, Ivan; Molinaro, Marco; Smareglia, Riccardo

2016-07-01

This contribution describes both the design and the implementation details of a new solution for publishing VO services, enlightening its maintainable, distributed, modular and scalable architecture. Indeed, the new publisher is multithreaded and multiprocess. Multiple instances of the modules can run on different machines to ensure high performance and high availability, and this will be true both for the interface modules of the services and the back end data access ones. The system uses message passing to let its components communicate through an AMQP message broker that can itself be distributed to provide better scalability and availability.
Fair and optimistic quantum contract signing

NASA Astrophysics Data System (ADS)

Paunković, N.; Bouda, J.; Mateus, P.

2011-12-01

We present a fair and optimistic quantum-contract-signing protocol between two clients that requires no communication with the third trusted party during the exchange phase. We discuss its fairness and show that it is possible to design such a protocol for which the probability of a dishonest client to cheat becomes negligible and scales as N-1/2, where N is the number of messages exchanged between the clients. Our protocol is not based on the exchange of signed messages: Its fairness is based on the laws of quantum mechanics. Thus, it is abuse free, and the clients do not have to generate new keys for each message during the exchange phase. We discuss a real-life scenario when measurement errors and qubit-state corruption due to noisy channels and imperfect quantum memories occur and argue that for a real, good-enough measurement apparatus, transmission channels, and quantum memories, our protocol would still be fair. Apart from stable quantum memories, the other segments of our protocol could be implemented by today's technology, as they require in essence the same type of apparatus as the one needed for the Bennett-Brassard 1984 (BB84) cryptographic protocol. Finally, we briefly discuss two alternative versions of the protocol, one that uses only two states [based on the Bennett 1992 (B92) protocol] and the other that uses entangled pairs, and show that it is possible to generalize our protocol to an arbitrary number of clients.
Single-pass memory system evaluation for multiprogramming workloads

NASA Technical Reports Server (NTRS)

Conte, Thomas M.; Hwu, Wen-Mei W.

1990-01-01

Modern memory systems are composed of levels of cache memories, a virtual memory system, and a backing store. Varying more than a few design parameters and measuring the performance of such systems has traditionally be constrained by the high cost of simulation. Models of cache performance recently introduced reduce the cost simulation but at the expense of accuracy of performance prediction. Stack-based methods predict performance accurately using one pass over the trace for all cache sizes, but these techniques have been limited to fully-associative organizations. This paper presents a stack-based method of evaluating the performance of cache memories using a recurrence/conflict model for the miss ratio. Unlike previous work, the performance of realistic cache designs, such as direct-mapped caches, are predicted by the method. The method also includes a new approach to the problem of the effects of multiprogramming. This new technique separates the characteristics of the individual program from that of the workload. The recurrence/conflict method is shown to be practical, general, and powerful by comparing its performance to that of a popular traditional cache simulator. The authors expect that the availability of such a tool will have a large impact on future architectural studies of memory systems.
DMA shared byte counters in a parallel computer

DOEpatents

Chen, Dong; Gara, Alan G.; Heidelberger, Philip; Vranas, Pavlos

2010-04-06

A parallel computer system is constructed as a network of interconnected compute nodes. Each of the compute nodes includes at least one processor, a memory and a DMA engine. The DMA engine includes a processor interface for interfacing with the at least one processor, DMA logic, a memory interface for interfacing with the memory, a DMA network interface for interfacing with the network, injection and reception byte counters, injection and reception FIFO metadata, and status registers and control registers. The injection FIFOs maintain memory locations of the injection FIFO metadata memory locations including its current head and tail, and the reception FIFOs maintain the reception FIFO metadata memory locations including its current head and tail. The injection byte counters and reception byte counters may be shared between messages.
WinHPC System Programming | High-Performance Computing | NREL

Science.gov Websites

Programming WinHPC System Programming Learn how to build and run an MPI (message passing interface (mpi.h) and library (msmpi.lib) are. To build from the command line, run... Start > Intel Software Development Tools > Intel C++ Compiler Professional... > C++ Build Environment for applications running
4 Types of Foods that Boost Your Memory

MedlinePlus

... message. Getting adequate vegetables, especially cruciferous ones including broccoli, cabbage and dark leafy greens, may help improve ... for a tortilla in your next sandwich wrap. Broccoli stir-fry also is an excellent option for ...
[Analysis of implicit memory during propofol anesthesia].

PubMed

Biescas Prat, J; Moix Queraltó, J; Casanovas Catot, P

2000-12-01

Consensus has not been achieved on the presence of unconscious memory of messages in general anesthesia for methodological reasons. Our objective was to apply a model of anesthesia that allows for clinical control of the level of hypnosis in order to evaluate the presence and characteristics of implicit memory in deep sedation with propofol. We randomly assigned 48 consecutive patients undergoing lower limb surgery to two groups. In both groups subarachnoid anesthesia was with varying doses of propofol to maintain a level of hypnosis marked by inability to respond to orders, absence of movements and spontaneous ventilation. The experimental group listened to a recording of the words "banana" and "melon" for the semantic category of fruits and "white" and "black" for colors. The control group listened to a recording of environmental operating room noise. We recorded, among other variables, anxiety and age. Upon awakening, after the presence of conscious memory had been ruled out, we investigated implicit memory by comparing the percentage of correct answers in the two groups. The experimental group had a higher percentage of correct fruit names (p = 0.03). No differences were detected for colors. The youngest patients in the experimental group were correct more often about the fruits than were older members (p = 0.04) and those with greater anxiety were more often correct (p = 0.002). Implicit memory is preserved under hypnosis with propofol and is more likely to be present among those who are younger or experience greater anxiety. Concrete words with object references are more easily remembered than abstract words referring to perception. The semantic load of messages is relevant.
Information processing in the CNS: a supramolecular chemistry?

PubMed

Tozzi, Arturo

2015-10-01

How does central nervous system process information? Current theories are based on two tenets: (a) information is transmitted by action potentials, the language by which neurons communicate with each other-and (b) homogeneous neuronal assemblies of cortical circuits operate on these neuronal messages where the operations are characterized by the intrinsic connectivity among neuronal populations. In this view, the size and time course of any spike is stereotypic and the information is restricted to the temporal sequence of the spikes; namely, the "neural code". However, an increasing amount of novel data point towards an alternative hypothesis: (a) the role of neural code in information processing is overemphasized. Instead of simply passing messages, action potentials play a role in dynamic coordination at multiple spatial and temporal scales, establishing network interactions across several levels of a hierarchical modular architecture, modulating and regulating the propagation of neuronal messages. (b) Information is processed at all levels of neuronal infrastructure from macromolecules to population dynamics. For example, intra-neuronal (changes in protein conformation, concentration and synthesis) and extra-neuronal factors (extracellular proteolysis, substrate patterning, myelin plasticity, microbes, metabolic status) can have a profound effect on neuronal computations. This means molecular message passing may have cognitive connotations. This essay introduces the concept of "supramolecular chemistry", involving the storage of information at the molecular level and its retrieval, transfer and processing at the supramolecular level, through transitory non-covalent molecular processes that are self-organized, self-assembled and dynamic. Finally, we note that the cortex comprises extremely heterogeneous cells, with distinct regional variations, macromolecular assembly, receptor repertoire and intrinsic microcircuitry. This suggests that every neuron (or group of neurons) embodies different molecular information that hands an operational effect on neuronal computation.
Adding Fault Tolerance to NPB Benchmarks Using ULFM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parchman, Zachary W; Vallee, Geoffroy R; Naughton III, Thomas J

2016-01-01

In the world of high-performance computing, fault tolerance and application resilience are becoming some of the primary concerns because of increasing hardware failures and memory corruptions. While the research community has been investigating various options, from system-level solutions to application-level solutions, standards such as the Message Passing Interface (MPI) are also starting to include such capabilities. The current proposal for MPI fault tolerant is centered around the User-Level Failure Mitigation (ULFM) concept, which provides means for fault detection and recovery of the MPI layer. This approach does not address application-level recovery, which is currently left to application developers. In thismore » work, we present a mod- ification of some of the benchmarks of the NAS parallel benchmark (NPB) to include support of the ULFM capabilities as well as application-level strategies and mechanisms for application-level failure recovery. As such, we present: (i) an application-level library to checkpoint and restore data, (ii) extensions of NPB benchmarks for fault tolerance based on different strategies, (iii) a fault injection tool, and (iv) some preliminary results that show the impact of such fault tolerant strategies on the application execution.« less
Toward Petascale Biologically Plausible Neural Networks

NASA Astrophysics Data System (ADS)

Long, Lyle

This talk will describe an approach to achieving petascale neural networks. Artificial intelligence has been oversold for many decades. Computers in the beginning could only do about 16,000 operations per second. Computer processing power, however, has been doubling every two years thanks to Moore's law, and growing even faster due to massively parallel architectures. Finally, 60 years after the first AI conference we have computers on the order of the performance of the human brain (1016 operations per second). The main issues now are algorithms, software, and learning. We have excellent models of neurons, such as the Hodgkin-Huxley model, but we do not know how the human neurons are wired together. With careful attention to efficient parallel computing, event-driven programming, table lookups, and memory minimization massive scale simulations can be performed. The code that will be described was written in C + + and uses the Message Passing Interface (MPI). It uses the full Hodgkin-Huxley neuron model, not a simplified model. It also allows arbitrary network structures (deep, recurrent, convolutional, all-to-all, etc.). The code is scalable, and has, so far, been tested on up to 2,048 processor cores using 107 neurons and 109 synapses.
Individual differences in susceptibility to inattentional blindness.

PubMed

Seegmiller, Janelle K; Watson, Jason M; Strayer, David L

2011-05-01

Inattentional blindness refers to the finding that people do not always see what appears in their gaze. Though inattentional blindness affects large percentages of people, it is unclear if there are individual differences in susceptibility. The present study addressed whether individual differences in attentional control, as reflected by variability in working memory capacity, modulate susceptibility to inattentional blindness. Participants watched a classic inattentional blindness video (Simons & Chabris, 1999) and were instructed to count passes among basketball players, wherein 58% noticed the unexpected: a person wearing a gorilla suit. When participants were accurate with their pass counts, individuals with higher working memory capacity were more likely to report seeing the gorilla (67%) than those with lesser working memory capacity (36%). These results suggest that variability in attentional control is a potential mechanism underlying the apparent modulation of inattentional blindness across individuals.
Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fiala, David J; Mueller, Frank; Engelmann, Christian

Faults have become the norm rather than the exception for high-end computing on clusters with 10s/100s of thousands of cores. Exacerbating this situation, some of these faults remain undetected, manifesting themselves as silent errors that corrupt memory while applications continue to operate and report incorrect results. This paper studies the potential for redundancy to both detect and correct soft errors in MPI message-passing applications. Our study investigates the challenges inherent to detecting soft errors within MPI application while providing transparent MPI redundancy. By assuming a model wherein corruption in application data manifests itself by producing differing MPI message data betweenmore » replicas, we study the best suited protocols for detecting and correcting MPI data that is the result of corruption. To experimentally validate our proposed detection and correction protocols, we introduce RedMPI, an MPI library which resides in the MPI profiling layer. RedMPI is capable of both online detection and correction of soft errors that occur in MPI applications without requiring any modifications to the application source by utilizing either double or triple redundancy. Our results indicate that our most efficient consistency protocol can successfully protect applications experiencing even high rates of silent data corruption with runtime overheads between 0% and 30% as compared to unprotected applications without redundancy. Using our fault injector within RedMPI, we observe that even a single soft error can have profound effects on running applications, causing a cascading pattern of corruption in most cases causes that spreads to all other processes. RedMPI's protection has been shown to successfully mitigate the effects of soft errors while allowing applications to complete with correct results even in the face of errors.« less
Context Memory in Alzheimer's Disease

PubMed Central

El Haj, Mohamad; Kessels, Roy P.C.

2013-01-01

Background Alzheimer's disease (AD) is a neurodegenerative disease characterized by a gradual loss of memory. Specifically, context aspects of memory are impaired in AD. Our review sheds light on the neurocognitive mechanisms of this memory component that forms the core of episodic memory function. Summary Context recall, an element of episodic memory, refers to remembering the context in which an event has occurred, such as from whom or to whom information has been transmitted. Key Messages Our review raises crucial questions. For example, (1) which context element is more prone to being forgotten in the disease? (2) How do AD patients fail to bind context features together? (3) May distinctiveness heuristic or decisions based on metacognitive expectations improve context retrieval in these patients? (4) How does cueing at retrieval enhance reinstating of encoding context in AD? By addressing these questions, our work contributes to the understanding of the memory deficits in AD. PMID:24403906
Pass-transistor very large scale integration

NASA Technical Reports Server (NTRS)

Maki, Gary K. (Inventor); Bhatia, Prakash R. (Inventor)

2004-01-01

Logic elements are provided that permit reductions in layout size and avoidance of hazards. Such logic elements may be included in libraries of logic cells. A logical function to be implemented by the logic element is decomposed about logical variables to identify factors corresponding to combinations of the logical variables and their complements. A pass transistor network is provided for implementing the pass network function in accordance with this decomposition. The pass transistor network includes ordered arrangements of pass transistors that correspond to the combinations of variables and complements resulting from the logical decomposition. The logic elements may act as selection circuits and be integrated with memory and buffer elements.
Memorial Day Message from NASA Administrator Jim Bridenstine

NASA Image and Video Library

2018-05-25

As we all go our separate ways this Memorial Day weekend, I urge everyone to remember the heroic sacrifice of the men and women who died in defense of our country while serving in the United States military. It is the dedication and commitment of these men and women, those who were willing to make the ultimate sacrifice for the principles of our nation, that have made the United States the greatest country on Earth.
Figural vividness and persuasion: capturing the "elusive" vividness effect.

PubMed

Guadagno, Rosanna E; Rhoads, Kelton V L; Sagarin, Brad J

2011-05-01

Despite the widespread belief that the use of vividness in persuasive communications is effective, many laboratory studies have failed to find vividness effects. A possible explanation for this discrepancy is that many laboratory tests have not vivified solely the central thesis of the message but have vivified irrelevant portions of the message as well or instead. Two experiments examined the effect of vivifying the central ("figure") or noncentral ("ground") features of a message on persuasion. In both experiments, the formerly "elusive vividness effect" of superior persuasion was found, but only in vivid-figure communications. A mediation analysis revealed the salutary role of supportive cognitive elaborations, rather than memory for the communication, in mediating the vividness effect. The findings caution against attempts to persuade by increasing overall message vividness because off-thesis vividness has the unintended and undercutting consequence of distracting recipients from the point of the communication.
Direct memory access transfer completion notification

DOEpatents

Archer, Charles J. , Blocksome; Michael A. , Parker; Jeffrey, J [Rochester, MN

2011-02-15

Methods, systems, and products are disclosed for DMA transfer completion notification that include: inserting, by an origin DMA on an origin node in an origin injection FIFO, a data descriptor for an application message; inserting, by the origin DMA, a reflection descriptor in the origin injection FIFO, the reflection descriptor specifying a remote get operation for injecting a completion notification descriptor in a reflection injection FIFO on a reflection node; transferring, by the origin DMA to a target node, the message in dependence upon the data descriptor; in response to completing the message transfer, transferring, by the origin DMA to the reflection node, the completion notification descriptor in dependence upon the reflection descriptor; receiving, by the origin DMA from the reflection node, a completion packet; and notifying, by the origin DMA in response to receiving the completion packet, the origin node's processing core that the message transfer is complete.
Direct memory access transfer completion notification

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Parker, Jeffrey J.

2010-08-17

Methods, apparatus, and products are disclosed for DMA transfer completion notification that include: inserting, by an origin DMA engine on an origin compute node in an injection FIFO buffer, a data descriptor for an application message to be transferred to a target compute node on behalf of an application on the origin compute node; inserting, by the origin DMA engine, a completion notification descriptor in the injection FIFO buffer after the data descriptor for the message, the completion notification descriptor specifying an address of a completion notification field in application storage for the application; transferring, by the origin DMA engine to the target compute node, the message in dependence upon the data descriptor; and notifying, by the origin DMA engine, the application that the transfer of the message is complete, including performing a local direct put operation to store predesignated notification data at the address of the completion notification field.

Class network routing

DOEpatents

Bhanot, Gyan [Princeton, NJ; Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

2009-09-08

Class network routing is implemented in a network such as a computer network comprising a plurality of parallel compute processors at nodes thereof. Class network routing allows a compute processor to broadcast a message to a range (one or more) of other compute processors in the computer network, such as processors in a column or a row. Normally this type of operation requires a separate message to be sent to each processor. With class network routing pursuant to the invention, a single message is sufficient, which generally reduces the total number of messages in the network as well as the latency to do a broadcast. Class network routing is also applied to dense matrix inversion algorithms on distributed memory parallel supercomputers with hardware class function (multicast) capability. This is achieved by exploiting the fact that the communication patterns of dense matrix inversion can be served by hardware class functions, which results in faster execution times.
SITRUS: Semantic Infrastructure for Wireless Sensor Networks

PubMed Central

Bispo, Kalil A.; Rosa, Nelson S.; Cunha, Paulo R. F.

2015-01-01

Wireless sensor networks (WSNs) are made up of nodes with limited resources, such as processing, bandwidth, memory and, most importantly, energy. For this reason, it is essential that WSNs always work to reduce the power consumption as much as possible in order to maximize its lifetime. In this context, this paper presents SITRUS (semantic infrastructure for wireless sensor networks), which aims to reduce the power consumption of WSN nodes using ontologies. SITRUS consists of two major parts: a message-oriented middleware responsible for both an oriented message communication service and a reconfiguration service; and a semantic information processing module whose purpose is to generate a semantic database that provides the basis to decide whether a WSN node needs to be reconfigurated or not. In order to evaluate the proposed solution, we carried out an experimental evaluation to assess the power consumption and memory usage of WSN applications built atop SITRUS. PMID:26528974
Fencing data transfers in a parallel active messaging interface of a parallel computer

DOEpatents

Blocksome, Michael A.; Mamidala, Amith R.

2015-06-02

Fencing data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task; the compute nodes coupled for data communications through the PAMI and through data communications resources including at least one segment of shared random access memory; including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers through a segment of shared memory; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints.
Fencing data transfers in a parallel active messaging interface of a parallel computer

DOEpatents

Blocksome, Michael A.; Mamidala, Amith R.

2015-06-09

Fencing data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task; the compute nodes coupled for data communications through the PAMI and through data communications resources including at least one segment of shared random access memory; including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers through a segment of shared memory; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints.
Fencing network direct memory access data transfers in a parallel active messaging interface of a parallel computer

DOEpatents

Blocksome, Michael A.; Mamidala, Amith R.

2015-07-07

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Fencing network direct memory access data transfers in a parallel active messaging interface of a parallel computer

DOEpatents

Blocksome, Michael A.; Mamidala, Amith R.

2015-07-14

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Increasing the Operational Value of Event Messages

NASA Technical Reports Server (NTRS)

Li, Zhenping; Savkli, Cetin; Smith, Dan

2003-01-01

Assessing the health of a space mission has traditionally been performed using telemetry analysis tools. Parameter values are compared to known operational limits and are plotted over various time periods. This presentation begins with the notion that there is an incredible amount of untapped information contained within the mission s event message logs. Through creative advancements in message handling tools, the event message logs can be used to better assess spacecraft and ground system status and to highlight and report on conditions not readily apparent when messages are evaluated one-at-a-time during a real-time pass. Work in this area is being funded as part of a larger NASA effort at the Goddard Space Flight Center to create component-based, middleware-based, standards-based general purpose ground system architecture referred to as GMSEC - the GSFC Mission Services Evolution Center. The new capabilities and operational concepts for event display, event data analyses and data mining are being developed by Lockheed Martin and the new subsystem has been named GREAT - the GMSEC Reusable Event Analysis Toolkit. Planned for use on existing and future missions, GREAT has the potential to increase operational efficiency in areas of problem detection and analysis, general status reporting, and real-time situational awareness.
Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lifflander, Jonathan; Meneses, Esteban; Menon, Harshita

2014-09-22

Deterministic replay of a parallel application is commonly used for discovering bugs or to recover from a hard fault with message-logging fault tolerance. For message passing programs, a major source of overhead during forward execution is recording the order in which messages are sent and received. During replay, this ordering must be used to deterministically reproduce the execution. Previous work in replay algorithms often makes minimal assumptions about the programming model and application in order to maintain generality. However, in many cases, only a partial order must be recorded due to determinism intrinsic in the code, ordering constraints imposed bymore » the execution model, and events that are commutative (their relative execution order during replay does not need to be reproduced exactly). In this paper, we present a novel algebraic framework for reasoning about the minimum dependencies required to represent the partial order for different concurrent orderings and interleavings. By exploiting this theory, we improve on an existing scalable message-logging fault tolerance scheme. The improved scheme scales to 131,072 cores on an IBM BlueGene/P with up to 2x lower overhead than one that records a total order.« less
A no-key-exchange secure image sharing scheme based on Shamir's three-pass cryptography protocol and the multiple-parameter fractional Fourier transform.

PubMed

Lang, Jun

2012-01-30

In this paper, we propose a novel secure image sharing scheme based on Shamir's three-pass protocol and the multiple-parameter fractional Fourier transform (MPFRFT), which can safely exchange information with no advance distribution of either secret keys or public keys between users. The image is encrypted directly by the MPFRFT spectrum without the use of phase keys, and information can be shared by transmitting the encrypted image (or message) three times between users. Numerical simulation results are given to verify the performance of the proposed algorithm.
E-mail, decisional styles, and rest breaks.

PubMed

Baker, James R; Phillips, James G

2007-10-01

E-mail is a common but problematic work application. A scale was created to measure tendencies to use e-mail to take breaks (e-breaking); and self-esteem and decisional style (vigilance, procrastination, buck-passing, hypervigilance) were used to predict the self-reported and actual e-mail behaviors of 133 participants (students and marketing employees). Individuals who were low in defensive avoidance (buck-passing) engaged in more e-mailing per week, both in time spent on e-mail and message volume. E-breakers were more likely to engage in behavioral procrastination and spent more time on personal e-mail.
PyPanda: a Python package for gene regulatory network reconstruction

PubMed Central

van IJzendoorn, David G.P.; Glass, Kimberly; Quackenbush, John; Kuijjer, Marieke L.

2016-01-01

Summary: PANDA (Passing Attributes between Networks for Data Assimilation) is a gene regulatory network inference method that uses message-passing to integrate multiple sources of ‘omics data. PANDA was originally coded in C ++. In this application note we describe PyPanda, the Python version of PANDA. PyPanda runs considerably faster than the C ++ version and includes additional features for network analysis. Availability and implementation: The open source PyPanda Python package is freely available at http://github.com/davidvi/pypanda. Contact: mkuijjer@jimmy.harvard.edu or d.g.p.van_ijzendoorn@lumc.nl PMID:27402905
PyPanda: a Python package for gene regulatory network reconstruction.

PubMed

van IJzendoorn, David G P; Glass, Kimberly; Quackenbush, John; Kuijjer, Marieke L

2016-11-01

PANDA (Passing Attributes between Networks for Data Assimilation) is a gene regulatory network inference method that uses message-passing to integrate multiple sources of 'omics data. PANDA was originally coded in C ++. In this application note we describe PyPanda, the Python version of PANDA. PyPanda runs considerably faster than the C ++ version and includes additional features for network analysis. The open source PyPanda Python package is freely available at http://github.com/davidvi/pypanda CONTACT: mkuijjer@jimmy.harvard.edu or d.g.p.van_ijzendoorn@lumc.nl. © The Author 2016. Published by Oxford University Press.
New Algorithms and Lower Bounds for Sequential-Access Data Compression

NASA Astrophysics Data System (ADS)

Gagie, Travis

2009-02-01

This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by character, outputting each character's self-delimiting codeword before reading the next one. We show how to encode and decode each character in constant worst-case time while producing an encoding whose length is worst-case optimal. In another chapter we consider one-pass compression with memory bounded in terms of the alphabet size and context length, and prove a nearly tight tradeoff between the amount of memory we can use and the quality of the compression we can achieve. In a third chapter we consider compression in the read/write streams model, which allows us passes and memory both polylogarithmic in the size of the input. We first show how to achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for achieving good grammar-based compression. Finally, we show that two streams are necessary and sufficient for achieving entropy-only bounds.
The Role and Dynamic of Strengthening in the Reconsolidation Process in a Human Declarative Memory: What Decides the Fate of Recent and Older Memories?

PubMed Central

Pedreira, María E.

2013-01-01

Several reports have shown that after specific reminders are presented, consolidated memories pass from a stable state to one in which the memory is reactivated. This reactivation implies that memories are labile and susceptible to amnesic agents. This susceptibility decreases over time and leads to a re-stabilization phase usually known as reconsolidation. With respect to the biological role of reconsolidation, two functions have been proposed. First, the reconsolidation process allows new information to be integrated into the background of the original memory; second, it strengthens the original memory. We have previously demonstrated that both of these functions occur in the reconsolidation of human declarative memories. Our paradigm consisted of learning verbal material (lists of five pairs of nonsense syllables) acquired by a training process (L1-training) on Day 1 of our experiment. After this declarative memory is consolidated, it can be made labile by presenting a specific reminder. After this, the memory passes through a subsequent stabilization process. Strengthening creates a new scenario for the reconsolidation process; this function represents a new factor that may transform the dynamic of memories. First, we analyzed whether the repeated labilization-reconsolidation processes maintained the memory for longer periods of time. We showed that at least one labilization-reconsolidation process strengthens a memory via evaluation 5 days after its re-stabilization. We also demonstrated that this effect is not triggered by retrieval only. We then analyzed the way strengthening modified the effect of an amnesic agent that was presented immediately after repeated labilizations. The repeated labilization-reconsolidation processes made the memory more resistant to interference during re-stabilization. Finally, we evaluated whether the effect of strengthening may depend on the age of the memory. We found that the effect of strengthening did depend on the age of the memory. Forgetting may represent a process that weakens the effect of strengthening. PMID:23658614
The role and dynamic of strengthening in the reconsolidation process in a human declarative memory: what decides the fate of recent and older memories?

PubMed

Forcato, Cecilia; Fernandez, Rodrigo S; Pedreira, María E

2013-01-01

Several reports have shown that after specific reminders are presented, consolidated memories pass from a stable state to one in which the memory is reactivated. This reactivation implies that memories are labile and susceptible to amnesic agents. This susceptibility decreases over time and leads to a re-stabilization phase usually known as reconsolidation. With respect to the biological role of reconsolidation, two functions have been proposed. First, the reconsolidation process allows new information to be integrated into the background of the original memory; second, it strengthens the original memory. We have previously demonstrated that both of these functions occur in the reconsolidation of human declarative memories. Our paradigm consisted of learning verbal material (lists of five pairs of nonsense syllables) acquired by a training process (L1-training) on Day 1 of our experiment. After this declarative memory is consolidated, it can be made labile by presenting a specific reminder. After this, the memory passes through a subsequent stabilization process. Strengthening creates a new scenario for the reconsolidation process; this function represents a new factor that may transform the dynamic of memories. First, we analyzed whether the repeated labilization-reconsolidation processes maintained the memory for longer periods of time. We showed that at least one labilization-reconsolidation process strengthens a memory via evaluation 5 days after its re-stabilization. We also demonstrated that this effect is not triggered by retrieval only. We then analyzed the way strengthening modified the effect of an amnesic agent that was presented immediately after repeated labilizations. The repeated labilization-reconsolidation processes made the memory more resistant to interference during re-stabilization. Finally, we evaluated whether the effect of strengthening may depend on the age of the memory. We found that the effect of strengthening did depend on the age of the memory. Forgetting may represent a process that weakens the effect of strengthening.
The Sleeper Effect in Persuasion: A Meta-Analytic Review

PubMed Central

Kumkale, G. Tarcan; Albarracín, Dolores

2009-01-01

A meta-analysis of the available judgment and memory data on the sleeper effect in persuasion is presented. According to this effect, when people receive a communication associated with a discounting cue, such as a noncredible source, they are less persuaded immediately after exposure than they are later in time. Findings from this meta-analysis indicate that recipients of discounting cues were more persuaded over time when the message arguments and the cue had a strong initial impact. In addition, the increase in persuasion was stronger when recipients of discounting cues had higher ability or motivation to think about the message and received the discounting cue after the message. These results are discussed in light of classic and contemporary models of attitudes and persuasion. PMID:14717653
Design and Implementation of High Performance Content-Addressable Memories.

DTIC Science & Technology

1985-12-01

content addressability and two basic implementations of content addressing. The need and application of hardware CAM is presented to motivate the " topic...3r Pass 4th Ps4 Pass Figure 2.15 Maximum SearchUsing All-Parallel CAM - left-most position (the most significant bit) and the other IF bits are zeros
Subjective vs. Documented Reality: A Case Study of Long-Term Real-Life Autobiographical Memory

ERIC Educational Resources Information Center

Mendelsohn, Avi; Furman, Orit; Navon, Inbal; Dudai, Yadin

2009-01-01

A young woman was filmed during 2 d of her ordinary life. A few months and then again a few years later she was tested for the memory of her experiences in those days while undergoing fMRI scanning. As time passed, she came to accept more false details as true. After months, activity of a network considered to subserve autobiographical memory was…
Stressful Life Events: Their Relationships with Coronary Heart Disease.

DTIC Science & Technology

1983-09-01

Whyte (1980) have said retrospective studies have been criticized for the use of subjective memories that fade with the passing of time. Another...meaning," where someone may explain away an illness through exaggeration of life events. Rabkin and Struening (1976) mentioned selective memory ...faded (lower) memory of critical events, or event denial, combined with a high cholesterol level or exaggerated account of past life events (trying to
To amend the DNA Analysis Backlog Elimination Act of 2000 to provide for Debbie Smith grants for auditing sexual assault evidence backlogs, and for other purposes.

THOMAS, 112th Congress

Sen. Cornyn, John [R-TX

2012-05-24

Senate - 01/02/2013 Message on House action received in Senate and at desk: House amendments to Senate bill. (All Actions) Tracker: This bill has the status Passed HouseHere are the steps for Status of Legislation:

Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele

2001-01-01

This viewgraph presentation provides information on support sources available for the automatic parallelization of computer program. CAPTools, a support tool developed at the University of Greenwich, transforms, with user guidance, existing sequential Fortran code into parallel message passing code. Comparison routines are then run for debugging purposes, in essence, ensuring that the code transformation was accurate.
Earth observation taken by the Expedition 43 crew.

NASA Image and Video Library

2015-03-13

Earth observation taken during a day pass by the Expedition 43 crew aboard the International Space Station (ISS). Sent as part of Twitter message: #HappyStPatrickDay with best wishes from the #E43 crew! From space you can see the âEmerald Isleâ is very green!
New Frontier Congressional Gold Medal Act

THOMAS, 111th Congress

Sen. Nelson, Bill [D-FL

2009-05-01

Senate - 09/23/2010 Message received in the Senate: Returned to the Senate pursuant to the provisions of H.Res. 1653. (All Actions) Notes: For further action, see H.R.2245, which became Public Law 111-44 on 8/7/2009. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
Travel Promotion Act of 2009

THOMAS, 111th Congress

Sen. Dorgan, Byron L. [D-ND

2009-05-12

Senate - 09/23/2010 Message received in the Senate: Returned to the Senate pursuant to the provisions of H.Res. 1653. (All Actions) Notes: For further action, see H.R.1299, which became Public Law 111-145 on 3/4/2010. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
Union Agitators

ERIC Educational Resources Information Center

Honawar, Vaishali

2006-01-01

A decade has passed since a few union leaders formed the network known as Teacher Union Reform Network (TURN) to search for innovative ways to enhance education. Selling their message has not always been easy. Created in 1995, TURN was the brain child of Adam Urbanski, the president of the Rochester (N.Y.) Teachers Association for the past 25…
At the Crossroads of Hualapai History, Memory, and American Colonization: Contesting Space and Place

ERIC Educational Resources Information Center

Shepherd, Jeffrey P.

2008-01-01

Standard, even "new Indian history" narratives of relocation and removal have generally avoided critical discussions of colonialism, memory, and space. Choosing instead to emphasize the important political, economic, social, and even cultural implications of such dislocations, much of what passes as "Indian" history fails to…
PVM Wrapper

NASA Technical Reports Server (NTRS)

Katz, Daniel

2004-01-01

PVM Wrapper is a software library that makes it possible for code that utilizes the Parallel Virtual Machine (PVM) software library to run using the message-passing interface (MPI) software library, without needing to rewrite the entire code. PVM and MPI are the two most common software libraries used for applications that involve passing of messages among parallel computers. Since about 1996, MPI has been the de facto standard. Codes written when PVM was popular often feature patterns of {"initsend," "pack," "send"} and {"receive," "unpack"} calls. In many cases, these calls are not contiguous and one set of calls may even exist over multiple subroutines. These characteristics make it difficult to obtain equivalent functionality via a single MPI "send" call. Because PVM Wrapper is written to run with MPI- 1.2, some PVM functions are not permitted and must be replaced - a task that requires some programming expertise. The "pvm_spawn" and "pvm_parent" function calls are not replaced, but a programmer can use "mpirun" and knowledge of the ranks of parent and child tasks with supplied macroinstructions to enable execution of codes that use "pvm_spawn" and "pvm_parent."
Parallelized direct execution simulation of message-passing parallel programs

NASA Technical Reports Server (NTRS)

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

1994-01-01

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
GEOS Atmospheric Model: Challenges at Exascale

NASA Technical Reports Server (NTRS)

Putman, William M.; Suarez, Max J.

2017-01-01

The Goddard Earth Observing System (GEOS) model at NASA's Global Modeling and Assimilation Office (GMAO) is used to simulate the multi-scale variability of the Earth's weather and climate, and is used primarily to assimilate conventional and satellite-based observations for weather forecasting and reanalysis. In addition, assimilations coupled to an ocean model are used for longer-term forecasting (e.g., El Nino) on seasonal to interannual times-scales. The GMAO's research activities, including system development, focus on numerous time and space scales, as detailed on the GMAO website, where they are tabbed under five major themes: Weather Analysis and Prediction; Seasonal-Decadal Analysis and Prediction; Reanalysis; Global Mesoscale Modeling, and Observing System Science. A brief description of the GEOS systems can also be found at the GMAO website. GEOS executes as a collection of earth system components connected through the Earth System Modeling Framework (ESMF). The ESMF layer is supplemented with the MAPL (Modeling, Analysis, and Prediction Layer) software toolkit developed at the GMAO, which facilitates the organization of the computational components into a hierarchical architecture. GEOS systems run in parallel using a horizontal decomposition of the Earth's sphere into processing elements (PEs). Communication between PEs is primarily through a message passing framework, using the message passing interface (MPI), and through explicit use of node-level shared memory access via the SHMEM (Symmetric Hierarchical Memory access) protocol. Production GEOS weather prediction systems currently run at 12.5-kilometer horizontal resolution with 72 vertical levels decomposed into PEs associated with 5,400 MPI processes. Research GEOS systems run at resolutions as fine as 1.5 kilometers globally using as many as 30,000 MPI processes. Looking forward, these systems can be expected to see a 2 times increase in horizontal resolution every two to three years, as well as less frequent increases in vertical resolution. Coupling these resolution changes with increases in complexity, the computational demands on the GEOS production and research systems should easily increase 100-fold over the next five years. Currently, our 12.5 kilometer weather prediction system narrowly meets the time-to-solution demands of a near-real-time production system. Work is now in progress to take advantage of a hybrid MPI-OpenMP parallelism strategy, in an attempt to achieve a modest two-fold speed-up to accommodate an immediate demand due to increased scientific complexity and an increase in vertical resolution. Pursuing demands that require a 10- to 100-fold increases or more, however, would require a detailed exploration of the computational profile of GEOS, as well as targeted solutions using more advanced high-performance computing technologies. Increased computing demands of 100-fold will be required within five years based on anticipated changes in the GEOS production systems, increases of 1000-fold can be anticipated over the next ten years.
Low-pass filtering of noisy Schlumberger sounding curves. Part I: Theory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Patella, D.

1986-02-01

A contribution is given to the solution of the problem of filtering noise-degraded Schlumberger sounding curves. It is shown that the transformation to the pole-pole system is actually a smoothing operation that filters high-frequency noise. In the case of residual noise contamination in the transformed pole-pole curve, it is demonstrated that a subsequent application of a conventional rectangular low-pass filter, with cut-off frequency not less than the right-hand frequency limit of the main message pass-band, may satisfactorily solve the problem by leaving a pole-pole curve available for interpretation. An attempt is also made to understand the essential peculiarities of themore » pole-pole system as far as penetration depth, resolving power and selectivity power are concerned.« less
Automated Instrumentation, Monitoring and Visualization of PVM Programs Using AIMS

NASA Technical Reports Server (NTRS)

Mehra, Pankaj; VanVoorst, Brian; Yan, Jerry; Tucker, Deanne (Technical Monitor)

1994-01-01

We present views and analysis of the execution of several PVM codes for Computational Fluid Dynamics on a network of Sparcstations, including (a) NAS Parallel benchmarks CG and MG (White, Alund and Sunderam 1993); (b) a multi-partitioning algorithm for NAS Parallel Benchmark SP (Wijngaart 1993); and (c) an overset grid flowsolver (Smith 1993). These views and analysis were obtained using our Automated Instrumentation and Monitoring System (AIMS) version 3.0, a toolkit for debugging the performance of PVM programs. We will describe the architecture, operation and application of AIMS. The AIMS toolkit contains (a) Xinstrument, which can automatically instrument various computational and communication constructs in message-passing parallel programs; (b) Monitor, a library of run-time trace-collection routines; (c) VK (Visual Kernel), an execution-animation tool with source-code clickback; and (d) Tally, a tool for statistical analysis of execution profiles. Currently, Xinstrument can handle C and Fortran77 programs using PVM 3.2.x; Monitor has been implemented and tested on Sun 4 systems running SunOS 4.1.2; and VK uses X11R5 and Motif 1.2. Data and views obtained using AIMS clearly illustrate several characteristic features of executing parallel programs on networked workstations: (a) the impact of long message latencies; (b) the impact of multiprogramming overheads and associated load imbalance; (c) cache and virtual-memory effects; and (4significant skews between workstation clocks. Interestingly, AIMS can compensate for constant skew (zero drift) by calibrating the skew between a parent and its spawned children. In addition, AIMS' skew-compensation algorithm can adjust timestamps in a way that eliminates physically impossible communications (e.g., messages going backwards in time). Our current efforts are directed toward creating new views to explain the observed performance of PVM programs. Some of the features planned for the near future include: (a) ConfigView, showing the physical topology of the virtual machine, inferred using specially formatted IP (Internet Protocol) packets; and (b) LoadView, synchronous animation of PVM-program execution and resource-utilization patterns.
Compiled MPI: Cost-Effective Exascale Applications Development

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bronevetsky, G; Quinlan, D; Lumsdaine, A

2012-04-10

The complexity of petascale and exascale machines makes it increasingly difficult to develop applications that can take advantage of them. Future systems are expected to feature billion-way parallelism, complex heterogeneous compute nodes and poor availability of memory (Peter Kogge, 2008). This new challenge for application development is motivating a significant amount of research and development on new programming models and runtime systems designed to simplify large-scale application development. Unfortunately, DoE has significant multi-decadal investment in a large family of mission-critical scientific applications. Scaling these applications to exascale machines will require a significant investment that will dwarf the costs of hardwaremore » procurement. A key reason for the difficulty in transitioning today's applications to exascale hardware is their reliance on explicit programming techniques, such as the Message Passing Interface (MPI) programming model to enable parallelism. MPI provides a portable and high performance message-passing system that enables scalable performance on a wide variety of platforms. However, it also forces developers to lock the details of parallelization together with application logic, making it very difficult to adapt the application to significant changes in the underlying system. Further, MPI's explicit interface makes it difficult to separate the application's synchronization and communication structure, reducing the amount of support that can be provided by compiler and run-time tools. This is in contrast to the recent research on more implicit parallel programming models such as Chapel, OpenMP and OpenCL, which promise to provide significantly more flexibility at the cost of reimplementing significant portions of the application. We are developing CoMPI, a novel compiler-driven approach to enable existing MPI applications to scale to exascale systems with minimal modifications that can be made incrementally over the application's lifetime. It includes: (1) New set of source code annotations, inserted either manually or automatically, that will clarify the application's use of MPI to the compiler infrastructure, enabling greater accuracy where needed; (2) A compiler transformation framework that leverages these annotations to transform the original MPI source code to improve its performance and scalability; (3) Novel MPI runtime implementation techniques that will provide a rich set of functionality extensions to be used by applications that have been transformed by our compiler; and (4) A novel compiler analysis that leverages simple user annotations to automatically extract the application's communication structure and synthesize most complex code annotations.« less
3-Dimensional Marine CSEM Modeling by Employing TDFEM with Parallel Solvers

NASA Astrophysics Data System (ADS)

Wu, X.; Yang, T.

2013-12-01

In this paper, parallel fulfillment is developed for forward modeling of the 3-Dimensional controlled source electromagnetic (CSEM) by using time-domain finite element method (TDFEM). Recently, a greater attention rises on research of hydrocarbon (HC) reservoir detection mechanism in the seabed. Since China has vast ocean resources, seeking hydrocarbon reservoirs become significant in the national economy. However, traditional methods of seismic exploration shown a crucial obstacle to detect hydrocarbon reservoirs in the seabed with a complex structure, due to relatively high acquisition costs and high-risking exploration. In addition, the development of EM simulations typically requires both a deep knowledge of the computational electromagnetics (CEM) and a proper use of sophisticated techniques and tools from computer science. However, the complexity of large-scale EM simulations often requires large memory because of a large amount of data, or solution time to address problems concerning matrix solvers, function transforms, optimization, etc. The objective of this paper is to present parallelized implementation of the time-domain finite element method for analysis of three-dimensional (3D) marine controlled source electromagnetic problems. Firstly, we established a three-dimensional basic background model according to the seismic data, then electromagnetic simulation of marine CSEM was carried out by using time-domain finite element method, which works on a MPI (Message Passing Interface) platform with exact orientation to allow fast detecting of hydrocarbons targets in ocean environment. To speed up the calculation process, SuperLU of an MPI (Message Passing Interface) version called SuperLU_DIST is employed in this approach. Regarding the representation of three-dimension seabed terrain with sense of reality, the region is discretized into an unstructured mesh rather than a uniform one in order to reduce the number of unknowns. Moreover, high-order Whitney vector basis functions are used for spatial discretization within the finite element approach to approximate the electric field. A horizontal electric dipole was used as a source, and an array of the receiver located at the seabed. To capture the presence of the hydrocarbon layer, the forward responses at water depths from 100m to 3000m are calculated. The normalized Magnitude Versus Offset (N-MVO) and Phase Versus Offset (PVO) curve can reflect resistive characteristics of hydrocarbon layers. For future work, Graphics Process Unit (GPU) acceleration algorithm would be carried out to multiply the calculation efficiency greatly.
Boyer-Moore Algorithm in Retrieving Deleted Short Message Service in Android Platform

NASA Astrophysics Data System (ADS)

Rahmat, R. F.; Prayoga, D. F.; Gunawan, D.; Sitompul, O. S.

2018-02-01

Short message service (SMS) can be used as digital evidence of disclosure of crime because it can strengthen the charges against the offenders. Criminals use various ways to destroy the evidence, including by deleting SMS. On the Android OS, SMS is stored in a SQLite database file. Deletion of SMS data is not followed by bit deletion in memory so that it is possible to rediscover the deleted SMS. Based on this case, the mobile forensic needs to be done to rediscover the short message service. The proposed method in this study is Boyer-Moore algorithm for searching string matching. An auto finds feature is designed to rediscover the short message service by searching using a particular pattern to rematch a text with the result of the hex value conversion in the database file. The system will redisplay the message for each of a match. From all the testing results, the proposed method has quite a high accuracy in rediscovering the short message service using the used dataset. The search results to rediscover the deleted SMS depend on the possibility of overwriting process and the vacuum procedure on the database file.
Message passing with queues and channels

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dozsa, Gabor J; Heidelberger, Philip; Kumar, Sameer

In an embodiment, a reception thread receives a source node identifier, a type, and a data pointer from an application and, in response, creates a receive request. If the source node identifier specifies a source node, the reception thread adds the receive request to a fast-post queue. If a message received from a network does not match a receive request on a posted queue, a polling thread adds a receive request that represents the message to an unexpected queue. If the fast-post queue contains the receive request, the polling thread removes the receive request from the fast-post queue. If themore » receive request that was removed from the fast-post queue does not match the receive request on the unexpected queue, the polling thread adds the receive request that was removed from the fast-post queue to the posted queue. The reception thread and the polling thread execute asynchronously from each other.« less
Secure Web-Site Access with Tickets and Message-Dependent Digests

USGS Publications Warehouse

Donato, David I.

2008-01-01

Although there are various methods for restricting access to documents stored on a World Wide Web (WWW) site (a Web site), none of the widely used methods is completely suitable for restricting access to Web applications hosted on an otherwise publicly accessible Web site. A new technique, however, provides a mix of features well suited for restricting Web-site or Web-application access to authorized users, including the following: secure user authentication, tamper-resistant sessions, simple access to user state variables by server-side applications, and clean session terminations. This technique, called message-dependent digests with tickets, or MDDT, maintains secure user sessions by passing single-use nonces (tickets) and message-dependent digests of user credentials back and forth between client and server. Appendix 2 provides a working implementation of MDDT with PHP server-side code and JavaScript client-side code.
Real time explosive hazard information sensing, processing, and communication for autonomous operation

DOEpatents

Versteeg, Roelof J; Few, Douglas A; Kinoshita, Robert A; Johnson, Doug; Linda, Ondrej

2015-02-24

Methods, computer readable media, and apparatuses provide robotic explosive hazard detection. A robot intelligence kernel (RIK) includes a dynamic autonomy structure with two or more autonomy levels between operator intervention and robot initiative A mine sensor and processing module (ESPM) operating separately from the RIK perceives environmental variables indicative of a mine using subsurface perceptors. The ESPM processes mine information to determine a likelihood of a presence of a mine. A robot can autonomously modify behavior responsive to an indication of a detected mine. The behavior is modified between detection of mines, detailed scanning and characterization of the mine, developing mine indication parameters, and resuming detection. Real time messages are passed between the RIK and the ESPM. A combination of ESPM bound messages and RIK bound messages cause the robot platform to switch between modes including a calibration mode, the mine detection mode, and the mine characterization mode.
Real time explosive hazard information sensing, processing, and communication for autonomous operation

DOEpatents

Versteeg, Roelof J.; Few, Douglas A.; Kinoshita, Robert A.; Johnson, Douglas; Linda, Ondrej

2015-12-15

Methods, computer readable media, and apparatuses provide robotic explosive hazard detection. A robot intelligence kernel (RIK) includes a dynamic autonomy structure with two or more autonomy levels between operator intervention and robot initiative A mine sensor and processing module (ESPM) operating separately from the RIK perceives environmental variables indicative of a mine using subsurface perceptors. The ESPM processes mine information to determine a likelihood of a presence of a mine. A robot can autonomously modify behavior responsive to an indication of a detected mine. The behavior is modified between detection of mines, detailed scanning and characterization of the mine, developing mine indication parameters, and resuming detection. Real time messages are passed between the RIK and the ESPM. A combination of ESPM bound messages and RIK bound messages cause the robot platform to switch between modes including a calibration mode, the mine detection mode, and the mine characterization mode.
Machine-checked proofs of the design and implementation of a fault-tolerant circuit

NASA Technical Reports Server (NTRS)

Bevier, William R.; Young, William D.

1990-01-01

A formally verified implementation of the 'oral messages' algorithm of Pease, Shostak, and Lamport is described. An abstract implementation of the algorithm is verified to achieve interactive consistency in the presence of faults. This abstract characterization is then mapped down to a hardware level implementation which inherits the fault-tolerant characteristics of the abstract version. All steps in the proof were checked with the Boyer-Moore theorem prover. A significant results is the demonstration of a fault-tolerant device that is formally specified and whose implementation is proved correct with respect to this specification. A significant simplifying assumption is that the redundant processors behave synchronously. A mechanically checked proof that the oral messages algorithm is 'optimal' in the sense that no algorithm which achieves agreement via similar message passing can tolerate a larger proportion of faulty processor is also described.
Parallel community climate model: Description and user`s guide

DOE Office of Scientific and Technical Information (OSTI.GOV)

Drake, J.B.; Flanery, R.E.; Semeraro, B.D.

This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain intomore » geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.« less

Hierarchical control of procedural and declarative category-learning systems

PubMed Central

Turner, Benjamin O.; Crossley, Matthew J.; Ashby, F. Gregory

2017-01-01

Substantial evidence suggests that human category learning is governed by the interaction of multiple qualitatively distinct neural systems. In this view, procedural memory is used to learn stimulus-response associations, and declarative memory is used to apply explicit rules and test hypotheses about category membership. However, much less is known about the interaction between these systems: how is control passed between systems as they interact to influence motor resources? Here, we used fMRI to elucidate the neural correlates of switching between procedural and declarative categorization systems. We identified a key region of the cerebellum (left Crus I) whose activity was bidirectionally modulated depending on switch direction. We also identified regions of the default mode network (DMN) that were selectively connected to left Crus I during switching. We propose that the cerebellum—in coordination with the DMN—serves a critical role in passing control between procedural and declarative memory systems. PMID:28213114
ms2: A molecular simulation tool for thermodynamic properties

NASA Astrophysics Data System (ADS)

Deublein, Stephan; Eckl, Bernhard; Stoll, Jürgen; Lishchuk, Sergey V.; Guevara-Carrion, Gabriela; Glass, Colin W.; Merker, Thorsten; Bernreuther, Martin; Hasse, Hans; Vrabec, Jadran

2011-11-01

This work presents the molecular simulation program ms2 that is designed for the calculation of thermodynamic properties of bulk fluids in equilibrium consisting of small electro-neutral molecules. ms2 features the two main molecular simulation techniques, molecular dynamics (MD) and Monte-Carlo. It supports the calculation of vapor-liquid equilibria of pure fluids and multi-component mixtures described by rigid molecular models on the basis of the grand equilibrium method. Furthermore, it is capable of sampling various classical ensembles and yields numerous thermodynamic properties. To evaluate the chemical potential, Widom's test molecule method and gradual insertion are implemented. Transport properties are determined by equilibrium MD simulations following the Green-Kubo formalism. ms2 is designed to meet the requirements of academia and industry, particularly achieving short response times and straightforward handling. It is written in Fortran90 and optimized for a fast execution on a broad range of computer architectures, spanning from single processor PCs over PC-clusters and vector computers to high-end parallel machines. The standard Message Passing Interface (MPI) is used for parallelization and ms2 is therefore easily portable to different computing platforms. Feature tools facilitate the interaction with the code and the interpretation of input and output files. The accuracy and reliability of ms2 has been shown for a large variety of fluids in preceding work. Program summaryProgram title:ms2 Catalogue identifier: AEJF_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJF_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Special Licence supplied by the authors No. of lines in distributed program, including test data, etc.: 82 794 No. of bytes in distributed program, including test data, etc.: 793 705 Distribution format: tar.gz Programming language: Fortran90 Computer: The simulation tool ms2 is usable on a wide variety of platforms, from single processor machines over PC-clusters and vector computers to vector-parallel architectures. (Tested with Fortran compilers: gfortran, Intel, PathScale, Portland Group and Sun Studio.) Operating system: Unix/Linux, Windows Has the code been vectorized or parallelized?: Yes. Message Passing Interface (MPI) protocol Scalability. Excellent scalability up to 16 processors for molecular dynamics and >512 processors for Monte-Carlo simulations. RAM:ms2 runs on single processors with 512 MB RAM. The memory demand rises with increasing number of processors used per node and increasing number of molecules. Classification: 7.7, 7.9, 12 External routines: Message Passing Interface (MPI) Nature of problem: Calculation of application oriented thermodynamic properties for rigid electro-neutral molecules: vapor-liquid equilibria, thermal and caloric data as well as transport properties of pure fluids and multi-component mixtures. Solution method: Molecular dynamics, Monte-Carlo, various classical ensembles, grand equilibrium method, Green-Kubo formalism. Restrictions: No. The system size is user-defined. Typical problems addressed by ms2 can be solved by simulating systems containing typically 2000 molecules or less. Unusual features: Feature tools are available for creating input files, analyzing simulation results and visualizing molecular trajectories. Additional comments: Sample makefiles for multiple operation platforms are provided. Documentation is provided with the installation package and is available at http://www.ms-2.de. Running time: The running time of ms2 depends on the problem set, the system size and the number of processes used in the simulation. Running four processes on a "Nehalem" processor, simulations calculating VLE data take between two and twelve hours, calculating transport properties between six and 24 hours.
A Rational Analysis of the Effects of Memory Biases on Serial Reproduction

ERIC Educational Resources Information Center

Xu, Jing; Griffiths, Thomas L.

2010-01-01

Many human interactions involve pieces of information being passed from one person to another, raising the question of how this process of information transmission is affected by the cognitive capacities of the agents involved. Bartlett (1932) explored the influence of memory biases on the "serial reproduction" of information, in which one…
Emotional Arousal and Enhanced Amygdala Activity: New Evidence for the Old Perseveration-Consolidation Hypothesis

ERIC Educational Resources Information Center

McGaugh, James L.

2005-01-01

Just a little over a century has passed since Muller and Pilzecker (1900) proposed the "perseveration-consolidation" hypothesis suggesting that neural activity initiated by newly learned information perseverates for a while and that such perseveration is critical for consolidating memory. Although memory consolidation is currently the focus of…
Fundamental bound on the persistence and capacity of short-term memory stored as graded persistent activity.

PubMed

Koyluoglu, Onur Ozan; Pertzov, Yoni; Manohar, Sanjay; Husain, Masud; Fiete, Ila R

2017-09-07

It is widely believed that persistent neural activity underlies short-term memory. Yet, as we show, the degradation of information stored directly in such networks behaves differently from human short-term memory performance. We build a more general framework where memory is viewed as a problem of passing information through noisy channels whose degradation characteristics resemble those of persistent activity networks. If the brain first encoded the information appropriately before passing the information into such networks, the information can be stored substantially more faithfully. Within this framework, we derive a fundamental lower-bound on recall precision, which declines with storage duration and number of stored items. We show that human performance, though inconsistent with models involving direct (uncoded) storage in persistent activity networks, can be well-fit by the theoretical bound. This finding is consistent with the view that if the brain stores information in patterns of persistent activity, it might use codes that minimize the effects of noise, motivating the search for such codes in the brain.
Fundamental bound on the persistence and capacity of short-term memory stored as graded persistent activity

PubMed Central

Pertzov, Yoni; Manohar, Sanjay; Husain, Masud; Fiete, Ila R

2017-01-01

It is widely believed that persistent neural activity underlies short-term memory. Yet, as we show, the degradation of information stored directly in such networks behaves differently from human short-term memory performance. We build a more general framework where memory is viewed as a problem of passing information through noisy channels whose degradation characteristics resemble those of persistent activity networks. If the brain first encoded the information appropriately before passing the information into such networks, the information can be stored substantially more faithfully. Within this framework, we derive a fundamental lower-bound on recall precision, which declines with storage duration and number of stored items. We show that human performance, though inconsistent with models involving direct (uncoded) storage in persistent activity networks, can be well-fit by the theoretical bound. This finding is consistent with the view that if the brain stores information in patterns of persistent activity, it might use codes that minimize the effects of noise, motivating the search for such codes in the brain. PMID:28879851
Auditory cortical function during verbal episodic memory encoding in Alzheimer's disease.

PubMed

Dhanjal, Novraj S; Warren, Jane E; Patel, Maneesh C; Wise, Richard J S

2013-02-01

Episodic memory encoding of a verbal message depends upon initial registration, which requires sustained auditory attention followed by deep semantic processing of the message. Motivated by previous data demonstrating modulation of auditory cortical activity during sustained attention to auditory stimuli, we investigated the response of the human auditory cortex during encoding of sentences to episodic memory. Subsequently, we investigated this response in patients with mild cognitive impairment (MCI) and probable Alzheimer's disease (pAD). Using functional magnetic resonance imaging, 31 healthy participants were studied. The response in 18 MCI and 18 pAD patients was then determined, and compared to 18 matched healthy controls. Subjects heard factual sentences, and subsequent retrieval performance indicated successful registration and episodic encoding. The healthy subjects demonstrated that suppression of auditory cortical responses was related to greater success in encoding heard sentences; and that this was also associated with greater activity in the semantic system. In contrast, there was reduced auditory cortical suppression in patients with MCI, and absence of suppression in pAD. Administration of a central cholinesterase inhibitor (ChI) partially restored the suppression in patients with pAD, and this was associated with an improvement in verbal memory. Verbal episodic memory impairment in AD is associated with altered auditory cortical function, reversible with a ChI. Although these results may indicate the direct influence of pathology in auditory cortex, they are also likely to indicate a partially reversible impairment of feedback from neocortical systems responsible for sustained attention and semantic processing. Copyright © 2012 American Neurological Association.
Development of a Grid-Independent Geos-Chem Chemical Transport Model (v9-02) as an Atmospheric Chemistry Module for Earth System Models

NASA Technical Reports Server (NTRS)

Long, M. S.; Yantosca, R.; Nielsen, J. E; Keller, C. A.; Da Silva, A.; Sulprizio, M. P.; Pawson, S.; Jacob, D. J.

2015-01-01

The GEOS-Chem global chemical transport model (CTM), used by a large atmospheric chemistry research community, has been re-engineered to also serve as an atmospheric chemistry module for Earth system models (ESMs). This was done using an Earth System Modeling Framework (ESMF) interface that operates independently of the GEOSChem scientific code, permitting the exact same GEOSChem code to be used as an ESM module or as a standalone CTM. In this manner, the continual stream of updates contributed by the CTM user community is automatically passed on to the ESM module, which remains state of science and referenced to the latest version of the standard GEOS-Chem CTM. A major step in this re-engineering was to make GEOS-Chem grid independent, i.e., capable of using any geophysical grid specified at run time. GEOS-Chem data sockets were also created for communication between modules and with external ESM code. The grid-independent, ESMF-compatible GEOS-Chem is now the standard version of the GEOS-Chem CTM. It has been implemented as an atmospheric chemistry module into the NASA GEOS- 5 ESM. The coupled GEOS-5-GEOS-Chem system was tested for scalability and performance with a tropospheric oxidant-aerosol simulation (120 coupled species, 66 transported tracers) using 48-240 cores and message-passing interface (MPI) distributed-memory parallelization. Numerical experiments demonstrate that the GEOS-Chem chemistry module scales efficiently for the number of cores tested, with no degradation as the number of cores increases. Although inclusion of atmospheric chemistry in ESMs is computationally expensive, the excellent scalability of the chemistry module means that the relative cost goes down with increasing number of cores in a massively parallel environment.
The effect of Twitter exposure on false memory formation.

PubMed

Fenn, Kimberly M; Griffin, Nicholas R; Uitvlugt, Mitchell G; Ravizza, Susan M

2014-12-01

Social media sites such as Facebook and Twitter have increased drastically in popularity. However, information on these sites is not verified and may contain inaccuracies. It is well-established that false information encountered after an event can lead to memory distortion. Therefore, social media may be particularly harmful for autobiographical memory. Here, we tested the effect of Twitter on false memory. We presented participants with a series of images that depicted a story and then presented false information about the images in a scrolling feed that bore either a low or high resemblance to a Twitter feed. Confidence for correct information was similar across the groups, but confidence for suggested information was significantly lower when false information was presented in a Twitter format. We propose that individuals take into account the medium of the message when integrating information into memory.
Approving the renewal of import restrictions contained in the Burmese Freedom and Democracy Act of 2003.

THOMAS, 112th Congress

Rep. Crowley, Joseph [D-NY-7

2011-05-26

Senate - 09/15/2011 Message on Senate action sent to the House. (All Actions) Notes: For further action, see H.R.2017, which became Public Law 112-33 on 9/30/2011. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
Final report: Compiled MPI. Cost-Effective Exascale Application Development

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gropp, William Douglas

2015-12-21

This is the final report on Compiled MPI: Cost-Effective Exascale Application Development, and summarizes the results under this project. The project investigated runtime enviroments that improve the performance of MPI (Message-Passing Interface) programs; work at Illinois in the last period of this project looked at optimizing data access optimizations expressed with MPI datatypes.
Emergency Border Security Supplemental Appropriations Act, 2010

THOMAS, 111th Congress

Rep. Price, David E. [D-NC-4

2010-07-27

Senate - 09/23/2010 Message received in the Senate: Returned to the Senate pursuant to the provisions of H.Res. 1653. (All Actions) Notes: For further action, see H.R.6080, which became Public Law 111-230 on 8/13/2010. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
A Down-to-Earth Educational Operating System for Up-in-the-Cloud Many-Core Architectures

ERIC Educational Resources Information Center

Ziwisky, Michael; Persohn, Kyle; Brylow, Dennis

2013-01-01

We present "Xipx," the first port of a major educational operating system to a processor in the emerging class of many-core architectures. Through extensions to the proven Embedded Xinu operating system, Xipx gives students hands-on experience with system programming in a distributed message-passing environment. We expose the software primitives…
LDPC Codes--Structural Analysis and Decoding Techniques

ERIC Educational Resources Information Center

Zhang, Xiaojie

2012-01-01

Low-density parity-check (LDPC) codes have been the focus of much research over the past decade thanks to their near Shannon limit performance and to their efficient message-passing (MP) decoding algorithms. However, the error floor phenomenon observed in MP decoding, which manifests itself as an abrupt change in the slope of the error-rate curve,…
Comprehensive Iran Sanctions, Accountability, and Divestment Act of 2009

THOMAS, 111th Congress

Sen. Dodd, Christopher J. [D-CT

2009-11-19

Senate - 09/23/2010 Message received in the Senate: Returned to the Senate pursuant to the provisions of H.Res. 1653. (All Actions) Notes: For further action, see H.R.2194, which became Public Law 111-195 on 7/1/2010. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
Federal Aviation Administration Extension Act of 2010

THOMAS, 111th Congress

Sen. Rockefeller, John D., IV [D-WV

2010-03-25

Senate - 09/23/2010 Message received in the Senate: Returned to the Senate pursuant to the provisions of H.Res. 1653. (All Actions) Notes: For further action, see H.R.5147, which became Public Law 111-161 on 4/30/2010. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
Minority Journalists Push Media to Maintain Diversity Commitment

ERIC Educational Resources Information Center

Anderson, Michelle D.

2008-01-01

In a media advisory released earlier this month, the National Association of Black Journalists (NABJ) had an urgent message for the newspaper industry: Diversity should not be treated like a passing fad and it should continue to be a top priority. The advocacy organization, where a majority of its 4,000 members are Black print journalists, warned…
Asynchronous Data Retrieval from an Object-Oriented Database

NASA Astrophysics Data System (ADS)

Gilbert, Jonathan P.; Bic, Lubomir

We present an object-oriented semantic database model which, similar to other object-oriented systems, combines the virtues of four concepts: the functional data model, a property inheritance hierarchy, abstract data types and message-driven computation. The main emphasis is on the last of these four concepts. We describe generic procedures that permit queries to be processed in a purely message-driven manner. A database is represented as a network of nodes and directed arcs, in which each node is a logical processing element, capable of communicating with other nodes by exchanging messages. This eliminates the need for shared memory and for centralized control during query processing. Hence, the model is suitable for implementation on a multiprocessor computer architecture, consisting of large numbers of loosely coupled processing elements.
When is Deceptive Message Production More Effortful than Truth-Telling? A Baker's Dozen of Moderators.

PubMed

Burgoon, Judee K

2015-01-01

Deception is thought to be more effortful than telling the truth. Empirical evidence from many quarters supports this general proposition. However, there are many factors that qualify and even reverse this pattern. Guided by a communication perspective, I present a baker's dozen of moderators that may alter the degree of cognitive difficulty associated with producing deceptive messages. Among sender-related factors are memory processes, motivation, incentives, and consequences. Lying increases activation of a network of brain regions related to executive memory, suppression of unwanted behaviors, and task switching that is not observed with truth-telling. High motivation coupled with strong incentives or the risk of adverse consequences also prompts more cognitive exertion-for truth-tellers and deceivers alike-to appear credible, with associated effects on performance and message production effort, depending on the magnitude of effort, communicator skill, and experience. Factors related to message and communication context include discourse genre, type of prevarication, expected response length, communication medium, preparation, and recency of target event/issue. These factors can attenuate the degree of cognitive taxation on senders so that truth-telling and deceiving are similarly effortful. Factors related to the interpersonal relationship among interlocutors include whether sender and receiver are cooperative or adversarial and how well-acquainted they are with one another. A final consideration is whether the unit of analysis is the utterance, turn at talk, episode, entire interaction, or series of interactions. Taking these factors into account should produce a more nuanced answer to the question of when deception is more difficult than truth-telling.
From Our Practices to Yours: Key Messages for the Journey to Integrated Behavioral Health.

PubMed

Gold, Stephanie B; Green, Larry A; Peek, C J

The historic, cultural separation of primary care and behavioral health has caused the spread of integrated care to lag behind other practice transformation efforts. The Advancing Care Together study was a 3-year evaluation of how practices implemented integrated care in their local contexts; at its culmination, practice leaders ("innovators") identified lessons learned to pass on to others. Individual feedback from innovators, key messages created by workgroups of innovators and the study team, and a synthesis of key messages from a facilitated discussion were analyzed for themes via immersion/crystallization. Five key themes were captured: (1) frame integrated care as a necessary paradigm shift to patient-centered, whole-person health care; (2) initialize: define relationships and protocols up-front, understanding they will evolve; (3) build inclusive, empowered teams to provide the foundation for integration; (4) develop a change management strategy of continuous evaluation and course-correction; and (5) use targeted data collection pertinent to integrated care to drive improvement and impart accountability. Innovators integrating primary care and behavioral health discerned key messages from their practical experience that they felt were worth sharing with others. Their messages present insight into the challenges unique to integrating care beyond other practice transformation efforts. © Copyright 2017 by the American Board of Family Medicine.

Concurrent hypercube system with improved message passing

NASA Technical Reports Server (NTRS)

Peterson, John C. (Inventor); Tuazon, Jesus O. (Inventor); Lieberman, Don (Inventor); Pniel, Moshe (Inventor)

1989-01-01

A network of microprocessors, or nodes, are interconnected in an n-dimensional cube having bidirectional communication links along the edges of the n-dimensional cube. Each node's processor network includes an I/O subprocessor dedicated to controlling communication of message packets along a bidirectional communication link with each end thereof terminating at an I/O controlled transceiver. Transmit data lines are directly connected from a local FIFO through each node's communication link transceiver. Status and control signals from the neighboring nodes are delivered over supervisory lines to inform the local node that the neighbor node's FIFO is empty and the bidirectional link between the two nodes is idle for data communication. A clocking line between neighbors, clocks a message into an empty FIFO at a neighbor's node and vica versa. Either neighbor may acquire control over the bidirectional communication link at any time, and thus each node has circuitry for checking whether or not the communication link is busy or idle, and whether or not the receive FIFO is empty. Likewise, each node can empty its own FIFO and in turn deliver a status signal to a neighboring node indicating that the local FIFO is empty. The system includes features of automatic message rerouting, block message transfer and automatic parity checking and generation.
Support for non-locking parallel reception of packets belonging to a single memory reception FIFO

DOEpatents

Chen, Dong [Yorktown Heights, NY; Heidelberger, Philip [Yorktown Heights, NY; Salapura, Valentina [Yorktown Heights, NY; Senger, Robert M [Yorktown Heights, NY; Steinmacher-Burow, Burkhard [Boeblingen, DE; Sugawara, Yutaka [Yorktown Heights, NY

2011-01-27

A method and apparatus for distributed parallel messaging in a parallel computing system. A plurality of DMA engine units are configured in a multiprocessor system to operate in parallel, one DMA engine unit for transferring a current packet received at a network reception queue to a memory location in a memory FIFO (rmFIFO) region of a memory. A control unit implements logic to determine whether any prior received packet destined for that rmFIFO is still in a process of being stored in the associated memory by another DMA engine unit of the plurality, and prevent the one DMA engine unit from indicating completion of storing the current received packet in the reception memory FIFO (rmFIFO) until all prior received packets destined for that rmFIFO are completely stored by the other DMA engine units. Thus, there is provided non-locking support so that multiple packets destined for a single rmFIFO are transferred and stored in parallel to predetermined locations in a memory.
Performance issues for domain-oriented time-driven distributed simulations

NASA Technical Reports Server (NTRS)

Nicol, David M.

1987-01-01

It has long been recognized that simulations form an interesting and important class of computations that may benefit from distributed or parallel processing. Since the point of parallel processing is improved performance, the recent proliferation of multiprocessors requires that we consider the performance issues that naturally arise when attempting to implement a distributed simulation. Three such issues are: (1) the problem of mapping the simulation onto the architecture, (2) the possibilities for performing redundant computation in order to reduce communication, and (3) the avoidance of deadlock due to distributed contention for message-buffer space. These issues are discussed in the context of a battlefield simulation implemented on a medium-scale multiprocessor message-passing architecture.
Parallel integer sorting with medium and fine-scale parallelism

NASA Technical Reports Server (NTRS)

Dagum, Leonardo

1993-01-01

Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
The Effect of Noise on the Relationship Between Auditory Working Memory and Comprehension in School-Age Children.

PubMed

Sullivan, Jessica R; Osman, Homira; Schafer, Erin C

2015-06-01

The objectives of the current study were to examine the effect of noise (-5 dB SNR) on auditory comprehension and to examine its relationship with working memory. It was hypothesized that noise has a negative impact on information processing, auditory working memory, and comprehension. Children with normal hearing between the ages of 8 and 10 years were administered working memory and comprehension tasks in quiet and noise. The comprehension measure comprised 5 domains: main idea, details, reasoning, vocabulary, and understanding messages. Performance on auditory working memory and comprehension tasks were significantly poorer in noise than in quiet. The reasoning, details, understanding, and vocabulary subtests were particularly affected in noise (p < .05). The relationship between auditory working memory and comprehension was stronger in noise than in quiet, suggesting an increased contribution of working memory. These data suggest that school-age children's auditory working memory and comprehension are negatively affected by noise. Performance on comprehension tasks in noise is strongly related to demands placed on working memory, supporting the theory that degrading listening conditions draws resources away from the primary task.
[Old age in primary school readers: a journey through the end of the 19th century to the start of the 21st century in Argentina].

PubMed

Oddone, María Julieta

2013-04-01

This article presents the content (discourse) analysis of messages transmitted by primary school readers in the period between 1880 to 2012. This study allowed us to explore the image of old age and aging that society has and passes on to new generations as well as the role assigned to this generational group. The historical periods that provide the context for the data were defined according to the continuity of or the turning points in the social values transmitted in the reading materials. The role assigned to elderly people and the image of old age that the Argentine society passed on and continues to pass on to younger generations demonstrate that each period described has its own model of aging.
Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications

NASA Astrophysics Data System (ADS)

Francés, J.; Otero, B.; Bleda, S.; Gallego, S.; Neipp, C.; Márquez, A.; Beléndez, A.

2015-06-01

The Finite-Difference Time-Domain (FDTD) method is applied to the analysis of vibroacoustic problems and to study the propagation of longitudinal and transversal waves in a stratified media. The potential of the scheme and the relevance of each acceleration strategy for massively computations in FDTD are demonstrated in this work. In this paper, we propose two new specific implementations of the bi-dimensional scheme of the FDTD method using multi-CPU and multi-GPU, respectively. In the first implementation, an open source message passing interface (OMPI) has been included in order to massively exploit the resources of a biprocessor station with two Intel Xeon processors. Moreover, regarding CPU code version, the streaming SIMD extensions (SSE) and also the advanced vectorial extensions (AVX) have been included with shared memory approaches that take advantage of the multi-core platforms. On the other hand, the second implementation called the multi-GPU code version is based on Peer-to-Peer communications available in CUDA on two GPUs (NVIDIA GTX 670). Subsequently, this paper presents an accurate analysis of the influence of the different code versions including shared memory approaches, vector instructions and multi-processors (both CPU and GPU) and compares them in order to delimit the degree of improvement of using distributed solutions based on multi-CPU and multi-GPU. The performance of both approaches was analysed and it has been demonstrated that the addition of shared memory schemes to CPU computing improves substantially the performance of vector instructions enlarging the simulation sizes that use efficiently the cache memory of CPUs. In this case GPU computing is slightly twice times faster than the fine tuned CPU version in both cases one and two nodes. However, for massively computations explicit vector instructions do not worth it since the memory bandwidth is the limiting factor and the performance tends to be the same than the sequential version with auto-vectorisation and also shared memory approach. In this scenario GPU computing is the best option since it provides a homogeneous behaviour. More specifically, the speedup of GPU computing achieves an upper limit of 12 for both one and two GPUs, whereas the performance reaches peak values of 80 GFlops and 146 GFlops for the performance for one GPU and two GPUs respectively. Finally, the method is applied to an earth crust profile in order to demonstrate the potential of our approach and the necessity of applying acceleration strategies in these type of applications.
A Biopsychological Model of Anti-drug PSA Processing: Developing Effective Persuasive Messages.

PubMed

Hohman, Zachary P; Keene, Justin Robert; Harris, Breanna N; Niedbala, Elizabeth M; Berke, Collin K

2017-11-01

For the current study, we developed and tested a biopsychological model to combine research on psychological tension, the Limited Capacity Model of Motivated Mediated Message Processing, and the endocrine system to predict and understand how people process anti-drug PSAs. We predicted that co-presentation of pleasant and unpleasant information, vs. solely pleasant or unpleasant, will trigger evaluative tension about the target behavior in persuasive messages and result in a biological response (increase in cortisol, alpha amylase, and heart rate). In experiment 1, we assessed the impact of co-presentation of pleasant and unpleasant information in persuasive messages on evaluative tension (conceptualized as attitude ambivalence), in experiment 2, we explored the impact of co-presentation on endocrine system responses (salivary cortisol and alpha amylase), and in experiment 3, we assessed the impact of co-presentation on heart rate. Across all experiments, we demonstrated that co-presentation of pleasant and unpleasant information, vs. solely pleasant or unpleasant, in persuasive communications leads to increases in attitude ambivalence, salivary cortisol, salivary alpha amylase, and heart rate. Taken together, the results support the initial paths of our biopsychological model of persuasive message processing and indicate that including both pleasant and unpleasant information in a message impacts the viewer. We predict that increases in evaluative tension and biological responses will aid in memory and cognitive processing of the message. However, future research is needed to test that hypothesis.
Women's somatic styles: rethinking breast self-examination education.

PubMed

Ellingson, Lyndall

2003-11-01

In this ethnographic study I explored women's somatic and sexual experiences, reception of breast self-examination (BSE) messages, and reactions to the practice of BSE. Mainstream BSE education uses messages that deemphasize the woman, her breasts, and her relationship to them as sexual. The turbid confluence of societally eroticized breasts and self-touch taboos makes it unlikely that women filter these messages in an asexual way. Using grounded theory, I examined women's expression of the self-body relationship and the sociocultural milieu within which women consider BSE education materials. Seven subjects varying in age, sexual orientation, parenting, and relationship status were interviewed about their physical experiences, self-touch, and body image. Subjects also participated in a BSE class and focus group, and composed a journal entry describing their reactions to practicing BSE. Discernible patterns in somatic memories, somatic styles, and reactions to BSE educational messages were found. This study suggests a need for a more consciously feminist approach to BSE education.
Generation of Quality Pulses for Control of Qubit/Quantum Memory Spin States: Experimental and Simulation

DTIC Science & Technology

2016-09-01

as an example the integration of cryogenic superconductor components, including filters and amplifiers to improve the pulse quality and validate the...5 5.1 CRYOGENIC BAND-PASS FILTERS .............................................................................10 6. BIBLIOGRAPHY...10 16. Gain plot of DARPA SURF tunable band-pass filter tuned to 950-MHz .............................. 10 v 17. VSG at -50 dBm: Experimental
Efficient and Accurate Optimal Linear Phase FIR Filter Design Using Opposition-Based Harmony Search Algorithm

PubMed Central

Saha, S. K.; Dutta, R.; Choudhury, R.; Kar, R.; Mandal, D.; Ghoshal, S. P.

2013-01-01

In this paper, opposition-based harmony search has been applied for the optimal design of linear phase FIR filters. RGA, PSO, and DE have also been adopted for the sake of comparison. The original harmony search algorithm is chosen as the parent one, and opposition-based approach is applied. During the initialization, randomly generated population of solutions is chosen, opposite solutions are also considered, and the fitter one is selected as a priori guess. In harmony memory, each such solution passes through memory consideration rule, pitch adjustment rule, and then opposition-based reinitialization generation jumping, which gives the optimum result corresponding to the least error fitness in multidimensional search space of FIR filter design. Incorporation of different control parameters in the basic HS algorithm results in the balancing of exploration and exploitation of search space. Low pass, high pass, band pass, and band stop FIR filters are designed with the proposed OHS and other aforementioned algorithms individually for comparative optimization performance. A comparison of simulation results reveals the optimization efficacy of the OHS over the other optimization techniques for the solution of the multimodal, nondifferentiable, nonlinear, and constrained FIR filter design problems. PMID:23844390
Efficient and accurate optimal linear phase FIR filter design using opposition-based harmony search algorithm.

PubMed

Saha, S K; Dutta, R; Choudhury, R; Kar, R; Mandal, D; Ghoshal, S P

2013-01-01

In this paper, opposition-based harmony search has been applied for the optimal design of linear phase FIR filters. RGA, PSO, and DE have also been adopted for the sake of comparison. The original harmony search algorithm is chosen as the parent one, and opposition-based approach is applied. During the initialization, randomly generated population of solutions is chosen, opposite solutions are also considered, and the fitter one is selected as a priori guess. In harmony memory, each such solution passes through memory consideration rule, pitch adjustment rule, and then opposition-based reinitialization generation jumping, which gives the optimum result corresponding to the least error fitness in multidimensional search space of FIR filter design. Incorporation of different control parameters in the basic HS algorithm results in the balancing of exploration and exploitation of search space. Low pass, high pass, band pass, and band stop FIR filters are designed with the proposed OHS and other aforementioned algorithms individually for comparative optimization performance. A comparison of simulation results reveals the optimization efficacy of the OHS over the other optimization techniques for the solution of the multimodal, nondifferentiable, nonlinear, and constrained FIR filter design problems.
Strategic Regulation of Grain Size in Memory Reporting over Time

ERIC Educational Resources Information Center

Goldsmith, M.; Koriat, A.; Pansky, A.

2005-01-01

As time passes, people often remember the gist of an event though they cannot remember its details. Can rememberers exploit this difference by strategically regulating the ''grain size'' of their answers over time, to avoid reporting wrong information? A metacognitive model of the control of grain size in memory reporting was examined in two…
Changes in Context-Specificity during Memory Reconsolidation: Selective Effects of Hippocampal Lesions

ERIC Educational Resources Information Center

Winocur, Gordon; Frankland, Paul W.; Sekeres, Melanie; Fogel, Stuart; Moscovitch, Morris

2009-01-01

After acquisition, memories associated with contextual fear conditioning pass through a labile phase, in which they are vulnerable to hippocampal lesions, to a more stable state, via consolidation, in which they engage extrahippocampal structures and are resistant to such disruption. The process is accompanied by changes in the form of the memory…
Windows Memory Forensic Data Visualization

DTIC Science & Technology

2014-06-12

clustering characteristics (Bastian, et al, 2009). The software is written in Java and utilizes the OpenGL library for rendering graphical content...Toolkit 2 nd ed. Burlington MA: Syngress. D3noob. (2013, February 8). Using a MYSQL database as a source of data. Message posted to http
Improving security of the ping-pong protocol

NASA Astrophysics Data System (ADS)

Zawadzki, Piotr

2013-01-01

A security layer for the asymptotically secure ping-pong protocol is proposed and analyzed in the paper. The operation of the improvement exploits inevitable errors introduced by the eavesdropping in the control and message modes. Its role is similar to the privacy amplification algorithms known from the quantum key distribution schemes. Messages are processed in blocks which guarantees that an eavesdropper is faced with a computationally infeasible problem as long as the system parameters are within reasonable limits. The introduced additional information preprocessing does not require quantum memory registers and confidential communication is possible without prior key agreement or some shared secret.
Eavesdropping on Memory.

PubMed

Loftus, Elizabeth F

2017-01-03

For more than four decades, I have been studying human memory. My research concerns the malleable nature of memory. Information suggested to an individual about an event can be integrated with the memory of the event itself, so that what actually occurred, and what was discussed later about what may have occurred, become inextricably interwoven, allowing distortion, elaboration, and even total fabrication. In my writings, classes, and public speeches, I've tried to convey one important take-home message: Just because someone tells you something in great detail, with much confidence, and with emotion, it doesn't mean that it is true. Here I describe my professional life as an experimental psychologist, in which I've eavesdropped on this process, as well as many personal experiences that may have influenced my thinking and choices.
Low message sensation health promotion videos are better remembered and activate areas of the brain associated with memory encoding.

PubMed

Seelig, David; Wang, An-Li; Jagannathan, Kanchana; Jaganathan, Kanchana; Loughead, James W; Blady, Shira J; Childress, Anna Rose; Romer, Daniel; Langleben, Daniel D

2014-01-01

Greater sensory stimulation in advertising has been postulated to facilitate attention and persuasion. For this reason, video ads promoting health behaviors are often designed to be high in "message sensation value" (MSV), a standardized measure of sensory intensity of the audiovisual and content features of an ad. However, our previous functional Magnetic Resonance Imaging (fMRI) study showed that low MSV ads were better remembered and produced more prefrontal and temporal and less occipital cortex activation, suggesting that high MSV may divert cognitive resources from processing ad content. The present study aimed to determine whether these findings from anti-smoking ads generalize to other public health topics, such as safe sex. Thirty-nine healthy adults viewed high- and low MSV ads promoting safer sex through condom use, during an fMRI session. Recognition memory of the ads was tested immediately and 3 weeks after the session. We found that low MSV condom ads were better remembered than the high MSV ads at both time points and replicated the fMRI patterns previously reported for the anti-smoking ads. Occipital and superior temporal activation was negatively related to the attitudes favoring condom use (see Condom Attitudes Scale, Methods and Materials section). Psychophysiological interaction (PPI) analysis of the relation between occipital and fronto-temporal (middle temporal and inferior frontal gyri) cortices revealed weaker negative interactions between occipital and fronto-temporal cortices during viewing of the low MSV that high MSV ads. These findings confirm that the low MSV video health messages are better remembered than the high MSV messages and that this effect generalizes across public health domains. The greater engagement of the prefrontal and fronto-temporal cortices by low MSV ads and the greater occipital activation by high MSV ads suggest that that the "attention-grabbing" high MSV format could impede the learning and retention of public health messages.
Low Message Sensation Health Promotion Videos Are Better Remembered and Activate Areas of the Brain Associated with Memory Encoding

PubMed Central

Jaganathan, Kanchana; Loughead, James W.; Blady, Shira J.; Childress, Anna Rose; Romer, Daniel; Langleben, Daniel D.

2014-01-01

Greater sensory stimulation in advertising has been postulated to facilitate attention and persuasion. For this reason, video ads promoting health behaviors are often designed to be high in “message sensation value” (MSV), a standardized measure of sensory intensity of the audiovisual and content features of an ad. However, our previous functional Magnetic Resonance Imaging (fMRI) study showed that low MSV ads were better remembered and produced more prefrontal and temporal and less occipital cortex activation, suggesting that high MSV may divert cognitive resources from processing ad content. The present study aimed to determine whether these findings from anti-smoking ads generalize to other public health topics, such as safe sex. Thirty-nine healthy adults viewed high- and low MSV ads promoting safer sex through condom use, during an fMRI session. Recognition memory of the ads was tested immediately and 3 weeks after the session. We found that low MSV condom ads were better remembered than the high MSV ads at both time points and replicated the fMRI patterns previously reported for the anti-smoking ads. Occipital and superior temporal activation was negatively related to the attitudes favoring condom use (see Condom Attitudes Scale, Methods and Materials section). Psychophysiological interaction (PPI) analysis of the relation between occipital and fronto-temporal (middle temporal and inferior frontal gyri) cortices revealed weaker negative interactions between occipital and fronto-temporal cortices during viewing of the low MSV that high MSV ads. These findings confirm that the low MSV video health messages are better remembered than the high MSV messages and that this effect generalizes across public health domains. The greater engagement of the prefrontal and fronto-temporal cortices by low MSV ads and the greater occipital activation by high MSV ads suggest that that the “attention-grabbing” high MSV format could impede the learning and retention of public health messages. PMID:25409187
Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tsugane, Keisuke; Boku, Taisuke; Murai, Hitoshi

Recently, the Partitioned Global Address Space (PGAS) parallel programming model has emerged as a usable distributed memory programming model. XcalableMP (XMP) is a PGAS parallel programming language that extends base languages such as C and Fortran with directives in OpenMP-like style. XMP supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensionalmore » gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. Finally, the performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.« less

Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP

DOE PAGES

Tsugane, Keisuke; Boku, Taisuke; Murai, Hitoshi; ...

2016-06-01

Recently, the Partitioned Global Address Space (PGAS) parallel programming model has emerged as a usable distributed memory programming model. XcalableMP (XMP) is a PGAS parallel programming language that extends base languages such as C and Fortran with directives in OpenMP-like style. XMP supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensionalmore » gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. Finally, the performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.« less
Checkpointing Shared Memory Programs at the Application-level

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bronevetsky, G; Schulz, M; Szwed, P

2004-09-08

Trends in high-performance computing are making it necessary for long-running applications to tolerate hardware faults. The most commonly used approach is checkpoint and restart(CPR)-the state of the computation is saved periodically on disk, and when a failure occurs, the computation is restarted from the last saved state. At present, it is the responsibility of the programmer to instrument applications for CPR. Our group is investigating the use of compiler technology to instrument codes to make them self-checkpointing and self-restarting, thereby providing an automatic solution to the problem of making long-running scientific applications resilient to hardware faults. Our previous work focusedmore » on message-passing programs. In this paper, we describe such a system for shared-memory programs running on symmetric multiprocessors. The system has two components: (i)a pre-compiler for source-to-source modification of applications, and (ii) a runtime system that implements a protocol for coordinating CPR among the threads of the parallel application. For the sake of concreteness, we focus on a non-trivial subset of OpenMP that includes barriers and locks. One of the advantages of this approach is that the ability to tolerate faults becomes embedded within the application itself, so applications become self-checkpointing and self-restarting on any platform. We demonstrate this by showing that our transformed benchmarks can checkpoint and restart on three different platforms (Windows/x86, Linux/x86, and Tru64/Alpha). Our experiments show that the overhead introduced by this approach is usually quite small; they also suggest ways in which the current implementation can be tuned to reduced overheads further.« less
PhreeqcRM: A reaction module for transport simulators based on the geochemical model PHREEQC

USGS Publications Warehouse

Parkhurst, David L.; Wissmeier, Laurin

2015-01-01

PhreeqcRM is a geochemical reaction module designed specifically to perform equilibrium and kinetic reaction calculations for reactive transport simulators that use an operator-splitting approach. The basic function of the reaction module is to take component concentrations from the model cells of the transport simulator, run geochemical reactions, and return updated component concentrations to the transport simulator. If multicomponent diffusion is modeled (e.g., Nernst–Planck equation), then aqueous species concentrations can be used instead of component concentrations. The reaction capabilities are a complete implementation of the reaction capabilities of PHREEQC. In each cell, the reaction module maintains the composition of all of the reactants, which may include minerals, exchangers, surface complexers, gas phases, solid solutions, and user-defined kinetic reactants.PhreeqcRM assigns initial and boundary conditions for model cells based on standard PHREEQC input definitions (files or strings) of chemical compositions of solutions and reactants. Additional PhreeqcRM capabilities include methods to eliminate reaction calculations for inactive parts of a model domain, transfer concentrations and other model properties, and retrieve selected results. The module demonstrates good scalability for parallel processing by using multiprocessing with MPI (message passing interface) on distributed memory systems, and limited scalability using multithreading with OpenMP on shared memory systems. PhreeqcRM is written in C++, but interfaces allow methods to be called from C or Fortran. By using the PhreeqcRM reaction module, an existing multicomponent transport simulator can be extended to simulate a wide range of geochemical reactions. Results of the implementation of PhreeqcRM as the reaction engine for transport simulators PHAST and FEFLOW are shown by using an analytical solution and the reactive transport benchmark of MoMaS.
Time, Not Sleep, Unbinds Contexts from Item Memory

PubMed Central

Cox, Roy; Tijdens, Ron R.; Meeter, Martijn M.; Sweegers, Carly C. G.; Talamini, Lucia M.

2014-01-01

Contextual cues are known to benefit memory retrieval, but whether and how sleep affects this context effect remains unresolved. We manipulated contextual congruence during memory retrieval in human volunteers across 12 h and 24 h intervals beginning with either sleep or wakefulness. Our data suggest that whereas contextual cues lose their potency with time, sleep does not modulate this process. Furthermore, our results are consistent with the idea that sleep's beneficial effect on memory retention depends on the amount of waking time that has passed between encoding and sleep onset. The findings are discussed in the framework of competitive consolidation theory. PMID:24498441
System for processing an encrypted instruction stream in hardware

DOE Office of Scientific and Technical Information (OSTI.GOV)

Griswold, Richard L.; Nickless, William K.; Conrad, Ryan C.

A system and method of processing an encrypted instruction stream in hardware is disclosed. Main memory stores the encrypted instruction stream and unencrypted data. A central processing unit (CPU) is operatively coupled to the main memory. A decryptor is operatively coupled to the main memory and located within the CPU. The decryptor decrypts the encrypted instruction stream upon receipt of an instruction fetch signal from a CPU core. Unencrypted data is passed through to the CPU core without decryption upon receipt of a data fetch signal.
The Army and the Strategic Military Legacy of Vietnam

DTIC Science & Technology

1990-06-01

anxious to know how long U.S. troops will be there. It is only natural given the pre- vious interventions and the memories of involvement in the...hand" institutional memory , future Army leaders will increasingly have to rely on the Vietnam "lessons literature." Future Army leaders will not have the...or may not even resemble, the Vietnam experience. As the Vietnam War begins to pass out of the military’s personal institutional memory by way of
Commanding and Controlling Satellite Clusters (IEEE Intelligent Systems, November/December 2000)

DTIC Science & Technology

2000-01-01

real - time operating system , a message-passing OS well suited for distributed...ground Flight processors ObjectAgent RTOS SCL RTOS RDMS Space command language Real - time operating system Rational database management system TS-21 RDMS...engineer with Princeton Satellite Systems. She is working with others to develop ObjectAgent software to run on the OSE Real Time Operating System .
VAC: Versatile Advection Code

NASA Astrophysics Data System (ADS)

Tóth, Gábor; Keppens, Rony

2012-07-01

The Versatile Advection Code (VAC) is a freely available general hydrodynamic and magnetohydrodynamic simulation software that works in 1, 2 or 3 dimensions on Cartesian and logically Cartesian grids. VAC runs on any Unix/Linux system with a Fortran 90 (or 77) compiler and Perl interpreter. VAC can run on parallel machines using either the Message Passing Interface (MPI) library or a High Performance Fortran (HPF) compiler.
Sharing Content and Experiences in Smart Environments

NASA Astrophysics Data System (ADS)

Plomp, Johan; Heinilä, Juhani; Ikonen, Veikko; Kaasinen, Eija; Välkkynen, Pasi

Once upon a time… Stories used to be the only way to pass a message. The story teller would take his audience through the events by mere oration. Here and there he would hesitate, whisper, or gesticulate to emphasise his story or induce the right emotions in his audience. No doubt troubadourswere loved, they both brought news of the world as well as entertainment.
Performance of hybrid programming models for multiscale cardiac simulations: preparing for petascale computation.

PubMed

Pope, Bernard J; Fitch, Blake G; Pitman, Michael C; Rice, John J; Reumann, Matthias

2011-10-01

Future multiscale and multiphysics models that support research into human disease, translational medical science, and treatment can utilize the power of high-performance computing (HPC) systems. We anticipate that computationally efficient multiscale models will require the use of sophisticated hybrid programming models, mixing distributed message-passing processes [e.g., the message-passing interface (MPI)] with multithreading (e.g., OpenMP, Pthreads). The objective of this study is to compare the performance of such hybrid programming models when applied to the simulation of a realistic physiological multiscale model of the heart. Our results show that the hybrid models perform favorably when compared to an implementation using only the MPI and, furthermore, that OpenMP in combination with the MPI provides a satisfactory compromise between performance and code complexity. Having the ability to use threads within MPI processes enables the sophisticated use of all processor cores for both computation and communication phases. Considering that HPC systems in 2012 will have two orders of magnitude more cores than what was used in this study, we believe that faster than real-time multiscale cardiac simulations can be achieved on these systems.
Simulated Raman Spectral Analysis of Organic Molecules

NASA Astrophysics Data System (ADS)

Lu, Lu

The advent of the laser technology in the 1960s solved the main difficulty of Raman spectroscopy, resulted in simplified Raman spectroscopy instruments and also boosted the sensitivity of the technique. Up till now, Raman spectroscopy is commonly used in chemistry and biology. As vibrational information is specific to the chemical bonds, Raman spectroscopy provides fingerprints to identify the type of molecules in the sample. In this thesis, we simulate the Raman Spectrum of organic and inorganic materials by General Atomic and Molecular Electronic Structure System (GAMESS) and Gaussian, two computational codes that perform several general chemistry calculations. We run these codes on our CPU-based high-performance cluster (HPC). Through the message passing interface (MPI), a standardized and portable message-passing system which can make the codes run in parallel, we are able to decrease the amount of time for computation and increase the sizes and capacities of systems simulated by the codes. From our simulations, we will set up a database that allows search algorithm to quickly identify N-H and O-H bonds in different materials. Our ultimate goal is to analyze and identify the spectra of organic matter compositions from meteorites and compared these spectra with terrestrial biologically-produced amino acids and residues.
Petascale computation performance of lightweight multiscale cardiac models using hybrid programming models.

PubMed

Pope, Bernard J; Fitch, Blake G; Pitman, Michael C; Rice, John J; Reumann, Matthias

2011-01-01

Future multiscale and multiphysics models must use the power of high performance computing (HPC) systems to enable research into human disease, translational medical science, and treatment. Previously we showed that computationally efficient multiscale models will require the use of sophisticated hybrid programming models, mixing distributed message passing processes (e.g. the message passing interface (MPI)) with multithreading (e.g. OpenMP, POSIX pthreads). The objective of this work is to compare the performance of such hybrid programming models when applied to the simulation of a lightweight multiscale cardiac model. Our results show that the hybrid models do not perform favourably when compared to an implementation using only MPI which is in contrast to our results using complex physiological models. Thus, with regards to lightweight multiscale cardiac models, the user may not need to increase programming complexity by using a hybrid programming approach. However, considering that model complexity will increase as well as the HPC system size in both node count and number of cores per node, it is still foreseeable that we will achieve faster than real time multiscale cardiac simulations on these systems using hybrid programming models.
PyPele Rewritten To Use MPI

NASA Technical Reports Server (NTRS)

Hockney, George; Lee, Seungwon

2008-01-01

A computer program known as PyPele, originally written as a Pythonlanguage extension module of a C++ language program, has been rewritten in pure Python language. The original version of PyPele dispatches and coordinates parallel-processing tasks on cluster computers and provides a conceptual framework for spacecraft-mission- design and -analysis software tools to run in an embarrassingly parallel mode. The original version of PyPele uses SSH (Secure Shell a set of standards and an associated network protocol for establishing a secure channel between a local and a remote computer) to coordinate parallel processing. Instead of SSH, the present Python version of PyPele uses Message Passing Interface (MPI) [an unofficial de-facto standard language-independent application programming interface for message- passing on a parallel computer] while keeping the same user interface. The use of MPI instead of SSH and the preservation of the original PyPele user interface make it possible for parallel application programs written previously for the original version of PyPele to run on MPI-based cluster computers. As a result, engineers using the previously written application programs can take advantage of embarrassing parallelism without need to rewrite those programs.
Detecting and Preventing Sybil Attacks in Wireless Sensor Networks Using Message Authentication and Passing Method.

PubMed

Dhamodharan, Udaya Suriya Raj Kumar; Vayanaperumal, Rajamani

2015-01-01

Wireless sensor networks are highly indispensable for securing network protection. Highly critical attacks of various kinds have been documented in wireless sensor network till now by many researchers. The Sybil attack is a massive destructive attack against the sensor network where numerous genuine identities with forged identities are used for getting an illegal entry into a network. Discerning the Sybil attack, sinkhole, and wormhole attack while multicasting is a tremendous job in wireless sensor network. Basically a Sybil attack means a node which pretends its identity to other nodes. Communication to an illegal node results in data loss and becomes dangerous in the network. The existing method Random Password Comparison has only a scheme which just verifies the node identities by analyzing the neighbors. A survey was done on a Sybil attack with the objective of resolving this problem. The survey has proposed a combined CAM-PVM (compare and match-position verification method) with MAP (message authentication and passing) for detecting, eliminating, and eventually preventing the entry of Sybil nodes in the network. We propose a scheme of assuring security for wireless sensor network, to deal with attacks of these kinds in unicasting and multicasting.
Detecting and Preventing Sybil Attacks in Wireless Sensor Networks Using Message Authentication and Passing Method

PubMed Central

Dhamodharan, Udaya Suriya Raj Kumar; Vayanaperumal, Rajamani

2015-01-01

Wireless sensor networks are highly indispensable for securing network protection. Highly critical attacks of various kinds have been documented in wireless sensor network till now by many researchers. The Sybil attack is a massive destructive attack against the sensor network where numerous genuine identities with forged identities are used for getting an illegal entry into a network. Discerning the Sybil attack, sinkhole, and wormhole attack while multicasting is a tremendous job in wireless sensor network. Basically a Sybil attack means a node which pretends its identity to other nodes. Communication to an illegal node results in data loss and becomes dangerous in the network. The existing method Random Password Comparison has only a scheme which just verifies the node identities by analyzing the neighbors. A survey was done on a Sybil attack with the objective of resolving this problem. The survey has proposed a combined CAM-PVM (compare and match-position verification method) with MAP (message authentication and passing) for detecting, eliminating, and eventually preventing the entry of Sybil nodes in the network. We propose a scheme of assuring security for wireless sensor network, to deal with attacks of these kinds in unicasting and multicasting. PMID:26236773
An in fiber experimental approach to photonic quantum digital signatures that does not require quantum memory

NASA Astrophysics Data System (ADS)

Collins, Robert J.; Donaldon, Ross J.; Dunjko, Vedran; Wallden, Petros; Clarke, Patrick J.; Andersson, Erika; Jeffers, John; Buller, Gerald S.

2014-10-01

Classical digital signatures are commonly used in e-mail, electronic financial transactions and other forms of electronic communications to ensure that messages have not been tampered with in transit, and that messages are transferrable. The security of commonly used classical digital signature schemes relies on the computational difficulty of inverting certain mathematical functions. However, at present, there are no such one-way functions which have been proven to be hard to invert. With enough computational resources certain implementations of classical public key cryptosystems can be, and have been, broken with current technology. It is nevertheless possible to construct information-theoretically secure signature schemes, including quantum digital signature schemes. Quantum signature schemes can be made information theoretically secure based on the laws of quantum mechanics, while classical comparable protocols require additional resources such as secret communication and a trusted authority. Early demonstrations of quantum digital signatures required quantum memory, rendering them impractical at present. Our present implementation is based on a protocol that does not require quantum memory. It also uses the new technique of unambiguous quantum state elimination, Here we report experimental results for a test-bed system, recorded with a variety of different operating parameters, along with a discussion of aspects of the system security.
Hierarchical parallelisation of functional renormalisation group calculations - hp-fRG

NASA Astrophysics Data System (ADS)

Rohe, Daniel

2016-10-01

The functional renormalisation group (fRG) has evolved into a versatile tool in condensed matter theory for studying important aspects of correlated electron systems. Practical applications of the method often involve a high numerical effort, motivating the question in how far High Performance Computing (HPC) can leverage the approach. In this work we report on a multi-level parallelisation of the underlying computational machinery and show that this can speed up the code by several orders of magnitude. This in turn can extend the applicability of the method to otherwise inaccessible cases. We exploit three levels of parallelisation: Distributed computing by means of Message Passing (MPI), shared-memory computing using OpenMP, and vectorisation by means of SIMD units (single-instruction-multiple-data). Results are provided for two distinct High Performance Computing (HPC) platforms, namely the IBM-based BlueGene/Q system JUQUEEN and an Intel Sandy-Bridge-based development cluster. We discuss how certain issues and obstacles were overcome in the course of adapting the code. Most importantly, we conclude that this vast improvement can actually be accomplished by introducing only moderate changes to the code, such that this strategy may serve as a guideline for other researcher to likewise improve the efficiency of their codes.
A Comparison of PETSC Library and HPF Implementations of an Archetypal PDE Computation

NASA Technical Reports Server (NTRS)

Hayder, M. Ehtesham; Keyes, David E.; Mehrotra, Piyush

1997-01-01

Two paradigms for distributed-memory parallel computation that free the application programmer from the details of message passing are compared for an archetypal structured scientific computation a nonlinear, structured-grid partial differential equation boundary value problem using the same algorithm on the same hardware. Both paradigms, parallel libraries represented by Argonne's PETSC, and parallel languages represented by the Portland Group's HPF, are found to be easy to use for this problem class, and both are reasonably effective in exploiting concurrency after a short learning curve. The level of involvement required by the application programmer under either paradigm includes specification of the data partitioning (corresponding to a geometrically simple decomposition of the domain of the PDE). Programming in SPAM style for the PETSC library requires writing the routines that discretize the PDE and its Jacobian, managing subdomain-to-processor mappings (affine global- to-local index mappings), and interfacing to library solver routines. Programming for HPF requires a complete sequential implementation of the same algorithm, introducing concurrency through subdomain blocking (an effort similar to the index mapping), and modest experimentation with rewriting loops to elucidate to the compiler the latent concurrency. Correctness and scalability are cross-validated on up to 32 nodes of an IBM SP2.
Porting marine ecosystem model spin-up using transport matrices to GPUs

NASA Astrophysics Data System (ADS)

Siewertsen, E.; Piwonski, J.; Slawig, T.

2013-01-01

We have ported an implementation of the spin-up for marine ecosystem models based on transport matrices to graphics processing units (GPUs). The original implementation was designed for distributed-memory architectures and uses the Portable, Extensible Toolkit for Scientific Computation (PETSc) library that is based on the Message Passing Interface (MPI) standard. The spin-up computes a steady seasonal cycle of ecosystem tracers with climatological ocean circulation data as forcing. Since the transport is linear with respect to the tracers, the resulting operator is represented by matrices. Each iteration of the spin-up involves two matrix-vector multiplications and the evaluation of the used biogeochemical model. The original code was written in C and Fortran. On the GPU, we use the Compute Unified Device Architecture (CUDA) standard, a customized version of PETSc and a commercial CUDA Fortran compiler. We describe the extensions to PETSc and the modifications of the original C and Fortran codes that had to be done. Here we make use of freely available libraries for the GPU. We analyze the computational effort of the main parts of the spin-up for two exemplar ecosystem models and compare the overall computational time to those necessary on different CPUs. The results show that a consumer GPU can compete with a significant number of cluster CPUs without further code optimization.
Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Turney, Raymond D.

2001-01-01

This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.