NASA Technical Reports Server (NTRS)
Phillips, Jennifer K.
1995-01-01
Two of the current and most popular implementations of the Message-Passing Standard, Message Passing Interface (MPI), were contrasted: MPICH by Argonne National Laboratory, and LAM by the Ohio Supercomputer Center at Ohio State University. A parallel skyline matrix solver was adapted to be run in a heterogeneous environment using MPI. The Message-Passing Interface Forum was held in May 1994 which lead to a specification of library functions that implement the message-passing model of parallel communication. LAM, which creates it's own environment, is more robust in a highly heterogeneous network. MPICH uses the environment native to the machine architecture. While neither of these free-ware implementations provides the performance of native message-passing or vendor's implementations, MPICH begins to approach that performance on the SP-2. The machines used in this study were: IBM RS6000, 3 Sun4, SGI, and the IBM SP-2. Each machine is unique and a few machines required specific modifications during the installation. When installed correctly, both implementations worked well with only minor problems.
Efficiently passing messages in distributed spiking neural network simulation.
Thibeault, Corey M; Minkovich, Kirill; O'Brien, Michael J; Harris, Frederick C; Srinivasa, Narayan
2013-01-01
Efficiently passing spiking messages in a neural model is an important aspect of high-performance simulation. As the scale of networks has increased so has the size of the computing systems required to simulate them. In addition, the information exchange of these resources has become more of an impediment to performance. In this paper we explore spike message passing using different mechanisms provided by the Message Passing Interface (MPI). A specific implementation, MVAPICH, designed for high-performance clusters with Infiniband hardware is employed. The focus is on providing information about these mechanisms for users of commodity high-performance spiking simulators. In addition, a novel hybrid method for spike exchange was implemented and benchmarked.
Implementing TCP/IP and a socket interface as a server in a message-passing operating system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hipp, E.; Wiltzius, D.
1990-03-01
The UNICOS 4.3BSD network code and socket transport interface are the basis of an explicit network server for NLTSS, a message passing operating system on the Cray YMP. A BSD socket user library provides access to the network server using an RPC mechanism. The advantages of this server methodology are its modularity and extensibility to migrate to future protocol suites (e.g. OSI) and transport interfaces. In addition, the network server is implemented in an explicit multi-tasking environment to take advantage of the Cray YMP multi-processor platform. 19 refs., 5 figs.
Implementing Multidisciplinary and Multi-Zonal Applications Using MPI
NASA Technical Reports Server (NTRS)
Fineberg, Samuel A.
1995-01-01
Multidisciplinary and multi-zonal applications are an important class of applications in the area of Computational Aerosciences. In these codes, two or more distinct parallel programs or copies of a single program are utilized to model a single problem. To support such applications, it is common to use a programming model where a program is divided into several single program multiple data stream (SPMD) applications, each of which solves the equations for a single physical discipline or grid zone. These SPMD applications are then bound together to form a single multidisciplinary or multi-zonal program in which the constituent parts communicate via point-to-point message passing routines. Unfortunately, simple message passing models, like Intel's NX library, only allow point-to-point and global communication within a single system-defined partition. This makes implementation of these applications quite difficult, if not impossible. In this report it is shown that the new Message Passing Interface (MPI) standard is a viable portable library for implementing the message passing portion of multidisciplinary applications. Further, with the extension of a portable loader, fully portable multidisciplinary application programs can be developed. Finally, the performance of MPI is compared to that of some native message passing libraries. This comparison shows that MPI can be implemented to deliver performance commensurate with native message libraries.
Users manual for the Chameleon parallel programming tools
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gropp, W.; Smith, B.
1993-06-01
Message passing is a common method for writing programs for distributed-memory parallel computers. Unfortunately, the lack of a standard for message passing has hampered the construction of portable and efficient parallel programs. In an attempt to remedy this problem, a number of groups have developed their own message-passing systems, each with its own strengths and weaknesses. Chameleon is a second-generation system of this type. Rather than replacing these existing systems, Chameleon is meant to supplement them by providing a uniform way to access many of these systems. Chameleon`s goals are to (a) be very lightweight (low over-head), (b) be highlymore » portable, and (c) help standardize program startup and the use of emerging message-passing operations such as collective operations on subsets of processors. Chameleon also provides a way to port programs written using PICL or Intel NX message passing to other systems, including collections of workstations. Chameleon is tracking the Message-Passing Interface (MPI) draft standard and will provide both an MPI implementation and an MPI transport layer. Chameleon provides support for heterogeneous computing by using p4 and PVM. Chameleon`s support for homogeneous computing includes the portable libraries p4, PICL, and PVM and vendor-specific implementation for Intel NX, IBM EUI (SP-1), and Thinking Machines CMMD (CM-5). Support for Ncube and PVM 3.x is also under development.« less
A message passing kernel for the hypercluster parallel processing test bed
NASA Technical Reports Server (NTRS)
Blech, Richard A.; Quealy, Angela; Cole, Gary L.
1989-01-01
A Message-Passing Kernel (MPK) for the Hypercluster parallel-processing test bed is described. The Hypercluster is being developed at the NASA Lewis Research Center to support investigations of parallel algorithms and architectures for computational fluid and structural mechanics applications. The Hypercluster resembles the hypercube architecture except that each node consists of multiple processors communicating through shared memory. The MPK efficiently routes information through the Hypercluster, using a message-passing protocol when necessary and faster shared-memory communication whenever possible. The MPK also interfaces all of the processors with the Hypercluster operating system (HYCLOPS), which runs on a Front-End Processor (FEP). This approach distributes many of the I/O tasks to the Hypercluster processors and eliminates the need for a separate I/O support program on the FEP.
Asbestos: Securing Untrusted Software with Interposition
2005-09-01
consistent intelligible interfaces to different types of resource. Message-based operating systems, such as Accent, Amoeba, Chorus, L4 , Spring...control on self-authenticating capabilities, precluding policies that restrict delegation. L4 uses a strict hierarchy of interpositions, useful for...the OS de- sign space amenable to secure application construction. Similar effects might be possible with message-passing microkernels , or unwieldy
MPIRUN: A Portable Loader for Multidisciplinary and Multi-Zonal Applications
NASA Technical Reports Server (NTRS)
Fineberg, Samuel A.; Woodrow, Thomas S. (Technical Monitor)
1994-01-01
Multidisciplinary and multi-zonal applications are an important class of applications in the area of Computational Aerosciences. In these codes, two or more distinct parallel programs or copies of a single program are utilized to model a single problem. To support such applications, it is common to use a programming model where a program is divided into several single program multiple data stream (SPMD) applications, each of which solves the equations for a single physical discipline or grid zone. These SPMD applications are then bound together to form a single multidisciplinary or multi-zonal program in which the constituent parts communicate via point-to-point message passing routines. One method for implementing the message passing portion of these codes is with the new Message Passing Interface (MPI) standard. Unfortunately, this standard only specifies the message passing portion of an application, but does not specify any portable mechanisms for loading an application. MPIRUN was developed to provide a portable means for loading MPI programs, and was specifically targeted at multidisciplinary and multi-zonal applications. Programs using MPIRUN for loading and MPI for message passing are then portable between all machines supported by MPIRUN. MPIRUN is currently implemented for the Intel iPSC/860, TMC CM5, IBM SP-1 and SP-2, Intel Paragon, and workstation clusters. Further, MPIRUN is designed to be simple enough to port easily to any system supporting MPI.
1985-06-01
just pass the message WAIT NOW AFTER Rlock + timeo t -- if time is out write.screen( TIME IS OUT") MAIN PROGRAM* CHAN linki , link2, link3, link4...PAR D.I.Loop.Interface (link4, linkl,) D.I.Loop.Interface ( linki , link2, 2) D.I.Loop.Interface (link2, link3, 3) JD.I Loop.Interface (link3, link4, 4
Efficient Tracing for On-the-Fly Space-Time Displays in a Debugger for Message Passing Programs
NASA Technical Reports Server (NTRS)
Hood, Robert; Matthews, Gregory
2001-01-01
In this work we describe the implementation of a practical mechanism for collecting and displaying trace information in a debugger for message passing programs. We introduce a trace format that is highly compressible while still providing information adequate for debugging purposes. We make the mechanism convenient for users to access by incorporating the trace collection in a set of wrappers for the MPI (message passing interface) communication library. We implement several debugger operations that use the trace display: consistent stoplines, undo, and rollback. They all are implemented using controlled replay, which executes at full speed in target processes until the appropriate position in the computation is reached. They provide convenient mechanisms for getting to places in the execution where the full power of a state-based debugger can be brought to bear on isolating communication errors.
Alvioli, M.; Baum, R.L.
2016-01-01
We describe a parallel implementation of TRIGRS, the Transient Rainfall Infiltration and Grid-Based Regional Slope-Stability Model for the timing and distribution of rainfall-induced shallow landslides. We have parallelized the four time-demanding execution modes of TRIGRS, namely both the saturated and unsaturated model with finite and infinite soil depth options, within the Message Passing Interface framework. In addition to new features of the code, we outline details of the parallel implementation and show the performance gain with respect to the serial code. Results are obtained both on commercial hardware and on a high-performance multi-node machine, showing the different limits of applicability of the new code. We also discuss the implications for the application of the model on large-scale areas and as a tool for real-time landslide hazard monitoring.
An implementation and evaluation of the MPI 3.0 one-sided communication interface
Dinan, James S.; Balaji, Pavan; Buntinas, Darius T.; ...
2016-01-09
The Q1 Message Passing Interface (MPI) 3.0 standard includes a significant revision to MPI’s remote memory access (RMA) interface, which provides support for one-sided communication. MPI-3 RMA is expected to greatly enhance the usability and performance ofMPI RMA.We present the first complete implementation of MPI-3 RMA and document implementation techniques and performance optimization opportunities enabled by the new interface. Our implementation targets messaging-based networks and is publicly available in the latest release of the MPICH MPI implementation. Here using this implementation, we explore the performance impact of new MPI-3 functionality and semantics. Results indicate that the MPI-3 RMA interface providesmore » significant advantages over the MPI-2 interface by enabling increased communication concurrency through relaxed semantics in the interface and additional routines that provide new window types, synchronization modes, and atomic operations.« less
An implementation and evaluation of the MPI 3.0 one-sided communication interface
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dinan, James S.; Balaji, Pavan; Buntinas, Darius T.
The Q1 Message Passing Interface (MPI) 3.0 standard includes a significant revision to MPI’s remote memory access (RMA) interface, which provides support for one-sided communication. MPI-3 RMA is expected to greatly enhance the usability and performance ofMPI RMA.We present the first complete implementation of MPI-3 RMA and document implementation techniques and performance optimization opportunities enabled by the new interface. Our implementation targets messaging-based networks and is publicly available in the latest release of the MPICH MPI implementation. Here using this implementation, we explore the performance impact of new MPI-3 functionality and semantics. Results indicate that the MPI-3 RMA interface providesmore » significant advantages over the MPI-2 interface by enabling increased communication concurrency through relaxed semantics in the interface and additional routines that provide new window types, synchronization modes, and atomic operations.« less
Cooperative Data Sharing: Simple Support for Clusters of SMP Nodes
NASA Technical Reports Server (NTRS)
DiNucci, David C.; Balley, David H. (Technical Monitor)
1997-01-01
Libraries like PVM and MPI send typed messages to allow for heterogeneous cluster computing. Lower-level libraries, such as GAM, provide more efficient access to communication by removing the need to copy messages between the interface and user space in some cases. still lower-level interfaces, such as UNET, get right down to the hardware level to provide maximum performance. However, these are all still interfaces for passing messages from one process to another, and have limited utility in a shared-memory environment, due primarily to the fact that message passing is just another term for copying. This drawback is made more pertinent by today's hybrid architectures (e.g. clusters of SMPs), where it is difficult to know beforehand whether two communicating processes will share memory. As a result, even portable language tools (like HPF compilers) must either map all interprocess communication, into message passing with the accompanying performance degradation in shared memory environments, or they must check each communication at run-time and implement the shared-memory case separately for efficiency. Cooperative Data Sharing (CDS) is a single user-level API which abstracts all communication between processes into the sharing and access coordination of memory regions, in a model which might be described as "distributed shared messages" or "large-grain distributed shared memory". As a result, the user programs to a simple latency-tolerant abstract communication specification which can be mapped efficiently to either a shared-memory or message-passing based run-time system, depending upon the available architecture. Unlike some distributed shared memory interfaces, the user still has complete control over the assignment of data to processors, the forwarding of data to its next likely destination, and the queuing of data until it is needed, so even the relatively high latency present in clusters can be accomodated. CDS does not require special use of an MMU, which can add overhead to some DSM systems, and does not require an SPMD programming model. unlike some message-passing interfaces, CDS allows the user to implement efficient demand-driven applications where processes must "fight" over data, and does not perform copying if processes share memory and do not attempt concurrent writes. CDS also supports heterogeneous computing, dynamic process creation, handlers, and a very simple thread-arbitration mechanism. Additional support for array subsections is currently being considered. The CDS1 API, which forms the kernel of CDS, is built primarily upon only 2 communication primitives, one process initiation primitive, and some data translation (and marshalling) routines, memory allocation routines, and priority control routines. The entire current collection of 28 routines provides enough functionality to implement most (or all) of MPI 1 and 2, which has a much larger interface consisting of hundreds of routines. still, the API is small enough to consider integrating into standard os interfaces for handling inter-process communication in a network-independent way. This approach would also help to solve many of the problems plaguing other higher-level standards such as MPI and PVM which must, in some cases, "play OS" to adequately address progress and process control issues. The CDS2 API, a higher level of interface roughly equivalent in functionality to MPI and to be built entirely upon CDS1, is still being designed. It is intended to add support for the equivalent of communicators, reduction and other collective operations, process topologies, additional support for process creation, and some automatic memory management. CDS2 will not exactly match MPI, because the copy-free semantics of communication from CDS1 will be supported. CDS2 application programs will be free to carefully also use CDS1. CDS1 has been implemented on networks of workstations running unmodified Unix-based operating systems, using UDP/IP and vendor-supplied high- performance locks. Although its inter-node performance is currently unimpressive due to rudimentary implementation technique, it even now outperforms highly-optimized MPI implementation on intra-node communication due to its support for non-copy communication. The similarity of the CDS1 architecture to that of other projects such as UNET and TRAP suggests that the inter-node performance can be increased significantly to surpass MPI or PVM, and it may be possible to migrate some of its functionality to communication controllers.
Li, J; Guo, L-X; Zeng, H; Han, X-B
2009-06-01
A message-passing-interface (MPI)-based parallel finite-difference time-domain (FDTD) algorithm for the electromagnetic scattering from a 1-D randomly rough sea surface is presented. The uniaxial perfectly matched layer (UPML) medium is adopted for truncation of FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different processors is illustrated for one sea surface realization, and the computation time of the parallel FDTD algorithm is dramatically reduced compared to a single-process implementation. Finally, some numerical results are shown, including the backscattering characteristics of sea surface for different polarization and the bistatic scattering from a sea surface with large incident angle and large wind speed.
NASA Technical Reports Server (NTRS)
Hockney, George; Lee, Seungwon
2008-01-01
A computer program known as PyPele, originally written as a Pythonlanguage extension module of a C++ language program, has been rewritten in pure Python language. The original version of PyPele dispatches and coordinates parallel-processing tasks on cluster computers and provides a conceptual framework for spacecraft-mission- design and -analysis software tools to run in an embarrassingly parallel mode. The original version of PyPele uses SSH (Secure Shell a set of standards and an associated network protocol for establishing a secure channel between a local and a remote computer) to coordinate parallel processing. Instead of SSH, the present Python version of PyPele uses Message Passing Interface (MPI) [an unofficial de-facto standard language-independent application programming interface for message- passing on a parallel computer] while keeping the same user interface. The use of MPI instead of SSH and the preservation of the original PyPele user interface make it possible for parallel application programs written previously for the original version of PyPele to run on MPI-based cluster computers. As a result, engineers using the previously written application programs can take advantage of embarrassing parallelism without need to rewrite those programs.
Dust Dynamics in Protoplanetary Disks: Parallel Computing with PVM
NASA Astrophysics Data System (ADS)
de La Fuente Marcos, Carlos; Barge, Pierre; de La Fuente Marcos, Raúl
2002-03-01
We describe a parallel version of our high-order-accuracy particle-mesh code for the simulation of collisionless protoplanetary disks. We use this code to carry out a massively parallel, two-dimensional, time-dependent, numerical simulation, which includes dust particles, to study the potential role of large-scale, gaseous vortices in protoplanetary disks. This noncollisional problem is easy to parallelize on message-passing multicomputer architectures. We performed the simulations on a cache-coherent nonuniform memory access Origin 2000 machine, using both the parallel virtual machine (PVM) and message-passing interface (MPI) message-passing libraries. Our performance analysis suggests that, for our problem, PVM is about 25% faster than MPI. Using PVM and MPI made it possible to reduce CPU time and increase code performance. This allows for simulations with a large number of particles (N ~ 105-106) in reasonable CPU times. The performances of our implementation of the pa! rallel code on an Origin 2000 supercomputer are presented and discussed. They exhibit very good speedup behavior and low load unbalancing. Our results confirm that giant gaseous vortices can play a dominant role in giant planet formation.
New NAS Parallel Benchmarks Results
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)
1997-01-01
NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
WinHPC System Programming | High-Performance Computing | NREL
Programming WinHPC System Programming Learn how to build and run an MPI (message passing interface (mpi.h) and library (msmpi.lib) are. To build from the command line, run... Start > Intel Software Development Tools > Intel C++ Compiler Professional... > C++ Build Environment for applications running
Software architecture for time-constrained machine vision applications
NASA Astrophysics Data System (ADS)
Usamentiaga, Rubén; Molleda, Julio; García, Daniel F.; Bulnes, Francisco G.
2013-01-01
Real-time image and video processing applications require skilled architects, and recent trends in the hardware platform make the design and implementation of these applications increasingly complex. Many frameworks and libraries have been proposed or commercialized to simplify the design and tuning of real-time image processing applications. However, they tend to lack flexibility, because they are normally oriented toward particular types of applications, or they impose specific data processing models such as the pipeline. Other issues include large memory footprints, difficulty for reuse, and inefficient execution on multicore processors. We present a novel software architecture for time-constrained machine vision applications that addresses these issues. The architecture is divided into three layers. The platform abstraction layer provides a high-level application programming interface for the rest of the architecture. The messaging layer provides a message-passing interface based on a dynamic publish/subscribe pattern. A topic-based filtering in which messages are published to topics is used to route the messages from the publishers to the subscribers interested in a particular type of message. The application layer provides a repository for reusable application modules designed for machine vision applications. These modules, which include acquisition, visualization, communication, user interface, and data processing, take advantage of the power of well-known libraries such as OpenCV, Intel IPP, or CUDA. Finally, the proposed architecture is applied to a real machine vision application: a jam detector for steel pickling lines.
Vetter, Jeffrey S.
2005-02-01
The method and system described herein presents a technique for performance analysis that helps users understand the communication behavior of their message passing applications. The method and system described herein may automatically classifies individual communication operations and reveal the cause of communication inefficiencies in the application. This classification allows the developer to quickly focus on the culprits of truly inefficient behavior, rather than manually foraging through massive amounts of performance data. Specifically, the method and system described herein trace the message operations of Message Passing Interface (MPI) applications and then classify each individual communication event using a supervised learning technique: decision tree classification. The decision tree may be trained using microbenchmarks that demonstrate both efficient and inefficient communication. Since the method and system described herein adapt to the target system's configuration through these microbenchmarks, they simultaneously automate the performance analysis process and improve classification accuracy. The method and system described herein may improve the accuracy of performance analysis and dramatically reduce the amount of data that users must encounter.
Automation Framework for Flight Dynamics Products Generation
NASA Technical Reports Server (NTRS)
Wiegand, Robert E.; Esposito, Timothy C.; Watson, John S.; Jun, Linda; Shoan, Wendy; Matusow, Carla
2010-01-01
XFDS provides an easily adaptable automation platform. To date it has been used to support flight dynamics operations. It coordinates the execution of other applications such as Satellite TookKit, FreeFlyer, MATLAB, and Perl code. It provides a mechanism for passing messages among a collection of XFDS processes, and allows sending and receiving of GMSEC messages. A unified and consistent graphical user interface (GUI) is used for the various tools. Its automation configuration is stored in text files, and can be edited either directly or using the GUI.
EMPIRE and pyenda: Two ensemble-based data assimilation systems written in Fortran and Python
NASA Astrophysics Data System (ADS)
Geppert, Gernot; Browne, Phil; van Leeuwen, Peter Jan; Merker, Claire
2017-04-01
We present and compare the features of two ensemble-based data assimilation frameworks, EMPIRE and pyenda. Both frameworks allow to couple models to the assimilation codes using the Message Passing Interface (MPI), leading to extremely efficient and fast coupling between models and the data-assimilation codes. The Fortran-based system EMPIRE (Employing Message Passing Interface for Researching Ensembles) is optimized for parallel, high-performance computing. It currently includes a suite of data assimilation algorithms including variants of the ensemble Kalman and several the particle filters. EMPIRE is targeted at models of all kinds of complexity and has been coupled to several geoscience models, eg. the Lorenz-63 model, a barotropic vorticity model, the general circulation model HadCM3, the ocean model NEMO, and the land-surface model JULES. The Python-based system pyenda (Python Ensemble Data Assimilation) allows Fortran- and Python-based models to be used for data assimilation. Models can be coupled either using MPI or by using a Python interface. Using Python allows quick prototyping and pyenda is aimed at small to medium scale models. pyenda currently includes variants of the ensemble Kalman filter and has been coupled to the Lorenz-63 model, an advection-based precipitation nowcasting scheme, and the dynamic global vegetation model JSBACH.
Final report: Compiled MPI. Cost-Effective Exascale Application Development
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gropp, William Douglas
2015-12-21
This is the final report on Compiled MPI: Cost-Effective Exascale Application Development, and summarizes the results under this project. The project investigated runtime enviroments that improve the performance of MPI (Message-Passing Interface) programs; work at Illinois in the last period of this project looked at optimizing data access optimizations expressed with MPI datatypes.
A flexible software architecture for scalable real-time image and video processing applications
NASA Astrophysics Data System (ADS)
Usamentiaga, Rubén; Molleda, Julio; García, Daniel F.; Bulnes, Francisco G.
2012-06-01
Real-time image and video processing applications require skilled architects, and recent trends in the hardware platform make the design and implementation of these applications increasingly complex. Many frameworks and libraries have been proposed or commercialized to simplify the design and tuning of real-time image processing applications. However, they tend to lack flexibility because they are normally oriented towards particular types of applications, or they impose specific data processing models such as the pipeline. Other issues include large memory footprints, difficulty for reuse and inefficient execution on multicore processors. This paper presents a novel software architecture for real-time image and video processing applications which addresses these issues. The architecture is divided into three layers: the platform abstraction layer, the messaging layer, and the application layer. The platform abstraction layer provides a high level application programming interface for the rest of the architecture. The messaging layer provides a message passing interface based on a dynamic publish/subscribe pattern. A topic-based filtering in which messages are published to topics is used to route the messages from the publishers to the subscribers interested in a particular type of messages. The application layer provides a repository for reusable application modules designed for real-time image and video processing applications. These modules, which include acquisition, visualization, communication, user interface and data processing modules, take advantage of the power of other well-known libraries such as OpenCV, Intel IPP, or CUDA. Finally, we present different prototypes and applications to show the possibilities of the proposed architecture.
Parallelization of a hydrological model using the message passing interface
Wu, Yiping; Li, Tiejian; Sun, Liqun; Chen, Ji
2013-01-01
With the increasing knowledge about the natural processes, hydrological models such as the Soil and Water Assessment Tool (SWAT) are becoming larger and more complex with increasing computation time. Additionally, other procedures such as model calibration, which may require thousands of model iterations, can increase running time and thus further reduce rapid modeling and analysis. Using the widely-applied SWAT as an example, this study demonstrates how to parallelize a serial hydrological model in a Windows® environment using a parallel programing technology—Message Passing Interface (MPI). With a case study, we derived the optimal values for the two parameters (the number of processes and the corresponding percentage of work to be distributed to the master process) of the parallel SWAT (P-SWAT) on an ordinary personal computer and a work station. Our study indicates that model execution time can be reduced by 42%–70% (or a speedup of 1.74–3.36) using multiple processes (two to five) with a proper task-distribution scheme (between the master and slave processes). Although the computation time cost becomes lower with an increasing number of processes (from two to five), this enhancement becomes less due to the accompanied increase in demand for message passing procedures between the master and all slave processes. Our case study demonstrates that the P-SWAT with a five-process run may reach the maximum speedup, and the performance can be quite stable (fairly independent of a project size). Overall, the P-SWAT can help reduce the computation time substantially for an individual model run, manual and automatic calibration procedures, and optimization of best management practices. In particular, the parallelization method we used and the scheme for deriving the optimal parameters in this study can be valuable and easily applied to other hydrological or environmental models.
A distributed infrastructure for publishing VO services: an implementation
NASA Astrophysics Data System (ADS)
Cepparo, Francesco; Scagnetto, Ivan; Molinaro, Marco; Smareglia, Riccardo
2016-07-01
This contribution describes both the design and the implementation details of a new solution for publishing VO services, enlightening its maintainable, distributed, modular and scalable architecture. Indeed, the new publisher is multithreaded and multiprocess. Multiple instances of the modules can run on different machines to ensure high performance and high availability, and this will be true both for the interface modules of the services and the back end data access ones. The system uses message passing to let its components communicate through an AMQP message broker that can itself be distributed to provide better scalability and availability.
NASA Astrophysics Data System (ADS)
Tóth, Gábor; Keppens, Rony
2012-07-01
The Versatile Advection Code (VAC) is a freely available general hydrodynamic and magnetohydrodynamic simulation software that works in 1, 2 or 3 dimensions on Cartesian and logically Cartesian grids. VAC runs on any Unix/Linux system with a Fortran 90 (or 77) compiler and Perl interpreter. VAC can run on parallel machines using either the Message Passing Interface (MPI) library or a High Performance Fortran (HPF) compiler.
NASA Astrophysics Data System (ADS)
Stuart, J. A.
2011-12-01
This paper explores the challenges in implementing a message passing interface usable on systems with data-parallel processors, and more specifically GPUs. As a case study, we design and implement the ``DCGN'' API on NVIDIA GPUs that is similar to MPI and allows full access to the underlying architecture. We introduce the notion of data-parallel thread-groups as a way to map resources to MPI ranks. We use a method that also allows the data-parallel processors to run autonomously from user-written CPU code. In order to facilitate communication, we use a sleep-based polling system to store and retrieve messages. Unlike previous systems, our method provides both performance and flexibility. By running a test suite of applications with different communication requirements, we find that a tolerable amount of overhead is incurred, somewhere between one and five percent depending on the application, and indicate the locations where this overhead accumulates. We conclude that with innovations in chipsets and drivers, this overhead will be mitigated and provide similar performance to typical CPU-based MPI implementations while providing fully-dynamic communication.
Parallelization of KENO-Va Monte Carlo code
NASA Astrophysics Data System (ADS)
Ramón, Javier; Peña, Jorge
1995-07-01
KENO-Va is a code integrated within the SCALE system developed by Oak Ridge that solves the transport equation through the Monte Carlo Method. It is being used at the Consejo de Seguridad Nuclear (CSN) to perform criticality calculations for fuel storage pools and shipping casks. Two parallel versions of the code: one for shared memory machines and other for distributed memory systems using the message-passing interface PVM have been generated. In both versions the neutrons of each generation are tracked in parallel. In order to preserve the reproducibility of the results in both versions, advanced seeds for random numbers were used. The CONVEX C3440 with four processors and shared memory at CSN was used to implement the shared memory version. A FDDI network of 6 HP9000/735 was employed to implement the message-passing version using proprietary PVM. The speedup obtained was 3.6 in both cases.
MPI-IO: A Parallel File I/O Interface for MPI Version 0.3
NASA Technical Reports Server (NTRS)
Corbett, Peter; Feitelson, Dror; Hsu, Yarsun; Prost, Jean-Pierre; Snir, Marc; Fineberg, Sam; Nitzberg, Bill; Traversat, Bernard; Wong, Parkson
1995-01-01
Thanks to MPI [9], writing portable message passing parallel programs is almost a reality. One of the remaining problems is file I/0. Although parallel file systems support similar interfaces, the lack of a standard makes developing a truly portable program impossible. Further, the closest thing to a standard, the UNIX file interface, is ill-suited to parallel computing. Working together, IBM Research and NASA Ames have drafted MPI-I0, a proposal to address the portable parallel I/0 problem. In a nutshell, this proposal is based on the idea that I/0 can be modeled as message passing: writing to a file is like sending a message, and reading from a file is like receiving a message. MPI-IO intends to leverage the relatively wide acceptance of the MPI interface in order to create a similar I/0 interface. The above approach can be materialized in different ways. The current proposal represents the result of extensive discussions (and arguments), but is by no means finished. Many changes can be expected as additional participants join the effort to define an interface for portable I/0. This document is organized as follows. The remainder of this section includes a discussion of some issues that have shaped the style of the interface. Section 2 presents an overview of MPI-IO as it is currently defined. It specifies what the interface currently supports and states what would need to be added to the current proposal to make the interface more complete and robust. The next seven sections contain the interface definition itself. Section 3 presents definitions and conventions. Section 4 contains functions for file control, most notably open. Section 5 includes functions for independent I/O, both blocking and nonblocking. Section 6 includes functions for collective I/O, both blocking and nonblocking. Section 7 presents functions to support system-maintained file pointers, and shared file pointers. Section 8 presents constructors that can be used to define useful filetypes (the role of filetypes is explained in Section 2 below). Section 9 presents how the error handling mechanism of MPI is supported by the MPI-IO interface. All this is followed by a set of appendices, which contain information about issues that have not been totally resolved yet, and about design considerations. The reader can find there the motivation behind some of our design choices. More information on this would definitely be welcome and will be included in a further release of this document. The first appendix contains a description of MPI-I0's 'hints' structure which is used when opening a file. Appendix B is a discussion of various issues in the support for file pointers. Appendix C explains what we mean in talking about atomic access. Appendix D provides detailed examples of filetype constructors, and Appendix E contains a collection of arguments for and against various design decisions.
Intel NX to PVM 3.2 message passing conversion library
NASA Technical Reports Server (NTRS)
Arthur, Trey; Nelson, Michael L.
1993-01-01
NASA Langley Research Center has developed a library that allows Intel NX message passing codes to be executed under the more popular and widely supported Parallel Virtual Machine (PVM) message passing library. PVM was developed at Oak Ridge National Labs and has become the defacto standard for message passing. This library will allow the many programs that were developed on the Intel iPSC/860 or Intel Paragon in a Single Program Multiple Data (SPMD) design to be ported to the numerous architectures that PVM (version 3.2) supports. Also, the library adds global operations capability to PVM. A familiarity with Intel NX and PVM message passing is assumed.
NASA Astrophysics Data System (ADS)
Ha, Jeongmok; Jeong, Hong
2016-07-01
This study investigates the directed acyclic subgraph (DAS) algorithm, which is used to solve discrete labeling problems much more rapidly than other Markov-random-field-based inference methods but at a competitive accuracy. However, the mechanism by which the DAS algorithm simultaneously achieves competitive accuracy and fast execution speed, has not been elucidated by a theoretical derivation. We analyze the DAS algorithm by comparing it with a message passing algorithm. Graphical models, inference methods, and energy-minimization frameworks are compared between DAS and message passing algorithms. Moreover, the performances of DAS and other message passing methods [sum-product belief propagation (BP), max-product BP, and tree-reweighted message passing] are experimentally compared.
Parallel Ray Tracing Using the Message Passing Interface
2007-09-01
software is available for lens design and for general optical systems modeling. It tends to be designed to run on a single processor and can be very...Cameron, Senior Member, IEEE Abstract—Ray-tracing software is available for lens design and for general optical systems modeling. It tends to be designed to...National Aeronautics and Space Administration (NASA), optical ray tracing, parallel computing, parallel pro- cessing, prime numbers, ray tracing
Kuzmak, P. M.; Dayhoff, R. E.
1992-01-01
There is a wide range of requirements for digital hospital imaging systems. Radiology needs very high resolution black and white images. Other diagnostic disciplines need high resolution color imaging capabilities. Images need to be displayed in many locations throughout the hospital. Different imaging systems within a hospital need to cooperate in order to show the whole picture. At the Baltimore VA Medical Center, the DHCP Integrated Imaging System and a commercial Picture Archiving and Communication System (PACS) work in concert to provide a wide-range of departmental and hospital-wide imaging capabilities. An interface between the DHCP and the Siemens-Loral PACS systems enables patient text and image data to be passed between the two systems. The interface uses ACR-NEMA 2.0 Standard messages extended with shadow groups based on draft ACR-NEMA 3.0 prototypes. A Novell file server, accessible to both systems via Ethernet, is used to communicate all the messages. Patient identification information, orders, ADT, procedure status, changes, patient reports, and images are sent between the two systems across the interface. The systems together provide an extensive set of imaging capabilities for both the specialist and the general practitioner. PMID:1482906
Kuzmak, P M; Dayhoff, R E
1992-01-01
There is a wide range of requirements for digital hospital imaging systems. Radiology needs very high resolution black and white images. Other diagnostic disciplines need high resolution color imaging capabilities. Images need to be displayed in many locations throughout the hospital. Different imaging systems within a hospital need to cooperate in order to show the whole picture. At the Baltimore VA Medical Center, the DHCP Integrated Imaging System and a commercial Picture Archiving and Communication System (PACS) work in concert to provide a wide-range of departmental and hospital-wide imaging capabilities. An interface between the DHCP and the Siemens-Loral PACS systems enables patient text and image data to be passed between the two systems. The interface uses ACR-NEMA 2.0 Standard messages extended with shadow groups based on draft ACR-NEMA 3.0 prototypes. A Novell file server, accessible to both systems via Ethernet, is used to communicate all the messages. Patient identification information, orders, ADT, procedure status, changes, patient reports, and images are sent between the two systems across the interface. The systems together provide an extensive set of imaging capabilities for both the specialist and the general practitioner.
Efficient Data Generation and Publication as a Test Tool
NASA Technical Reports Server (NTRS)
Einstein, Craig Jakob
2017-01-01
A tool to facilitate the generation and publication of test data was created to test the individual components of a command and control system designed to launch spacecraft. Specifically, this tool was built to ensure messages are properly passed between system components. The tool can also be used to test whether the appropriate groups have access (read/write privileges) to the correct messages. The messages passed between system components take the form of unique identifiers with associated values. These identifiers are alphanumeric strings that identify the type of message and the additional parameters that are contained within the message. The values that are passed with the message depend on the identifier. The data generation tool allows for the efficient creation and publication of these messages. A configuration file can be used to set the parameters of the tool and also specify which messages to pass.
Parallel deterministic neutronics with AMR in 3D
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clouse, C.; Ferguson, J.; Hendrickson, C.
1997-12-31
AMTRAN, a three dimensional Sn neutronics code with adaptive mesh refinement (AMR) has been parallelized over spatial domains and energy groups and runs on the Meiko CS-2 with MPI message passing. Block refined AMR is used with linear finite element representations for the fluxes, which allows for a straight forward interpretation of fluxes at block interfaces with zoning differences. The load balancing algorithm assumes 8 spatial domains, which minimizes idle time among processors.
Communication Studies of DMP and SMP Machines
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
Understanding the interplay between machines and problems is key to obtaining high performance on parallel machines. This paper investigates the interplay between programming paradigms and communication capabilities of parallel machines. In particular, we explicate the communication capabilities of the IBM SP-2 distributed-memory multiprocessor and the SGI PowerCHALLENGEarray symmetric multiprocessor. Two benchmark problems of bitonic sorting and Fast Fourier Transform are selected for experiments. Communication-efficient algorithms are developed to exploit the overlapping capabilities of the machines. Programs are written in Message-Passing Interface for portability and identical codes are used for both machines. Various data sizes and message sizes are used to test the machines' communication capabilities. Experimental results indicate that the communication performance of the multiprocessors are consistent with the size of messages. The SP-2 is sensitive to message size but yields a much higher communication overlapping because of the communication co-processor. The PowerCHALLENGEarray is not highly sensitive to message size and yields a low communication overlapping. Bitonic sorting yields lower performance compared to FFT due to a smaller computation-to-communication ratio.
NASA Technical Reports Server (NTRS)
Jones, Terry; Mark, Richard; Martin, Jeanne; May, John; Pierce, Elsie; Stanberry, Linda
1996-01-01
This paper describes an implementation of the proposed MPI-IO (Message Passing Interface - Input/Output) standard for parallel I/O. Our system uses third-party transfer to move data over an external network between the processors where it is used and the I/O devices where it resides. Data travels directly from source to destination, without the need for shuffling it among processors or funneling it through a central node. Our distributed server model lets multiple compute nodes share the burden of coordinating data transfers. The system is built on the High Performance Storage System (HPSS), and a prototype version runs on a Meiko CS-2 parallel computer.
Advances in Parallel Computing and Databases for Digital Pathology in Cancer Research
2016-11-13
these technologies and how we have used them in the past. We are interested in learning more about the needs of clinical pathologists as we continue to...such as image processing and correlation. Further, High Performance Computing (HPC) paradigms such as the Message Passing Interface (MPI) have been...Defense for Research and Engineering. such as pMatlab [4], or bcMPI [5] can significantly reduce the need for deep knowledge of parallel computing. In
Time-Dependent Simulations of Incompressible Flow in a Turbopump Using Overset Grid Approach
NASA Technical Reports Server (NTRS)
Kiris, Cetin; Kwak, Dochan
2001-01-01
This viewgraph presentation provides information on mathematical modelling of the SSME (space shuttle main engine). The unsteady SSME-rig1 start-up procedure from the pump at rest has been initiated by using 34.3 million grid points. The computational model for the SSME-rig1 has been completed. Moving boundary capability is obtained by using DCF module in OVERFLOW-D. MPI (Message Passing Interface)/OpenMP hybrid parallel code has been benchmarked.
MPF: A portable message passing facility for shared memory multiprocessors
NASA Technical Reports Server (NTRS)
Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.
1987-01-01
The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob; Biegel, Bryan A. (Technical Monitor)
2002-01-01
We describe a new problem size, called Class D, for the NAS Parallel Benchmarks (NPB), whose MPI source code implementation is being released as NPB 2.4. A brief rationale is given for how the new class is derived. We also describe the modifications made to the MPI (Message Passing Interface) implementation to allow the new class to be run on systems with 32-bit integers, and with moderate amounts of memory. Finally, we give the verification values for the new problem size.
IMPETUS - Interactive MultiPhysics Environment for Unified Simulations.
Ha, Vi Q; Lykotrafitis, George
2016-12-08
We introduce IMPETUS - Interactive MultiPhysics Environment for Unified Simulations, an object oriented, easy-to-use, high performance, C++ program for three-dimensional simulations of complex physical systems that can benefit a large variety of research areas, especially in cell mechanics. The program implements cross-communication between locally interacting particles and continuum models residing in the same physical space while a network facilitates long-range particle interactions. Message Passing Interface is used for inter-processor communication for all simulations. Copyright © 2016 Elsevier Ltd. All rights reserved.
Specification of Fenix MPI Fault Tolerance library version 1.0.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gamble, Marc; Van Der Wijngaart, Rob; Teranishi, Keita
This document provides a specification of Fenix, a software library compatible with the Message Passing Interface (MPI) to support fault recovery without application shutdown. The library consists of two modules. The first, termed process recovery , restores an application to a consistent state after it has suffered a loss of one or more MPI processes (ranks). The second specifies functions the user can invoke to store application data in Fenix managed redundant storage, and to retrieve it from that storage after process recovery.
Evaluating and extending user-level fault tolerance in MPI applications
Laguna, Ignacio; Richards, David F.; Gamblin, Todd; ...
2016-01-11
The user-level failure mitigation (ULFM) interface has been proposed to provide fault-tolerant semantics in the Message Passing Interface (MPI). Previous work presented performance evaluations of ULFM; yet questions related to its programability and applicability, especially to non-trivial, bulk synchronous applications, remain unanswered. In this article, we present our experiences on using ULFM in a case study with a large, highly scalable, bulk synchronous molecular dynamics application to shed light on the advantages and difficulties of this interface to program fault-tolerant MPI applications. We found that, although ULFM is suitable for master–worker applications, it provides few benefits for more common bulkmore » synchronous MPI applications. Furthermore, to address these limitations, we introduce a new, simpler fault-tolerant interface for complex, bulk synchronous MPI programs with better applicability and support than ULFM for application-level recovery mechanisms, such as global rollback.« less
NASA Technical Reports Server (NTRS)
Katz, Daniel
2004-01-01
PVM Wrapper is a software library that makes it possible for code that utilizes the Parallel Virtual Machine (PVM) software library to run using the message-passing interface (MPI) software library, without needing to rewrite the entire code. PVM and MPI are the two most common software libraries used for applications that involve passing of messages among parallel computers. Since about 1996, MPI has been the de facto standard. Codes written when PVM was popular often feature patterns of {"initsend," "pack," "send"} and {"receive," "unpack"} calls. In many cases, these calls are not contiguous and one set of calls may even exist over multiple subroutines. These characteristics make it difficult to obtain equivalent functionality via a single MPI "send" call. Because PVM Wrapper is written to run with MPI- 1.2, some PVM functions are not permitted and must be replaced - a task that requires some programming expertise. The "pvm_spawn" and "pvm_parent" function calls are not replaced, but a programmer can use "mpirun" and knowledge of the ranks of parent and child tasks with supplied macroinstructions to enable execution of codes that use "pvm_spawn" and "pvm_parent."
Edworthy, Judy; Hellier, Elizabeth; Newbold, Lex; Titchener, Kirsteen
2015-05-01
Three experiments explore several factors which influence information transmission when warning messages are passed from person to person. In Experiment 1, messages were passed down chains of participants using five different modes of communication. Written communication channels resulted in more accurate message transmission than verbal. In addition, some elements of the message endured further down the chain than others. Experiment 2 largely replicated these effects and also demonstrated that simple repetition of a message eliminated differences between written and spoken communication. In a final field experiment, chains of participants passed information however they wanted to, with the proviso that half of the chains could not use telephones. Here, the lack of ability to use a telephone did not affect accuracy, but did slow down the speed of transmission from the recipient of the message to the last person in the chain. Implications of the findings for crisis and emergency risk communication are discussed. Copyright © 2015 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Statistics of Epidemics in Networks by Passing Messages
NASA Astrophysics Data System (ADS)
Shrestha, Munik Kumar
Epidemic processes are common out-of-equilibrium phenomena of broad interdisciplinary interest. In this thesis, we show how message-passing approach can be a helpful tool for simulating epidemic models in disordered medium like networks, and in particular for estimating the probability that a given node will become infectious at a particular time. The sort of dynamics we consider are stochastic, where randomness can arise from the stochastic events or from the randomness of network structures. As in belief propagation, variables or messages in message-passing approach are defined on the directed edges of a network. However, unlike belief propagation, where the posterior distributions are updated according to Bayes' rule, in message-passing approach we write differential equations for the messages over time. It takes correlations between neighboring nodes into account while preventing causal signals from backtracking to their immediate source, and thus avoids "echo chamber effects" where a pair of adjacent nodes each amplify the probability that the other is infectious. In our first results, we develop a message-passing approach to threshold models of behavior popular in sociology. These are models, first proposed by Granovetter, where individuals have to hear about a trend or behavior from some number of neighbors before adopting it themselves. In thermodynamic limit of large random networks, we provide an exact analytic scheme while calculating the time dependence of the probabilities and thus learning about the whole dynamics of bootstrap percolation, which is a simple model known in statistical physics for exhibiting discontinuous phase transition. As an application, we apply a similar model to financial networks, studying when bankruptcies spread due to the sudden devaluation of shared assets in overlapping portfolios. We predict that although diversification may be good for individual institutions, it can create dangerous systemic effects, and as a result financial contagion gets worse with too much diversification. We also predict that financial system exhibits "robust yet fragile" behavior, with regions of the parameter space where contagion is rare but catastrophic whenever it occurs. In further results, we develop a message-passing approach to recurrent state epidemics like susceptible-infectious-susceptible and susceptible-infectious-recovered-susceptible where nodes can return to previously inhabited states and multiple waves of infection can pass through the population. Given that message-passing has been applied exclusively to models with one-way state changes like susceptible-infectious and susceptible-infectious-recovered, we develop message-passing for recurrent epidemics based on a new class of differential equations and demonstrate that our approach is simple and efficiently approximates results obtained from Monte Carlo simulation, and that the accuracy of message-passing is often superior to the pair approximation (which also takes second-order correlations into account).
Verification of Faulty Message Passing Systems with Continuous State Space in PVS
NASA Technical Reports Server (NTRS)
Pilotto, Concetta; White, Jerome
2010-01-01
We present a library of Prototype Verification System (PVS) meta-theories that verifies a class of distributed systems in which agent commu nication is through message-passing. The theoretic work, outlined in, consists of iterative schemes for solving systems of linear equations , such as message-passing extensions of the Gauss and Gauss-Seidel me thods. We briefly review that work and discuss the challenges in formally verifying it.
The serial message-passing schedule for LDPC decoding algorithms
NASA Astrophysics Data System (ADS)
Liu, Mingshan; Liu, Shanshan; Zhou, Yuan; Jiang, Xue
2015-12-01
The conventional message-passing schedule for LDPC decoding algorithms is the so-called flooding schedule. It has the disadvantage that the updated messages cannot be used until next iteration, thus reducing the convergence speed . In this case, the Layered Decoding algorithm (LBP) based on serial message-passing schedule is proposed. In this paper the decoding principle of LBP algorithm is briefly introduced, and then proposed its two improved algorithms, the grouped serial decoding algorithm (Grouped LBP) and the semi-serial decoding algorithm .They can improve LBP algorithm's decoding speed while maintaining a good decoding performance.
The Raid distributed database system
NASA Technical Reports Server (NTRS)
Bhargava, Bharat; Riedl, John
1989-01-01
Raid, a robust and adaptable distributed database system for transaction processing (TP), is described. Raid is a message-passing system, with server processes on each site to manage concurrent processing, consistent replicated copies during site failures, and atomic distributed commitment. A high-level layered communications package provides a clean location-independent interface between servers. The latest design of the package delivers messages via shared memory in a configuration with several servers linked into a single process. Raid provides the infrastructure to investigate various methods for supporting reliable distributed TP. Measurements on TP and server CPU time are presented, along with data from experiments on communications software, consistent replicated copy control during site failures, and concurrent distributed checkpointing. A software tool for evaluating the implementation of TP algorithms in an operating-system kernel is proposed.
Collignon, Barbara; Schulz, Roland; Smith, Jeremy C; Baudry, Jerome
2011-04-30
A message passing interface (MPI)-based implementation (Autodock4.lga.MPI) of the grid-based docking program Autodock4 has been developed to allow simultaneous and independent docking of multiple compounds on up to thousands of central processing units (CPUs) using the Lamarkian genetic algorithm. The MPI version reads a single binary file containing precalculated grids that represent the protein-ligand interactions, i.e., van der Waals, electrostatic, and desolvation potentials, and needs only two input parameter files for the entire docking run. In comparison, the serial version of Autodock4 reads ASCII grid files and requires one parameter file per compound. The modifications performed result in significantly reduced input/output activity compared with the serial version. Autodock4.lga.MPI scales up to 8192 CPUs with a maximal overhead of 16.3%, of which two thirds is due to input/output operations and one third originates from MPI operations. The optimal docking strategy, which minimizes docking CPU time without lowering the quality of the database enrichments, comprises the docking of ligands preordered from the most to the least flexible and the assignment of the number of energy evaluations as a function of the number of rotatable bounds. In 24 h, on 8192 high-performance computing CPUs, the present MPI version would allow docking to a rigid protein of about 300K small flexible compounds or 11 million rigid compounds.
Parallel PAB3D: Experiences with a Prototype in MPI
NASA Technical Reports Server (NTRS)
Guerinoni, Fabio; Abdol-Hamid, Khaled S.; Pao, S. Paul
1998-01-01
PAB3D is a three-dimensional Navier Stokes solver that has gained acceptance in the research and industrial communities. It takes as computational domain, a set disjoint blocks covering the physical domain. This is the first report on the implementation of PAB3D using the Message Passing Interface (MPI), a standard for parallel processing. We discuss briefly the characteristics of tile code and define a prototype for testing. The principal data structure used for communication is derived from preprocessing "patching". We describe a simple interface (COMMSYS) for MPI communication, and some general techniques likely to be encountered when working on problems of this nature. Last, we identify levels of improvement from the current version and outline future work.
Applications Performance Under MPL and MPI on NAS IBM SP2
NASA Technical Reports Server (NTRS)
Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)
1994-01-01
On July 5, 1994, an IBM Scalable POWER parallel System (IBM SP2) with 64 nodes, was installed at the Numerical Aerodynamic Simulation (NAS) Facility Each node of NAS IBM SP2 is a "wide node" consisting of a RISC 6000/590 workstation module with a clock of 66.5 MHz which can perform four floating point operations per clock with a peak performance of 266 Mflop/s. By the end of 1994, 64 nodes of IBM SP2 will be upgraded to 160 nodes with a peak performance of 42.5 Gflop/s. An overview of the IBM SP2 hardware is presented. The basic understanding of architectural details of RS 6000/590 will help application scientists the porting, optimizing, and tuning of codes from other machines such as the CRAY C90 and the Paragon to the NAS SP2. Optimization techniques such as quad-word loading, effective utilization of two floating point units, and data cache optimization of RS 6000/590 is illustrated, with examples giving performance gains at each optimization step. The conversion of codes using Intel's message passing library NX to codes using native Message Passing Library (MPL) and the Message Passing Interface (NMI) library available on the IBM SP2 is illustrated. In particular, we will present the performance of Fast Fourier Transform (FFT) kernel from NAS Parallel Benchmarks (NPB) under MPL and MPI. We have also optimized some of Fortran BLAS 2 and BLAS 3 routines, e.g., the optimized Fortran DAXPY runs at 175 Mflop/s and optimized Fortran DGEMM runs at 230 Mflop/s per node. The performance of the NPB (Class B) on the IBM SP2 is compared with the CRAY C90, Intel Paragon, TMC CM-5E, and the CRAY T3D.
Efficient Implementation of Multigrid Solvers on Message-Passing Parrallel Systems
NASA Technical Reports Server (NTRS)
Lou, John
1994-01-01
We discuss our implementation strategies for finite difference multigrid partial differential equation (PDE) solvers on message-passing systems. Our target parallel architecture is Intel parallel computers: the Delta and Paragon system.
Application Portable Parallel Library
NASA Technical Reports Server (NTRS)
Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott
1995-01-01
Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
Grebner, Christoph; Becker, Johannes; Weber, Daniel; Bellinger, Daniel; Tafipolski, Maxim; Brückner, Charlotte; Engels, Bernd
2014-09-15
The presented program package, Conformational Analysis and Search Tool (CAST) allows the accurate treatment of large and flexible (macro) molecular systems. For the determination of thermally accessible minima CAST offers the newly developed TabuSearch algorithm, but algorithms such as Monte Carlo (MC), MC with minimization, and molecular dynamics are implemented as well. For the determination of reaction paths, CAST provides the PathOpt, the Nudge Elastic band, and the umbrella sampling approach. Access to free energies is possible through the free energy perturbation approach. Along with a number of standard force fields, a newly developed symmetry-adapted perturbation theory-based force field is included. Semiempirical computations are possible through DFTB+ and MOPAC interfaces. For calculations based on density functional theory, a Message Passing Interface (MPI) interface to the Graphics Processing Unit (GPU)-accelerated TeraChem program is available. The program is available on request. Copyright © 2014 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gorentla Venkata, Manjunath; Graham, Richard L; Ladd, Joshua S
This paper describes the design and implementation of InfiniBand (IB) CORE-Direct based blocking and nonblocking broadcast operations within the Cheetah collective operation framework. It describes a novel approach that fully ofFLoads collective operations and employs only user-supplied buffers. For a 64 rank communicator, the latency of CORE-Direct based hierarchical algorithm is better than production-grade Message Passing Interface (MPI) implementations, 150% better than the default Open MPI algorithm and 115% better than the shared memory optimized MVAPICH implementation for a one kilobyte (KB) message, and for eight mega-bytes (MB) it is 48% and 64% better, respectively. Flat-topology broadcast achieves 99.9% overlapmore » in a polling based communication-computation test, and 95.1% overlap for a wait based test, compared with 92.4% and 17.0%, respectively, for a similar Central Processing Unit (CPU) based implementation.« less
Pope, Bernard J; Fitch, Blake G; Pitman, Michael C; Rice, John J; Reumann, Matthias
2011-10-01
Future multiscale and multiphysics models that support research into human disease, translational medical science, and treatment can utilize the power of high-performance computing (HPC) systems. We anticipate that computationally efficient multiscale models will require the use of sophisticated hybrid programming models, mixing distributed message-passing processes [e.g., the message-passing interface (MPI)] with multithreading (e.g., OpenMP, Pthreads). The objective of this study is to compare the performance of such hybrid programming models when applied to the simulation of a realistic physiological multiscale model of the heart. Our results show that the hybrid models perform favorably when compared to an implementation using only the MPI and, furthermore, that OpenMP in combination with the MPI provides a satisfactory compromise between performance and code complexity. Having the ability to use threads within MPI processes enables the sophisticated use of all processor cores for both computation and communication phases. Considering that HPC systems in 2012 will have two orders of magnitude more cores than what was used in this study, we believe that faster than real-time multiscale cardiac simulations can be achieved on these systems.
Simulated Raman Spectral Analysis of Organic Molecules
NASA Astrophysics Data System (ADS)
Lu, Lu
The advent of the laser technology in the 1960s solved the main difficulty of Raman spectroscopy, resulted in simplified Raman spectroscopy instruments and also boosted the sensitivity of the technique. Up till now, Raman spectroscopy is commonly used in chemistry and biology. As vibrational information is specific to the chemical bonds, Raman spectroscopy provides fingerprints to identify the type of molecules in the sample. In this thesis, we simulate the Raman Spectrum of organic and inorganic materials by General Atomic and Molecular Electronic Structure System (GAMESS) and Gaussian, two computational codes that perform several general chemistry calculations. We run these codes on our CPU-based high-performance cluster (HPC). Through the message passing interface (MPI), a standardized and portable message-passing system which can make the codes run in parallel, we are able to decrease the amount of time for computation and increase the sizes and capacities of systems simulated by the codes. From our simulations, we will set up a database that allows search algorithm to quickly identify N-H and O-H bonds in different materials. Our ultimate goal is to analyze and identify the spectra of organic matter compositions from meteorites and compared these spectra with terrestrial biologically-produced amino acids and residues.
Pope, Bernard J; Fitch, Blake G; Pitman, Michael C; Rice, John J; Reumann, Matthias
2011-01-01
Future multiscale and multiphysics models must use the power of high performance computing (HPC) systems to enable research into human disease, translational medical science, and treatment. Previously we showed that computationally efficient multiscale models will require the use of sophisticated hybrid programming models, mixing distributed message passing processes (e.g. the message passing interface (MPI)) with multithreading (e.g. OpenMP, POSIX pthreads). The objective of this work is to compare the performance of such hybrid programming models when applied to the simulation of a lightweight multiscale cardiac model. Our results show that the hybrid models do not perform favourably when compared to an implementation using only MPI which is in contrast to our results using complex physiological models. Thus, with regards to lightweight multiscale cardiac models, the user may not need to increase programming complexity by using a hybrid programming approach. However, considering that model complexity will increase as well as the HPC system size in both node count and number of cores per node, it is still foreseeable that we will achieve faster than real time multiscale cardiac simulations on these systems using hybrid programming models.
Message passing with parallel queue traversal
Underwood, Keith D [Albuquerque, NM; Brightwell, Ronald B [Albuquerque, NM; Hemmert, K Scott [Albuquerque, NM
2012-05-01
In message passing implementations, associative matching structures are used to permit list entries to be searched in parallel fashion, thereby avoiding the delay of linear list traversal. List management capabilities are provided to support list entry turnover semantics and priority ordering semantics.
Neighbourhood-consensus message passing and its potentials in image processing applications
NASA Astrophysics Data System (ADS)
Ružic, Tijana; Pižurica, Aleksandra; Philips, Wilfried
2011-03-01
In this paper, a novel algorithm for inference in Markov Random Fields (MRFs) is presented. Its goal is to find approximate maximum a posteriori estimates in a simple manner by combining neighbourhood influence of iterated conditional modes (ICM) and message passing of loopy belief propagation (LBP). We call the proposed method neighbourhood-consensus message passing because a single joint message is sent from the specified neighbourhood to the central node. The message, as a function of beliefs, represents the agreement of all nodes within the neighbourhood regarding the labels of the central node. This way we are able to overcome the disadvantages of reference algorithms, ICM and LBP. On one hand, more information is propagated in comparison with ICM, while on the other hand, the huge amount of pairwise interactions is avoided in comparison with LBP by working with neighbourhoods. The idea is related to the previously developed iterated conditional expectations algorithm. Here we revisit it and redefine it in a message passing framework in a more general form. The results on three different benchmarks demonstrate that the proposed technique can perform well both for binary and multi-label MRFs without any limitations on the model definition. Furthermore, it manifests improved performance over related techniques either in terms of quality and/or speed.
Foundations for a healthy future.
Riley, R; Green, J; Willis, S; Soden, E; Rushby, C; Postle, D; Wakeling, S
1998-01-01
Health promotion activities with children and young people are important as they take messages about health seriously and can be influential in spreading messages about healthy living to their friends and families. Child health professionals have an important role to play in passing on messages of positive health to children and young people. Peer education is a useful way of passing on messages about health to young people. This article shares examples of three health promotion projects with children in a community trust, looking at asthma, sex education and testicular examination.
MPI_XSTAR: MPI-based parallelization of XSTAR program
NASA Astrophysics Data System (ADS)
Danehkar, A.
2017-12-01
MPI_XSTAR parallelizes execution of multiple XSTAR runs using Message Passing Interface (MPI). XSTAR (ascl:9910.008), part of the HEASARC's HEAsoft (ascl:1408.004) package, calculates the physical conditions and emission spectra of ionized gases. MPI_XSTAR invokes XSTINITABLE from HEASoft to generate a job list of XSTAR commands for given physical parameters. The job list is used to make directories in ascending order, where each individual XSTAR is spawned on each processor and outputs are saved. HEASoft's XSTAR2TABLE program is invoked upon the contents of each directory in order to produce table model FITS files for spectroscopy analysis tools.
''Towards a High-Performance and Robust Implementation of MPI-IO on Top of GPFS''
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prost, J.P.; Tremann, R.; Blackwore, R.
2000-01-11
MPI-IO/GPFS is a prototype implementation of the I/O chapter of the Message Passing Interface (MPI) 2 standard. It uses the IBM General Parallel File System (GPFS), with prototyped extensions, as the underlying file system. this paper describes the features of this prototype which support its high performance and robustness. The use of hints at the file system level and at the MPI-IO level allows tailoring the use of the file system to the application needs. Error handling in collective operations provides robust error reporting and deadlock prevention in case of returning errors.
Distributed-Memory Computing With the Langley Aerothermodynamic Upwind Relaxation Algorithm (LAURA)
NASA Technical Reports Server (NTRS)
Riley, Christopher J.; Cheatwood, F. McNeil
1997-01-01
The Langley Aerothermodynamic Upwind Relaxation Algorithm (LAURA), a Navier-Stokes solver, has been modified for use in a parallel, distributed-memory environment using the Message-Passing Interface (MPI) standard. A standard domain decomposition strategy is used in which the computational domain is divided into subdomains with each subdomain assigned to a processor. Performance is examined on dedicated parallel machines and a network of desktop workstations. The effect of domain decomposition and frequency of boundary updates on performance and convergence is also examined for several realistic configurations and conditions typical of large-scale computational fluid dynamic analysis.
Dispatching packets on a global combining network of a parallel computer
Almasi, Gheorghe [Ardsley, NY; Archer, Charles J [Rochester, MN
2011-07-19
Methods, apparatus, and products are disclosed for dispatching packets on a global combining network of a parallel computer comprising a plurality of nodes connected for data communications using the network capable of performing collective operations and point to point operations that include: receiving, by an origin system messaging module on an origin node from an origin application messaging module on the origin node, a storage identifier and an operation identifier, the storage identifier specifying storage containing an application message for transmission to a target node, and the operation identifier specifying a message passing operation; packetizing, by the origin system messaging module, the application message into network packets for transmission to the target node, each network packet specifying the operation identifier and an operation type for the message passing operation specified by the operation identifier; and transmitting, by the origin system messaging module, the network packets to the target node.
GMSEC Interface Specification Document 2016 March
NASA Technical Reports Server (NTRS)
Handy, Matthew
2016-01-01
The GMSEC Interface Specification Document contains the standard set of defined messages. Each GMSEC standard message contains a GMSEC Information Bus Header section and a Message Contents section. Each message section identifies required fields, optional fields, data type and recommended use of the fields. Additionally, this document includes the message subjects associated with the standard messages. The system design of the operations center should ensure the components that are selected use both the API and the defined standard messages in order to achieve full interoperability from component to component.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Painter, J.; McCormick, P.; Krogh, M.
This paper presents the ACL (Advanced Computing Lab) Message Passing Library. It is a high throughput, low latency communications library, based on Thinking Machines Corp.`s CMMD, upon which message passing applications can be built. The library has been implemented on the Cray T3D, Thinking Machines CM-5, SGI workstations, and on top of PVM.
Toward Abstracting the Communication Intent in Applications to Improve Portability and Productivity
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mintz, Tiffany M; Hernandez, Oscar R; Kartsaklis, Christos
Programming with communication libraries such as the Message Passing Interface (MPI) obscures the high-level intent of the communication in an application and makes static communication analysis difficult to do. Compilers are unaware of communication libraries specifics, leading to the exclusion of communication patterns from any automated analysis and optimizations. To overcome this, communication patterns can be expressed at higher-levels of abstraction and incrementally added to existing MPI applications. In this paper, we propose the use of directives to clearly express the communication intent of an application in a way that is not specific to a given communication library. Our communicationmore » directives allow programmers to express communication among processes in a portable way, giving hints to the compiler on regions of computations that can be overlapped with communication and relaxing communication constraints on the ordering, completion and synchronization of the communication imposed by specific libraries such as MPI. The directives can then be translated by the compiler into message passing calls that efficiently implement the intended pattern and be targeted to multiple communication libraries. Thus far, we have used the directives to express point-to-point communication patterns in C, C++ and Fortran applications, and have translated them to MPI and SHMEM.« less
NASA Astrophysics Data System (ADS)
Ramiller, Chuck; Taylor, Trey; Rafferty, Tom H.; Cornell, Mark E.; Rafal, Marc; Savage, Richard
2010-07-01
The Hobby-Eberly Telescope (HET) will be undergoing a major upgrade as a precursor to the HET Dark Energy Experiment (HETDEX‡). As part of this upgrade, the Prime Focus Instrument Package (PFIP) will be replaced with a new design that supports the HETDEX requirements along with the existing suite of instruments and anticipated future additions. This paper describes the new PFIP control system hardware plus the physical constraints and other considerations driving its design. Because of its location at the top end of the telescope, the new PFIP is essentially a stand-alone remote automation island containing over a dozen subsystems. Within the PFIP, motion controllers and modular IO systems are interconnected using a local Controller Area Network (CAN) bus and the CANOpen messaging protocol. CCD cameras that are equipped only with USB 2.0 interfaces are connected to a local Ethernet network via small microcontroller boards running embedded Linux. Links to ground-level systems pass through a 100 m cable bundle and use Ethernet over fiber optic cable exclusively; communications are either direct or through Ethernet/CAN gateways that pass CANOpen messages transparently. All of the control system hardware components are commercially available, designed for rugged industrial applications, and rated for extended temperature operation down to -10 °C.
LES of a ducted propeller with rotor and stator in crashback
NASA Astrophysics Data System (ADS)
Jang, Hyunchul; Mahesh, Krishnan
2012-11-01
A sliding interface method is developed for large eddy simulation (LES) of flow past ducted propellers with both rotor and stator. The method is developed for arbitrarily shaped unstructured elements on massively parallel computing platforms. Novel algorithms for searching sliding elements, interpolation at the sliding interface, and data structures for message passing are developed. We perform LES of flow past a ducted propeller with stator blades in the crashback mode of operation, where a marine vessel is quickly decelerated by rotating the propeller in reverse. The unsteady loads predicted by LES are in good agreement with experiments. A highly unsteady vortex ring is observed outside the duct. High pressure fluctuations are observed near the blade tips, which significantly contribute to the side-force. This work is supported by the United States Office of Naval Research.
Performance Analysis of GYRO: A Tool Evaluation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Worley, P.; Roth, P.; Candy, J.
2005-06-26
The performance of the Eulerian gyrokinetic-Maxwell solver code GYRO is analyzed on five high performance computing systems. First, a manual approach is taken, using custom scripts to analyze the output of embedded wall clock timers, floating point operation counts collected using hardware performance counters, and traces of user and communication events collected using the profiling interface to Message Passing Interface (MPI) libraries. Parts of the analysis are then repeated or extended using a number of sophisticated performance analysis tools: IPM, KOJAK, SvPablo, TAU, and the PMaC modeling tool suite. The paper briefly discusses what has been discovered via this manualmore » analysis process, what performance analyses are inconvenient or infeasible to attempt manually, and to what extent the tools show promise in accelerating or significantly extending the manual performance analyses.« less
NASA Technical Reports Server (NTRS)
Juang, Hann-Ming Henry; Tao, Wei-Kuo; Zeng, Xi-Ping; Shie, Chung-Lin; Simpson, Joanne; Lang, Steve
2004-01-01
The capability for massively parallel programming (MPP) using a message passing interface (MPI) has been implemented into a three-dimensional version of the Goddard Cumulus Ensemble (GCE) model. The design for the MPP with MPI uses the concept of maintaining similar code structure between the whole domain as well as the portions after decomposition. Hence the model follows the same integration for single and multiple tasks (CPUs). Also, it provides for minimal changes to the original code, so it is easily modified and/or managed by the model developers and users who have little knowledge of MPP. The entire model domain could be sliced into one- or two-dimensional decomposition with a halo regime, which is overlaid on partial domains. The halo regime requires that no data be fetched across tasks during the computational stage, but it must be updated before the next computational stage through data exchange via MPI. For reproducible purposes, transposing data among tasks is required for spectral transform (Fast Fourier Transform, FFT), which is used in the anelastic version of the model for solving the pressure equation. The performance of the MPI-implemented codes (i.e., the compressible and anelastic versions) was tested on three different computing platforms. The major results are: 1) both versions have speedups of about 99% up to 256 tasks but not for 512 tasks; 2) the anelastic version has better speedup and efficiency because it requires more computations than that of the compressible version; 3) equal or approximately-equal numbers of slices between the x- and y- directions provide the fastest integration due to fewer data exchanges; and 4) one-dimensional slices in the x-direction result in the slowest integration due to the need for more memory relocation for computation.
Zhang, Xiaohua; Wong, Sergio E; Lightstone, Felice C
2013-04-30
A mixed parallel scheme that combines message passing interface (MPI) and multithreading was implemented in the AutoDock Vina molecular docking program. The resulting program, named VinaLC, was tested on the petascale high performance computing (HPC) machines at Lawrence Livermore National Laboratory. To exploit the typical cluster-type supercomputers, thousands of docking calculations were dispatched by the master process to run simultaneously on thousands of slave processes, where each docking calculation takes one slave process on one node, and within the node each docking calculation runs via multithreading on multiple CPU cores and shared memory. Input and output of the program and the data handling within the program were carefully designed to deal with large databases and ultimately achieve HPC on a large number of CPU cores. Parallel performance analysis of the VinaLC program shows that the code scales up to more than 15K CPUs with a very low overhead cost of 3.94%. One million flexible compound docking calculations took only 1.4 h to finish on about 15K CPUs. The docking accuracy of VinaLC has been validated against the DUD data set by the re-docking of X-ray ligands and an enrichment study, 64.4% of the top scoring poses have RMSD values under 2.0 Å. The program has been demonstrated to have good enrichment performance on 70% of the targets in the DUD data set. An analysis of the enrichment factors calculated at various percentages of the screening database indicates VinaLC has very good early recovery of actives. Copyright © 2013 Wiley Periodicals, Inc.
DMA engine for repeating communication patterns
Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Steinmacher-Burow, Burkhard; Vranas, Pavlos
2010-09-21
A parallel computer system is constructed as a network of interconnected compute nodes to operate a global message-passing application for performing communications across the network. Each of the compute nodes includes one or more individual processors with memories which run local instances of the global message-passing application operating at each compute node to carry out local processing operations independent of processing operations carried out at other compute nodes. Each compute node also includes a DMA engine constructed to interact with the application via Injection FIFO Metadata describing multiple Injection FIFOs where each Injection FIFO may containing an arbitrary number of message descriptors in order to process messages with a fixed processing overhead irrespective of the number of message descriptors included in the Injection FIFO.
Development of a distributed control system for TOTEM experiment using ASIO Boost C++ libraries
NASA Astrophysics Data System (ADS)
Cafagna, F.; Mercadante, A.; Minafra, N.; Quinto, M.; Radicioni, E.
2014-06-01
The main goals of the TOTEM Experiment at the LHC are the measurements of the elastic and total p-p cross sections and the studies of the diffractive dissociation processes. Those scientific objectives are achieved by using three tracking detectors symmetrically arranged around the interaction point called IP5. The control system is based on a C++ software that allows the user, by means of a graphical interface, direct access to hardware and handling of devices configuration. A first release of the software was designed as a monolithic block, with all functionalities being merged together. Such approach showed soon its limits, mainly poor reusability and maintainability of the source code, evident not only in phase of bug-fixing, but also when one wants to extend functionalities or apply some other modifications. This led to the decision of a radical redesign of the software, now based on the dialogue (message-passing) among separate building blocks. Thanks to the acquired extensibility, the software gained new features and now is a complete tool by which it is possible not only to configure different devices interfacing with a large subset of buses like I2C and VME, but also to do data acquisition both for calibration and physics runs. Furthermore, the software lets the user set up a series of operations to be executed sequentially to handle complex operations. To achieve maximum flexibility, the program units may be run either as a single process or as separate processes on different PCs which exchange messages over the network, thus allowing remote control of the system. Portability is ensured by the adoption of the ASIO (Asynchronous Input Output) library of Boost, a cross-platform suite of libraries which is candidate to become part of the C++ 11 standard. We present the state of the art of this project and outline the future perspectives. In particular, we describe the system architecture and the message-passing scheme. We also report on the results obtained in a first complete test of the software both as a single process and on two PCs.
Computation of an Underexpanded 3-D Rectangular Jet by the CE/SE Method
NASA Technical Reports Server (NTRS)
Loh, Ching Y.; Himansu, Ananda; Wang, Xiao Y.; Jorgenson, Philip C. E.
2000-01-01
Recently, an unstructured three-dimensional space-time conservation element and solution element (CE/SE) Euler solver was developed. Now it is also developed for parallel computation using METIS for domain decomposition and MPI (message passing interface). The method is employed here to numerically study the near-field of a typical 3-D rectangular under-expanded jet. For the computed case-a jet with Mach number Mj = 1.6. with a very modest grid of 1.7 million tetrahedrons, the flow features such as the shock-cell structures and the axis switching, are in good qualitative agreement with experimental results.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moryakov, A. V., E-mail: sailor@orc.ru
2016-12-15
An algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations is presented. The algorithm for systems of first-order differential equations is implemented in the EDELWEISS code with the possibility of parallel computations on supercomputers employing the MPI (Message Passing Interface) standard for the data exchange between parallel processes. The solution is represented by a series of orthogonal polynomials on the interval [0, 1]. The algorithm is characterized by simplicity and the possibility to solve nonlinear problems with a correction of the operator in accordance with the solution obtained in the previous iterative process.
NASA Technical Reports Server (NTRS)
Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana
2016-01-01
In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.
Ellingson, Sally R; Dakshanamurthy, Sivanesan; Brown, Milton; Smith, Jeremy C; Baudry, Jerome
2014-04-25
In this paper we give the current state of high-throughput virtual screening. We describe a case study of using a task-parallel MPI (Message Passing Interface) version of Autodock4 [1], [2] to run a virtual high-throughput screen of one-million compounds on the Jaguar Cray XK6 Supercomputer at Oak Ridge National Laboratory. We include a description of scripts developed to increase the efficiency of the predocking file preparation and postdocking analysis. A detailed tutorial, scripts, and source code for this MPI version of Autodock4 are available online at http://www.bio.utk.edu/baudrylab/autodockmpi.htm.
A multiprocessing architecture for real-time monitoring
NASA Technical Reports Server (NTRS)
Schmidt, James L.; Kao, Simon M.; Read, Jackson Y.; Weitzenkamp, Scott M.; Laffey, Thomas J.
1988-01-01
A multitasking architecture for performing real-time monitoring and analysis using knowledge-based problem solving techniques is described. To handle asynchronous inputs and perform in real time, the system consists of three or more distributed processes which run concurrently and communicate via a message passing scheme. The Data Management Process acquires, compresses, and routes the incoming sensor data to other processes. The Inference Process consists of a high performance inference engine that performs a real-time analysis on the state and health of the physical system. The I/O Process receives sensor data from the Data Management Process and status messages and recommendations from the Inference Process, updates its graphical displays in real time, and acts as the interface to the console operator. The distributed architecture has been interfaced to an actual spacecraft (NASA's Hubble Space Telescope) and is able to process the incoming telemetry in real-time (i.e., several hundred data changes per second). The system is being used in two locations for different purposes: (1) in Sunnyville, California at the Space Telescope Test Control Center it is used in the preflight testing of the vehicle; and (2) in Greenbelt, Maryland at NASA/Goddard it is being used on an experimental basis in flight operations for health and safety monitoring.
A Robust and Scalable Software Library for Parallel Adaptive Refinement on Unstructured Meshes
NASA Technical Reports Server (NTRS)
Lou, John Z.; Norton, Charles D.; Cwik, Thomas A.
1999-01-01
The design and implementation of Pyramid, a software library for performing parallel adaptive mesh refinement (PAMR) on unstructured meshes, is described. This software library can be easily used in a variety of unstructured parallel computational applications, including parallel finite element, parallel finite volume, and parallel visualization applications using triangular or tetrahedral meshes. The library contains a suite of well-designed and efficiently implemented modules that perform operations in a typical PAMR process. Among these are mesh quality control during successive parallel adaptive refinement (typically guided by a local-error estimator), parallel load-balancing, and parallel mesh partitioning using the ParMeTiS partitioner. The Pyramid library is implemented in Fortran 90 with an interface to the Message-Passing Interface (MPI) library, supporting code efficiency, modularity, and portability. An EM waveguide filter application, adaptively refined using the Pyramid library, is illustrated.
SpaceWire Driver Software for Special DSPs
NASA Technical Reports Server (NTRS)
Clark, Douglas; Lux, James; Nishimoto, Kouji; Lang, Minh
2003-01-01
A computer program provides a high-level C-language interface to electronics circuitry that controls a SpaceWire interface in a system based on a space qualified version of the ADSP-21020 digital signal processor (DSP). SpaceWire is a spacecraft-oriented standard for packet-switching data-communication networks that comprise nodes connected through bidirectional digital serial links that utilize low-voltage differential signaling (LVDS). The software is tailored to the SMCS-332 application-specific integrated circuit (ASIC) (also available as the TSS901E), which provides three highspeed (150 Mbps) serial point-to-point links compliant with the proposed Institute of Electrical and Electronics Engineers (IEEE) Standard 1355.2 and equivalent European Space Agency (ESA) Standard ECSS-E-50-12. In the specific application of this software, the SpaceWire ASIC was combined with the DSP processor, memory, and control logic in a Multi-Chip Module DSP (MCM-DSP). The software is a collection of low-level driver routines that provide a simple message-passing application programming interface (API) for software running on the DSP. Routines are provided for interrupt-driven access to the two styles of interface provided by the SMCS: (1) the "word at a time" conventional host interface (HOCI); and (2) a higher performance "dual port memory" style interface (COMI).
Message Passing vs. Shared Address Space on a Cluster of SMPs
NASA Technical Reports Server (NTRS)
Shan, Hongzhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswas, Rupak
2000-01-01
The convergence of scalable computer architectures using clusters of PCs (or PC-SMPs) with commodity networking has become an attractive platform for high end scientific computing. Currently, message-passing and shared address space (SAS) are the two leading programming paradigms for these systems. Message-passing has been standardized with MPI, and is the most common and mature programming approach. However message-passing code development can be extremely difficult, especially for irregular structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality, and high protocol overhead. In this paper, we compare the performance of and programming effort, required for six applications under both programming models on a 32 CPU PC-SMP cluster. Our application suite consists of codes that typically do not exhibit high efficiency under shared memory programming. due to their high communication to computation ratios and complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications: however, on certain classes of problems SAS performance is competitive with MPI. We also present new algorithms for improving the PC cluster performance of MPI collective operations.
Xu, Lei; Jeavons, Peter
2015-11-01
Leader election in anonymous rings and complete networks is a very practical problem in distributed computing. Previous algorithms for this problem are generally designed for a classical message passing model where complex messages are exchanged. However, the need to send and receive complex messages makes such algorithms less practical for some real applications. We present some simple synchronous algorithms for distributed leader election in anonymous rings and complete networks that are inspired by the development of the neural system of the fruit fly. Our leader election algorithms all assume that only one-bit messages are broadcast by nodes in the network and processors are only able to distinguish between silence and the arrival of one or more messages. These restrictions allow implementations to use a simpler message-passing architecture. Even with these harsh restrictions our algorithms are shown to achieve good time and message complexity both analytically and experimentally.
Distributed parallel messaging for multiprocessor systems
Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka
2013-06-04
A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.
ERIC Educational Resources Information Center
Schwan, Stephan; Straub, Daniela; Hesse, Friedrich W.
2002-01-01
Describes a study of computer conferencing where learners interacted over the course of four log-in sessions to acquire the knowledge sufficient to pass a learning test. Studied the number of messages irrelevant to the topic, explicit threading of messages, reading times of relevant messages, and learning outcomes. (LRW)
Sutton, Jeannette; Gibson, C. Ben; Spiro, Emma S.; League, Cedar; Fitzhugh, Sean M.; Butts, Carter T.
2015-01-01
Message retransmission is a central aspect of information diffusion. In a disaster context, the passing on of official warning messages by members of the public also serves as a behavioral indicator of message salience, suggesting that particular messages are (or are not) perceived by the public to be both noteworthy and valuable enough to share with others. This study provides the first examination of terse message retransmission of official warning messages in response to a domestic terrorist attack, the Boston Marathon Bombing in 2013. Using messages posted from public officials’ Twitter accounts that were active during the period of the Boston Marathon bombing and manhunt, we examine the features of messages that are associated with their retransmission. We focus on message content, style, and structure, as well as the networked relationships of message senders to answer the question: what characteristics of a terse message sent under conditions of imminent threat predict its retransmission among members of the public? We employ a negative binomial model to examine how message characteristics affect message retransmission. We find that, rather than any single effect dominating the process, retransmission of official Tweets during the Boston bombing response was jointly influenced by various message content, style, and sender characteristics. These findings suggest the need for more work that investigates impact of multiple factors on the allocation of attention and on message retransmission during hazard events. PMID:26295584
Simplifying HL7 Version 3 messages.
Worden, Robert; Scott, Philip
2011-01-01
HL7 Version 3 offers a semantically robust method for healthcare interoperability but has been criticized as overly complex to implement. This paper reviews initiatives to simplify HL7 Version 3 messaging and presents a novel approach based on semantic mapping. Based on user-defined definitions, precise transforms between simple and full messages are automatically generated. Systems can be interfaced with the simple messages and achieve interoperability with full Version 3 messages through the transforms. This reduces the costs of HL7 interfacing and will encourage better uptake of HL7 Version 3 and CDA.
The Edge-Disjoint Path Problem on Random Graphs by Message-Passing.
Altarelli, Fabrizio; Braunstein, Alfredo; Dall'Asta, Luca; De Bacco, Caterina; Franz, Silvio
2015-01-01
We present a message-passing algorithm to solve a series of edge-disjoint path problems on graphs based on the zero-temperature cavity equations. Edge-disjoint paths problems are important in the general context of routing, that can be defined by incorporating under a unique framework both traffic optimization and total path length minimization. The computation of the cavity equations can be performed efficiently by exploiting a mapping of a generalized edge-disjoint path problem on a star graph onto a weighted maximum matching problem. We perform extensive numerical simulations on random graphs of various types to test the performance both in terms of path length minimization and maximization of the number of accommodated paths. In addition, we test the performance on benchmark instances on various graphs by comparison with state-of-the-art algorithms and results found in the literature. Our message-passing algorithm always outperforms the others in terms of the number of accommodated paths when considering non trivial instances (otherwise it gives the same trivial results). Remarkably, the largest improvement in performance with respect to the other methods employed is found in the case of benchmarks with meshes, where the validity hypothesis behind message-passing is expected to worsen. In these cases, even though the exact message-passing equations do not converge, by introducing a reinforcement parameter to force convergence towards a sub optimal solution, we were able to always outperform the other algorithms with a peak of 27% performance improvement in terms of accommodated paths. On random graphs, we numerically observe two separated regimes: one in which all paths can be accommodated and one in which this is not possible. We also investigate the behavior of both the number of paths to be accommodated and their minimum total length.
The Edge-Disjoint Path Problem on Random Graphs by Message-Passing
2015-01-01
We present a message-passing algorithm to solve a series of edge-disjoint path problems on graphs based on the zero-temperature cavity equations. Edge-disjoint paths problems are important in the general context of routing, that can be defined by incorporating under a unique framework both traffic optimization and total path length minimization. The computation of the cavity equations can be performed efficiently by exploiting a mapping of a generalized edge-disjoint path problem on a star graph onto a weighted maximum matching problem. We perform extensive numerical simulations on random graphs of various types to test the performance both in terms of path length minimization and maximization of the number of accommodated paths. In addition, we test the performance on benchmark instances on various graphs by comparison with state-of-the-art algorithms and results found in the literature. Our message-passing algorithm always outperforms the others in terms of the number of accommodated paths when considering non trivial instances (otherwise it gives the same trivial results). Remarkably, the largest improvement in performance with respect to the other methods employed is found in the case of benchmarks with meshes, where the validity hypothesis behind message-passing is expected to worsen. In these cases, even though the exact message-passing equations do not converge, by introducing a reinforcement parameter to force convergence towards a sub optimal solution, we were able to always outperform the other algorithms with a peak of 27% performance improvement in terms of accommodated paths. On random graphs, we numerically observe two separated regimes: one in which all paths can be accommodated and one in which this is not possible. We also investigate the behavior of both the number of paths to be accommodated and their minimum total length. PMID:26710102
Development of mpi_EPIC model for global agroecosystem modeling
Kang, Shujiang; Wang, Dali; Jeff A. Nichols; ...
2014-12-31
Models that address policy-maker concerns about multi-scale effects of food and bioenergy production systems are computationally demanding. We integrated the message passing interface algorithm into the process-based EPIC model to accelerate computation of ecosystem effects. Simulation performance was further enhanced by applying the Vampir framework. When this enhanced mpi_EPIC model was tested, total execution time for a global 30-year simulation of a switchgrass cropping system was shortened to less than 0.5 hours on a supercomputer. The results illustrate that mpi_EPIC using parallel design can balance simulation workloads and facilitate large-scale, high-resolution analysis of agricultural production systems, management alternatives and environmentalmore » effects.« less
Eigensolver for a Sparse, Large Hermitian Matrix
NASA Technical Reports Server (NTRS)
Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris
2003-01-01
A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).
Low Latency Messages on Distributed Memory Multiprocessors
Rosing, Matt; Saltz, Joel
1995-01-01
This article describes many of the issues in developing an efficient interface for communication on distributed memory machines. Although the hardware component of message latency is less than 1 ws on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 μs. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. This article describes several tests performed and many of the issues involvedmore » in supporting low latency messages on distributed memory machines.« less
Message passing with a limited number of DMA byte counters
Blocksome, Michael [Rochester, MN; Chen, Dong [Croton on Hudson, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kumar, Sameer [White Plains, NY; Parker, Jeffrey J [Rochester, MN
2011-10-04
A method for passing messages in a parallel computer system constructed as a plurality of compute nodes interconnected as a network where each compute node includes a DMA engine but includes only a limited number of byte counters for tracking a number of bytes that are sent or received by the DMA engine, where the byte counters may be used in shared counter or exclusive counter modes of operation. The method includes using rendezvous protocol, a source compute node deterministically sending a request to send (RTS) message with a single RTS descriptor using an exclusive injection counter to track both the RTS message and message data to be sent in association with the RTS message, to a destination compute node such that the RTS descriptor indicates to the destination compute node that the message data will be adaptively routed to the destination node. Using one DMA FIFO at the source compute node, the RTS descriptors are maintained for rendezvous messages destined for the destination compute node to ensure proper message data ordering thereat. Using a reception counter at a DMA engine, the destination compute node tracks reception of the RTS and associated message data and sends a clear to send (CTS) message to the source node in a rendezvous protocol form of a remote get to accept the RTS message and message data and processing the remote get (CTS) by the source compute node DMA engine to provide the message data to be sent.
NASA Technical Reports Server (NTRS)
Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.
Public-channel cryptography based on mutual chaos pass filters.
Klein, Einat; Gross, Noam; Kopelowitz, Evi; Rosenbluh, Michael; Khaykovich, Lev; Kinzel, Wolfgang; Kanter, Ido
2006-10-01
We study the mutual coupling of chaotic lasers and observe both experimentally and in numeric simulations that there exists a regime of parameters for which two mutually coupled chaotic lasers establish isochronal synchronization, while a third laser coupled unidirectionally to one of the pair does not synchronize. We then propose a cryptographic scheme, based on the advantage of mutual coupling over unidirectional coupling, where all the parameters of the system are public knowledge. We numerically demonstrate that in such a scheme the two communicating lasers can add a message signal (compressed binary message) to the transmitted coupling signal and recover the message in both directions with high fidelity by using a mutual chaos pass filter procedure. An attacker, however, fails to recover an errorless message even if he amplifies the coupling signal.
An implementation and performance measurement of the progressive retry technique
NASA Technical Reports Server (NTRS)
Suri, Gaurav; Huang, Yennun; Wang, Yi-Min; Fuchs, W. Kent; Kintala, Chandra
1995-01-01
This paper describes a recovery technique called progressive retry for bypassing software faults in message-passing applications. The technique is implemented as reusable modules to provide application-level software fault tolerance. The paper describes the implementation of the technique and presents results from the application of progressive retry to two telecommunications systems. the results presented show that the technique is helpful in reducing the total recovery time for message-passing applications.
Belief propagation decoding of quantum channels by passing quantum messages
NASA Astrophysics Data System (ADS)
Renes, Joseph M.
2017-07-01
The belief propagation (BP) algorithm is a powerful tool in a wide range of disciplines from statistical physics to machine learning to computational biology, and is ubiquitous in decoding classical error-correcting codes. The algorithm works by passing messages between nodes of the factor graph associated with the code and enables efficient decoding of the channel, in some cases even up to the Shannon capacity. Here we construct the first BP algorithm which passes quantum messages on the factor graph and is capable of decoding the classical-quantum channel with pure state outputs. This gives explicit decoding circuits whose number of gates is quadratic in the code length. We also show that this decoder can be modified to work with polar codes for the pure state channel and as part of a decoder for transmitting quantum information over the amplitude damping channel. These represent the first explicit capacity-achieving decoders for non-Pauli channels.
LCAMP: Location Constrained Approximate Message Passing for Compressed Sensing MRI
Sung, Kyunghyun; Daniel, Bruce L; Hargreaves, Brian A
2016-01-01
Iterative thresholding methods have been extensively studied as faster alternatives to convex optimization methods for solving large-sized problems in compressed sensing. A novel iterative thresholding method called LCAMP (Location Constrained Approximate Message Passing) is presented for reducing computational complexity and improving reconstruction accuracy when a nonzero location (or sparse support) constraint can be obtained from view shared images. LCAMP modifies the existing approximate message passing algorithm by replacing the thresholding stage with a location constraint, which avoids adjusting regularization parameters or thresholding levels. This work is first compared with other conventional reconstruction methods using random 1D signals and then applied to dynamic contrast-enhanced breast MRI to demonstrate the excellent reconstruction accuracy (less than 2% absolute difference) and low computation time (5 - 10 seconds using Matlab) with highly undersampled 3D data (244 × 128 × 48; overall reduction factor = 10). PMID:23042658
A new deadlock resolution protocol and message matching algorithm for the extreme-scale simulator
Engelmann, Christian; Naughton, III, Thomas J.
2016-03-22
Investigating the performance of parallel applications at scale on future high-performance computing (HPC) architectures and the performance impact of different HPC architecture choices is an important component of HPC hardware/software co-design. The Extreme-scale Simulator (xSim) is a simulation toolkit for investigating the performance of parallel applications at scale. xSim scales to millions of simulated Message Passing Interface (MPI) processes. The overhead introduced by a simulation tool is an important performance and productivity aspect. This paper documents two improvements to xSim: (1)~a new deadlock resolution protocol to reduce the parallel discrete event simulation overhead and (2)~a new simulated MPI message matchingmore » algorithm to reduce the oversubscription management overhead. The results clearly show a significant performance improvement. The simulation overhead for running the NAS Parallel Benchmark suite was reduced from 102% to 0% for the embarrassingly parallel (EP) benchmark and from 1,020% to 238% for the conjugate gradient (CG) benchmark. xSim offers a highly accurate simulation mode for better tracking of injected MPI process failures. Furthermore, with highly accurate simulation, the overhead was reduced from 3,332% to 204% for EP and from 37,511% to 13,808% for CG.« less
Data communications in a parallel active messaging interface of a parallel computer
Davis, Kristan D.; Faraj, Daniel A.
2014-07-22
Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and ranges of message sizes so that each algorithm is associated with a separate range of message sizes; receiving in an origin endpoint of the PAMI a data communications instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint, the data communications message characterized by a message size; selecting, from among the associated algorithms and ranges, a data communications algorithm in dependence upon the message size; and transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.
Data communications in a parallel active messaging interface of a parallel computer
Davis, Kristan D; Faraj, Daniel A
2013-07-09
Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and ranges of message sizes so that each algorithm is associated with a separate range of message sizes; receiving in an origin endpoint of the PAMI a data communications instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint, the data communications message characterized by a message size; selecting, from among the associated algorithms and ranges, a data communications algorithm in dependence upon the message size; and transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.
Design of a network for concurrent message passing systems
NASA Astrophysics Data System (ADS)
Song, Paul Y.
1988-08-01
We describe the design of the network design frame (NDF), a self-timed routing chip for a message-passing concurrent computer. The NDF uses a partitioned data path, low-voltage output drivers, and a distributed token-passing arbiter to provide a bandwidth of 450 Mbits/sec into the network. Wormhole routing and bidirectional virtual channels are used to provide low latency communications, less than 2us latency to deliver a 216 bit message across the diameter of a 1K node mess-connected machine. To support concurrent software systems, the NDF provides two logical networks, one for user messages and one for system messages. The two networks share the same set of physical wires. To facilitate the development of network nodes, the NDF is a design frame. The NDF circuitry is integrated into the pad frame of a chip leaving the center of the chip uncommitted. We define an analytic framework in which to study the effects of network size, network buffering capacity, bidirectional channels, and traffic on this class of networks. The response of the network to various combinations of these parameters are obtained through extensive simulation of the network model. Through simulation, we are able to observe the macro behavior of the network as opposed to the micro behavior of the NDF routing controller.
Low latency messages on distributed memory multiprocessors
NASA Technical Reports Server (NTRS)
Rosing, Matthew; Saltz, Joel
1993-01-01
Many of the issues in developing an efficient interface for communication on distributed memory machines are described and a portable interface is proposed. Although the hardware component of message latency is less than one microsecond on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 microseconds. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. Based on several tests that were run on the iPSC/860, an interface that will better match current distributed memory machines is proposed. The model used in the proposed interface consists of a computation processor and a communication processor on each node. Communication between these processors and other nodes in the system is done through a buffered network. Information that is transmitted is either data or procedures to be executed on the remote processor. The dual processor system is better suited for efficiently handling asynchronous communications compared to a single processor system. The ability to send data or procedure is very flexible for minimizing message latency, based on the type of communication being performed. The test performed and the proposed interface are described.
Performance Evaluation of Communication Software Systems for Distributed Computing
NASA Technical Reports Server (NTRS)
Fatoohi, Rod
1996-01-01
In recent years there has been an increasing interest in object-oriented distributed computing since it is better quipped to deal with complex systems while providing extensibility, maintainability, and reusability. At the same time, several new high-speed network technologies have emerged for local and wide area networks. However, the performance of networking software is not improving as fast as the networking hardware and the workstation microprocessors. This paper gives an overview and evaluates the performance of the Common Object Request Broker Architecture (CORBA) standard in a distributed computing environment at NASA Ames Research Center. The environment consists of two testbeds of SGI workstations connected by four networks: Ethernet, FDDI, HiPPI, and ATM. The performance results for three communication software systems are presented, analyzed and compared. These systems are: BSD socket programming interface, IONA's Orbix, an implementation of the CORBA specification, and the PVM message passing library. The results show that high-level communication interfaces, such as CORBA and PVM, can achieve reasonable performance under certain conditions.
Load Balancing Strategies for Multiphase Flows on Structured Grids
NASA Astrophysics Data System (ADS)
Olshefski, Kristopher; Owkes, Mark
2017-11-01
The computation time required to perform large simulations of complex systems is currently one of the leading bottlenecks of computational research. Parallelization allows multiple processing cores to perform calculations simultaneously and reduces computational times. However, load imbalances between processors waste computing resources as processors wait for others to complete imbalanced tasks. In multiphase flows, these imbalances arise due to the additional computational effort required at the gas-liquid interface. However, many current load balancing schemes are only designed for unstructured grid applications. The purpose of this research is to develop a load balancing strategy while maintaining the simplicity of a structured grid. Several approaches are investigated including brute force oversubscription, node oversubscription through Message Passing Interface (MPI) commands, and shared memory load balancing using OpenMP. Each of these strategies are tested with a simple one-dimensional model prior to implementation into the three-dimensional NGA code. Current results show load balancing will reduce computational time by at least 30%.
HYDRA : High-speed simulation architecture for precision spacecraft formation simulation
NASA Technical Reports Server (NTRS)
Martin, Bryan J.; Sohl, Garett.
2003-01-01
e Hierarchical Distributed Reconfigurable Architecture- is a scalable simulation architecture that provides flexibility and ease-of-use which take advantage of modern computation and communication hardware. It also provides the ability to implement distributed - or workstation - based simulations and high-fidelity real-time simulation from a common core. Originally designed to serve as a research platform for examining fundamental challenges in formation flying simulation for future space missions, it is also finding use in other missions and applications, all of which can take advantage of the underlying Object-Oriented structure to easily produce distributed simulations. Hydra automates the process of connecting disparate simulation components (Hydra Clients) through a client server architecture that uses high-level descriptions of data associated with each client to find and forge desirable connections (Hydra Services) at run time. Services communicate through the use of Connectors, which abstract messaging to provide single-interface access to any desired communication protocol, such as from shared-memory message passing to TCP/IP to ACE and COBRA. Hydra shares many features with the HLA, although providing more flexibility in connectivity services and behavior overriding.
[Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].
Furuta, Takuya; Sato, Tatsuhiko
2015-01-01
Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.
A Prize-Collecting Steiner Tree Approach for Transduction Network Inference
NASA Astrophysics Data System (ADS)
Bailly-Bechet, Marc; Braunstein, Alfredo; Zecchina, Riccardo
Into the cell, information from the environment is mainly propagated via signaling pathways which form a transduction network. Here we propose a new algorithm to infer transduction networks from heterogeneous data, using both the protein interaction network and expression datasets. We formulate the inference problem as an optimization task, and develop a message-passing, probabilistic and distributed formalism to solve it. We apply our algorithm to the pheromone response in the baker’s yeast S. cerevisiae. We are able to find the backbone of the known structure of the MAPK cascade of pheromone response, validating our algorithm. More importantly, we make biological predictions about some proteins whose role could be at the interface between pheromone response and other cellular functions.
Matching pursuit parallel decomposition of seismic data
NASA Astrophysics Data System (ADS)
Li, Chuanhui; Zhang, Fanchang
2017-07-01
In order to improve the computation speed of matching pursuit decomposition of seismic data, a matching pursuit parallel algorithm is designed in this paper. We pick a fixed number of envelope peaks from the current signal in every iteration according to the number of compute nodes and assign them to the compute nodes on average to search the optimal Morlet wavelets in parallel. With the help of parallel computer systems and Message Passing Interface, the parallel algorithm gives full play to the advantages of parallel computing to significantly improve the computation speed of the matching pursuit decomposition and also has good expandability. Besides, searching only one optimal Morlet wavelet by every compute node in every iteration is the most efficient implementation.
NASA Technical Reports Server (NTRS)
Quealy, Angela; Cole, Gary L.; Blech, Richard A.
1993-01-01
The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.
PCTDSE: A parallel Cartesian-grid-based TDSE solver for modeling laser-atom interactions
NASA Astrophysics Data System (ADS)
Fu, Yongsheng; Zeng, Jiaolong; Yuan, Jianmin
2017-01-01
We present a parallel Cartesian-grid-based time-dependent Schrödinger equation (TDSE) solver for modeling laser-atom interactions. It can simulate the single-electron dynamics of atoms in arbitrary time-dependent vector potentials. We use a split-operator method combined with fast Fourier transforms (FFT), on a three-dimensional (3D) Cartesian grid. Parallelization is realized using a 2D decomposition strategy based on the Message Passing Interface (MPI) library, which results in a good parallel scaling on modern supercomputers. We give simple applications for the hydrogen atom using the benchmark problems coming from the references and obtain repeatable results. The extensions to other laser-atom systems are straightforward with minimal modifications of the source code.
NASA Astrophysics Data System (ADS)
Boyko, Oleksiy; Zheleznyak, Mark
2015-04-01
The original numerical code TOPKAPI-IMMS of the distributed rainfall-runoff model TOPKAPI ( Todini et al, 1996-2014) is developed and implemented in Ukraine. The parallel version of the code has been developed recently to be used on multiprocessors systems - multicore/processors PC and clusters. Algorithm is based on binary-tree decomposition of the watershed for the balancing of the amount of computation for all processors/cores. Message passing interface (MPI) protocol is used as a parallel computing framework. The numerical efficiency of the parallelization algorithms is demonstrated for the case studies for the flood predictions of the mountain watersheds of the Ukrainian Carpathian regions. The modeling results is compared with the predictions based on the lumped parameters models.
Simple, efficient allocation of modelling runs on heterogeneous clusters with MPI
Donato, David I.
2017-01-01
In scientific modelling and computation, the choice of an appropriate method for allocating tasks for parallel processing depends on the computational setting and on the nature of the computation. The allocation of independent but similar computational tasks, such as modelling runs or Monte Carlo trials, among the nodes of a heterogeneous computational cluster is a special case that has not been specifically evaluated previously. A simulation study shows that a method of on-demand (that is, worker-initiated) pulling from a bag of tasks in this case leads to reliably short makespans for computational jobs despite heterogeneity both within and between cluster nodes. A simple reference implementation in the C programming language with the Message Passing Interface (MPI) is provided.
NASA Technical Reports Server (NTRS)
Tezduyar, Tayfun E.
1998-01-01
This is a final report as far as our work at University of Minnesota is concerned. The report describes our research progress and accomplishments in development of high performance computing methods and tools for 3D finite element computation of aerodynamic characteristics and fluid-structure interactions (FSI) arising in airdrop systems, namely ram-air parachutes and round parachutes. This class of simulations involves complex geometries, flexible structural components, deforming fluid domains, and unsteady flow patterns. The key components of our simulation toolkit are a stabilized finite element flow solver, a nonlinear structural dynamics solver, an automatic mesh moving scheme, and an interface between the fluid and structural solvers; all of these have been developed within a parallel message-passing paradigm.
Optimization of the design of Gas Cherenkov Detectors for ICF diagnosis
NASA Astrophysics Data System (ADS)
Liu, Bin; Hu, Huasi; Han, Hetong; Lv, Huanwen; Li, Lan
2018-07-01
A design method, which combines a genetic algorithm (GA) with Monte-Carlo simulation, is established and applied to two different types of Cherenkov detectors, namely, Gas Cherenkov Detector (GCD) and Gamma Reaction History (GRH). For accelerating the optimization program, open Message Passing Interface (MPI) is used in the Geant4 simulation. Compared with the traditional optical ray-tracing method, the performances of these detectors have been improved with the optimization method. The efficiency for GCD system, with a threshold of 6.3 MeV, is enhanced by ∼20% and time response improved by ∼7.2%. For the GRH system, with threshold of 10 MeV, the efficiency is enhanced by ∼76% in comparison with previously published results.
SBML-PET-MPI: a parallel parameter estimation tool for Systems Biology Markup Language based models.
Zi, Zhike
2011-04-01
Parameter estimation is crucial for the modeling and dynamic analysis of biological systems. However, implementing parameter estimation is time consuming and computationally demanding. Here, we introduced a parallel parameter estimation tool for Systems Biology Markup Language (SBML)-based models (SBML-PET-MPI). SBML-PET-MPI allows the user to perform parameter estimation and parameter uncertainty analysis by collectively fitting multiple experimental datasets. The tool is developed and parallelized using the message passing interface (MPI) protocol, which provides good scalability with the number of processors. SBML-PET-MPI is freely available for non-commercial use at http://www.bioss.uni-freiburg.de/cms/sbml-pet-mpi.html or http://sites.google.com/site/sbmlpetmpi/.
A Multiple Sphere T-Matrix Fortran Code for Use on Parallel Computer Clusters
NASA Technical Reports Server (NTRS)
Mackowski, D. W.; Mishchenko, M. I.
2011-01-01
A general-purpose Fortran-90 code for calculation of the electromagnetic scattering and absorption properties of multiple sphere clusters is described. The code can calculate the efficiency factors and scattering matrix elements of the cluster for either fixed or random orientation with respect to the incident beam and for plane wave or localized- approximation Gaussian incident fields. In addition, the code can calculate maps of the electric field both interior and exterior to the spheres.The code is written with message passing interface instructions to enable the use on distributed memory compute clusters, and for such platforms the code can make feasible the calculation of absorption, scattering, and general EM characteristics of systems containing several thousand spheres.
The graphical brain: Belief propagation and active inference
Friston, Karl J.; Parr, Thomas; de Vries, Bert
2018-01-01
This paper considers functional integration in the brain from a computational perspective. We ask what sort of neuronal message passing is mandated by active inference—and what implications this has for context-sensitive connectivity at microscopic and macroscopic levels. In particular, we formulate neuronal processing as belief propagation under deep generative models. Crucially, these models can entertain both discrete and continuous states, leading to distinct schemes for belief updating that play out on the same (neuronal) architecture. Technically, we use Forney (normal) factor graphs to elucidate the requisite message passing in terms of its form and scheduling. To accommodate mixed generative models (of discrete and continuous states), one also has to consider link nodes or factors that enable discrete and continuous representations to talk to each other. When mapping the implicit computational architecture onto neuronal connectivity, several interesting features emerge. For example, Bayesian model averaging and comparison, which link discrete and continuous states, may be implemented in thalamocortical loops. These and other considerations speak to a computational connectome that is inherently state dependent and self-organizing in ways that yield to a principled (variational) account. We conclude with simulations of reading that illustrate the implicit neuronal message passing, with a special focus on how discrete (semantic) representations inform, and are informed by, continuous (visual) sampling of the sensorium. Author Summary This paper considers functional integration in the brain from a computational perspective. We ask what sort of neuronal message passing is mandated by active inference—and what implications this has for context-sensitive connectivity at microscopic and macroscopic levels. In particular, we formulate neuronal processing as belief propagation under deep generative models that can entertain both discrete and continuous states. This leads to distinct schemes for belief updating that play out on the same (neuronal) architecture. Technically, we use Forney (normal) factor graphs to characterize the requisite message passing, and link this formal characterization to canonical microcircuits and extrinsic connectivity in the brain. PMID:29417960
CSlib, a library to couple codes via Client/Server messaging
DOE Office of Scientific and Technical Information (OSTI.GOV)
Plimpton, Steve
The CSlib is a small, portable library which enables two (or more) independent simulation codes to be coupled, by exchanging messages with each other. Both codes link to the library when they are built, and can them communicate with each other as they run. The messages contain data or instructions that the two codes send back-and-forth to each other. The messaging can take place via files, sockets, or MPI. The latter is a standard distributed-memory message-passing library.
Design and Implementation of an Operations Module for the ARGOS paperless Ship System
1989-06-01
A. OPERATIONS STACK SCRIPTS SCRIPTS FOR STACK: operations * BACKGROUND #1: Operations * on openStack hide message box show menuBar pass openStack end... openStack ** CARD #1, BUTTON #1: Up ***** on mouseUp visual effect zoom out go to card id 10931 of stack argos end mouseUp ** CARD #1, BUTTON #2...STACK SCRIPTS SCRIPTS FOR STACK: Reports ** BACKGROUND #1: Operations * on openStack hie message box show menuBar pass openStack end openStack ** CARD #1
Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster
NASA Technical Reports Server (NTRS)
Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)
2002-01-01
In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Charon Message-Passing Toolkit for Scientific Computations
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob F.; Yan, Jerry (Technical Monitor)
2000-01-01
Charon is a library, callable from C and Fortran, that aids the conversion of structured-grid legacy codes-such as those used in the numerical computation of fluid flows-into parallel, high- performance codes. Key are functions that define distributed arrays, that map between distributed and non-distributed arrays, and that allow easy specification of common communications on structured grids. The library is based on the widely accepted MPI message passing standard. We present an overview of the functionality of Charon, and some representative results.
Couriers in the Inca Empire: Getting Your Message Across. [Lesson Plan].
ERIC Educational Resources Information Center
2002
This lesson shows how the Inca communicated across the vast stretches of their mountain realm, the largest empire of the pre-industrial world. The lesson explains how couriers carried messages along mountain-ridge roads, up and down stone steps, and over chasm-spanning footbridges. It states that couriers could pass a message from Quito (Ecuador)…
NASA Technical Reports Server (NTRS)
Baxley, Brian T.; Palmer, Michael T.; Swieringa, Kurt A.
2015-01-01
This document describes the IM cockpit interfaces, displays, and alerting capabilities that were developed for and used in the IMAC experiment, which was conducted at NASA Langley in the summer of 2015. Specifically, this document includes: (1) screen layouts for each page of the interface; (2) step-by-step instructions for data entry, data verification and input error correction; (3) algorithm state messages and error condition alerting messages; (4) aircraft speed guidance and deviation indications; and (5) graphical display of the spatial relationships between the Ownship aircraft and the Target aircraft. The controller displays for IM will be described in a separate document.
Inference in the brain: Statistics flowing in redundant population codes
Pitkow, Xaq; Angelaki, Dora E
2017-01-01
It is widely believed that the brain performs approximate probabilistic inference to estimate causal variables in the world from ambiguous sensory data. To understand these computations, we need to analyze how information is represented and transformed by the actions of nonlinear recurrent neural networks. We propose that these probabilistic computations function by a message-passing algorithm operating at the level of redundant neural populations. To explain this framework, we review its underlying concepts, including graphical models, sufficient statistics, and message-passing, and then describe how these concepts could be implemented by recurrently connected probabilistic population codes. The relevant information flow in these networks will be most interpretable at the population level, particularly for redundant neural codes. We therefore outline a general approach to identify the essential features of a neural message-passing algorithm. Finally, we argue that to reveal the most important aspects of these neural computations, we must study large-scale activity patterns during moderately complex, naturalistic behaviors. PMID:28595050
Critical field-exponents for secure message-passing in modular networks
NASA Astrophysics Data System (ADS)
Shekhtman, Louis M.; Danziger, Michael M.; Bonamassa, Ivan; Buldyrev, Sergey V.; Caldarelli, Guido; Zlatić, Vinko; Havlin, Shlomo
2018-05-01
We study secure message-passing in the presence of multiple adversaries in modular networks. We assume a dominant fraction of nodes in each module have the same vulnerability, i.e., the same entity spying on them. We find both analytically and via simulations that the links between the modules (interlinks) have effects analogous to a magnetic field in a spin-system in that for any amount of interlinks the system no longer undergoes a phase transition. We then define the exponents δ, which relates the order parameter (the size of the giant secure component) at the critical point to the field strength (average number of interlinks per node), and γ, which describes the susceptibility near criticality. These are found to be δ = 2 and γ = 1 (with the scaling of the order parameter near the critical point given by β = 1). When two or more vulnerabilities are equally present in a module we find δ = 1 and γ = 0 (with β ≥ 2). Apart from defining a previously unidentified universality class, these exponents show that increasing connections between modules is more beneficial for security than increasing connections within modules. We also measure the correlation critical exponent ν, and the upper critical dimension d c , finding that ν {d}c=3 as for ordinary percolation, suggesting that for secure message-passing d c = 6. These results provide an interesting analogy between secure message-passing in modular networks and the physics of magnetic spin-systems.
Social Interface Model: Theorizing Ecological Post-Delivery Processes for Intervention Effects.
Pettigrew, Jonathan; Segrott, Jeremy; Ray, Colter D; Littlecott, Hannah
2018-01-03
Successful prevention programs depend on a complex interplay among aspects of the intervention, the participant, the specific intervention setting, and the broader set of contexts with which a participant interacts. There is a need to theorize what happens as participants bring intervention ideas and behaviors into other life-contexts, and theory has not yet specified how social interactions about interventions may influence outcomes. To address this gap, we use an ecological perspective to develop the social interface model. This paper presents the key components of the model and its potential to aid the design and implementation of prevention interventions. The model is predicated on the idea that intervention message effectiveness depends not only on message aspects but also on the participants' adoption and adaptation of the message vis-à-vis their social ecology. The model depicts processes by which intervention messages are received and enacted by participants through social processes occurring within and between relevant microsystems. Mesosystem interfaces (negligible interface, transference, co-dependence, and interdependence) can facilitate or detract from intervention effects. The social interface model advances prevention science by theorizing that practitioners can create better quality interventions by planning for what occurs after interventions are delivered.
ERIC Educational Resources Information Center
Brown-Schmidt, Sarah; Konopka, Agnieszka E.
2008-01-01
During unscripted speech, speakers coordinate the formulation of pre-linguistic messages with the linguistic processes that implement those messages into speech. We examine the process of constructing a contextually appropriate message and interfacing that message with utterance planning in English ("the small butterfly") and Spanish ("la mariposa…
PRAIS: Distributed, real-time knowledge-based systems made easy
NASA Technical Reports Server (NTRS)
Goldstein, David G.
1990-01-01
This paper discusses an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS). PRAIS strives for transparently parallelizing production (rule-based) systems, even when under real-time constraints. PRAIS accomplishes these goals by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors.
Gene-network inference by message passing
NASA Astrophysics Data System (ADS)
Braunstein, A.; Pagnani, A.; Weigt, M.; Zecchina, R.
2008-01-01
The inference of gene-regulatory processes from gene-expression data belongs to the major challenges of computational systems biology. Here we address the problem from a statistical-physics perspective and develop a message-passing algorithm which is able to infer sparse, directed and combinatorial regulatory mechanisms. Using the replica technique, the algorithmic performance can be characterized analytically for artificially generated data. The algorithm is applied to genome-wide expression data of baker's yeast under various environmental conditions. We find clear cases of combinatorial control, and enrichment in common functional annotations of regulated genes and their regulators.
Optimal mapping of neural-network learning on message-passing multicomputers
NASA Technical Reports Server (NTRS)
Chu, Lon-Chan; Wah, Benjamin W.
1992-01-01
A minimization of learning-algorithm completion time is sought in the present optimal-mapping study of the learning process in multilayer feed-forward artificial neural networks (ANNs) for message-passing multicomputers. A novel approximation algorithm for mappings of this kind is derived from observations of the dominance of a parallel ANN algorithm over its communication time. Attention is given to both static and dynamic mapping schemes for systems with static and dynamic background workloads, as well as to experimental results obtained for simulated mappings on multicomputers with dynamic background workloads.
Trace-Driven Debugging of Message Passing Programs
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Hood, Robert; Lopez, Louis; Bailey, David (Technical Monitor)
1998-01-01
In this paper we report on features added to a parallel debugger to simplify the debugging of parallel message passing programs. These features include replay, setting consistent breakpoints based on interprocess event causality, a parallel undo operation, and communication supervision. These features all use trace information collected during the execution of the program being debugged. We used a number of different instrumentation techniques to collect traces. We also implemented trace displays using two different trace visualization systems. The implementation was tested on an SGI Power Challenge cluster and a network of SGI workstations.
Usage Patterns of Communication Interfaces for Social Support among At-Risk Adolescents
ERIC Educational Resources Information Center
Passig, David
2014-01-01
Social and interpersonal support has mostly been carried out face-to-face. However, the internet was able, in the last couple of decades, to facilitate social interactions through a range of computer-mediated communication (CMC) interfaces--from email applications, chat-rooms, forums, instant messages (IM), short text messages (SMS), social…
Gauge-free cluster variational method by maximal messages and moment matching.
Domínguez, Eduardo; Lage-Castellanos, Alejandro; Mulet, Roberto; Ricci-Tersenghi, Federico
2017-04-01
We present an implementation of the cluster variational method (CVM) as a message passing algorithm. The kind of message passing algorithm used for CVM, usually named generalized belief propagation (GBP), is a generalization of the belief propagation algorithm in the same way that CVM is a generalization of the Bethe approximation for estimating the partition function. However, the connection between fixed points of GBP and the extremal points of the CVM free energy is usually not a one-to-one correspondence because of the existence of a gauge transformation involving the GBP messages. Our contribution is twofold. First, we propose a way of defining messages (fields) in a generic CVM approximation, such that messages arrive on a given region from all its ancestors, and not only from its direct parents, as in the standard parent-to-child GBP. We call this approach maximal messages. Second, we focus on the case of binary variables, reinterpreting the messages as fields enforcing the consistency between the moments of the local (marginal) probability distributions. We provide a precise rule to enforce all consistencies, avoiding any redundancy, that would otherwise lead to a gauge transformation on the messages. This moment matching method is gauge free, i.e., it guarantees that the resulting GBP is not gauge invariant. We apply our maximal messages and moment matching GBP to obtain an analytical expression for the critical temperature of the Ising model in general dimensions at the level of plaquette CVM. The values obtained outperform Bethe estimates, and are comparable with loop corrected belief propagation equations. The method allows for a straightforward generalization to disordered systems.
Gauge-free cluster variational method by maximal messages and moment matching
NASA Astrophysics Data System (ADS)
Domínguez, Eduardo; Lage-Castellanos, Alejandro; Mulet, Roberto; Ricci-Tersenghi, Federico
2017-04-01
We present an implementation of the cluster variational method (CVM) as a message passing algorithm. The kind of message passing algorithm used for CVM, usually named generalized belief propagation (GBP), is a generalization of the belief propagation algorithm in the same way that CVM is a generalization of the Bethe approximation for estimating the partition function. However, the connection between fixed points of GBP and the extremal points of the CVM free energy is usually not a one-to-one correspondence because of the existence of a gauge transformation involving the GBP messages. Our contribution is twofold. First, we propose a way of defining messages (fields) in a generic CVM approximation, such that messages arrive on a given region from all its ancestors, and not only from its direct parents, as in the standard parent-to-child GBP. We call this approach maximal messages. Second, we focus on the case of binary variables, reinterpreting the messages as fields enforcing the consistency between the moments of the local (marginal) probability distributions. We provide a precise rule to enforce all consistencies, avoiding any redundancy, that would otherwise lead to a gauge transformation on the messages. This moment matching method is gauge free, i.e., it guarantees that the resulting GBP is not gauge invariant. We apply our maximal messages and moment matching GBP to obtain an analytical expression for the critical temperature of the Ising model in general dimensions at the level of plaquette CVM. The values obtained outperform Bethe estimates, and are comparable with loop corrected belief propagation equations. The method allows for a straightforward generalization to disordered systems.
Multiple node remote messaging
Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Ohmacht, Martin; Salapura, Valentina; Steinmacher-Burow, Burkhard; Vranas, Pavlos
2010-08-31
A method for passing remote messages in a parallel computer system formed as a network of interconnected compute nodes includes that a first compute node (A) sends a single remote message to a remote second compute node (B) in order to control the remote second compute node (B) to send at least one remote message. The method includes various steps including controlling a DMA engine at first compute node (A) to prepare the single remote message to include a first message descriptor and at least one remote message descriptor for controlling the remote second compute node (B) to send at least one remote message, including putting the first message descriptor into an injection FIFO at the first compute node (A) and sending the single remote message and the at least one remote message descriptor to the second compute node (B).
Design considerations for parallel graphics libraries
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1994-01-01
Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
High Performance Fortran for Aerospace Applications
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush; Zima, Hans; Bushnell, Dennis M. (Technical Monitor)
2000-01-01
This paper focuses on the use of High Performance Fortran (HPF) for important classes of algorithms employed in aerospace applications. HPF is a set of Fortran extensions designed to provide users with a high-level interface for programming data parallel scientific applications, while delegating to the compiler/runtime system the task of generating explicitly parallel message-passing programs. We begin by providing a short overview of the HPF language. This is followed by a detailed discussion of the efficient use of HPF for applications involving multiple structured grids such as multiblock and adaptive mesh refinement (AMR) codes as well as unstructured grid codes. We focus on the data structures and computational structures used in these codes and on the high-level strategies that can be expressed in HPF to optimally exploit the parallelism in these algorithms.
Speed and accuracy improvements in FLAASH atmospheric correction of hyperspectral imagery
NASA Astrophysics Data System (ADS)
Perkins, Timothy; Adler-Golden, Steven; Matthew, Michael W.; Berk, Alexander; Bernstein, Lawrence S.; Lee, Jamine; Fox, Marsha
2012-11-01
Remotely sensed spectral imagery of the earth's surface can be used to fullest advantage when the influence of the atmosphere has been removed and the measurements are reduced to units of reflectance. Here, we provide a comprehensive summary of the latest version of the Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes atmospheric correction algorithm. We also report some new code improvements for speed and accuracy. These include the re-working of the original algorithm in C-language code parallelized with message passing interface and containing a new radiative transfer look-up table option, which replaces executions of the MODTRAN model. With computation times now as low as ~10 s per image per computer processor, automated, real-time, on-board atmospheric correction of hyper- and multi-spectral imagery is within reach.
CFD Analysis and Design Optimization Using Parallel Computers
NASA Technical Reports Server (NTRS)
Martinelli, Luigi; Alonso, Juan Jose; Jameson, Antony; Reuther, James
1997-01-01
A versatile and efficient multi-block method is presented for the simulation of both steady and unsteady flow, as well as aerodynamic design optimization of complete aircraft configurations. The compressible Euler and Reynolds Averaged Navier-Stokes (RANS) equations are discretized using a high resolution scheme on body-fitted structured meshes. An efficient multigrid implicit scheme is implemented for time-accurate flow calculations. Optimum aerodynamic shape design is achieved at very low cost using an adjoint formulation. The method is implemented on parallel computing systems using the MPI message passing interface standard to ensure portability. The results demonstrate that, by combining highly efficient algorithms with parallel computing, it is possible to perform detailed steady and unsteady analysis as well as automatic design for complex configurations using the present generation of parallel computers.
A practical guide to replica-exchange Wang—Landau simulations
NASA Astrophysics Data System (ADS)
Vogel, Thomas; Li, Ying Wai; Landau, David P.
2018-04-01
This paper is based on a series of tutorial lectures about the replica-exchange Wang-Landau (REWL) method given at the IX Brazilian Meeting on Simulational Physics (BMSP 2017). It provides a practical guide for the implementation of the method. A complete example code for a model system is available online. In this paper, we discuss the main parallel features of this code after a brief introduction to the REWL algorithm. The tutorial section is mainly directed at users who have written a single-walker Wang–Landau program already but might have just taken their first steps in parallel programming using the Message Passing Interface (MPI). In the last section, we answer “frequently asked questions” from users about the implementation of REWL for different scientific problems.
MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program
NASA Astrophysics Data System (ADS)
Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.
2018-02-01
We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.
ms2: A molecular simulation tool for thermodynamic properties
NASA Astrophysics Data System (ADS)
Deublein, Stephan; Eckl, Bernhard; Stoll, Jürgen; Lishchuk, Sergey V.; Guevara-Carrion, Gabriela; Glass, Colin W.; Merker, Thorsten; Bernreuther, Martin; Hasse, Hans; Vrabec, Jadran
2011-11-01
This work presents the molecular simulation program ms2 that is designed for the calculation of thermodynamic properties of bulk fluids in equilibrium consisting of small electro-neutral molecules. ms2 features the two main molecular simulation techniques, molecular dynamics (MD) and Monte-Carlo. It supports the calculation of vapor-liquid equilibria of pure fluids and multi-component mixtures described by rigid molecular models on the basis of the grand equilibrium method. Furthermore, it is capable of sampling various classical ensembles and yields numerous thermodynamic properties. To evaluate the chemical potential, Widom's test molecule method and gradual insertion are implemented. Transport properties are determined by equilibrium MD simulations following the Green-Kubo formalism. ms2 is designed to meet the requirements of academia and industry, particularly achieving short response times and straightforward handling. It is written in Fortran90 and optimized for a fast execution on a broad range of computer architectures, spanning from single processor PCs over PC-clusters and vector computers to high-end parallel machines. The standard Message Passing Interface (MPI) is used for parallelization and ms2 is therefore easily portable to different computing platforms. Feature tools facilitate the interaction with the code and the interpretation of input and output files. The accuracy and reliability of ms2 has been shown for a large variety of fluids in preceding work. Program summaryProgram title:ms2 Catalogue identifier: AEJF_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJF_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Special Licence supplied by the authors No. of lines in distributed program, including test data, etc.: 82 794 No. of bytes in distributed program, including test data, etc.: 793 705 Distribution format: tar.gz Programming language: Fortran90 Computer: The simulation tool ms2 is usable on a wide variety of platforms, from single processor machines over PC-clusters and vector computers to vector-parallel architectures. (Tested with Fortran compilers: gfortran, Intel, PathScale, Portland Group and Sun Studio.) Operating system: Unix/Linux, Windows Has the code been vectorized or parallelized?: Yes. Message Passing Interface (MPI) protocol Scalability. Excellent scalability up to 16 processors for molecular dynamics and >512 processors for Monte-Carlo simulations. RAM:ms2 runs on single processors with 512 MB RAM. The memory demand rises with increasing number of processors used per node and increasing number of molecules. Classification: 7.7, 7.9, 12 External routines: Message Passing Interface (MPI) Nature of problem: Calculation of application oriented thermodynamic properties for rigid electro-neutral molecules: vapor-liquid equilibria, thermal and caloric data as well as transport properties of pure fluids and multi-component mixtures. Solution method: Molecular dynamics, Monte-Carlo, various classical ensembles, grand equilibrium method, Green-Kubo formalism. Restrictions: No. The system size is user-defined. Typical problems addressed by ms2 can be solved by simulating systems containing typically 2000 molecules or less. Unusual features: Feature tools are available for creating input files, analyzing simulation results and visualizing molecular trajectories. Additional comments: Sample makefiles for multiple operation platforms are provided. Documentation is provided with the installation package and is available at http://www.ms-2.de. Running time: The running time of ms2 depends on the problem set, the system size and the number of processes used in the simulation. Running four processes on a "Nehalem" processor, simulations calculating VLE data take between two and twelve hours, calculating transport properties between six and 24 hours.
Performance and Application of Parallel OVERFLOW Codes on Distributed and Shared Memory Platforms
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Rizk, Yehia M.
1999-01-01
The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each group are approximately balanced. A proper number of threads are initially allocated to each group, and in subsequent iterations during the run-time, the number of threads are adjusted to achieve load balancing across the processes. Each process exploits the multitasking directives already established in Overflow.
NASA Technical Reports Server (NTRS)
Rasmussen, Robert D. (Inventor); Manning, Robert M. (Inventor); Lewis, Blair F. (Inventor); Bolotin, Gary S. (Inventor); Ward, Richard S. (Inventor)
1990-01-01
This is a distributed computing system providing flexible fault tolerance; ease of software design and concurrency specification; and dynamic balance of the loads. The system comprises a plurality of computers each having a first input/output interface and a second input/output interface for interfacing to communications networks each second input/output interface including a bypass for bypassing the associated computer. A global communications network interconnects the first input/output interfaces for providing each computer the ability to broadcast messages simultaneously to the remainder of the computers. A meshwork communications network interconnects the second input/output interfaces providing each computer with the ability to establish a communications link with another of the computers bypassing the remainder of computers. Each computer is controlled by a resident copy of a common operating system. Communications between respective ones of computers is by means of split tokens each having a moving first portion which is sent from computer to computer and a resident second portion which is disposed in the memory of at least one of computer and wherein the location of the second portion is part of the first portion. The split tokens represent both functions to be executed by the computers and data to be employed in the execution of the functions. The first input/output interfaces each include logic for detecting a collision between messages and for terminating the broadcasting of a message whereby collisions between messages are detected and avoided.
Thiha, Phyo; Pisani, Anthony R; Gurditta, Kunali; Cherry, Erin; Peterson, Derick R; Kautz, Henry; Wyman, Peter A
2016-11-09
Equipping members of a target population to deliver effective public health messaging to peers is an established approach in health promotion. The Sources of Strength program has demonstrated the promise of this approach for "upstream" youth suicide prevention. Text messaging is a well-established medium for promoting behavior change and is the dominant communication medium for youth. In order for peer 'opinion leader' programs like Sources of Strength to use scalable, wide-reaching media such as text messaging to spread peer-to-peer messages, they need techniques for assisting peer opinion leaders in creating effective testimonials to engage peers and match program goals. We developed a Web interface, called Stories of Personal Resilience in Managing Emotions (StoryPRIME), which helps peer opinion leaders write effective, short-form messages that can be delivered to the target population in youth suicide prevention program like Sources of Strength. To determine the efficacy of StoryPRIME, a Web-based interface for remotely eliciting high school peer leaders, and helping them produce high-quality, personal testimonials for use in a text messaging extension of an evidence-based, peer-led suicide prevention program. In a double-blind randomized controlled experiment, 36 high school students wrote testimonials with or without eliciting from the StoryPRIME interface. The interface was created in the context of Sources of Strength-an evidence-based youth suicide prevention program-and 24 ninth graders rated these testimonials on relatability, usefulness/relevance, intrigue, and likability. Testimonials written with the StoryPRIME interface were rated as more relatable, useful/relevant, intriguing, and likable than testimonials written without StoryPRIME, P=.054. StoryPRIME is a promising way to elicit high-quality, personal testimonials from youth for prevention programs that draw on members of a target population to spread public health messages. ©Phyo Thiha, Anthony R Pisani, Kunali Gurditta, Erin Cherry, Derick R Peterson, Henry Kautz, Peter A Wyman. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 09.11.2016.
Increasing the Automation and Autonomy for Spacecraft Operations with Criteria Action Table
NASA Technical Reports Server (NTRS)
Li, Zhen-Ping; Savki, Cetin
2005-01-01
The Criteria Action Table (CAT) is an automation tool developed for monitoring real time system messages for specific events and processes in order to take user defined actions based on a set of user-defined rules. CAT was developed by Lockheed Martin Space Operations as a part of a larger NASA effort at the Goddard Space Flight Center (GSFC) to create a component-based, middleware-based, and standard-based general purpose ground system architecture referred as GMSEC - the GSFC Mission Services Evolution Center. CAT has been integrated into the upgraded ground systems for Tropical Rainfall Measuring Mission (TRMM) and Small Explorer (SMEX) satellites and it plays the central role in their automation effort to reduce the cost and increase the reliability for spacecraft operations. The GMSEC architecture provides a standard communication interface and protocol for components to publish/describe messages to an information bus. It also provides a standard message definition so components can send and receive messages to the bus interface rather than each other, thus reducing the component-to-component coupling, interface, protocols, and link (socket) management. With the GMSEC architecture, components can publish standard event messages to the bus for all nominal, significant, and surprising events in regard to satellite, celestial, ground system, or any other activity. In addition to sending standard event messages, each GMSEC compliant component is required to accept and process GMSEC directive request messages.
Three-pass protocol scheme for bitmap image security by using vernam cipher algorithm
NASA Astrophysics Data System (ADS)
Rachmawati, D.; Budiman, M. A.; Aulya, L.
2018-02-01
Confidentiality, integrity, and efficiency are the crucial aspects of data security. Among the other digital data, image data is too prone to abuse of operation like duplication, modification, etc. There are some data security techniques, one of them is cryptography. The security of Vernam Cipher cryptography algorithm is very dependent on the key exchange process. If the key is leaked, security of this algorithm will collapse. Therefore, a method that minimizes key leakage during the exchange of messages is required. The method which is used, is known as Three-Pass Protocol. This protocol enables message delivery process without the key exchange. Therefore, the sending messages process can reach the receiver safely without fear of key leakage. The system is built by using Java programming language. The materials which are used for system testing are image in size 200×200 pixel, 300×300 pixel, 500×500 pixel, 800×800 pixel and 1000×1000 pixel. The result of experiments showed that Vernam Cipher algorithm in Three-Pass Protocol scheme could restore the original image.
Ordering Traces Logically to Identify Lateness in Message Passing Programs
Isaacs, Katherine E.; Gamblin, Todd; Bhatele, Abhinav; ...
2015-03-30
Event traces are valuable for understanding the behavior of parallel programs. However, automatically analyzing a large parallel trace is difficult, especially without a specific objective. We aid this endeavor by extracting a trace's logical structure, an ordering of trace events derived from happened-before relationships, while taking into account developer intent. Using this structure, we can calculate an operation's delay relative to its peers on other processes. The logical structure also serves as a platform for comparing and clustering processes as well as highlighting communication patterns in a trace visualization. We present an algorithm for determining this idealized logical structure frommore » traces of message passing programs, and we develop metrics to quantify delays and differences among processes. We implement our techniques in Ravel, a parallel trace visualization tool that displays both logical and physical timelines. Rather than showing the duration of each operation, we display where delays begin and end, and how they propagate. As a result, we apply our approach to the traces of several message passing applications, demonstrating the accuracy of our extracted structure and its utility in analyzing these codes.« less
TravelAid : in-vehicle signing and variable speed limit evaluation
DOT National Transportation Integrated Search
2001-12-01
This report discusses the effectiveness of using variable message signs (VMSs) and in-vehicle traffic advisory systems (IVUs) on a mountainous pass (Snoqualmie Pass on Interstate 90 in Washington State) for changing driver behavior. As part of this p...
Interface For Fault-Tolerant Control System
NASA Technical Reports Server (NTRS)
Shaver, Charles; Williamson, Michael
1989-01-01
Interface unit and controller emulator developed for research on electronic helicopter-flight-control systems equipped with artificial intelligence. Interface unit interrupt-driven system designed to link microprocessor-based, quadruply-redundant, asynchronous, ultra-reliable, fault-tolerant control system (controller) with electronic servocontrol unit that controls set of hydraulic actuators. Receives digital feedforward messages from, and transmits digital feedback messages to, controller through differential signal lines or fiber-optic cables (thus far only differential signal lines have been used). Analog signals transmitted to and from servocontrol unit via coaxial cables.
LBMD : a layer-based mesh data structure tailored for generic API infrastructures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ebeida, Mohamed S.; Knupp, Patrick Michael
2010-11-01
A new mesh data structure is introduced for the purpose of mesh processing in Application Programming Interface (API) infrastructures. This data structure utilizes a reduced mesh representation to increase its ability to handle significantly larger meshes compared to full mesh representation. In spite of the reduced representation, each mesh entity (vertex, edge, face, and region) is represented using a unique handle, with no extra storage cost, which is a crucial requirement in most API libraries. The concept of mesh layers makes the data structure more flexible for mesh generation and mesh modification operations. This flexibility can have a favorable impactmore » in solver based queries of finite volume and multigrid methods. The capabilities of LBMD make it even more attractive for parallel implementations using Message Passing Interface (MPI) or Graphics Processing Units (GPUs). The data structure is associated with a new classification method to relate mesh entities to their corresponding geometrical entities. The classification technique stores the related information at the node level without introducing any ambiguities. Several examples are presented to illustrate the strength of this new data structure.« less
Addressing the challenges of standalone multi-core simulations in molecular dynamics
NASA Astrophysics Data System (ADS)
Ocaya, R. O.; Terblans, J. J.
2017-07-01
Computational modelling in material science involves mathematical abstractions of force fields between particles with the aim to postulate, develop and understand materials by simulation. The aggregated pairwise interactions of the material's particles lead to a deduction of its macroscopic behaviours. For practically meaningful macroscopic scales, a large amount of data are generated, leading to vast execution times. Simulation times of hours, days or weeks for moderately sized problems are not uncommon. The reduction of simulation times, improved result accuracy and the associated software and hardware engineering challenges are the main motivations for many of the ongoing researches in the computational sciences. This contribution is concerned mainly with simulations that can be done on a "standalone" computer based on Message Passing Interfaces (MPI), parallel code running on hardware platforms with wide specifications, such as single/multi- processor, multi-core machines with minimal reconfiguration for upward scaling of computational power. The widely available, documented and standardized MPI library provides this functionality through the MPI_Comm_size (), MPI_Comm_rank () and MPI_Reduce () functions. A survey of the literature shows that relatively little is written with respect to the efficient extraction of the inherent computational power in a cluster. In this work, we discuss the main avenues available to tap into this extra power without compromising computational accuracy. We also present methods to overcome the high inertia encountered in single-node-based computational molecular dynamics. We begin by surveying the current state of the art and discuss what it takes to achieve parallelism, efficiency and enhanced computational accuracy through program threads and message passing interfaces. Several code illustrations are given. The pros and cons of writing raw code as opposed to using heuristic, third-party code are also discussed. The growing trend towards graphical processor units and virtual computing clouds for high-performance computing is also discussed. Finally, we present the comparative results of vacancy formation energy calculations using our own parallelized standalone code called Verlet-Stormer velocity (VSV) operating on 30,000 copper atoms. The code is based on the Sutton-Chen implementation of the Finnis-Sinclair pairwise embedded atom potential. A link to the code is also given.
AMS -- The Unix ADAM Message System
NASA Astrophysics Data System (ADS)
Kelly, B. D.; Chipperfield, A. J.
The ADAM Message System (AMS) library, which implements the ADAM inter-task communications protocol under Unix, is described, along with its Fortran-callable interface (FAMS). The description of AMS is distinguished from the current implementation which uses the Message System Primitives (MSP).
A real-time MPEG software decoder using a portable message-passing library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kwong, Man Kam; Tang, P.T. Peter; Lin, Biquan
1995-12-31
We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4 and MPI. The parallel MPEG decoder currently runs on the IBM SP system but can be easil ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, 1/0 capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gropp, W.D.; Keyes, D.E.
1988-03-01
The authors discuss the parallel implementation of preconditioned conjugate gradient (PCG)-based domain decomposition techniques for self-adjoint elliptic partial differential equations in two dimensions on several architectures. The complexity of these methods is described on a variety of message-passing parallel computers as a function of the size of the problem, number of processors and relative communication speeds of the processors. They show that communication startups are very important, and that even the small amount of global communication in these methods can significantly reduce the performance of many message-passing architectures.
n-body simulations using message passing parallel computers.
NASA Astrophysics Data System (ADS)
Grama, A. Y.; Kumar, V.; Sameh, A.
The authors present new parallel formulations of the Barnes-Hut method for n-body simulations on message passing computers. These parallel formulations partition the domain efficiently incurring minimal communication overhead. This is in contrast to existing schemes that are based on sorting a large number of keys or on the use of global data structures. The new formulations are augmented by alternate communication strategies which serve to minimize communication overhead. The impact of these communication strategies is experimentally studied. The authors report on experimental results obtained from an astrophysical simulation on an nCUBE2 parallel computer.
Um, Ki Sung; Kwak, Yun Sik; Cho, Hune; Kim, Il Kon
2005-11-01
A basic assumption of Health Level Seven (HL7) protocol is 'No limitation of message length'. However, most existing commercial HL7 interface engines do limit message length because they use the string array method, which is run in the main memory for the HL7 message parsing process. Specifically, messages with image and multi-media data create a long string array and thus cause the computer system to raise critical and fatal problem. Consequently, HL7 messages cannot handle the image and multi-media data necessary in modern medical records. This study aims to solve this problem with the 'streaming algorithm' method. This new method for HL7 message parsing applies the character-stream object which process character by character between the main memory and hard disk device with the consequence that the processing load on main memory could be alleviated. The main functions of this new engine are generating, parsing, validating, browsing, sending, and receiving HL7 messages. Also, the engine can parse and generate XML-formatted HL7 messages. This new HL7 engine successfully exchanged HL7 messages with 10 megabyte size images and discharge summary information between two university hospitals.
NASA Astrophysics Data System (ADS)
Pathak, Ashish; Raessi, Mehdi
2016-04-01
We present a three-dimensional (3D) and fully Eulerian approach to capturing the interaction between two fluids and moving rigid structures by using the fictitious domain and volume-of-fluid (VOF) methods. The solid bodies can have arbitrarily complex geometry and can pierce the fluid-fluid interface, forming contact lines. The three-phase interfaces are resolved and reconstructed by using a VOF-based methodology. Then, a consistent scheme is employed for transporting mass and momentum, allowing for simulations of three-phase flows of large density ratios. The Eulerian approach significantly simplifies numerical resolution of the kinematics of rigid bodies of complex geometry and with six degrees of freedom. The fluid-structure interaction (FSI) is computed using the fictitious domain method. The methodology was developed in a message passing interface (MPI) parallel framework accelerated with graphics processing units (GPUs). The computationally intensive solution of the pressure Poisson equation is ported to GPUs, while the remaining calculations are performed on CPUs. The performance and accuracy of the methodology are assessed using an array of test cases, focusing individually on the flow solver and the FSI in surface-piercing configurations. Finally, an application of the proposed methodology in simulations of the ocean wave energy converters is presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hansen, Timothy M.; Palmintier, Bryan; Suryanarayanan, Siddharth
As more Smart Grid technologies (e.g., distributed photovoltaic, spatially distributed electric vehicle charging) are integrated into distribution grids, static distribution simulations are no longer sufficient for performing modeling and analysis. GridLAB-D is an agent-based distribution system simulation environment that allows fine-grained end-user models, including geospatial and network topology detail. A problem exists in that, without outside intervention, once the GridLAB-D simulation begins execution, it will run to completion without allowing the real-time interaction of Smart Grid controls, such as home energy management systems and aggregator control. We address this lack of runtime interaction by designing a flexible communication interface, Bus.pymore » (pronounced bus-dot-pie), that uses Python to pass messages between one or more GridLAB-D instances and a Smart Grid simulator. This work describes the design and implementation of Bus.py, discusses its usefulness in terms of some Smart Grid scenarios, and provides an example of an aggregator-based residential demand response system interacting with GridLAB-D through Bus.py. The small scale example demonstrates the validity of the interface and shows that an aggregator using said interface is able to control residential loads in GridLAB-D during runtime to cause a reduction in the peak load on the distribution system in (a) peak reduction and (b) time-of-use pricing cases.« less
A visual interface for generic message translation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blattner, M.M.; Kou, L.T.; Carlson, J.W.
1988-06-21
This paper is concerned with the translation of data structures we call messages. Messages are an example of a type of data structure encountered in generic data translation. Our objective is to provide a system that the nonprogrammer can use to specify the nature of translations from one type to another. For this reason we selected a visual interface that uses interaction techniques that do not require a knowledge of programming or command languages. The translator must accomplish two tasks: create a mapping between fields in different message types that specifies which fields have similar semantic content, and reformat ormore » translate data specifications within those fields. The translations are accomplished with appropriate, but different, visual metaphors. 14 refs., 4 figs.« less
Cummings, M L
2004-12-01
In the recent development of a human-in-the-loop simulation test bed designed to examine human performance issues for supervisory control of the Navy's new Tactical Tomahawk missile, measurements of operator situation awareness (SA) and workload through secondary tasking were taken through an embedded instant messaging program. Instant message interfaces (otherwise known as "chat"), already a means of communication between Navy ships, allow researchers to query users in real-time in a natural, ecologic setting, and thus provide more realistic and unobtrusive measurements. However, in the course of this testing, results revealed that some subjects fixated on the real-time instant messaging secondary task instead of the primary task of missile control, leading to the overall degradation of mission performance as well as a loss of SA. While this research effort was the first to quantify command and control performance degradation as a result of instant messaging, the military has recognized that in its network centric warfare quest, instant messaging is a critical informal communication tool, but has associated problems. Recently, a military spokesman said that managing chat in current military operations was sometimes a "nightmare," because military personnel have difficulty in handling large amounts of information through chat, and then synthesizing knowledge from this information. This research highlights the need for further investigation of the role of instant messaging interfaces both on task performance and situation awareness, and how the associated problems could be ameliorated through adaptive display design.
Chain of Custody Item Monitor Message Viewer v.1.0 Beta
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schwartz, Steven Robert; Fielder, Laura; Hymel, Ross W.
The CoCIM Message Viewer software allows users to connect to and download messages from a Chain of Custody Item Monitor (CoCIM) connected to a serial port on the user’s computer. The downloaded messages are authenticated and displayed in a Graphical User Interface that allows the user a limited degree of sorting and filtering of the downloaded messages as well as the ability to save downloaded files or to open previously downloaded message history files.
Comparative Implementation of High Performance Computing for Power System Dynamic Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Huang, Zhenyu; Diao, Ruisheng
Dynamic simulation for transient stability assessment is one of the most important, but intensive, computations for power system planning and operation. Present commercial software is mainly designed for sequential computation to run a single simulation, which is very time consuming with a single processer. The application of High Performance Computing (HPC) to dynamic simulations is very promising in accelerating the computing process by parallelizing its kernel algorithms while maintaining the same level of computation accuracy. This paper describes the comparative implementation of four parallel dynamic simulation schemes in two state-of-the-art HPC environments: Message Passing Interface (MPI) and Open Multi-Processing (OpenMP).more » These implementations serve to match the application with dedicated multi-processor computing hardware and maximize the utilization and benefits of HPC during the development process.« less
MPI Runtime Error Detection with MUST: Advances in Deadlock Detection
Hilbrich, Tobias; Protze, Joachim; Schulz, Martin; ...
2013-01-01
The widely used Message Passing Interface (MPI) is complex and rich. As a result, application developers require automated tools to avoid and to detect MPI programming errors. We present the Marmot Umpire Scalable Tool (MUST) that detects such errors with significantly increased scalability. We present improvements to our graph-based deadlock detection approach for MPI, which cover future MPI extensions. Our enhancements also check complex MPI constructs that no previous graph-based detection approach handled correctly. Finally, we present optimizations for the processing of MPI operations that reduce runtime deadlock detection overheads. Existing approaches often require ( p ) analysis time permore » MPI operation, for p processes. We empirically observe that our improvements lead to sub-linear or better analysis time per operation for a wide range of real world applications.« less
SKIRT: Hybrid parallelization of radiative transfer simulations
NASA Astrophysics Data System (ADS)
Verstocken, S.; Van De Putte, D.; Camps, P.; Baes, M.
2017-07-01
We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modelling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behaviour of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.
González-Domínguez, Jorge; Remeseiro, Beatriz; Martín, María J
2017-02-01
The analysis of the interference patterns on the tear film lipid layer is a useful clinical test to diagnose dry eye syndrome. This task can be automated with a high degree of accuracy by means of the use of tear film maps. However, the time required by the existing applications to generate them prevents a wider acceptance of this method by medical experts. Multithreading has been previously successfully employed by the authors to accelerate the tear film map definition on multicore single-node machines. In this work, we propose a hybrid message-passing and multithreading parallel approach that further accelerates the generation of tear film maps by exploiting the computational capabilities of distributed-memory systems such as multicore clusters and supercomputers. The algorithm for drawing tear film maps is parallelized using Message Passing Interface (MPI) for inter-node communications and the multithreading support available in the C++11 standard for intra-node parallelization. The original algorithm is modified to reduce the communications and increase the scalability. The hybrid method has been tested on 32 nodes of an Intel cluster (with two 12-core Haswell 2680v3 processors per node) using 50 representative images. Results show that maximum runtime is reduced from almost two minutes using the previous only-multithreaded approach to less than ten seconds using the hybrid method. The hybrid MPI/multithreaded implementation can be used by medical experts to obtain tear film maps in only a few seconds, which will significantly accelerate and facilitate the diagnosis of the dry eye syndrome. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A Multi-Level Parallelization Concept for High-Fidelity Multi-Block Solvers
NASA Technical Reports Server (NTRS)
Hatay, Ferhat F.; Jespersen, Dennis C.; Guruswamy, Guru P.; Rizk, Yehia M.; Byun, Chansup; Gee, Ken; VanDalsem, William R. (Technical Monitor)
1997-01-01
The integration of high-fidelity Computational Fluid Dynamics (CFD) analysis tools with the industrial design process benefits greatly from the robust implementations that are transportable across a wide range of computer architectures. In the present work, a hybrid domain-decomposition and parallelization concept was developed and implemented into the widely-used NASA multi-block Computational Fluid Dynamics (CFD) packages implemented in ENSAERO and OVERFLOW. The new parallel solver concept, PENS (Parallel Euler Navier-Stokes Solver), employs both fine and coarse granularity in data partitioning as well as data coalescing to obtain the desired load-balance characteristics on the available computer platforms. This multi-level parallelism implementation itself introduces no changes to the numerical results, hence the original fidelity of the packages are identically preserved. The present implementation uses the Message Passing Interface (MPI) library for interprocessor message passing and memory accessing. By choosing an appropriate combination of the available partitioning and coalescing capabilities only during the execution stage, the PENS solver becomes adaptable to different computer architectures from shared-memory to distributed-memory platforms with varying degrees of parallelism. The PENS implementation on the IBM SP2 distributed memory environment at the NASA Ames Research Center obtains 85 percent scalable parallel performance using fine-grain partitioning of single-block CFD domains using up to 128 wide computational nodes. Multi-block CFD simulations of complete aircraft simulations achieve 75 percent perfect load-balanced executions using data coalescing and the two levels of parallelism. SGI PowerChallenge, SGI Origin 2000, and a cluster of workstations are the other platforms where the robustness of the implementation is tested. The performance behavior on the other computer platforms with a variety of realistic problems will be included as this on-going study progresses.
Space Reclamation for Uncoordinated Checkpointing in Message-Passing Systems. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Wang, Yi-Min
1993-01-01
Checkpointing and rollback recovery are techniques that can provide efficient recovery from transient process failures. In a message-passing system, the rollback of a message sender may cause the rollback of the corresponding receiver, and the system needs to roll back to a consistent set of checkpoints called recovery line. If the processes are allowed to take uncoordinated checkpoints, the above rollback propagation may result in the domino effect which prevents recovery line progression. Traditionally, only obsolete checkpoints before the global recovery line can be discarded, and the necessary and sufficient condition for identifying all garbage checkpoints has remained an open problem. A necessary and sufficient condition for achieving optimal garbage collection is derived and it is proved that the number of useful checkpoints is bounded by N(N+1)/2, where N is the number of processes. The approach is based on the maximum-sized antichain model of consistent global checkpoints and the technique of recovery line transformation and decomposition. It is also shown that, for systems requiring message logging to record in-transit messages, the same approach can be used to achieve optimal message log reclamation. As a final topic, a unifying framework is described by considering checkpoint coordination and exploiting piecewise determinism as mechanisms for bounding rollback propagation, and the applicability of the optimal garbage collection algorithm to domino-free recovery protocols is demonstrated.
ERIC Educational Resources Information Center
Flowers, Arthur; Crandell, Edwin W.
Three auditory perceptual processes (resistance to distortion, selective listening in the form of auditory dedifferentiation, and binaural synthesis) were evaluated by five assessment techniques: (1) low pass filtered speech, (2) accelerated speech, (3) competing messages, (4) accelerated plus competing messages, and (5) binaural synthesis.…
Quantum cluster variational method and message passing algorithms revisited
NASA Astrophysics Data System (ADS)
Domínguez, E.; Mulet, Roberto
2018-02-01
We present a general framework to study quantum disordered systems in the context of the Kikuchi's cluster variational method (CVM). The method relies in the solution of message passing-like equations for single instances or in the iterative solution of complex population dynamic algorithms for an average case scenario. We first show how a standard application of the Kikuchi's CVM can be easily translated to message passing equations for specific instances of the disordered system. We then present an "ad hoc" extension of these equations to a population dynamic algorithm representing an average case scenario. At the Bethe level, these equations are equivalent to the dynamic population equations that can be derived from a proper cavity ansatz. However, at the plaquette approximation, the interpretation is more subtle and we discuss it taking also into account previous results in classical disordered models. Moreover, we develop a formalism to properly deal with the average case scenario using a replica-symmetric ansatz within this CVM for quantum disordered systems. Finally, we present and discuss numerical solutions of the different approximations for the quantum transverse Ising model and the quantum random field Ising model in two-dimensional lattices.
Network device interface for digitally interfacing data channels to a controller a via network
NASA Technical Reports Server (NTRS)
Konz, Daniel W. (Inventor); Ellerbrock, Philip J. (Inventor); Grant, Robert L. (Inventor); Winkelmann, Joseph P. (Inventor)
2006-01-01
The present invention provides a network device interface and method for digitally connecting a plurality of data channels to a controller using a network bus. The network device interface interprets commands and data received from the controller and polls the data channels in accordance with these commands. Specifically, the network device interface receives digital commands and data from the controller, and based on these commands and data, communicates with the data channels to either retrieve data in the case of a sensor or send data to activate an actuator. In one embodiment, the bus controller transmits messages to the network device interface containing a plurality of bits having a value defined by a transition between first and second states in the bits. The network device interface determines timing of the data sequence of the message and uses the determined timing to communicate with the bus controller.
A Toolkit for Designing User Interfaces
1990-03-01
as the NPS IB can provide prototyping capability. Interface generators are available commercially for nearly every computing machine on the market ...structure which holds attributes of the message buffer window is shown in Figure 4.2. The variables nlines and nchars hold the number of lines in the...window its appearance of scrolling 46 /* define a type and structure for the message buffer */ struct messbuf( long nlines ; /* number of lines in the
Accelerating list management for MPI.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hemmert, K. Scott; Rodrigues, Arun F.; Underwood, Keith Douglas
2005-07-01
The latency and throughput of MPI messages are critically important to a range of parallel scientific applications. In many modern networks, both of these performance characteristics are largely driven by the performance of a processor on the network interface. Because of the semantics of MPI, this embedded processor is forced to traverse a linked list of posted receives each time a message is received. As this list grows long, the latency of message reception grows and the throughput of MPI messages decreases. This paper presents a novel hardware feature to handle list management functions on a network interface. By movingmore » functions such as list insertion, list traversal, and list deletion to the hardware unit, latencies are decreased by up to 20% in the zero length queue case with dramatic improvements in the presence of long queues. Similarly, the throughput is increased by up to 10% in the zero length queue case and by nearly 100% in the presence queues of 30 messages.« less
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)
2002-01-01
The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.
2003-01-01
Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
ADAPTIVE TETRAHEDRAL GRID REFINEMENT AND COARSENING IN MESSAGE-PASSING ENVIRONMENTS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hallberg, J.; Stagg, A.
2000-10-01
A grid refinement and coarsening scheme has been developed for tetrahedral and triangular grid-based calculations in message-passing environments. The element adaption scheme is based on an edge bisection of elements marked for refinement by an appropriate error indicator. Hash-table/linked-list data structures are used to store nodal and element formation. The grid along inter-processor boundaries is refined and coarsened consistently with the update of these data structures via MPI calls. The parallel adaption scheme has been applied to the solution of a transient, three-dimensional, nonlinear, groundwater flow problem. Timings indicate efficiency of the grid refinement process relative to the flow solvermore » calculations.« less
Accelerating atomistic calculations of quantum energy eigenstates on graphic cards
NASA Astrophysics Data System (ADS)
Rodrigues, Walter; Pecchia, A.; Lopez, M.; Auf der Maur, M.; Di Carlo, A.
2014-10-01
Electronic properties of nanoscale materials require the calculation of eigenvalues and eigenvectors of large matrices. This bottleneck can be overcome by parallel computing techniques or the introduction of faster algorithms. In this paper we report a custom implementation of the Lanczos algorithm with simple restart, optimized for graphical processing units (GPUs). The whole algorithm has been developed using CUDA and runs entirely on the GPU, with a specialized implementation that spares memory and reduces at most machine-to-device data transfers. Furthermore parallel distribution over several GPUs has been attained using the standard message passing interface (MPI). Benchmark calculations performed on a GaN/AlGaN wurtzite quantum dot with up to 600,000 atoms are presented. The empirical tight-binding (ETB) model with an sp3d5s∗+spin-orbit parametrization has been used to build the system Hamiltonian (H).
Ice-sheet modelling accelerated by graphics cards
NASA Astrophysics Data System (ADS)
Brædstrup, Christian Fredborg; Damsgaard, Anders; Egholm, David Lundbek
2014-11-01
Studies of glaciers and ice sheets have increased the demand for high performance numerical ice flow models over the past decades. When exploring the highly non-linear dynamics of fast flowing glaciers and ice streams, or when coupling multiple flow processes for ice, water, and sediment, researchers are often forced to use super-computing clusters. As an alternative to conventional high-performance computing hardware, the Graphical Processing Unit (GPU) is capable of massively parallel computing while retaining a compact design and low cost. In this study, we present a strategy for accelerating a higher-order ice flow model using a GPU. By applying the newest GPU hardware, we achieve up to 180× speedup compared to a similar but serial CPU implementation. Our results suggest that GPU acceleration is a competitive option for ice-flow modelling when compared to CPU-optimised algorithms parallelised by the OpenMP or Message Passing Interface (MPI) protocols.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clough, Katy; Figueras, Pau; Finkel, Hal
In this work, we introduce GRChombo: a new numerical relativity code which incorporates full adaptive mesh refinement (AMR) using block structured Berger-Rigoutsos grid generation. The code supports non-trivial 'many-boxes-in-many-boxes' mesh hierarchies and massive parallelism through the message passing interface. GRChombo evolves the Einstein equation using the standard BSSN formalism, with an option to turn on CCZ4 constraint damping if required. The AMR capability permits the study of a range of new physics which has previously been computationally infeasible in a full 3 + 1 setting, while also significantly simplifying the process of setting up the mesh for these problems. Wemore » show that GRChombo can stably and accurately evolve standard spacetimes such as binary black hole mergers and scalar collapses into black holes, demonstrate the performance characteristics of our code, and discuss various physics problems which stand to benefit from the AMR technique.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shipman, Galen M.
These are the slides for a presentation on programming models in HPC, at the Los Alamos National Laboratory's Parallel Computing Summer School. The following topics are covered: Flynn's Taxonomy of computer architectures; single instruction single data; single instruction multiple data; multiple instruction multiple data; address space organization; definition of Trinity (Intel Xeon-Phi is a MIMD architecture); single program multiple data; multiple program multiple data; ExMatEx workflow overview; definition of a programming model, programming languages, runtime systems; programming model and environments; MPI (Message Passing Interface); OpenMP; Kokkos (Performance Portable Thread-Parallel Programming Model); Kokkos abstractions, patterns, policies, and spaces; RAJA, a systematicmore » approach to node-level portability and tuning; overview of the Legion Programming Model; mapping tasks and data to hardware resources; interoperability: supporting task-level models; Legion S3D execution and performance details; workflow, integration of external resources into the programming model.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sayan Ghosh, Jeff Hammond
OpenSHMEM is a community effort to unifyt and standardize the SHMEM programming model. MPI (Message Passing Interface) is a well-known community standard for parallel programming using distributed memory. The most recen t release of MPI, version 3.0, was designed in part to support programming models like SHMEM.OSHMPI is an implementation of the OpenSHMEM standard using MPI-3 for the Linux operating system. It is the first implementation of SHMEM over MPI one-sided communication and has the potential to be widely adopted due to the portability and widely availability of Linux and MPI-3. OSHMPI has been tested on a variety of systemsmore » and implementations of MPI-3, includingInfiniBand clusters using MVAPICH2 and SGI shared-memory supercomputers using MPICH. Current support is limited to Linux but may be extended to Apple OSX if there is sufficient interest. The code is opensource via https://github.com/jeffhammond/oshmpi« less
NASA Astrophysics Data System (ADS)
Qiang, Ji
2017-10-01
A three-dimensional (3D) Poisson solver with longitudinal periodic and transverse open boundary conditions can have important applications in beam physics of particle accelerators. In this paper, we present a fast efficient method to solve the Poisson equation using a spectral finite-difference method. This method uses a computational domain that contains the charged particle beam only and has a computational complexity of O(Nu(logNmode)) , where Nu is the total number of unknowns and Nmode is the maximum number of longitudinal or azimuthal modes. This saves both the computational time and the memory usage of using an artificial boundary condition in a large extended computational domain. The new 3D Poisson solver is parallelized using a message passing interface (MPI) on multi-processor computers and shows a reasonable parallel performance up to hundreds of processor cores.
Facilitating Co-Design for Extreme-Scale Systems Through Lightweight Simulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Engelmann, Christian; Lauer, Frank
This work focuses on tools for investigating algorithm performance at extreme scale with millions of concurrent threads and for evaluating the impact of future architecture choices to facilitate the co-design of high-performance computing (HPC) architectures and applications. The approach focuses on lightweight simulation of extreme-scale HPC systems with the needed amount of accuracy. The prototype presented in this paper is able to provide this capability using a parallel discrete event simulation (PDES), such that a Message Passing Interface (MPI) application can be executed at extreme scale, and its performance properties can be evaluated. The results of an initial prototype aremore » encouraging as a simple 'hello world' MPI program could be scaled up to 1,048,576 virtual MPI processes on a four-node cluster, and the performance properties of two MPI programs could be evaluated at up to 16,384 virtual MPI processes on the same system.« less
A 3D staggered-grid finite difference scheme for poroelastic wave equation
NASA Astrophysics Data System (ADS)
Zhang, Yijie; Gao, Jinghuai
2014-10-01
Three dimensional numerical modeling has been a viable tool for understanding wave propagation in real media. The poroelastic media can better describe the phenomena of hydrocarbon reservoirs than acoustic and elastic media. However, the numerical modeling in 3D poroelastic media demands significantly more computational capacity, including both computational time and memory. In this paper, we present a 3D poroelastic staggered-grid finite difference (SFD) scheme. During the procedure, parallel computing is implemented to reduce the computational time. Parallelization is based on domain decomposition, and communication between processors is performed using message passing interface (MPI). Parallel analysis shows that the parallelized SFD scheme significantly improves the simulation efficiency and 3D decomposition in domain is the most efficient. We also analyze the numerical dispersion and stability condition of the 3D poroelastic SFD method. Numerical results show that the 3D numerical simulation can provide a real description of wave propagation.
Parallel file system with metadata distributed across partitioned key-value store c
Bent, John M.; Faibish, Sorin; Grider, Gary; Torres, Aaron
2017-09-19
Improved techniques are provided for storing metadata associated with a plurality of sub-files associated with a single shared file in a parallel file system. The shared file is generated by a plurality of applications executing on a plurality of compute nodes. A compute node implements a Parallel Log Structured File System (PLFS) library to store at least one portion of the shared file generated by an application executing on the compute node and metadata for the at least one portion of the shared file on one or more object storage servers. The compute node is also configured to implement a partitioned data store for storing a partition of the metadata for the shared file, wherein the partitioned data store communicates with partitioned data stores on other compute nodes using a message passing interface. The partitioned data store can be implemented, for example, using Multidimensional Data Hashing Indexing Middleware (MDHIM).
Regularization with numerical extrapolation for finite and UV-divergent multi-loop integrals
NASA Astrophysics Data System (ADS)
de Doncker, E.; Yuasa, F.; Kato, K.; Ishikawa, T.; Kapenga, J.; Olagbemi, O.
2018-03-01
We give numerical integration results for Feynman loop diagrams such as those covered by Laporta (2000) and by Baikov and Chetyrkin (2010), and which may give rise to loop integrals with UV singularities. We explore automatic adaptive integration using multivariate techniques from the PARINT package for multivariate integration, as well as iterated integration with programs from the QUADPACK package, and a trapezoidal method based on a double exponential transformation. PARINT is layered over MPI (Message Passing Interface), and incorporates advanced parallel/distributed techniques including load balancing among processes that may be distributed over a cluster or a network/grid of nodes. Results are included for 2-loop vertex and box diagrams and for sets of 2-, 3- and 4-loop self-energy diagrams with or without UV terms. Numerical regularization of integrals with singular terms is achieved by linear and non-linear extrapolation methods.
Networking and AI systems: Requirements and benefits
NASA Technical Reports Server (NTRS)
1988-01-01
The price performance benefits of network systems is well documented. The ability to share expensive resources sold timesharing for mainframes, department clusters of minicomputers, and now local area networks of workstations and servers. In the process, other fundamental system requirements emerged. These have now been generalized with open system requirements for hardware, software, applications and tools. The ability to interconnect a variety of vendor products has led to a specification of interfaces that allow new techniques to extend existing systems for new and exciting applications. As an example of the message passing system, local area networks provide a testbed for many of the issues addressed by future concurrent architectures: synchronization, load balancing, fault tolerance and scalability. Gold Hill has been working with a number of vendors on distributed architectures that range from a network of workstations to a hypercube of microprocessors with distributed memory. Results from early applications are promising both for performance and scalability.
Secure UNIX socket-based controlling system for high-throughput protein crystallography experiments.
Gaponov, Yurii; Igarashi, Noriyuki; Hiraki, Masahiko; Sasajima, Kumiko; Matsugaki, Naohiro; Suzuki, Mamoru; Kosuge, Takashi; Wakatsuki, Soichi
2004-01-01
A control system for high-throughput protein crystallography experiments has been developed based on a multilevel secure (SSL v2/v3) UNIX socket under the Linux operating system. Main features of protein crystallography experiments (purification, crystallization, loop preparation, data collecting, data processing) are dealt with by the software. All information necessary to perform protein crystallography experiments is stored (except raw X-ray data, that are stored in Network File Server) in a relational database (MySQL). The system consists of several servers and clients. TCP/IP secure UNIX sockets with four predefined behaviors [(a) listening to a request followed by a reply, (b) sending a request and waiting for a reply, (c) listening to a broadcast message, and (d) sending a broadcast message] support communications between all servers and clients allowing one to control experiments, view data, edit experimental conditions and perform data processing remotely. The usage of the interface software is well suited for developing well organized control software with a hierarchical structure of different software units (Gaponov et al., 1998), which will pass and receive different types of information. All communication is divided into two parts: low and top levels. Large and complicated control tasks are split into several smaller ones, which can be processed by control clients independently. For communicating with experimental equipment (beamline optical elements, robots, and specialized experimental equipment etc.), the STARS server, developed at the Photon Factory, is used (Kosuge et al., 2002). The STARS server allows any application with an open socket to be connected with any other clients that control experimental equipment. Majority of the source code is written in C/C++. GUI modules of the system were built mainly using Glade user interface builder for GTK+ and Gnome under Red Hat Linux 7.1 operating system.
PELE web server: atomistic study of biomolecular systems at your fingertips.
Madadkar-Sobhani, Armin; Guallar, Victor
2013-07-01
PELE, Protein Energy Landscape Exploration, our novel technology based on protein structure prediction algorithms and a Monte Carlo sampling, is capable of modelling the all-atom protein-ligand dynamical interactions in an efficient and fast manner, with two orders of magnitude reduced computational cost when compared with traditional molecular dynamics techniques. PELE's heuristic approach generates trial moves based on protein and ligand perturbations followed by side chain sampling and global/local minimization. The collection of accepted steps forms a stochastic trajectory. Furthermore, several processors may be run in parallel towards a collective goal or defining several independent trajectories; the whole procedure has been parallelized using the Message Passing Interface. Here, we introduce the PELE web server, designed to make the whole process of running simulations easier and more practical by minimizing input file demand, providing user-friendly interface and producing abstract outputs (e.g. interactive graphs and tables). The web server has been implemented in C++ using Wt (http://www.webtoolkit.eu) and MySQL (http://www.mysql.com). The PELE web server, accessible at http://pele.bsc.es, is free and open to all users with no login requirement.
1992-10-01
5 2.1 Network Overview ................................................ 5 2.2 Network Access Methods...to TAC and ?4ini-TAC users, such as common error messages, TAC commands, and instructions for performing file tranders. Section 5 , Network Use...originally known as Interface Message Processors, or IMPs. 5 THE DEFENSE DATA NETWORK DRAFt NIC 60001, October 1992 message do not necessarily take the same
NASA Astrophysics Data System (ADS)
Huang, Haiping
2017-05-01
Revealing hidden features in unlabeled data is called unsupervised feature learning, which plays an important role in pretraining a deep neural network. Here we provide a statistical mechanics analysis of the unsupervised learning in a restricted Boltzmann machine with binary synapses. A message passing equation to infer the hidden feature is derived, and furthermore, variants of this equation are analyzed. A statistical analysis by replica theory describes the thermodynamic properties of the model. Our analysis confirms an entropy crisis preceding the non-convergence of the message passing equation, suggesting a discontinuous phase transition as a key characteristic of the restricted Boltzmann machine. Continuous phase transition is also confirmed depending on the embedded feature strength in the data. The mean-field result under the replica symmetric assumption agrees with that obtained by running message passing algorithms on single instances of finite sizes. Interestingly, in an approximate Hopfield model, the entropy crisis is absent, and a continuous phase transition is observed instead. We also develop an iterative equation to infer the hyper-parameter (temperature) hidden in the data, which in physics corresponds to iteratively imposing Nishimori condition. Our study provides insights towards understanding the thermodynamic properties of the restricted Boltzmann machine learning, and moreover important theoretical basis to build simplified deep networks.
Principles for problem aggregation and assignment in medium scale multiprocessors
NASA Technical Reports Server (NTRS)
Nicol, David M.; Saltz, Joel H.
1987-01-01
One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faraj, Daniel A.
Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and bit masks; receiving in an origin endpoint of the PAMI a collective instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint; constructing a bit mask for the received collective instruction; selecting, from among the associated algorithms and bit masks,more » a data communications algorithm in dependence upon the constructed bit mask; and executing the collective instruction, transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.« less
Faraj, Daniel A
2013-07-16
Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and bit masks; receiving in an origin endpoint of the PAMI a collective instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint; constructing a bit mask for the received collective instruction; selecting, from among the associated algorithms and bit masks, a data communications algorithm in dependence upon the constructed bit mask; and executing the collective instruction, transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.
Defining the ATC Controller Interface for Data Link Clearances
NASA Technical Reports Server (NTRS)
Rankin, James
1998-01-01
The Controller Interface (CI) is the primary method for Air Traffic Controllers to communicate with aircraft via Controller-Pilot Data Link Communications (CPDLC). The controller, wearing a microphone/headset, aurally gives instructions to aircraft as he/she would with today's voice radio systems. The CI's voice recognition system converts the instructions to digitized messages that are formatted according to the RTCA DO-219 Operational Performance Standards for ATC Two-Way Data Link Communications. The DO-219 messages are transferred via RS-232 to the ATIDS system for uplink using a Mode-S datalink. Pilot acknowledgments of controller messages are downlinked to the ATIDS system and transferred to the Cl. A computer monitor is used to convey information to the controller. Aircraft data from the ARTS database are displayed on flight strips. The flight strips are electronic versions of the strips currently used in the ATC system. Outgoing controller messages cause the respective strip to change color to indicate an unacknowledged transmission. The message text is shown on the flight strips for reference. When the pilot acknowledges the message, the strip returns to its normal color. A map of the airport can also be displayed on the monitor. In addition to voice recognition, the controller can enter messages using the monitor's touch screen or by mouse/keyboard.
Compiled MPI: Cost-Effective Exascale Applications Development
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bronevetsky, G; Quinlan, D; Lumsdaine, A
2012-04-10
The complexity of petascale and exascale machines makes it increasingly difficult to develop applications that can take advantage of them. Future systems are expected to feature billion-way parallelism, complex heterogeneous compute nodes and poor availability of memory (Peter Kogge, 2008). This new challenge for application development is motivating a significant amount of research and development on new programming models and runtime systems designed to simplify large-scale application development. Unfortunately, DoE has significant multi-decadal investment in a large family of mission-critical scientific applications. Scaling these applications to exascale machines will require a significant investment that will dwarf the costs of hardwaremore » procurement. A key reason for the difficulty in transitioning today's applications to exascale hardware is their reliance on explicit programming techniques, such as the Message Passing Interface (MPI) programming model to enable parallelism. MPI provides a portable and high performance message-passing system that enables scalable performance on a wide variety of platforms. However, it also forces developers to lock the details of parallelization together with application logic, making it very difficult to adapt the application to significant changes in the underlying system. Further, MPI's explicit interface makes it difficult to separate the application's synchronization and communication structure, reducing the amount of support that can be provided by compiler and run-time tools. This is in contrast to the recent research on more implicit parallel programming models such as Chapel, OpenMP and OpenCL, which promise to provide significantly more flexibility at the cost of reimplementing significant portions of the application. We are developing CoMPI, a novel compiler-driven approach to enable existing MPI applications to scale to exascale systems with minimal modifications that can be made incrementally over the application's lifetime. It includes: (1) New set of source code annotations, inserted either manually or automatically, that will clarify the application's use of MPI to the compiler infrastructure, enabling greater accuracy where needed; (2) A compiler transformation framework that leverages these annotations to transform the original MPI source code to improve its performance and scalability; (3) Novel MPI runtime implementation techniques that will provide a rich set of functionality extensions to be used by applications that have been transformed by our compiler; and (4) A novel compiler analysis that leverages simple user annotations to automatically extract the application's communication structure and synthesize most complex code annotations.« less
3-Dimensional Marine CSEM Modeling by Employing TDFEM with Parallel Solvers
NASA Astrophysics Data System (ADS)
Wu, X.; Yang, T.
2013-12-01
In this paper, parallel fulfillment is developed for forward modeling of the 3-Dimensional controlled source electromagnetic (CSEM) by using time-domain finite element method (TDFEM). Recently, a greater attention rises on research of hydrocarbon (HC) reservoir detection mechanism in the seabed. Since China has vast ocean resources, seeking hydrocarbon reservoirs become significant in the national economy. However, traditional methods of seismic exploration shown a crucial obstacle to detect hydrocarbon reservoirs in the seabed with a complex structure, due to relatively high acquisition costs and high-risking exploration. In addition, the development of EM simulations typically requires both a deep knowledge of the computational electromagnetics (CEM) and a proper use of sophisticated techniques and tools from computer science. However, the complexity of large-scale EM simulations often requires large memory because of a large amount of data, or solution time to address problems concerning matrix solvers, function transforms, optimization, etc. The objective of this paper is to present parallelized implementation of the time-domain finite element method for analysis of three-dimensional (3D) marine controlled source electromagnetic problems. Firstly, we established a three-dimensional basic background model according to the seismic data, then electromagnetic simulation of marine CSEM was carried out by using time-domain finite element method, which works on a MPI (Message Passing Interface) platform with exact orientation to allow fast detecting of hydrocarbons targets in ocean environment. To speed up the calculation process, SuperLU of an MPI (Message Passing Interface) version called SuperLU_DIST is employed in this approach. Regarding the representation of three-dimension seabed terrain with sense of reality, the region is discretized into an unstructured mesh rather than a uniform one in order to reduce the number of unknowns. Moreover, high-order Whitney vector basis functions are used for spatial discretization within the finite element approach to approximate the electric field. A horizontal electric dipole was used as a source, and an array of the receiver located at the seabed. To capture the presence of the hydrocarbon layer, the forward responses at water depths from 100m to 3000m are calculated. The normalized Magnitude Versus Offset (N-MVO) and Phase Versus Offset (PVO) curve can reflect resistive characteristics of hydrocarbon layers. For future work, Graphics Process Unit (GPU) acceleration algorithm would be carried out to multiply the calculation efficiency greatly.
2013-08-20
Earth observation taken during day pass by an Expedition 36 crew member on board the International Space Station (ISS). Per Twitter message: Looking southwest over northern Africa. Libya, Algeria, Niger.
Application of a distributed network in computational fluid dynamic simulations
NASA Technical Reports Server (NTRS)
Deshpande, Manish; Feng, Jinzhang; Merkle, Charles L.; Deshpande, Ashish
1994-01-01
A general-purpose 3-D, incompressible Navier-Stokes algorithm is implemented on a network of concurrently operating workstations using parallel virtual machine (PVM) and compared with its performance on a CRAY Y-MP and on an Intel iPSC/860. The problem is relatively computationally intensive, and has a communication structure based primarily on nearest-neighbor communication, making it ideally suited to message passing. Such problems are frequently encountered in computational fluid dynamics (CDF), and their solution is increasingly in demand. The communication structure is explicitly coded in the implementation to fully exploit the regularity in message passing in order to produce a near-optimal solution. Results are presented for various grid sizes using up to eight processors.
Approximate message passing with restricted Boltzmann machine priors
NASA Astrophysics Data System (ADS)
Tramel, Eric W.; Drémeau, Angélique; Krzakala, Florent
2016-07-01
Approximate message passing (AMP) has been shown to be an excellent statistical approach to signal inference and compressed sensing problems. The AMP framework provides modularity in the choice of signal prior; here we propose a hierarchical form of the Gauss-Bernoulli prior which utilizes a restricted Boltzmann machine (RBM) trained on the signal support to push reconstruction performance beyond that of simple i.i.d. priors for signals whose support can be well represented by a trained binary RBM. We present and analyze two methods of RBM factorization and demonstrate how these affect signal reconstruction performance within our proposed algorithm. Finally, using the MNIST handwritten digit dataset, we show experimentally that using an RBM allows AMP to approach oracle-support performance.
Passing messages between biological networks to refine predicted interactions.
Glass, Kimberly; Huttenhower, Curtis; Quackenbush, John; Yuan, Guo-Cheng
2013-01-01
Regulatory network reconstruction is a fundamental problem in computational biology. There are significant limitations to such reconstruction using individual datasets, and increasingly people attempt to construct networks using multiple, independent datasets obtained from complementary sources, but methods for this integration are lacking. We developed PANDA (Passing Attributes between Networks for Data Assimilation), a message-passing model using multiple sources of information to predict regulatory relationships, and used it to integrate protein-protein interaction, gene expression, and sequence motif data to reconstruct genome-wide, condition-specific regulatory networks in yeast as a model. The resulting networks were not only more accurate than those produced using individual data sets and other existing methods, but they also captured information regarding specific biological mechanisms and pathways that were missed using other methodologies. PANDA is scalable to higher eukaryotes, applicable to specific tissue or cell type data and conceptually generalizable to include a variety of regulatory, interaction, expression, and other genome-scale data. An implementation of the PANDA algorithm is available at www.sourceforge.net/projects/panda-net.
NASA Technical Reports Server (NTRS)
Long, M. S.; Yantosca, R.; Nielsen, J. E; Keller, C. A.; Da Silva, A.; Sulprizio, M. P.; Pawson, S.; Jacob, D. J.
2015-01-01
The GEOS-Chem global chemical transport model (CTM), used by a large atmospheric chemistry research community, has been re-engineered to also serve as an atmospheric chemistry module for Earth system models (ESMs). This was done using an Earth System Modeling Framework (ESMF) interface that operates independently of the GEOSChem scientific code, permitting the exact same GEOSChem code to be used as an ESM module or as a standalone CTM. In this manner, the continual stream of updates contributed by the CTM user community is automatically passed on to the ESM module, which remains state of science and referenced to the latest version of the standard GEOS-Chem CTM. A major step in this re-engineering was to make GEOS-Chem grid independent, i.e., capable of using any geophysical grid specified at run time. GEOS-Chem data sockets were also created for communication between modules and with external ESM code. The grid-independent, ESMF-compatible GEOS-Chem is now the standard version of the GEOS-Chem CTM. It has been implemented as an atmospheric chemistry module into the NASA GEOS- 5 ESM. The coupled GEOS-5-GEOS-Chem system was tested for scalability and performance with a tropospheric oxidant-aerosol simulation (120 coupled species, 66 transported tracers) using 48-240 cores and message-passing interface (MPI) distributed-memory parallelization. Numerical experiments demonstrate that the GEOS-Chem chemistry module scales efficiently for the number of cores tested, with no degradation as the number of cores increases. Although inclusion of atmospheric chemistry in ESMs is computationally expensive, the excellent scalability of the chemistry module means that the relative cost goes down with increasing number of cores in a massively parallel environment.
Earth observation taken by the Expedition 46 crew
2016-01-23
ISS046e021993 (01/23/2016) --- Earth observation of the coast of Oman taken during a night pass by the Expedition 46 crew aboard the International Space Station. NASA astronaut Tim Kopra tweeted this image out with this message: "Passing over the Gulf of #Oman at night -- city lights of #Muscat #Dubai #AbuDhabi and #Doha in the distance".
Recommended Practices for Interactive Video Portability
1990-10-01
3-9 4. Implementation details 4-1 4.1 Installation issues ....................... 4-1 April 15, 1990 Release R 1.0 vii contents 4.1.1 VDI ...passed via an ASCII or binary application interface to the Virtual Device Interface ( VDI ) Management Software. ’ VDI Management, in turn, executes...the commands by calling appropriate low-level services and passes responses back to the application via the application interface. VDI Manage- ment is
Mean-field message-passing equations in the Hopfield model and its generalizations
NASA Astrophysics Data System (ADS)
Mézard, Marc
2017-02-01
Motivated by recent progress in using restricted Boltzmann machines as preprocessing algorithms for deep neural network, we revisit the mean-field equations [belief-propagation and Thouless-Anderson Palmer (TAP) equations] in the best understood of such machines, namely the Hopfield model of neural networks, and we explicit how they can be used as iterative message-passing algorithms, providing a fast method to compute the local polarizations of neurons. In the "retrieval phase", where neurons polarize in the direction of one memorized pattern, we point out a major difference between the belief propagation and TAP equations: The set of belief propagation equations depends on the pattern which is retrieved, while one can use a unique set of TAP equations. This makes the latter method much better suited for applications in the learning process of restricted Boltzmann machines. In the case where the patterns memorized in the Hopfield model are not independent, but are correlated through a combinatorial structure, we show that the TAP equations have to be modified. This modification can be seen either as an alteration of the reaction term in TAP equations or, more interestingly, as the consequence of message passing on a graphical model with several hidden layers, where the number of hidden layers depends on the depth of the correlations in the memorized patterns. This layered structure is actually necessary when one deals with more general restricted Boltzmann machines.
2013-08-03
Earth observation taken during day pass by an Expedition 36 crew member on board the International Space Station (ISS). Per Twitter message: Perhaps a dandelion losing its seeds in the wind? Love clouds!
Paralysis is the loss of muscle function in part of your body. It happens when something goes ... way messages pass between your brain and muscles. Paralysis can be complete or partial. It can occur ...
Advanced Numerical Techniques of Performance Evaluation. Volume 2
1990-06-01
multiprocessor environment. This factor is determined by the overhead of the primitives available in the system ( semaphore , monitor , or message... semaphore , monitor , or message passing primitives ) and U the programming ability of the user who implements the simulation. " t,: the sequential...Warp Operating System . i Pro" lftevcnth ACM Symposum on Operating Systems Princlplcs, pages 77 9:3, Auslin, TX, Nov wicr 1987. ACM. [121 D.R. Jefferson
Benchmarking hypercube hardware and software
NASA Technical Reports Server (NTRS)
Grunwald, Dirk C.; Reed, Daniel A.
1986-01-01
It was long a truism in computer systems design that balanced systems achieve the best performance. Message passing parallel processors are no different. To quantify the balance of a hypercube design, an experimental methodology was developed and the associated suite of benchmarks was applied to several existing hypercubes. The benchmark suite includes tests of both processor speed in the absence of internode communication and message transmission speed as a function of communication patterns.
Data transfer using complete bipartite graph
NASA Astrophysics Data System (ADS)
Chandrasekaran, V. M.; Praba, B.; Manimaran, A.; Kailash, G.
2017-11-01
Information exchange extent is an estimation of the amount of information sent between two focuses on a framework in a given time period. It is an extremely significant perception in present world. There are many ways of message passing in the present situations. Some of them are through encryption, decryption, by using complete bipartite graph. In this paper, we recommend a method for communication using messages through encryption of a complete bipartite graph.
Interface Message Processors for the ARPA Computer Network
1975-04-01
Pluribus IMP construction and checkout; sizeable changes to the i*4P message-processing algorithms: and Satellite IMP issues. The IMP message...extremely low cost modification design. We have begun to consider changes to the MLC design which would enable the MLC to suppress continuous breaks...existing authentication mechanisms need not make these changes . 2.7 Other Topics During the first quarter BBN constructed an environmental test chamber
Trellis Tone Modulation Multiple-Access for Peer Discovery in D2D Networks
Lim, Chiwoo; Kim, Sang-Hyo
2018-01-01
In this paper, a new non-orthogonal multiple-access scheme, trellis tone modulation multiple-access (TTMMA), is proposed for peer discovery of distributed device-to-device (D2D) communication. The range and capacity of discovery are important performance metrics in peer discovery. The proposed trellis tone modulation uses single-tone transmission and achieves a long discovery range due to its low Peak-to-Average Power Ratio (PAPR). The TTMMA also exploits non-orthogonal resource assignment to increase the discovery capacity. For the multi-user detection of superposed multiple-access signals, a message-passing algorithm with supplementary schemes are proposed. With TTMMA and its message-passing demodulation, approximately 1.5 times the number of devices are discovered compared to the conventional frequency division multiple-access (FDMA)-based discovery. PMID:29673167
Statistical physics of hard combinatorial optimization: Vertex cover problem
NASA Astrophysics Data System (ADS)
Zhao, Jin-Hua; Zhou, Hai-Jun
2014-07-01
Typical-case computation complexity is a research topic at the boundary of computer science, applied mathematics, and statistical physics. In the last twenty years, the replica-symmetry-breaking mean field theory of spin glasses and the associated message-passing algorithms have greatly deepened our understanding of typical-case computation complexity. In this paper, we use the vertex cover problem, a basic nondeterministic-polynomial (NP)-complete combinatorial optimization problem of wide application, as an example to introduce the statistical physical methods and algorithms. We do not go into the technical details but emphasize mainly the intuitive physical meanings of the message-passing equations. A nonfamiliar reader shall be able to understand to a large extent the physics behind the mean field approaches and to adjust the mean field methods in solving other optimization problems.
Trellis Tone Modulation Multiple-Access for Peer Discovery in D2D Networks.
Lim, Chiwoo; Jang, Min; Kim, Sang-Hyo
2018-04-17
In this paper, a new non-orthogonal multiple-access scheme, trellis tone modulation multiple-access (TTMMA), is proposed for peer discovery of distributed device-to-device (D2D) communication. The range and capacity of discovery are important performance metrics in peer discovery. The proposed trellis tone modulation uses single-tone transmission and achieves a long discovery range due to its low Peak-to-Average Power Ratio (PAPR). The TTMMA also exploits non-orthogonal resource assignment to increase the discovery capacity. For the multi-user detection of superposed multiple-access signals, a message-passing algorithm with supplementary schemes are proposed. With TTMMA and its message-passing demodulation, approximately 1.5 times the number of devices are discovered compared to the conventional frequency division multiple-access (FDMA)-based discovery.
Moghadasi, Mohammad; Kozakov, Dima; Mamonov, Artem B.; Vakili, Pirooz; Vajda, Sandor; Paschalidis, Ioannis Ch.
2013-01-01
We introduce a message-passing algorithm to solve the Side Chain Positioning (SCP) problem. SCP is a crucial component of protein docking refinement, which is a key step of an important class of problems in computational structural biology called protein docking. We model SCP as a combinatorial optimization problem and formulate it as a Maximum Weighted Independent Set (MWIS) problem. We then employ a modified and convergent belief-propagation algorithm to solve a relaxation of MWIS and develop randomized estimation heuristics that use the relaxed solution to obtain an effective MWIS feasible solution. Using a benchmark set of protein complexes we demonstrate that our approach leads to more accurate docking predictions compared to a baseline algorithm that does not solve the SCP. PMID:23515575
Multi-partitioning for ADI-schemes on message passing architectures
NASA Technical Reports Server (NTRS)
Vanderwijngaart, Rob F.
1994-01-01
A kind of discrete-operator splitting called Alternating Direction Implicit (ADI) has been found to be useful in simulating fluid flow problems. In particular, it is being used to study the effects of hot exhaust jets from high performance aircraft on landing surfaces. Decomposition techniques that minimize load imbalance and message-passing frequency are described. Three strategies that are investigated for implementing the NAS Scalar Penta-diagonal Parallel Benchmark (SP) are transposition, pipelined Gaussian elimination, and multipartitioning. The multipartitioning strategy, which was used on Ethernet, was found to be the most efficient, although it was considered only a moderate success because of Ethernet's limited communication properties. The efficiency derived largely from the coarse granularity of the strategy, which reduced latencies and allowed overlap of communication and computation.
Parallel and fault-tolerant algorithms for hypercube multiprocessors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aykanat, C.
1988-01-01
Several techniques for increasing the performance of parallel algorithms on distributed-memory message-passing multi-processor systems are investigated. These techniques are effectively implemented for the parallelization of the Scaled Conjugate Gradient (SCG) algorithm on a hypercube connected message-passing multi-processor. Significant performance improvement is achieved by using these techniques. The SCG algorithm is used for the solution phase of an FE modeling system. Almost linear speed-up is achieved, and it is shown that hypercube topology is scalable for an FE class of problem. The SCG algorithm is also shown to be suitable for vectorization, and near supercomputer performance is achieved on a vectormore » hypercube multiprocessor by exploiting both parallelization and vectorization. Fault-tolerance issues for the parallel SCG algorithm and for the hypercube topology are also addressed.« less
Ultrascalable petaflop parallel supercomputer
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY
2010-07-20
A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob F.
2000-01-01
Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining, gather/scatter, and redistribution. At the end of the conversion process most intermediate Charon function calls will have been removed, the non-distributed arrays will have been deleted, and virtually the only remaining Charon functions calls are the high-level, highly optimized communications. Distribution of the data is under complete control of the programmer, although a wide range of useful distributions is easily available through predefined functions. A crucial aspect of the library is that it does not allocate space for distributed arrays, but accepts programmer-specified memory. This has two major consequences. First, codes parallelized using Charon do not suffer from encapsulation; user data is always directly accessible. This provides high efficiency, and also retains the possibility of using message passing directly for highly irregular communications. Second, non-distributed arrays can be interpreted as (trivial) distributions in the Charon sense, which allows them to be mapped to truly distributed arrays, and vice versa. This is the mechanism that enables incremental parallelization. In this paper we provide a brief introduction of the library and then focus on the actual steps in the parallelization process, using some representative examples from, among others, the NAS Parallel Benchmarks. We show how a complicated two-dimensional pipeline-the prototypical non-data-parallel algorithm- can be constructed with ease. To demonstrate the flexibility of the library, we give examples of the stepwise, efficient parallel implementation of nonlocal boundary conditions common in aircraft simulations, as well as the construction of the sequence of grids required for multigrid.
Testing New Programming Paradigms with NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.
2000-01-01
Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
Parallel image registration with a thin client interface
NASA Astrophysics Data System (ADS)
Saiprasad, Ganesh; Lo, Yi-Jung; Plishker, William; Lei, Peng; Ahmad, Tabassum; Shekhar, Raj
2010-03-01
Despite its high significance, the clinical utilization of image registration remains limited because of its lengthy execution time and a lack of easy access. The focus of this work was twofold. First, we accelerated our course-to-fine, volume subdivision-based image registration algorithm by a novel parallel implementation that maintains the accuracy of our uniprocessor implementation. Second, we developed a thin-client computing model with a user-friendly interface to perform rigid and nonrigid image registration. Our novel parallel computing model uses the message passing interface model on a 32-core cluster. The results show that, compared with the uniprocessor implementation, the parallel implementation of our image registration algorithm is approximately 5 times faster for rigid image registration and approximately 9 times faster for nonrigid registration for the images used. To test the viability of such systems for clinical use, we developed a thin client in the form of a plug-in in OsiriX, a well-known open source PACS workstation and DICOM viewer, and used it for two applications. The first application registered the baseline and follow-up MR brain images, whose subtraction was used to track progression of multiple sclerosis. The second application registered pretreatment PET and intratreatment CT of radiofrequency ablation patients to demonstrate a new capability of multimodality imaging guidance. The registration acceleration coupled with the remote implementation using a thin client should ultimately increase accuracy, speed, and access of image registration-based interpretations in a number of diagnostic and interventional applications.
Cluster Computing For Real Time Seismic Array Analysis.
NASA Astrophysics Data System (ADS)
Martini, M.; Giudicepietro, F.
A seismic array is an instrument composed by a dense distribution of seismic sen- sors that allow to measure the directional properties of the wavefield (slowness or wavenumber vector) radiated by a seismic source. Over the last years arrays have been widely used in different fields of seismological researches. In particular they are applied in the investigation of seismic sources on volcanoes where they can be suc- cessfully used for studying the volcanic microtremor and long period events which are critical for getting information on the volcanic systems evolution. For this reason arrays could be usefully employed for the volcanoes monitoring, however the huge amount of data produced by this type of instruments and the processing techniques which are quite time consuming limited their potentiality for this application. In order to favor a direct application of arrays techniques to continuous volcano monitoring we designed and built a small PC cluster able to near real time computing the kinematics properties of the wavefield (slowness or wavenumber vector) produced by local seis- mic source. The cluster is composed of 8 Intel Pentium-III bi-processors PC working at 550 MHz, and has 4 Gigabytes of RAM memory. It runs under Linux operating system. The developed analysis software package is based on the Multiple SIgnal Classification (MUSIC) algorithm and is written in Fortran. The message-passing part is based upon the LAM programming environment package, an open-source imple- mentation of the Message Passing Interface (MPI). The developed software system includes modules devote to receiving date by internet and graphical applications for the continuous displaying of the processing results. The system has been tested with a data set collected during a seismic experiment conducted on Etna in 1999 when two dense seismic arrays have been deployed on the northeast and the southeast flanks of this volcano. A real time continuous acquisition system has been simulated by a pro- gram which reads data from disk files and send them to a remote host by using the Internet protocols.
Message Design Guidelines For Screen-Based Programs.
ERIC Educational Resources Information Center
Rimar, G. I.
1996-01-01
Effective message design for screen-based computer or video instructional programs requires knowledge from many disciplines. Evaluates current conventions and suggests a new set of guidelines for screen-based designers. Discusses screen layout, highlighting and cueing, text font and style, text positioning, color, and graphical user interfaces for…
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2015-04-01
PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
Sharma, Parichit; Mantri, Shrikant S
2014-01-01
The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis.
Sharma, Parichit; Mantri, Shrikant S.
2014-01-01
The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis. PMID:24979410
Beyond Linear Syntax: An Image-Oriented Communication Aid
ERIC Educational Resources Information Center
Patel, Rupal; Pilato, Sam; Roy, Deb
2004-01-01
This article presents a novel AAC communication aid based on semantic rather than syntactic schema, leading to more natural message construction. Users interact with a two-dimensional spatially organized image schema, which depicts the semantic structure and contents of the message. An overview of the interface design is presented followed by…
Passing Messages between Biological Networks to Refine Predicted Interactions
Glass, Kimberly; Huttenhower, Curtis; Quackenbush, John; Yuan, Guo-Cheng
2013-01-01
Regulatory network reconstruction is a fundamental problem in computational biology. There are significant limitations to such reconstruction using individual datasets, and increasingly people attempt to construct networks using multiple, independent datasets obtained from complementary sources, but methods for this integration are lacking. We developed PANDA (Passing Attributes between Networks for Data Assimilation), a message-passing model using multiple sources of information to predict regulatory relationships, and used it to integrate protein-protein interaction, gene expression, and sequence motif data to reconstruct genome-wide, condition-specific regulatory networks in yeast as a model. The resulting networks were not only more accurate than those produced using individual data sets and other existing methods, but they also captured information regarding specific biological mechanisms and pathways that were missed using other methodologies. PANDA is scalable to higher eukaryotes, applicable to specific tissue or cell type data and conceptually generalizable to include a variety of regulatory, interaction, expression, and other genome-scale data. An implementation of the PANDA algorithm is available at www.sourceforge.net/projects/panda-net. PMID:23741402
Applications of Probabilistic Combiners on Linear Feedback Shift Register Sequences
2016-12-01
on the resulting output strings show a drastic increase in complexity, while simultaneously passing the stringent randomness tests required by the...a three-variable function. Our tests on the resulting output strings show a drastic increase in complex- ity, while simultaneously passing the...10001101 01000010 11101001 Decryption of a message that has been encrypted using bitwise XOR is quite simple. Since each bit is its own additive inverse
Sujansky, Walter; Wilson, Tom
2015-04-01
This report describes a grant-funded project to explore the use of DIRECT secure messaging for the electronic delivery of laboratory test results to outpatient physicians and electronic health record systems. The project seeks to leverage the inherent attributes of DIRECT secure messaging and electronic provider directories to overcome certain barriers to the delivery of lab test results in the outpatient setting. The described system enables laboratories that generate test results as HL7 messages to deliver these results as structured or unstructured documents attached to DIRECT secure messages. The system automatically analyzes generated HL7 messages and consults an electronic provider directory to determine the appropriate DIRECT address and delivery format for each indicated recipient. The system also enables lab results delivered to providers as structured attachments to be consumed by HL7 interface engines and incorporated into electronic health record systems. Lab results delivered as unstructured attachments may be printed or incorporated into patient records as PDF files. The system receives and logs acknowledgement messages to document the status of each transmitted lab result, and a graphical interface allows searching and review of this logged information. The described system is a fully implemented prototype that has been tested in a laboratory setting. Although this approach is promising, further work is required to pilot test the system in production settings with clinical laboratories and outpatient provider organizations. Copyright © 2015 Elsevier Inc. All rights reserved.
Method and systems for a radiation tolerant bus interface circuit
NASA Technical Reports Server (NTRS)
Kinstler, Gary A. (Inventor)
2007-01-01
A bus management tool that allows communication to be maintained between a group of nodes operatively connected on two busses in the presence of radiation by transmitting periodically a first message from one to another of the nodes on one of the busses, determining whether the first message was received by the other of the nodes on the first bus, and when it is determined that the first message was not received by the other of the nodes, transmitting a recovery command to the other of the nodes on a second of the of busses. Methods, systems, and articles of manufacture consistent with the present invention also provide for a bus recovery tool on the other node that re-initializes a bus interface circuit operatively connecting the other node to the first bus in response to the recovery command.
1984-10-29
Photoelectron energy analysis was done with Si s states 10--14 eV below E,. For CoSi2 and NiSi2, a commercial double-pass electron energy analyzer . The...Collaborative studies with theorists gave rise to modeling ,f interfaces and calculation of electronic energy states for ordered silicides. (i, ,I... analyzed by a double-pass cylindrical mirror energy A. There is no evidence of Cr outdiffusion into the Au analyzer , and the overall resolution
Regional-scale calculation of the LS factor using parallel processing
NASA Astrophysics Data System (ADS)
Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong
2015-05-01
With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.
Tough2{_}MP: A parallel version of TOUGH2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Keni; Wu, Yu-Shu; Ding, Chris
2003-04-09
TOUGH2{_}MP is a massively parallel version of TOUGH2. It was developed for running on distributed-memory parallel computers to simulate large simulation problems that may not be solved by the standard, single-CPU TOUGH2 code. The new code implements an efficient massively parallel scheme, while preserving the full capacity and flexibility of the original TOUGH2 code. The new software uses the METIS software package for grid partitioning and AZTEC software package for linear-equation solving. The standard message-passing interface is adopted for communication among processors. Numerical performance of the current version code has been tested on CRAY-T3E and IBM RS/6000 SP platforms. Inmore » addition, the parallel code has been successfully applied to real field problems of multi-million-cell simulations for three-dimensional multiphase and multicomponent fluid and heat flow, as well as solute transport. In this paper, we will review the development of the TOUGH2{_}MP, and discuss the basic features, modules, and their applications.« less
MPI, HPF or OpenMP: A Study with the NAS Benchmarks
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Hribar, Michelle; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)
1999-01-01
Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but the task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study,potentials of applying some of the techniques to realistic aerospace applications will be presented
Evaluation of a new parallel numerical parameter optimization algorithm for a dynamical system
NASA Astrophysics Data System (ADS)
Duran, Ahmet; Tuncel, Mehmet
2016-10-01
It is important to have a scalable parallel numerical parameter optimization algorithm for a dynamical system used in financial applications where time limitation is crucial. We use Message Passing Interface parallel programming and present such a new parallel algorithm for parameter estimation. For example, we apply the algorithm to the asset flow differential equations that have been developed and analyzed since 1989 (see [3-6] and references contained therein). We achieved speed-up for some time series to run up to 512 cores (see [10]). Unlike [10], we consider more extensive financial market situations, for example, in presence of low volatility, high volatility and stock market price at a discount/premium to its net asset value with varying magnitude, in this work. Moreover, we evaluated the convergence of the model parameter vector, the nonlinear least squares error and maximum improvement factor to quantify the success of the optimization process depending on the number of initial parameter vectors.
A De-centralized Scheduling and Load Balancing Algorithm for Heterogeneous Grid Environments
NASA Technical Reports Server (NTRS)
Arora, Manish; Das, Sajal K.; Biswas, Rupak
2002-01-01
In the past two decades, numerous scheduling and load balancing techniques have been proposed for locally distributed multiprocessor systems. However, they all suffer from significant deficiencies when extended to a Grid environment: some use a centralized approach that renders the algorithm unscalable, while others assume the overhead involved in searching for appropriate resources to be negligible. Furthermore, classical scheduling algorithms do not consider a Grid node to be N-resource rich and merely work towards maximizing the utilization of one of the resources. In this paper, we propose a new scheduling and load balancing algorithm for a generalized Grid model of N-resource nodes that not only takes into account the node and network heterogeneity, but also considers the overhead involved in coordinating among the nodes. Our algorithm is decentralized, scalable, and overlaps the node coordination time with that of the actual processing of ready jobs, thus saving valuable clock cycles needed for making decisions. The proposed algorithm is studied by conducting simulations using the Message Passing Interface (MPI) paradigm.
A De-Centralized Scheduling and Load Balancing Algorithm for Heterogeneous Grid Environments
NASA Technical Reports Server (NTRS)
Arora, Manish; Das, Sajal K.; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2002-01-01
In the past two decades, numerous scheduling and load balancing techniques have been proposed for locally distributed multiprocessor systems. However, they all suffer from significant deficiencies when extended to a Grid environment: some use a centralized approach that renders the algorithm unscalable, while others assume the overhead involved in searching for appropriate resources to be negligible. Furthermore, classical scheduling algorithms do not consider a Grid node to be N-resource rich and merely work towards maximizing the utilization of one of the resources. In this paper we propose a new scheduling and load balancing algorithm for a generalized Grid model of N-resource nodes that not only takes into account the node and network heterogeneity, but also considers the overhead involved in coordinating among the nodes. Our algorithm is de-centralized, scalable, and overlaps the node coordination time of the actual processing of ready jobs, thus saving valuable clock cycles needed for making decisions. The proposed algorithm is studied by conducting simulations using the Message Passing Interface (MPI) paradigm.
MPI, HPF or OpenMP: A Study with the NAS Benchmarks
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Hribar, M.; Waheed, A.; Yan, J.; Saini, Subhash (Technical Monitor)
1999-01-01
Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but this task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study, we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study, potentials of applying some of the techniques to realistic aerospace applications will be presented.
High Performance Geostatistical Modeling of Biospheric Resources
NASA Astrophysics Data System (ADS)
Pedelty, J. A.; Morisette, J. T.; Smith, J. A.; Schnase, J. L.; Crosier, C. S.; Stohlgren, T. J.
2004-12-01
We are using parallel geostatistical codes to study spatial relationships among biospheric resources in several study areas. For example, spatial statistical models based on large- and small-scale variability have been used to predict species richness of both native and exotic plants (hot spots of diversity) and patterns of exotic plant invasion. However, broader use of geostastics in natural resource modeling, especially at regional and national scales, has been limited due to the large computing requirements of these applications. To address this problem, we implemented parallel versions of the kriging spatial interpolation algorithm. The first uses the Message Passing Interface (MPI) in a master/slave paradigm on an open source Linux Beowulf cluster, while the second is implemented with the new proprietary Xgrid distributed processing system on an Xserve G5 cluster from Apple Computer, Inc. These techniques are proving effective and provide the basis for a national decision support capability for invasive species management that is being jointly developed by NASA and the US Geological Survey.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code
NASA Astrophysics Data System (ADS)
Simunovic, S.; Zacharia, T.; Baltas, N.; Spalding, D. B.
1995-03-01
PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. The Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simunovic, S.; Zacharia, T.; Baltas, N.
1995-04-01
PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. Themore » Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.« less
NASA Technical Reports Server (NTRS)
Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana
2017-01-01
In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.
NASA Astrophysics Data System (ADS)
Rizki, Permata Nur Miftahur; Lee, Heezin; Lee, Minsu; Oh, Sangyoon
2017-01-01
With the rapid advance of remote sensing technology, the amount of three-dimensional point-cloud data has increased extraordinarily, requiring faster processing in the construction of digital elevation models. There have been several attempts to accelerate the computation using parallel methods; however, little attention has been given to investigating different approaches for selecting the most suited parallel programming model for a given computing environment. We present our findings and insights identified by implementing three popular high-performance parallel approaches (message passing interface, MapReduce, and GPGPU) on time demanding but accurate kriging interpolation. The performances of the approaches are compared by varying the size of the grid and input data. In our empirical experiment, we demonstrate the significant acceleration by all three approaches compared to a C-implemented sequential-processing method. In addition, we also discuss the pros and cons of each method in terms of usability, complexity infrastructure, and platform limitation to give readers a better understanding of utilizing those parallel approaches for gridding purposes.
Zhan, X.
2005-01-01
A parallel Fortran-MPI (Message Passing Interface) software for numerical inversion of the Laplace transform based on a Fourier series method is developed to meet the need of solving intensive computational problems involving oscillatory water level's response to hydraulic tests in a groundwater environment. The software is a parallel version of ACM (The Association for Computing Machinery) Transactions on Mathematical Software (TOMS) Algorithm 796. Running 38 test examples indicated that implementation of MPI techniques with distributed memory architecture speedups the processing and improves the efficiency. Applications to oscillatory water levels in a well during aquifer tests are presented to illustrate how this package can be applied to solve complicated environmental problems involved in differential and integral equations. The package is free and is easy to use for people with little or no previous experience in using MPI but who wish to get off to a quick start in parallel computing. ?? 2004 Elsevier Ltd. All rights reserved.
Nagaoka, Tomoaki; Watanabe, Soichi
2012-01-01
Electromagnetic simulation with anatomically realistic computational human model using the finite-difference time domain (FDTD) method has recently been performed in a number of fields in biomedical engineering. To improve the method's calculation speed and realize large-scale computing with the computational human model, we adapt three-dimensional FDTD code to a multi-GPU cluster environment with Compute Unified Device Architecture and Message Passing Interface. Our multi-GPU cluster system consists of three nodes. The seven GPU boards (NVIDIA Tesla C2070) are mounted on each node. We examined the performance of the FDTD calculation on multi-GPU cluster environment. We confirmed that the FDTD calculation on the multi-GPU clusters is faster than that on a multi-GPU (a single workstation), and we also found that the GPU cluster system calculate faster than a vector supercomputer. In addition, our GPU cluster system allowed us to perform the large-scale FDTD calculation because were able to use GPU memory of over 100 GB.
Porting the AVS/Express scientific visualization software to Cray XT4.
Leaver, George W; Turner, Martin J; Perrin, James S; Mummery, Paul M; Withers, Philip J
2011-08-28
Remote scientific visualization, where rendering services are provided by larger scale systems than are available on the desktop, is becoming increasingly important as dataset sizes increase beyond the capabilities of desktop workstations. Uptake of such services relies on access to suitable visualization applications and the ability to view the resulting visualization in a convenient form. We consider five rules from the e-Science community to meet these goals with the porting of a commercial visualization package to a large-scale system. The application uses message-passing interface (MPI) to distribute data among data processing and rendering processes. The use of MPI in such an interactive application is not compatible with restrictions imposed by the Cray system being considered. We present details, and performance analysis, of a new MPI proxy method that allows the application to run within the Cray environment yet still support MPI communication required by the application. Example use cases from materials science are considered.
The AI Bus architecture for distributed knowledge-based systems
NASA Technical Reports Server (NTRS)
Schultz, Roger D.; Stobie, Iain
1991-01-01
The AI Bus architecture is layered, distributed object oriented framework developed to support the requirements of advanced technology programs for an order of magnitude improvement in software costs. The consequent need for highly autonomous computer systems, adaptable to new technology advances over a long lifespan, led to the design of an open architecture and toolbox for building large scale, robust, production quality systems. The AI Bus accommodates a mix of knowledge based and conventional components, running on heterogeneous, distributed real world and testbed environment. The concepts and design is described of the AI Bus architecture and its current implementation status as a Unix C++ library or reusable objects. Each high level semiautonomous agent process consists of a number of knowledge sources together with interagent communication mechanisms based on shared blackboards and message passing acquaintances. Standard interfaces and protocols are followed for combining and validating subsystems. Dynamic probes or demons provide an event driven means for providing active objects with shared access to resources, and each other, while not violating their security.
NASA Technical Reports Server (NTRS)
Becker, Jeffrey C.
1995-01-01
The Thinking Machines CM-5 platform was designed to run single program, multiple data (SPMD) applications, i.e., to run a single binary across all nodes of a partition, with each node possibly operating on different data. Certain classes of applications, such as multi-disciplinary computational fluid dynamics codes, are facilitated by the ability to have subsets of the partition nodes running different binaries. In order to extend the CM-5 system software to permit such applications, a multi-program loader was developed. This system is based on the dld loader which was originally developed for workstations. This paper provides a high level description of dld, and describes how it was ported to the CM-5 to provide support for multi-binary applications. Finally, it elaborates how the loader has been used to implement the CM-5 version of MPIRUN, a portable facility for running multi-disciplinary/multi-zonal MPI (Message-Passing Interface Standard) codes.
Spectral Element Method for the Simulation of Unsteady Compressible Flows
NASA Technical Reports Server (NTRS)
Diosady, Laslo Tibor; Murman, Scott M.
2013-01-01
This work uses a discontinuous-Galerkin spectral-element method (DGSEM) to solve the compressible Navier-Stokes equations [1{3]. The inviscid ux is computed using the approximate Riemann solver of Roe [4]. The viscous fluxes are computed using the second form of Bassi and Rebay (BR2) [5] in a manner consistent with the spectral-element approximation. The method of lines with the classical 4th-order explicit Runge-Kutta scheme is used for time integration. Results for polynomial orders up to p = 15 (16th order) are presented. The code is parallelized using the Message Passing Interface (MPI). The computations presented in this work are performed using the Sandy Bridge nodes of the NASA Pleiades supercomputer at NASA Ames Research Center. Each Sandy Bridge node consists of 2 eight-core Intel Xeon E5-2670 processors with a clock speed of 2.6Ghz and 2GB per core memory. On a Sandy Bridge node the Tau Benchmark [6] runs in a time of 7.6s.
Design and implementation of a hybrid MPI-CUDA model for the Smith-Waterman algorithm.
Khaled, Heba; Faheem, Hossam El Deen Mostafa; El Gohary, Rania
2015-01-01
This paper provides a novel hybrid model for solving the multiple pair-wise sequence alignment problem combining message passing interface and CUDA, the parallel computing platform and programming model invented by NVIDIA. The proposed model targets homogeneous cluster nodes equipped with similar Graphical Processing Unit (GPU) cards. The model consists of the Master Node Dispatcher (MND) and the Worker GPU Nodes (WGN). The MND distributes the workload among the cluster working nodes and then aggregates the results. The WGN performs the multiple pair-wise sequence alignments using the Smith-Waterman algorithm. We also propose a modified implementation to the Smith-Waterman algorithm based on computing the alignment matrices row-wise. The experimental results demonstrate a considerable reduction in the running time by increasing the number of the working GPU nodes. The proposed model achieved a performance of about 12 Giga cell updates per second when we tested against the SWISS-PROT protein knowledge base running on four nodes.
A hybrid neurogenetic approach for stock forecasting.
Kwon, Yung-Keun; Moon, Byung-Ro
2007-05-01
In this paper, we propose a hybrid neurogenetic system for stock trading. A recurrent neural network (NN) having one hidden layer is used for the prediction model. The input features are generated from a number of technical indicators being used by financial experts. The genetic algorithm (GA) optimizes the NN's weights under a 2-D encoding and crossover. We devised a context-based ensemble method of NNs which dynamically changes on the basis of the test day's context. To reduce the time in processing mass data, we parallelized the GA on a Linux cluster system using message passing interface. We tested the proposed method with 36 companies in NYSE and NASDAQ for 13 years from 1992 to 2004. The neurogenetic hybrid showed notable improvement on the average over the buy-and-hold strategy and the context-based ensemble further improved the results. We also observed that some companies were more predictable than others, which implies that the proposed neurogenetic hybrid can be used for financial portfolio construction.
Guo, L-X; Li, J; Zeng, H
2009-11-01
We present an investigation of the electromagnetic scattering from a three-dimensional (3-D) object above a two-dimensional (2-D) randomly rough surface. A Message Passing Interface-based parallel finite-difference time-domain (FDTD) approach is used, and the uniaxial perfectly matched layer (UPML) medium is adopted for truncation of the FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different number of processors is illustrated for one rough surface realization and shows that the computation time of our parallel FDTD algorithm is dramatically reduced relative to a single-processor implementation. Finally, the composite scattering coefficients versus scattered and azimuthal angle are presented and analyzed for different conditions, including the surface roughness, the dielectric constants, the polarization, and the size of the 3-D object.
Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Jin, Haoqiang; VanderWijngaart, Rob F.
2003-01-01
We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of grids, but had not previously been captured in bench-marks. The new suite, named NPB Multi-Zone, is extended from the NAS Parallel Benchmarks suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the Message Passing Interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on three different parallel computers. We also use an empirical formula to investigate the performance characteristics of the multi-zone benchmarks.
Employing multi-GPU power for molecular dynamics simulation: an extension of GALAMOST
NASA Astrophysics Data System (ADS)
Zhu, You-Liang; Pan, Deng; Li, Zhan-Wei; Liu, Hong; Qian, Hu-Jun; Zhao, Yang; Lu, Zhong-Yuan; Sun, Zhao-Yan
2018-04-01
We describe the algorithm of employing multi-GPU power on the basis of Message Passing Interface (MPI) domain decomposition in a molecular dynamics code, GALAMOST, which is designed for the coarse-grained simulation of soft matters. The code of multi-GPU version is developed based on our previous single-GPU version. In multi-GPU runs, one GPU takes charge of one domain and runs single-GPU code path. The communication between neighbouring domains takes a similar algorithm of CPU-based code of LAMMPS, but is optimised specifically for GPUs. We employ a memory-saving design which can enlarge maximum system size at the same device condition. An optimisation algorithm is employed to prolong the update period of neighbour list. We demonstrate good performance of multi-GPU runs on the simulation of Lennard-Jones liquid, dissipative particle dynamics liquid, polymer and nanoparticle composite, and two-patch particles on workstation. A good scaling of many nodes on cluster for two-patch particles is presented.
2013-07-26
Earth observation taken during day pass by an Expedition 36 crew member on board the International Space Station (ISS). Per Twitter message: Never tire of finding shapes in the clouds! These look very botanical to me. Simply perfect.
Sen. Alexander, Lamar [R-TN
2011-07-28
Senate - 12/19/2012 Message on House action received in Senate and at desk: House amendments to Senate bill. (All Actions) Tracker: This bill has the status Passed HouseHere are the steps for Status of Legislation:
Electronic Cigarettes on Twitter – Spreading the Appeal of Flavors
Chu, Kar-Hai; Unger, Jennifer B.; Cruz, Tess Boley; Soto, Daniel W.
2016-01-01
Objectives Social media platforms are used by tobacco companies to promote products. This study examines message content on Twitter from e-cigarette brands and determines if messages about flavors are more likely than non-flavor messages to be passed along to other viewers. Methods We examined Twitter data from 2 e-cigarette brands and identified messages that contained terms related to e-cigarette flavors. Results Flavor-related posts were retweeted at a significantly higher rate by e-cigarette brands (p = .04) and other Twitter users (p < .01) than non-flavor posts. Conclusions E-cigarette brands and other Twitter users pay attention to flavor-related posts and retweet them often. These findings suggest flavors continue to be an attractive characteristic and their marketing should be monitored closely. PMID:27853734
NASA Astrophysics Data System (ADS)
Min, Byungjoon
2018-01-01
Identifying the most influential spreaders is one of outstanding problems in physics of complex systems. So far, many approaches have attempted to rank the influence of nodes but there is still the lack of accuracy to single out influential spreaders. Here, we directly tackle the problem of finding important spreaders by solving analytically the expected size of epidemic outbreaks when spreading originates from a single seed. We derive and validate a theory for calculating the size of epidemic outbreaks with a single seed using a message-passing approach. In addition, we find that the probability to occur epidemic outbreaks is highly dependent on the location of the seed but the size of epidemic outbreaks once it occurs is insensitive to the seed. We also show that our approach can be successfully adapted into weighted networks.
The design of multi-core DSP parallel model based on message passing and multi-level pipeline
NASA Astrophysics Data System (ADS)
Niu, Jingyu; Hu, Jian; He, Wenjing; Meng, Fanrong; Li, Chuanrong
2017-10-01
Currently, the design of embedded signal processing system is often based on a specific application, but this idea is not conducive to the rapid development of signal processing technology. In this paper, a parallel processing model architecture based on multi-core DSP platform is designed, and it is mainly suitable for the complex algorithms which are composed of different modules. This model combines the ideas of multi-level pipeline parallelism and message passing, and summarizes the advantages of the mainstream model of multi-core DSP (the Master-Slave model and the Data Flow model), so that it has better performance. This paper uses three-dimensional image generation algorithm to validate the efficiency of the proposed model by comparing with the effectiveness of the Master-Slave and the Data Flow model.
Generalized hypercube structures and hyperswitch communication network
NASA Technical Reports Server (NTRS)
Young, Steven D.
1992-01-01
This paper discusses an ongoing study that uses a recent development in communication control technology to implement hybrid hypercube structures. These architectures are similar to binary hypercubes, but they also provide added connectivity between the processors. This added connectivity increases communication reliability while decreasing the latency of interprocessor message passing. Because these factors directly determine the speed that can be obtained by multiprocessor systems, these architectures are attractive for applications such as remote exploration and experimentation, where high performance and ultrareliability are required. This paper describes and enumerates these architectures and discusses how they can be implemented with a modified version of the hyperswitch communication network (HCN). The HCN is analyzed because it has three attractive features that enable these architectures to be effective: speed, fault tolerance, and the ability to pass multiple messages simultaneously through the same hyperswitch controller.
Parallel multiscale simulations of a brain aneurysm
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2012-01-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work. PMID:23734066
Parallel multiscale simulations of a brain aneurysm.
Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.
Parallel multiscale simulations of a brain aneurysm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em, E-mail: george_karniadakis@brown.edu
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm.more » The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.« less
MESA: Message-Based System Analysis Using Runtime Verification
NASA Technical Reports Server (NTRS)
Shafiei, Nastaran; Tkachuk, Oksana; Mehlitz, Peter
2017-01-01
In this paper, we present a novel approach and framework for run-time verication of large, safety critical messaging systems. This work was motivated by verifying the System Wide Information Management (SWIM) project of the Federal Aviation Administration (FAA). SWIM provides live air traffic, site and weather data streams for the whole National Airspace System (NAS), which can easily amount to several hundred messages per second. Such safety critical systems cannot be instrumented, therefore, verification and monitoring has to happen using a nonintrusive approach, by connecting to a variety of network interfaces. Due to a large number of potential properties to check, the verification framework needs to support efficient formulation of properties with a suitable Domain Specific Language (DSL). Our approach is to utilize a distributed system that is geared towards connectivity and scalability and interface it at the message queue level to a powerful verification engine. We implemented our approach in the tool called MESA: Message-Based System Analysis, which leverages the open source projects RACE (Runtime for Airspace Concept Evaluation) and TraceContract. RACE is a platform for instantiating and running highly concurrent and distributed systems and enables connectivity to SWIM and scalability. TraceContract is a runtime verication tool that allows for checking traces against properties specified in a powerful DSL. We applied our approach to verify a SWIM service against several requirements.We found errors such as duplicate and out-of-order messages.
2013-08-03
Earth observation taken during day pass by an Expedition 36 crew member on board the International Space Station (ISS). Per Twitter message: From southernmost point of orbit over the South Pacific- all clouds seemed to be leading to the South Pole.
2013-07-21
Earth observation taken during night pass by an Expedition 36 crew member on board the International Space Station (ISS). Per Twitter message this is labeled as : Tehran, Iran. Lights along the coast of the Caspian Sea visible through clouds. July 21.
Whistleblower Protection Enhancement Act of 2010
Sen. Akaka, Daniel K. [D-HI
2009-02-03
Senate - 12/22/2010 Message on House action received in Senate and at desk: House amendments to Senate bill. (All Actions) Tracker: This bill has the status Passed HouseHere are the steps for Status of Legislation:
2014-05-31
Earth Observation taken during a day pass by the Expedition 40 crew aboard the International Space Station (ISS). Folder lists this as: CEO - Arena de Sao Paolo. View used for Twitter message: Cloudy skies over São Paulo Brazil
Kim, Hwa Sun; Cho, Hune
2011-01-01
Objectives The Health Level Seven Interface Engine (HL7 IE), developed by Kyungpook National University, has been employed in health information systems, however users without a background in programming have reported difficulties in using it. Therefore, we developed a graphical user interface (GUI) engine to make the use of the HL7 IE more convenient. Methods The GUI engine was directly connected with the HL7 IE to handle the HL7 version 2.x messages. Furthermore, the information exchange rules (called the mapping data), represented by a conceptual graph in the GUI engine, were transformed into program objects that were made available to the HL7 IE; the mapping data were stored as binary files for reuse. The usefulness of the GUI engine was examined through information exchange tests between an HL7 version 2.x message and a health information database system. Results Users could easily create HL7 version 2.x messages by creating a conceptual graph through the GUI engine without requiring assistance from programmers. In addition, time could be saved when creating new information exchange rules by reusing the stored mapping data. Conclusions The GUI engine was not able to incorporate information types (e.g., extensible markup language, XML) other than the HL7 version 2.x messages and the database, because it was designed exclusively for the HL7 IE protocol. However, in future work, by including additional parsers to manage XML-based information such as Continuity of Care Documents (CCD) and Continuity of Care Records (CCR), we plan to ensure that the GUI engine will be more widely accessible for the health field. PMID:22259723
Kim, Hwa Sun; Cho, Hune; Lee, In Keun
2011-12-01
The Health Level Seven Interface Engine (HL7 IE), developed by Kyungpook National University, has been employed in health information systems, however users without a background in programming have reported difficulties in using it. Therefore, we developed a graphical user interface (GUI) engine to make the use of the HL7 IE more convenient. The GUI engine was directly connected with the HL7 IE to handle the HL7 version 2.x messages. Furthermore, the information exchange rules (called the mapping data), represented by a conceptual graph in the GUI engine, were transformed into program objects that were made available to the HL7 IE; the mapping data were stored as binary files for reuse. The usefulness of the GUI engine was examined through information exchange tests between an HL7 version 2.x message and a health information database system. Users could easily create HL7 version 2.x messages by creating a conceptual graph through the GUI engine without requiring assistance from programmers. In addition, time could be saved when creating new information exchange rules by reusing the stored mapping data. The GUI engine was not able to incorporate information types (e.g., extensible markup language, XML) other than the HL7 version 2.x messages and the database, because it was designed exclusively for the HL7 IE protocol. However, in future work, by including additional parsers to manage XML-based information such as Continuity of Care Documents (CCD) and Continuity of Care Records (CCR), we plan to ensure that the GUI engine will be more widely accessible for the health field.
Data communications in a parallel active messaging interface of a parallel computer
Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2014-09-02
Eager send data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints that specify a client, a context, and a task, including receiving an eager send data communications instruction with transfer data disposed in a send buffer characterized by a read/write send buffer memory address in a read/write virtual address space of the origin endpoint; determining for the send buffer a read-only send buffer memory address in a read-only virtual address space, the read-only virtual address space shared by both the origin endpoint and the target endpoint, with all frames of physical memory mapped to pages of virtual memory in the read-only virtual address space; and communicating by the origin endpoint to the target endpoint an eager send message header that includes the read-only send buffer memory address.
Data communications in a parallel active messaging interface of a parallel computer
Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2014-09-16
Eager send data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints that specify a client, a context, and a task, including receiving an eager send data communications instruction with transfer data disposed in a send buffer characterized by a read/write send buffer memory address in a read/write virtual address space of the origin endpoint; determining for the send buffer a read-only send buffer memory address in a read-only virtual address space, the read-only virtual address space shared by both the origin endpoint and the target endpoint, with all frames of physical memory mapped to pages of virtual memory in the read-only virtual address space; and communicating by the origin endpoint to the target endpoint an eager send message header that includes the read-only send buffer memory address.
Data communications in a parallel active messaging interface of a parallel computer
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E
2015-02-03
Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a SEND instruction, the SEND instruction specifying a transmission of transfer data from the origin endpoint to a first target endpoint; transmitting from the origin endpoint to the first target endpoint a Request-To-Send (`RTS`) message advising the first target endpoint of the location and size of the transfer data; assigning by the first target endpoint to each of a plurality of target endpoints separate portions of the transfer data; and receiving by the plurality of target endpoints the transfer data.
Data communications in a parallel active messaging interface of a parallel computer
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E
2014-11-18
Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a SEND instruction, the SEND instruction specifying a transmission of transfer data from the origin endpoint to a first target endpoint; transmitting from the origin endpoint to the first target endpoint a Request-To-Send (`RTS`) message advising the first target endpoint of the location and size of the transfer data; assigning by the first target endpoint to each of a plurality of target endpoints separate portions of the transfer data; and receiving by the plurality of target endpoints the transfer data.
System and method for transferring telemetry data between a ground station and a control center
NASA Technical Reports Server (NTRS)
Ray, Timothy J. (Inventor); Ly, Vuong T. (Inventor)
2012-01-01
Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for coordinating communications between a ground station, a control center, and a spacecraft. The method receives a call to a simple, unified application programmer interface implementing communications protocols related to outer space, when instruction relates to receiving a command at the control center for the ground station generate an abstract message by agreeing upon a format for each type of abstract message with the ground station and using a set of message definitions to configure the command in the agreed upon format, encode the abstract message to generate an encoded message, and transfer the encoded message to the ground station, and perform similar actions when the instruction relates to receiving a second command as a second encoded message at the ground station from the control center and when the determined instruction type relates to transmitting information to the control center.
1979-12-01
Links between processes can be aLlocated strictLy to controL functions. In fact, the degree of separation of control and data is an important research is...delays or Loss of control messages. Cognoscienti agree that message-passing IPC schemes are equivalent in "power" to schemes which employ shared...THEORETICAL WORK Page 55 SECTION 6 THEORETICAL WORK 6.1 WORKING GRUP JIM REP.OR STRUCTURE of Discussion: Distributed system without central (or any) control
ACR/NEMA Digital Image Interface Standard (An Illustrated Protocol Overview)
NASA Astrophysics Data System (ADS)
Lawrence, G. Robert
1985-09-01
The American College of Radiologists (ACR) and the National Electrical Manufacturers Association (NEMA) have sponsored a joint standards committee mandated to develop a universal interface standard for the transfer of radiology images among a variety of PACS imaging devicesl. The resulting standard interface conforms to the ISO/OSI standard reference model for network protocol layering. The standard interface specifies the lower layers of the reference model (Physical, Data Link, Transport and Session) and implies a requirement of the Network Layer should a requirement for a network exist. The message content has been considered and a flexible message and image format specified. The following Imaging Equipment modalities are supported by the standard interface... CT Computed Tomograpy DS Digital Subtraction NM Nuclear Medicine US Ultrasound MR Magnetic Resonance DR Digital Radiology The following data types are standardized over the transmission interface media.... IMAGE DATA DIGITIZED VOICE HEADER DATA RAW DATA TEXT REPORTS GRAPHICS OTHERS This paper consists of text supporting the illustrated protocol data flow. Each layer will be individually treated. Particular emphasis will be given to the Data Link layer (Frames) and the Transport layer (Packets). The discussion utilizes a finite state sequential machine model for the protocol layers.
NASA Astrophysics Data System (ADS)
Arkhipkin, D.; Lauret, J.
2017-10-01
One of the STAR experiment’s modular Messaging Interface and Reliable Architecture framework (MIRA) integration goals is to provide seamless and automatic connections with the existing control systems. After an initial proof of concept and operation of the MIRA system as a parallel data collection system for online use and real-time monitoring, the STAR Software and Computing group is now working on the integration of Experimental Physics and Industrial Control System (EPICS) with MIRA’s interfaces. This integration goals are to allow functional interoperability and, later on, to replace the existing/legacy Detector Control System components at the service level. In this report, we describe the evolutionary integration process and, as an example, will discuss the EPICS Alarm Handler conversion. We review the complete upgrade procedure starting with the integration of EPICS-originated alarm signals propagation into MIRA, followed by the replacement of the existing operator interface based on Motif Editor and Display Manager (MEDM) with modern portable web-based Alarm Handler interface. To achieve this aim, we have built an EPICS-to-MQTT [8] bridging service, and recreated the functionality of the original Alarm Handler using low-latency web messaging technologies. The integration of EPICS alarm handling into our messaging framework allowed STAR to improve the DCS alarm awareness of existing STAR DAQ and RTS services, which use MIRA as a primary source of experiment control information.
The Application of Current User Interface Technology to Interactive Wargaming Systems.
1987-09-01
components is essential to the Macintosh interface. Apple states that "Consistent visual communication is very powerful in delivering complex messages...interface. A visual interface uses visual objects as the basis of communication. "A visual communication object is some combination S. of text and...graphics used for communication under a system of inter- pretation, or visual language." The benefit of visual communication is V 45 "When humans are faced
Terrorism Risk Insurance Program Reauthorization Act of 2014
Sen. Schumer, Charles E. [D-NY
2014-04-10
Senate - 12/11/2014 Message on House action received in Senate and at desk: House amendment to Senate bill. (All Actions) Tracker: This bill has the status Passed HouseHere are the steps for Status of Legislation:
Electronic Message Preservation Act
Rep. Hodes, Paul W. [D-NH-2
2009-03-09
Senate - 03/18/2010 Received in the Senate and Read twice and referred to the Committee on Homeland Security and Governmental Affairs. (All Actions) Tracker: This bill has the status Passed HouseHere are the steps for Status of Legislation:
Federal Financial Assistance Management Improvement Act of 2009
Sen. Voinovich, George V. [R-OH
2009-01-22
Senate - 12/15/2009 Message on House action received in Senate and at desk: House amendment to Senate bill. (All Actions) Tracker: This bill has the status Passed HouseHere are the steps for Status of Legislation:
Bachhuber, Marcus A; McGinty, Emma E; Kennedy-Hendricks, Alene; Niederdeppe, Jeff; Barry, Colleen L
2015-01-01
Barriers to public support for naloxone distribution include lack of knowledge, concerns about potential unintended consequences, and lack of sympathy for people at risk of overdose. A randomized survey experiment was conducted with a nationally-representative web-based survey research panel (GfK KnowledgePanel). Participants were randomly assigned to read different messages alone or in combination: 1) factual information about naloxone; 2) pre-emptive refutation of potential concerns about naloxone distribution; and 3) a sympathetic narrative about a mother whose daughter died of an opioid overdose. Participants were then asked if they support or oppose policies related to naloxone distribution. For each policy item, logistic regression models were used to test the effect of each message exposure compared with the no-exposure control group. The final sample consisted of 1,598 participants (completion rate: 72.6%). Factual information and the sympathetic narrative alone each led to higher support for training first responders to use naloxone, providing naloxone to friends and family members of people using opioids, and passing laws to protect people who administer naloxone. Participants receiving the combination of the sympathetic narrative and factual information, compared to factual information alone, were more likely to support all policies: providing naloxone to friends and family members (OR: 2.0 [95% CI: 1.4 to 2.9]), training first responders to use naloxone (OR: 2.0 [95% CI: 1.2 to 3.4]), passing laws to protect people if they administer naloxone (OR: 1.5 [95% CI: 1.04 to 2.2]), and passing laws to protect people if they call for medical help for an overdose (OR: 1.7 [95% CI: 1.2 to 2.5]). All messages increased public support, but combining factual information and the sympathetic narrative was most effective. Public support for naloxone distribution can be improved through education and sympathetic portrayals of the population who stands to benefit from these policies.
Statistical analysis of loopy belief propagation in random fields
NASA Astrophysics Data System (ADS)
Yasuda, Muneki; Kataoka, Shun; Tanaka, Kazuyuki
2015-10-01
Loopy belief propagation (LBP), which is equivalent to the Bethe approximation in statistical mechanics, is a message-passing-type inference method that is widely used to analyze systems based on Markov random fields (MRFs). In this paper, we propose a message-passing-type method to analytically evaluate the quenched average of LBP in random fields by using the replica cluster variation method. The proposed analytical method is applicable to general pairwise MRFs with random fields whose distributions differ from each other and can give the quenched averages of the Bethe free energies over random fields, which are consistent with numerical results. The order of its computational cost is equivalent to that of standard LBP. In the latter part of this paper, we describe the application of the proposed method to Bayesian image restoration, in which we observed that our theoretical results are in good agreement with the numerical results for natural images.
NASA Technical Reports Server (NTRS)
Goldstein, David
1991-01-01
Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.
ERIC Educational Resources Information Center
Warfvinge, Per
2008-01-01
The ECTS grade transfer scale is an interface grade scale to help European universities, students and employers to understand the level of student achievement. Hence, the ECTS scale can be seen as an interface, transforming local scales to a common system where A-E denote passing grades. By definition, ECTS should distribute the passing students…
Systems and methods for separating particles and/or substances from a sample fluid
Mariella, Jr., Raymond P.; Dougherty, George M.; Dzenitis, John M.; Miles, Robin R.; Clague, David S.
2016-11-01
Systems and methods for separating particles and/or toxins from a sample fluid. A method according to one embodiment comprises simultaneously passing a sample fluid and a buffer fluid through a chamber such that a fluidic interface is formed between the sample fluid and the buffer fluid as the fluids pass through the chamber, the sample fluid having particles of interest therein; applying a force to the fluids for urging the particles of interest to pass through the interface into the buffer fluid; and substantially separating the buffer fluid from the sample fluid.
A Software Implementation of a Satellite Interface Message Processor.
ERIC Educational Resources Information Center
Eastwood, Margaret A.; Eastwood, Lester F., Jr.
A design for network control software for a computer network is described in which some nodes are linked by a communications satellite channel. It is assumed that the network has an ARPANET-like configuration; that is, that specialized processors at each node are responsible for message switching and network control. The purpose of the control…
Fernandes, Natália Maria da Silva; Bastos, Marcus Gomes; Oliveira, Nivalda A C de; Costa, Alex do Vale; Bernardino, Heder Soares
2015-01-01
The focus in the treatment of CKD is to prevent its progression through optimal medical control. The large number of patients with CKD has pressed nephrologists to assess more patients into ever-smaller periods of consultation. The use of light technologies as a promising form of health care. The internet offers the opportunity to manipulate the doctor in his professional contact with the user. To develop a web system to attend the patients with CKD not on dialysis and clinically stable stages at distance. Developed a system using the Java language, MySQL database and PrimeFaces framework; available on a Glassfish application server. The initial access is performed by the nephrologist, which registers the patients with their personal information and access data. After being registered, the patient (or family doctor) can enter the data of your query and these will be following, passed on to the nephrologist for evaluation. The form with the data of interest is pre-determined, but there is possibility to add free-form information. The system enables, in addition, there is exchange of messages between doctors and patients. In addition, users receive messages via e-mail alerting them of their duties. Confidentiality is guaranteed by individual passwords for doctors and patients. This tool will enable to increase the coverage area of nephrologists, reduce costs and bring the patient to the primary care physician, using the Family Health Program as an interface between the patient and the nephrology secondary care.
Effect of two layouts on high technology AAC navigation and content location by people with aphasia.
Wallace, Sarah E; Hux, Karen
2014-03-01
Navigating high-technology augmentative and alternative communication (AAC) devices with dynamic displays can be challenging for people with aphasia. The purpose of this study was to determine which of two AAC interfaces two people with aphasia could use most efficiently and accurately. The researchers used a BCB'C' alternating treatment design to provide device-use instruction to two people with severe aphasia regarding two personalised AAC interfaces that had different navigation layouts but identical content. One interface had static buttons for homepage and go-back features, and the other interface had static buttons in a navigation ring layout. Throughout treatment, the researchers monitored participants' mastery patterns regarding navigation efficiency and accuracy when locating target messages. Participants' accuracy and efficiency improved with both interfaces given intervention; however, the navigation ring layout appeared more transparent and better facilitated navigation than the homepage layout. People with aphasia can learn to navigate computerised devices; however, interface layout can substantially affect the efficiency and accuracy with which they locate messages. Given intervention incorporating errorless learning principles, people with chronic aphasia can learn to navigate across multiple device levels to locate target sentences. Both navigation ring and homepage interfaces may be used by people with aphasia. Some people with aphasia may be more consistent and efficient in finding target sentences using the navigation ring interface than the homepage interface. Additionally, the navigation ring interface may be more transparent and easier for people with aphasia to master--that is, they may require fewer intervention sessions to learn to navigate the navigation ring interface. Generalisation of learning may result from use of the navigation ring interface. Specifically, people with aphasia may improve navigation with the homepage interface as a result of instruction on the navigation interface, but not vice versa.
Message passing with queues and channels
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dozsa, Gabor J; Heidelberger, Philip; Kumar, Sameer
In an embodiment, a send thread receives an identifier that identifies a destination node and a pointer to data. The send thread creates a first send request in response to the receipt of the identifier and the data pointer. The send thread selects a selected channel from among a plurality of channels. The selected channel comprises a selected hand-off queue and an identification of a selected message unit. Each of the channels identifies a different message unit. The selected hand-off queue is randomly accessible. If the selected hand-off queue contains an available entry, the send thread adds the first sendmore » request to the selected hand-off queue. If the selected hand-off queue does not contain an available entry, the send thread removes a second send request from the selected hand-off queue and sends the second send request to the selected message unit.« less
Assisted entry mitigates text messaging-based driving detriment.
Sawyer, Benjamin D; Hancock, Peter A
2012-01-01
Previous research using cell phones indicates that manual manipulation is not a principal component of text messaging relating driving detriment. This paper suggests that manipulation of a phone in conjunction with the cognitive need to compose the message itself co-act to contribute to driving degradation. This being so, drivers sending text messages might experience reduced interference to the driving task if the text messaging itself were assisted through the predictive T9 system. We evaluated undergraduate drivers in a simulator who drove and texted using either Assisted Text entry, via Nokia's T9 system, or unassisted entry via the multitap interface. Results supported the superiority of the T9 system over the multitap system implying that specific assistive technologies can modulate the degradation of capacity which texting tragically induces.
DOT National Transportation Integrated Search
2002-10-01
The success of automation for intelligent transportation systems is ultimately contingent upon the Interface between the users (humans) and the system (ITS). The issues of variable message signs (VMS) and traffic signal device (TSD) design were studi...
Ten Design Points for the Human Interface to Instructional Multimedia.
ERIC Educational Resources Information Center
McFarland, Ronald D.
1995-01-01
Ten ways to design an effective Human-Computer Interface are explained. Highlights include material delivery that relates to user knowledge; appropriate screen presentations; attention value versus learning and recall; the relationship of packaging and message; the effectiveness of visuals and text; the use of color to enhance communication; the…
Signaling communication events in a computer network
Bender, Carl A.; DiNicola, Paul D.; Gildea, Kevin J.; Govindaraju, Rama K.; Kim, Chulho; Mirza, Jamshed H.; Shah, Gautam H.; Nieplocha, Jaroslaw
2000-01-01
A method, apparatus and program product for detecting a communication event in a distributed parallel data processing system in which a message is sent from an origin to a target. A low-level application programming interface (LAPI) is provided which has an operation for associating a counter with a communication event to be detected. The LAPI increments the counter upon the occurrence of the communication event. The number in the counter is monitored, and when the number increases, the event is detected. A completion counter in the origin is associated with the completion of a message being sent from the origin to the target. When the message is completed, LAPI increments the completion counter such that monitoring the completion counter detects the completion of the message. The completion counter may be used to insure that a first message has been sent from the origin to the target and completed before a second message is sent.
Scalable Optical-Fiber Communication Networks
NASA Technical Reports Server (NTRS)
Chow, Edward T.; Peterson, John C.
1993-01-01
Scalable arbitrary fiber extension network (SAFEnet) is conceptual fiber-optic communication network passing digital signals among variety of computers and input/output devices at rates from 200 Mb/s to more than 100 Gb/s. Intended for use with very-high-speed computers and other data-processing and communication systems in which message-passing delays must be kept short. Inherent flexibility makes it possible to match performance of network to computers by optimizing configuration of interconnections. In addition, interconnections made redundant to provide tolerance to faults.
Domestic Minor Sex Trafficking Deterrence and Victims Support Act of 2010
Sen. Wyden, Ron [D-OR
2009-12-22
Senate - 12/21/2010 Message on House action received in Senate and at desk: House amendment to Senate bill. (All Actions) Tracker: This bill has the status Passed HouseHere are the steps for Status of Legislation:
Data communications in a parallel active messaging interface of a parallel computer
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E
2014-02-11
Data communications in a parallel active messaging interface ('PAMI') or a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution of a compute node, including specification of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications instruction, the instruction characterized by instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance witht the instruction type, the transfer data from the origin endpoin to the target endpoint.
Digital Systems Validation Handbook. Volume 2. Chapter 19. Pilot - Vehicle Interface
1993-11-01
checklists, and other status messages. Voice interactive systems are defi-ed as "the interface between a cooperative human and a machine, which involv -he...Pilot-Vehicle Interface 19-85 5.6.1 Crew Interaction and the Cockpit 19-85 5.6.2 Crew Resource Management and Safety 19-87 5.6.3 Pilot and Crew Training...systems was a "stand-alone" component performing its intended function. Systems and their cockpit interfaces were added as technological advances were
Chaos synchronization communication using extremely unsymmetrical bidirectional injections.
Zhang, Wei Li; Pan, Wei; Luo, Bin; Zou, Xi Hua; Wang, Meng Yao; Zhou, Zhi
2008-02-01
Chaos synchronization and message transmission between two semiconductor lasers with extremely unsymmetrical bidirectional injections (EUBIs) are discussed. By using EUBIs, synchronization is realized through injection locking. Numerical results show that if the laser subjected to strong injection serves as the receiver, chaos pass filtering (CPF) of the system is similar to that of unidirectional coupled systems. Moreover, if the other laser serves as the receiver, a stronger CPF can be obtained. Finally, we demonstrate that messages can be extracted successfully from either of the two transmission directions of the system.
Monitoring tools of COMPASS experiment at CERN
NASA Astrophysics Data System (ADS)
Bodlak, M.; Frolov, V.; Huber, S.; Jary, V.; Konorov, I.; Levit, D.; Novy, J.; Salac, R.; Tomsa, J.; Virius, M.
2015-12-01
This paper briefly introduces the data acquisition system of the COMPASS experiment and is mainly focused on the part that is responsible for the monitoring of the nodes in the whole newly developed data acquisition system of this experiment. The COMPASS is a high energy particle experiment with a fixed target located at the SPS of the CERN laboratory in Geneva, Switzerland. The hardware of the data acquisition system has been upgraded to use FPGA cards that are responsible for data multiplexing and event building. The software counterpart of the system includes several processes deployed in heterogenous network environment. There are two processes, namely Message Logger and Message Browser, taking care of monitoring. These tools handle messages generated by nodes in the system. While Message Logger collects and saves messages to the database, the Message Browser serves as a graphical interface over the database containing these messages. For better performance, certain database optimizations have been used. Lastly, results of performance tests are presented.
NASA Astrophysics Data System (ADS)
Vijayanand, V. D.; Vasudevan, M.; Ganesan, V.; Parameswaran, P.; Laha, K.; Bhaduri, A. K.
2016-06-01
Creep deformation and rupture behavior of single-pass and dual-pass 316LN stainless steel (SS) weld joints fabricated by an autogenous activated tungsten inert gas welding process have been assessed by performing metallography, hardness, and conventional and impression creep tests. The fusion zone of the single-pass joint consisted of columnar zones adjacent to base metals with a central equiaxed zone, which have been modified extensively by the thermal cycle of the second pass in the dual-pass joint. The equiaxed zone in the single-pass joint, as well as in the second pass of the dual-pass joint, displayed the lowest hardness in the joints. In the dual-pass joint, the equiaxed zone of the first pass had hardness comparable to the columnar zone. The hardness variations in the joints influenced the creep deformation. The equiaxed and columnar zone in the first pass of the dual-pass joint was more creep resistant than that of the second pass. Both joints possessed lower creep rupture life than the base metal. However, the creep rupture life of the dual-pass joint was about twofolds more than that of the single-pass joint. Creep failure in the single-pass joint occurred in the central equiaxed fusion zone, whereas creep cavitation that originated in the second pass was blocked at the weld pass interface. The additional interface and strength variation between two passes in the dual-pass joint provides more restraint to creep deformation and crack propagation in the fusion zone, resulting in an increase in the creep rupture life of the dual-pass joint over the single-pass joint. Furthermore, the differences in content, morphology, and distribution of delta ferrite in the fusion zone of the joints favors more creep cavitation resistance in the dual-pass joint over the single-pass joint with the enhancement of creep rupture life.
Scalable Integrated Multi-Mission Support System (SIMSS) Simulator Release 2.0 for GMSEC
NASA Technical Reports Server (NTRS)
Kim, John; Velamuri, Sarma; Casey, Taylor; Bemann, Travis
2012-01-01
Scalable Integrated Multi-Mission Support System (SIMSS) Simulator Release 2.0 software is designed to perform a variety of test activities related to spacecraft simulations and ground segment checks. This innovation uses the existing SIMSS framework, which interfaces with the GMSEC (Goddard Mission Services Evolution Center) Application Programming Interface (API) Version 3.0 message middleware, and allows SIMSS to accept GMSEC standard messages via the GMSEC message bus service. SIMSS is a distributed, component-based, plug-and-play client-server system that is useful for performing real-time monitoring and communications testing. SIMSS runs on one or more workstations, and is designed to be user-configurable, or to use predefined configurations for routine operations. SIMSS consists of more than 100 modules that can be configured to create, receive, process, and/or transmit data. The SIMSS/GMSEC innovation is intended to provide missions with a low-cost solution for implementing their ground systems, as well as to significantly reduce a mission s integration time and risk.
Disaster Relief Appropriations Act, 2013
Rep. Rogers, Harold [R-KY-5
2011-02-11
Senate - 12/30/2012 Message on Senate action sent to the House. (All Actions) Notes: This bill was for a time the Senate legislative vehicle for Hurricane Sandy supplemental appropriations. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
Speech-based E-mail and driver behavior: effects of an in-vehicle message system interface.
Jamson, A Hamish; Westerman, Stephen J; Hockey, G Robert J; Carsten, Oliver M J
2004-01-01
As mobile office technology becomes more advanced, drivers have increased opportunity to process information "on the move." Although speech-based interfaces can minimize direct interference with driving, the cognitive demands associated with such systems may still cause distraction. We studied the effects on driving performance of an in-vehicle simulated "E-mail" message system; E-mails were either system controlled or driver controlled. A high-fidelity, fixed-base driving simulator was used to test 19 participants on a car-following task. Virtual traffic scenarios varying in driving demand. Drivers compensated for the secondary task by adopting longer headways but showed reduced anticipation of braking requirements and shorter time to collision. Drivers were also less reactive when processing E-mails, demonstrated by a reduction in steering wheel inputs. In most circumstances, there were advantages in providing drivers with control over when E-mails were opened. However, during periods without E-mail interaction in demanding traffic scenarios, drivers showed reduced braking anticipation. This may be a result of increased cognitive costs associated with the decision making process when using a driver-controlled interface when the task of scheduling E-mail acceptance is added to those of driving and E-mail response. Actual or potential applications of this research include the design of speech-based in-vehicle messaging systems.
Software Architecture Evolution
2013-12-01
system’s major components occurring via a Java Message Service message bus [69]. This architecture was designed to promote loose coupling of soft- ware...play reconfiguration of the system. The components were Java -based and platform-independent; the interfaces by which they communicated were based on...The MPCS database, a MySQL database used for storing telemetry as well as some other information, such as logs and commanding data [68]. This
NASA Technical Reports Server (NTRS)
Lupton, W. F.; Conrad, A. R.
1992-01-01
KTL is a set of routines which eases the job of writing applications which must interact with a variety of underlying sub-systems (known as services). A typical application is an X Window user interface coordinating telescope and instruments. In order to connect to a service, application code specifies a service name--typically an instrument name--and a style, which defines the way in which the application will interact with the service. Two styles are currently supported: keyword, where the application reads and writes named keywords and the resulting inter-task message traffic is hidden; and message, where the application deals directly with messages. The keyword style is intended mainly for user interfaces, and the message style is intended mainly for lower-level applications. KTL applications are event driven: a typical application first connects to all its desired services, then expresses interest in specified events. The application then enters an event dispatch loop in which it waits for events and calls the appropriate service's event-handling routine. Each event is associated with a call-back routine which is invoked when the event occurs. Call-back routines may (and typically do) interact with other sub-systems and KTL provides the means of doing so without blocking the application (vital for X Window user interfaces). This approach is a marriage of ideas culled from the X window, ADAM, Keck instrument, and Keck telescope control systems. A novel feature of KTL is that it knows nothing about any services or styles. Instead it defines a generic set of routines which must be implemented by all services and styles (essentially open(), ioctl(), read(), write(), event(), and close()) and activates sharable libraries at run-time. Services have been implemented (in both keyword and message styles) for HIRES (the Keck high resolution echelle spectrograph built by Lick Observatory), LWS (the Keck long wavelength spectrometer built by UC San Diego), and the Keck telescope. Each of these implementations uses different underlying message systems: the Lick MUSIC system, RPC's, and direct sockets (respectively). Services for the remaining three front-line Keck instruments will be implemented over the next few months.
Interactive Supercomputing’s Star-P Platform
DOE Office of Scientific and Technical Information (OSTI.GOV)
Edelman, Alan; Husbands, Parry; Leibman, Steve
2006-09-19
The thesis of this extended abstract is simple. High productivity comes from high level infrastructures. To measure this, we introduce a methodology that goes beyond the tradition of timing software in serial and tuned parallel modes. We perform a classroom productivity study involving 29 students who have written a homework exercise in a low level language (MPI message passing) and a high level language (Star-P with MATLAB client). Our conclusions indicate what perhaps should be of little surprise: (1) the high level language is always far easier on the students than the low level language. (2) The early versions ofmore » the high level language perform inadequately compared to the tuned low level language, but later versions substantially catch up. Asymptotically, the analogy must hold that message passing is to high level language parallel programming as assembler is to high level environments such as MATLAB, Mathematica, Maple, or even Python. We follow the Kepner method that correctly realizes that traditional speedup numbers without some discussion of the human cost of reaching these numbers can fail to reflect the true human productivity cost of high performance computing. Traditional data compares low level message passing with serial computation. With the benefit of a high level language system in place, in our case Star-P running with MATLAB client, and with the benefit of a large data pool: 29 students, each running the same code ten times on three evolutions of the same platform, we can methodically demonstrate the productivity gains. To date we are not aware of any high level system as extensive and interoperable as Star-P, nor are we aware of an experiment of this kind performed with this volume of data.« less
Combining Distributed and Shared Memory Models: Approach and Evolution of the Global Arrays Toolkit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nieplocha, Jarek; Harrison, Robert J.; Kumar, Mukul
2002-07-29
Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in the modern computers this characteristic might have a negative impact on performance and scalability. Various techniques, such as code restructuring to increase data reuse and introducing blocking in data accesses, can address the problem and yield performance competitive with message passing[Singh], however at the cost of compromising the ease of use feature. Distributed memory models such as message passing or one-sided communication offer performance and scalability butmore » they compromise the ease-of-use. In this context, the message-passing model is sometimes referred to as?assembly programming for the scientific computing?. The Global Arrays toolkit[GA1, GA2] attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed explicitly by the programmer. This management is achieved by explicit calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be explicitly specified and hence managed. The GA model exposes to the programmer the hierarchical memory of modern high-performance computer systems, and by recognizing the communication overhead for remote data transfer, it promotes data reuse and locality of reference. This paper describes the characteristics of the Global Arrays programming model, capabilities of the toolkit, and discusses its evolution.« less
Performance Measurement, Visualization and Modeling of Parallel and Distributed Programs
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Sarukkai, Sekhar R.; Mehra, Pankaj; Lum, Henry, Jr. (Technical Monitor)
1994-01-01
This paper presents a methodology for debugging the performance of message-passing programs on both tightly coupled and loosely coupled distributed-memory machines. The AIMS (Automated Instrumentation and Monitoring System) toolkit, a suite of software tools for measurement and analysis of performance, is introduced and its application illustrated using several benchmark programs drawn from the field of computational fluid dynamics. AIMS includes (i) Xinstrument, a powerful source-code instrumentor, which supports both Fortran77 and C as well as a number of different message-passing libraries including Intel's NX Thinking Machines' CMMD, and PVM; (ii) Monitor, a library of timestamping and trace -collection routines that run on supercomputers (such as Intel's iPSC/860, Delta, and Paragon and Thinking Machines' CM5) as well as on networks of workstations (including Convex Cluster and SparcStations connected by a LAN); (iii) Visualization Kernel, a trace-animation facility that supports source-code clickback, simultaneous visualization of computation and communication patterns, as well as analysis of data movements; (iv) Statistics Kernel, an advanced profiling facility, that associates a variety of performance data with various syntactic components of a parallel program; (v) Index Kernel, a diagnostic tool that helps pinpoint performance bottlenecks through the use of abstract indices; (vi) Modeling Kernel, a facility for automated modeling of message-passing programs that supports both simulation -based and analytical approaches to performance prediction and scalability analysis; (vii) Intrusion Compensator, a utility for recovering true performance from observed performance by removing the overheads of monitoring and their effects on the communication pattern of the program; and (viii) Compatibility Tools, that convert AIMS-generated traces into formats used by other performance-visualization tools, such as ParaGraph, Pablo, and certain AVS/Explorer modules.
Comparison of neuronal spike exchange methods on a Blue Gene/P supercomputer.
Hines, Michael; Kumar, Sameer; Schürmann, Felix
2011-01-01
For neural network simulations on parallel machines, interprocessor spike communication can be a significant portion of the total simulation time. The performance of several spike exchange methods using a Blue Gene/P (BG/P) supercomputer has been tested with 8-128 K cores using randomly connected networks of up to 32 M cells with 1 k connections per cell and 4 M cells with 10 k connections per cell, i.e., on the order of 4·10(10) connections (K is 1024, M is 1024(2), and k is 1000). The spike exchange methods used are the standard Message Passing Interface (MPI) collective, MPI_Allgather, and several variants of the non-blocking Multisend method either implemented via non-blocking MPI_Isend, or exploiting the possibility of very low overhead direct memory access (DMA) communication available on the BG/P. In all cases, the worst performing method was that using MPI_Isend due to the high overhead of initiating a spike communication. The two best performing methods-the persistent Multisend method using the Record-Replay feature of the Deep Computing Messaging Framework DCMF_Multicast; and a two-phase multisend in which a DCMF_Multicast is used to first send to a subset of phase one destination cores, which then pass it on to their subset of phase two destination cores-had similar performance with very low overhead for the initiation of spike communication. Departure from ideal scaling for the Multisend methods is almost completely due to load imbalance caused by the large variation in number of cells that fire on each processor in the interval between synchronization. Spike exchange time itself is negligible since transmission overlaps with computation and is handled by a DMA controller. We conclude that ideal performance scaling will be ultimately limited by imbalance between incoming processor spikes between synchronization intervals. Thus, counterintuitively, maximization of load balance requires that the distribution of cells on processors should not reflect neural net architecture but be randomly distributed so that sets of cells which are burst firing together should be on different processors with their targets on as large a set of processors as possible.
NASA Astrophysics Data System (ADS)
Zapata, M. A. Uh; Van Bang, D. Pham; Nguyen, K. D.
2016-05-01
This paper presents a parallel algorithm for the finite-volume discretisation of the Poisson equation on three-dimensional arbitrary geometries. The proposed method is formulated by using a 2D horizontal block domain decomposition and interprocessor data communication techniques with message passing interface. The horizontal unstructured-grid cells are reordered according to the neighbouring relations and decomposed into blocks using a load-balanced distribution to give all processors an equal amount of elements. In this algorithm, two parallel successive over-relaxation methods are presented: a multi-colour ordering technique for unstructured grids based on distributed memory and a block method using reordering index following similar ideas of the partitioning for structured grids. In all cases, the parallel algorithms are implemented with a combination of an acceleration iterative solver. This solver is based on a parabolic-diffusion equation introduced to obtain faster solutions of the linear systems arising from the discretisation. Numerical results are given to evaluate the performances of the methods showing speedups better than linear.
New Parallel computing framework for radiation transport codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kostin, M.A.; /Michigan State U., NSCL; Mokhov, N.V.
A new parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was integrated with the MARS15 code, and an effort is under way to deploy it in PHITS. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility canmore » be used in single process calculations as well as in the parallel regime. Several checkpoint files can be merged into one thus combining results of several calculations. The framework also corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.« less
Roofline model toolkit: A practical tool for architectural and program analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lo, Yu Jung; Williams, Samuel; Van Straalen, Brian
We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measuremore » sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.« less
Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F
2002-02-01
This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sitaraman, Hariswaran; Grout, Ray
2015-10-30
The load balancing strategies for hybrid solvers that involve grid based partial differential equation solution coupled with particle tracking are presented in this paper. A typical Message Passing Interface (MPI) based parallelization of grid based solves are done using a spatial domain decomposition while particle tracking is primarily done using either of the two techniques. One of the techniques is to distribute the particles to MPI ranks to whose grid they belong to while the other is to share the particles equally among all ranks, irrespective of their spatial location. The former technique provides spatial locality for field interpolation butmore » cannot assure load balance in terms of number of particles, which is achieved by the latter. The two techniques are compared for a case of particle tracking in a homogeneous isotropic turbulence box as well as a turbulent jet case. We performed a strong scaling study for more than 32,000 cores, which results in particle densities representative of anticipated exascale machines. The use of alternative implementations of MPI collectives and efficient load equalization strategies are studied to reduce data communication overheads.« less
Shahmohammadi Beni, Mehrdad; Ng, C Y P; Krstic, D; Nikezic, D; Yu, K N
2017-01-01
Radiotherapy is a common cancer treatment module, where a certain amount of dose will be delivered to the targeted organ. This is achieved usually by photons generated by linear accelerator units. However, radiation scattering within the patient's body and the surrounding environment will lead to dose dispersion to healthy tissues which are not targets of the primary radiation. Determination of the dispersed dose would be important for assessing the risk and biological consequences in different organs or tissues. In the present work, the concept of conversion coefficient (F) of the dispersed dose was developed, in which F = (Dd/Dt), where Dd was the dispersed dose in a non-targeted tissue and Dt is the absorbed dose in the targeted tissue. To quantify Dd and Dt, a comprehensive model was developed using the Monte Carlo N-Particle (MCNP) package to simulate the linear accelerator head, the human phantom, the treatment couch and the radiotherapy treatment room. The present work also demonstrated the feasibility and power of parallel computing through the use of the Message Passing Interface (MPI) version of MCNP5.
Krstic, D.; Nikezic, D.
2017-01-01
Radiotherapy is a common cancer treatment module, where a certain amount of dose will be delivered to the targeted organ. This is achieved usually by photons generated by linear accelerator units. However, radiation scattering within the patient’s body and the surrounding environment will lead to dose dispersion to healthy tissues which are not targets of the primary radiation. Determination of the dispersed dose would be important for assessing the risk and biological consequences in different organs or tissues. In the present work, the concept of conversion coefficient (F) of the dispersed dose was developed, in which F = (Dd/Dt), where Dd was the dispersed dose in a non-targeted tissue and Dt is the absorbed dose in the targeted tissue. To quantify Dd and Dt, a comprehensive model was developed using the Monte Carlo N-Particle (MCNP) package to simulate the linear accelerator head, the human phantom, the treatment couch and the radiotherapy treatment room. The present work also demonstrated the feasibility and power of parallel computing through the use of the Message Passing Interface (MPI) version of MCNP5. PMID:28362837
User's guide to the Reliability Estimation System Testbed (REST)
NASA Technical Reports Server (NTRS)
Nicol, David M.; Palumbo, Daniel L.; Rifkin, Adam
1992-01-01
The Reliability Estimation System Testbed is an X-window based reliability modeling tool that was created to explore the use of the Reliability Modeling Language (RML). RML was defined to support several reliability analysis techniques including modularization, graphical representation, Failure Mode Effects Simulation (FMES), and parallel processing. These techniques are most useful in modeling large systems. Using modularization, an analyst can create reliability models for individual system components. The modules can be tested separately and then combined to compute the total system reliability. Because a one-to-one relationship can be established between system components and the reliability modules, a graphical user interface may be used to describe the system model. RML was designed to permit message passing between modules. This feature enables reliability modeling based on a run time simulation of the system wide effects of a component's failure modes. The use of failure modes effects simulation enhances the analyst's ability to correctly express system behavior when using the modularization approach to reliability modeling. To alleviate the computation bottleneck often found in large reliability models, REST was designed to take advantage of parallel processing on hypercube processors.
Shehzad, Danish; Bozkuş, Zeki
2016-01-01
Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA) by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models.
Katouda, Michio; Naruse, Akira; Hirano, Yukihiko; Nakajima, Takahito
2016-11-15
A new parallel algorithm and its implementation for the RI-MP2 energy calculation utilizing peta-flop-class many-core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual-level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi-node and multi-GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi-node and multi-GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
A Comparison of Ffowcs Williams-Hawkings Solvers for Airframe Noise Applications
NASA Technical Reports Server (NTRS)
Lockard, David P.
2002-01-01
This paper presents a comparison between two implementations of the Ffowcs Williams and Hawkings equation for airframe noise applications. Airframe systems are generally moving at constant speed and not rotating, so these conditions are used in the current investigation. Efficient and easily implemented forms of the equations applicable to subsonic, rectilinear motion of all acoustic sources are used. The assumptions allow the derivation of a simple form of the equations in the frequency-domain, and the time-domain method uses the restrictions on the motion to reduce the work required to find the emission time. The comparison between the frequency domain method and the retarded time formulation reveals some of the advantages of the different approaches. Both methods are still capable of predicting the far-field noise from nonlinear near-field flow quantities. Because of the large input data sets and potentially large numbers of observer positions of interest in three-dimensional problems, both codes utilize the message passing interface to divide the problem among different processors. Example problems are used to demonstrate the usefulness and efficiency of the two schemes.
Bozkuş, Zeki
2016-01-01
Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA) by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models. PMID:27413363
A hybrid parallel framework for the cellular Potts model simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Yi; He, Kejing; Dong, Shoubin
2009-01-01
The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approachmore » achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kusoglu Sarikaya, C.; Rafatov, I., E-mail: rafatov@metu.edu.tr; Kudryavtsev, A. A.
2016-06-15
The work deals with the Particle in Cell/Monte Carlo Collision (PIC/MCC) analysis of the problem of detection and identification of impurities in the nonlocal plasma of gas discharge using the Plasma Electron Spectroscopy (PLES) method. For this purpose, 1d3v PIC/MCC code for numerical simulation of glow discharge with nonlocal electron energy distribution function is developed. The elastic, excitation, and ionization collisions between electron-neutral pairs and isotropic scattering and charge exchange collisions between ion-neutral pairs and Penning ionizations are taken into account. Applicability of the numerical code is verified under the Radio-Frequency capacitively coupled discharge conditions. The efficiency of the codemore » is increased by its parallelization using Open Message Passing Interface. As a demonstration of the PLES method, parallel PIC/MCC code is applied to the direct current glow discharge in helium doped with a small amount of argon. Numerical results are consistent with the theoretical analysis of formation of nonlocal EEDF and existing experimental data.« less
Power flow prediction in vibrating systems via model reduction
NASA Astrophysics Data System (ADS)
Li, Xianhui
This dissertation focuses on power flow prediction in vibrating systems. Reduced order models (ROMs) are built based on rational Krylov model reduction which preserve power flow information in the original systems over a specified frequency band. Stiffness and mass matrices of the ROMs are obtained by projecting the original system matrices onto the subspaces spanned by forced responses. A matrix-free algorithm is designed to construct ROMs directly from the power quantities at selected interpolation frequencies. Strategies for parallel implementation of the algorithm via message passing interface are proposed. The quality of ROMs is iteratively refined according to the error estimate based on residual norms. Band capacity is proposed to provide a priori estimate of the sizes of good quality ROMs. Frequency averaging is recast as ensemble averaging and Cauchy distribution is used to simplify the computation. Besides model reduction for deterministic systems, details of constructing ROMs for parametric and nonparametric random systems are also presented. Case studies have been conducted on testbeds from Harwell-Boeing collections. Input and coupling power flow are computed for the original systems and the ROMs. Good agreement is observed in all cases.
Procacci, Piero
2016-06-27
We present a new release (6.0β) of the ORAC program [Marsili et al. J. Comput. Chem. 2010, 31, 1106-1116] with a hybrid OpenMP/MPI (open multiprocessing message passing interface) multilevel parallelism tailored for generalized ensemble (GE) and fast switching double annihilation (FS-DAM) nonequilibrium technology aimed at evaluating the binding free energy in drug-receptor system on high performance computing platforms. The production of the GE or FS-DAM trajectories is handled using a weak scaling parallel approach on the MPI level only, while a strong scaling force decomposition scheme is implemented for intranode computations with shared memory access at the OpenMP level. The efficiency, simplicity, and inherent parallel nature of the ORAC implementation of the FS-DAM algorithm, project the code as a possible effective tool for a second generation high throughput virtual screening in drug discovery and design. The code, along with documentation, testing, and ancillary tools, is distributed under the provisions of the General Public License and can be freely downloaded at www.chim.unifi.it/orac .
FAST: a framework for simulation and analysis of large-scale protein-silicon biosensor circuits.
Gu, Ming; Chakrabartty, Shantanu
2013-08-01
This paper presents a computer aided design (CAD) framework for verification and reliability analysis of protein-silicon hybrid circuits used in biosensors. It is envisioned that similar to integrated circuit (IC) CAD design tools, the proposed framework will be useful for system level optimization of biosensors and for discovery of new sensing modalities without resorting to laborious fabrication and experimental procedures. The framework referred to as FAST analyzes protein-based circuits by solving inverse problems involving stochastic functional elements that admit non-linear relationships between different circuit variables. In this regard, FAST uses a factor-graph netlist as a user interface and solving the inverse problem entails passing messages/signals between the internal nodes of the netlist. Stochastic analysis techniques like density evolution are used to understand the dynamics of the circuit and estimate the reliability of the solution. As an example, we present a complete design flow using FAST for synthesis, analysis and verification of our previously reported conductometric immunoassay that uses antibody-based circuits to implement forward error-correction (FEC).
PRATHAM: Parallel Thermal Hydraulics Simulations using Advanced Mesoscopic Methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joshi, Abhijit S; Jain, Prashant K; Mudrich, Jaime A
2012-01-01
At the Oak Ridge National Laboratory, efforts are under way to develop a 3D, parallel LBM code called PRATHAM (PaRAllel Thermal Hydraulic simulations using Advanced Mesoscopic Methods) to demonstrate the accuracy and scalability of LBM for turbulent flow simulations in nuclear applications. The code has been developed using FORTRAN-90, and parallelized using the message passing interface MPI library. Silo library is used to compact and write the data files, and VisIt visualization software is used to post-process the simulation data in parallel. Both the single relaxation time (SRT) and multi relaxation time (MRT) LBM schemes have been implemented in PRATHAM.more » To capture turbulence without prohibitively increasing the grid resolution requirements, an LES approach [5] is adopted allowing large scale eddies to be numerically resolved while modeling the smaller (subgrid) eddies. In this work, a Smagorinsky model has been used, which modifies the fluid viscosity by an additional eddy viscosity depending on the magnitude of the rate-of-strain tensor. In LBM, this is achieved by locally varying the relaxation time of the fluid.« less
Enabling Energy Saving Innovations Act
Rep. Aderholt, Robert B. [R-AL-4
2012-04-26
Senate - 09/24/2012 Message on Senate action sent to the House. (All Actions) Notes: For further action, see H.R.6582, which became Public Law 112-210 on 12/18/2012. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
A Model for Speedup of Parallel Programs
1997-01-01
Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static
Interstate Recognition of Notarizations Act of 2010
Rep. Aderholt, Robert B. [R-AL-4
2009-10-14
House - 11/17/2010 On motion to refer the bill and the accompanying veto message to the Committee on the Judiciary. Agreed to without objection. (All Actions) Tracker: This bill has the status Failed to pass over vetoHere are the steps for Status of Legislation:
NASA Astrophysics Data System (ADS)
Sakata, Ayaka; Xu, Yingying
2018-03-01
We analyse a linear regression problem with nonconvex regularization called smoothly clipped absolute deviation (SCAD) under an overcomplete Gaussian basis for Gaussian random data. We propose an approximate message passing (AMP) algorithm considering nonconvex regularization, namely SCAD-AMP, and analytically show that the stability condition corresponds to the de Almeida-Thouless condition in spin glass literature. Through asymptotic analysis, we show the correspondence between the density evolution of SCAD-AMP and the replica symmetric (RS) solution. Numerical experiments confirm that for a sufficiently large system size, SCAD-AMP achieves the optimal performance predicted by the replica method. Through replica analysis, a phase transition between replica symmetric and replica symmetry breaking (RSB) region is found in the parameter space of SCAD. The appearance of the RS region for a nonconvex penalty is a significant advantage that indicates the region of smooth landscape of the optimization problem. Furthermore, we analytically show that the statistical representation performance of the SCAD penalty is better than that of \
Message passing for integrating and assessing renewable generation in a redundant power grid
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zdeborova, Lenka; Backhaus, Scott; Chertkov, Michael
2009-01-01
A simplified model of a redundant power grid is used to study integration of fluctuating renewable generation. The grid consists of large number of generator and consumer nodes. The net power consumption is determined by the difference between the gross consumption and the level of renewable generation. The gross consumption is drawn from a narrow distribution representing the predictability of aggregated loads, and we consider two different distributions representing wind and solar resources. Each generator is connected to D consumers, and redundancy is built in by connecting R {le} D of these consumers to other generators. The lines are switchablemore » so that at any instance each consumer is connected to a single generator. We explore the capacity of the renewable generation by determining the level of 'firm' generation capacity that can be displaced for different levels of redundancy R. We also develop message-passing control algorithm for finding switch sellings where no generator is overloaded.« less
Mapping a battlefield simulation onto message-passing parallel architectures
NASA Technical Reports Server (NTRS)
Nicol, David M.
1987-01-01
Perhaps the most critical problem in distributed simulation is that of mapping: without an effective mapping of workload to processors the speedup potential of parallel processing cannot be realized. Mapping a simulation onto a message-passing architecture is especially difficult when the computational workload dynamically changes as a function of time and space; this is exactly the situation faced by battlefield simulations. This paper studies an approach where the simulated battlefield domain is first partitioned into many regions of equal size; typically there are more regions than processors. The regions are then assigned to processors; a processor is responsible for performing all simulation activity associated with the regions. The assignment algorithm is quite simple and attempts to balance load by exploiting locality of workload intensity. The performance of this technique is studied on a simple battlefield simulation implemented on the Flex/32 multiprocessor. Measurements show that the proposed method achieves reasonable processor efficiencies. Furthermore, the method shows promise for use in dynamic remapping of the simulation.
Remote Asynchronous Message Service Gateway
NASA Technical Reports Server (NTRS)
Wang, Shin-Ywan; Burleigh, Scott C.
2011-01-01
The Remote Asynchronous Message Service (RAMS) gateway is a special-purpose AMS application node that enables exchange of AMS messages between nodes residing in different AMS "continua," notionally in different geographical locations. JPL s implementation of RAMS gateway functionality is integrated with the ION (Interplanetary Overlay Network) implementation of the DTN (Delay-Tolerant Networking) bundle protocol, and with JPL s implementation of AMS itself. RAMS protocol data units are encapsulated in ION bundles and are forwarded to the neighboring RAMS gateways identified in the source gateway s AMS management information base. Each RAMS gateway has interfaces in two communication environments: the AMS message space it serves, and the RAMS network - the grid or tree of mutually aware RAMS gateways - that enables AMS messages produced in one message space to be forwarded to other message spaces of the same venture. Each gateway opens persistent, private RAMS network communication channels to the RAMS gateways of other message spaces for the same venture, in other continua. The interconnected RAMS gateways use these communication channels to forward message petition assertions and cancellations among themselves. Each RAMS gateway subscribes locally to all subjects that are of interest in any of the linked message spaces. On receiving its copy of a message on any of these subjects, the RAMS gateway node uses the RAMS network to forward the message to every other RAMS gateway whose message space contains at least one node that has subscribed to messages on that subject. On receiving a message via the RAMS network from some other RAMS gateway, the RAMS gateway node forwards the message to all subscribers in its own message space.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sadayappan, Ponnuswamy
Exascale computing systems will provide a thousand-fold increase in parallelism and a proportional increase in failure rate relative to today's machines. Systems software for exascale machines must provide the infrastructure to support existing applications while simultaneously enabling efficient execution of new programming models that naturally express dynamic, adaptive, irregular computation; coupled simulations; and massive data analysis in a highly unreliable hardware environment with billions of threads of execution. We propose a new approach to the data and work distribution model provided by system software based on the unifying formalism of an abstract file system. The proposed hierarchical data model providesmore » simple, familiar visibility and access to data structures through the file system hierarchy, while providing fault tolerance through selective redundancy. The hierarchical task model features work queues whose form and organization are represented as file system objects. Data and work are both first class entities. By exposing the relationships between data and work to the runtime system, information is available to optimize execution time and provide fault tolerance. The data distribution scheme provides replication (where desirable and possible) for fault tolerance and efficiency, and it is hierarchical to make it possible to take advantage of locality. The user, tools, and applications, including legacy applications, can interface with the data, work queues, and one another through the abstract file model. This runtime environment will provide multiple interfaces to support traditional Message Passing Interface applications, languages developed under DARPA's High Productivity Computing Systems program, as well as other, experimental programming models. We will validate our runtime system with pilot codes on existing platforms and will use simulation to validate for exascale-class platforms. In this final report, we summarize research results from the work done at the Ohio State University towards the larger goals of the project listed above.« less
User Interface Developed for Controls/CFD Interdisciplinary Research
NASA Technical Reports Server (NTRS)
1996-01-01
The NASA Lewis Research Center, in conjunction with the University of Akron, is developing analytical methods and software tools to create a cross-discipline "bridge" between controls and computational fluid dynamics (CFD) technologies. Traditionally, the controls analyst has used simulations based on large lumping techniques to generate low-order linear models convenient for designing propulsion system controls. For complex, high-speed vehicles such as the High Speed Civil Transport (HSCT), simulations based on CFD methods are required to capture the relevant flow physics. The use of CFD should also help reduce the development time and costs associated with experimentally tuning the control system. The initial application for this research is the High Speed Civil Transport inlet control problem. A major aspect of this research is the development of a controls/CFD interface for non-CFD experts, to facilitate the interactive operation of CFD simulations and the extraction of reduced-order, time-accurate models from CFD results. A distributed computing approach for implementing the interface is being explored. Software being developed as part of the Integrated CFD and Experiments (ICE) project provides the basis for the operating environment, including run-time displays and information (data base) management. Message-passing software is used to communicate between the ICE system and the CFD simulation, which can reside on distributed, parallel computing systems. Initially, the one-dimensional Large-Perturbation Inlet (LAPIN) code is being used to simulate a High Speed Civil Transport type inlet. LAPIN can model real supersonic inlet features, including bleeds, bypasses, and variable geometry, such as translating or variable-ramp-angle centerbodies. Work is in progress to use parallel versions of the multidimensional NPARC code.
The SMART MIL-STD-1553 bus adapter hardware manual
NASA Technical Reports Server (NTRS)
Ton, T. T.
1981-01-01
The SMART Multiplexer Interface Adapter, (SMIA) a complete system interface for message structure of the MIL-STD-1553, is described. It provides buffering and storage for transmitted and received data and handles all the necessary handshaking to interface between parallel 8-bit data bus and a MIL-STD serial bit stream. The bus adapter is configured as either a bus controller of a remote terminal interface. It is coupled directly to the multiplex bus, or stub coupled through an additional isolation transformer located at the connection point. Fault isolation resistors provide short circuit protection.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-09-05
... its virtual contact interface be made mandatory as soon as possible for the many beneficial features... messaging and the virtual contact interface in the Standard, some Federal departments and agencies have... Laboratory Programs. [FR Doc. 2013-21491 Filed 9-4-13; 8:45 am] BILLING CODE 3510-13-P ...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perry, Marcia; Agarwal, Deb
2003-03-17
The PCCEServer application is a server that should be used in conjunction with the LBNLSecureMessaging user interface to enable secure synchronous and asynchronous messaging. It provides authentication and authorization services for members of a collaboration group via PKI/SSL and maintains an access control list. Members of collaboration groups using the LBNLSecureMessaging client must register identifying information. including usemame and password and an optional X.509 certificate, with the PCCEServer. This registration not only restricts access to instant messaging, but augments the LBNLSecureMessaging's IRC-based chat facility with persistence. Users register permanent unique user ids by which they are knowTl to other usersmore » in the system and create permanent venues intended for group conversations on a tong-term or continuous basis. In addition, the PCCEServer enhances instant messaging with presence and awareness information such as user availability, and it allows users to leave notes asynchronously for other users who are online or offline. Written in Java, it is a standalone application that can run on any platform that supports a Java Virtual Machine.« less
Design of Instant Messaging System of Multi-language E-commerce Platform
NASA Astrophysics Data System (ADS)
Yang, Heng; Chen, Xinyi; Li, Jiajia; Cao, Yaru
2017-09-01
This paper aims at researching the message system in the instant messaging system based on the multi-language e-commerce platform in order to design the instant messaging system in multi-language environment and exhibit the national characteristics based information as well as applying national languages to e-commerce. In order to develop beautiful and friendly system interface for the front end of the message system and reduce the development cost, the mature jQuery framework is adopted in this paper. The high-performance server Tomcat is adopted at the back end to process user requests, and MySQL database is adopted for data storage to persistently store user data, and meanwhile Oracle database is adopted as the message buffer for system optimization. Moreover, AJAX technology is adopted for the client to actively pull the newest data from the server at the specified time. In practical application, the system has strong reliability, good expansibility, short response time, high system throughput capacity and high user concurrency.
Kingman and Heritage Islands Act of 2009
Rep. Norton, Eleanor Holmes [D-DC-At Large
2009-04-23
Senate - 09/28/2010 Message on Senate action sent to the House. (All Actions) Notes: For further action, see H.R.6278, which became Public Law 111-328 on 12/22/2010. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
Distance Learning and Public School Finance.
ERIC Educational Resources Information Center
Monahan, Brian; Wimber, Charles
This discussion of the application of computers and telecommunications technology to distance learning begins by describing a workstation equipped to produce "lessonware" in such formats as audiocassettes, videocassettes, diskettes, and voice messages. Such workstations would be located in classrooms equipped with two-way reverse passes to a cable…
Reducing Flight Delays Act of 2013
Sen. Collins, Susan M. [R-ME
2013-04-25
Senate - 04/30/2013 Message on Senate action sent to the House. (All Actions) Notes: For further action, see H.R.1765, which became Public Law 113-9 on 5/1/2013. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
Temporary Bankruptcy Judgeships Extension Act of 2011
Rep. Smith, Lamar [R-TX-21
2011-03-10
Senate - 04/23/2012 Message on Senate action sent to the House. (All Actions) Notes: For further action, see H.R.4967, which became Public Law 112-121 on 5/25/2012. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
NASA Astrophysics Data System (ADS)
Wedgwood, Kyle C. A.; Satin, Leslie S.
2018-03-01
In 1967, the social psychologist Stanley Milgram asked how many intermediates would be required for a letter from participants in various U.S. states to be received by a contact in Boston, given that participants were given only basic information about the contact, and were instructed only to pass along the message to someone that might know the intended recipient [27]. The surprising answer was that, in spite of the social and geographical distances covered, on average, only 6 people were required for the message to reach its intended target.
Compositional schedulability analysis of real-time actor-based systems.
Jaghoori, Mohammad Mahdi; de Boer, Frank; Longuet, Delphine; Chothia, Tom; Sirjani, Marjan
2017-01-01
We present an extension of the actor model with real-time, including deadlines associated with messages, and explicit application-level scheduling policies, e.g.,"earliest deadline first" which can be associated with individual actors. Schedulability analysis in this setting amounts to checking whether, given a scheduling policy for each actor, every task is processed within its designated deadline. To check schedulability, we introduce a compositional automata-theoretic approach, based on maximal use of model checking combined with testing. Behavioral interfaces define what an actor expects from the environment, and the deadlines for messages given these assumptions. We use model checking to verify that actors match their behavioral interfaces. We extend timed automata refinement with the notion of deadlines and use it to define compatibility of actor environments with the behavioral interfaces. Model checking of compatibility is computationally hard, so we propose a special testing process. We show that the analyses are decidable and automate the process using the Uppaal model checker.
Schreiber, Richard; Sittig, Dean F; Ash, Joan; Wright, Adam
2017-09-01
In this report, we describe 2 instances in which expert use of an electronic health record (EHR) system interfaced to an external clinical laboratory information system led to unintended consequences wherein 2 patients failed to have laboratory tests drawn in a timely manner. In both events, user actions combined with the lack of an acknowledgment message describing the order cancellation from the external clinical system were the root causes. In 1 case, rapid, near-simultaneous order entry was the culprit; in the second, astute order management by a clinician, unaware of the lack of proper 2-way interface messaging from the external clinical system, led to the confusion. Although testing had shown that the laboratory system would cancel duplicate laboratory orders, it was thought that duplicate alerting in the new order entry system would prevent such events. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
NASA Technical Reports Server (NTRS)
White, James H. (Inventor); Schwartz, Michael (Inventor); Sammells, Anthony F. (Inventor)
1997-01-01
An electrolytic cell for generating hydrogen peroxide is provided including a cathode containing a catalyst for the reduction of oxygen, and an anode containing a catalyst for the oxidation of water. A polymer membrane, semipermeable to either protons or hydroxide ions is also included and has a first face interfacing to the cathode and a second face interfacing to the anode so that when a stream of water containing dissolved oxygen or oxygen bubbles is passed over the cathode and a stream of water is passed over the anode, and an electric current is passed between the anode and the cathode, hydrogen peroxide is generated at the cathode and oxygen is generated at the anode.
NASA Astrophysics Data System (ADS)
Liang, Haijun; Ren, Jialong; Song, Tao
2017-05-01
Operating requirement of air traffic control system, the multi-platform real-time message-oriented middleware was studied and realized, which is composed of CDCC and CDCS. The former provides application process interface, while the latter realizes data synchronism of CDCC and data exchange. MQM, as one important part of it, provides message queue management and, encrypt and compress data during transmitting procedure. The practical system application verifies that the middleware can simplify the development of air traffic control system, enhance its stability, improve its systematic function and make it convenient for maintenance and reuse.
1981-03-01
m u z r- Z > H < m ff -< C 2 C) 2 3 r- c is o 1 L_ • o > • en • • • Ŕ O > 00 o 30 n 30 T r- 1> - n o > IDE N T S E LIN E D IT...responsibilities to the Defense Intelligence Agency ( DIA ), the Services, and the unified and specified commands for carrying out that guidance. 3 - JCS...Testing 1-2 1.4 JINTACCS Message Text Eorinats (MTF) and TADIL Messages 1-2 1.4.1 JINTACCS Message Text Formats (MTF) 1- 3 1.4.2 Developmental TADIL
Information processing in the CNS: a supramolecular chemistry?
Tozzi, Arturo
2015-10-01
How does central nervous system process information? Current theories are based on two tenets: (a) information is transmitted by action potentials, the language by which neurons communicate with each other-and (b) homogeneous neuronal assemblies of cortical circuits operate on these neuronal messages where the operations are characterized by the intrinsic connectivity among neuronal populations. In this view, the size and time course of any spike is stereotypic and the information is restricted to the temporal sequence of the spikes; namely, the "neural code". However, an increasing amount of novel data point towards an alternative hypothesis: (a) the role of neural code in information processing is overemphasized. Instead of simply passing messages, action potentials play a role in dynamic coordination at multiple spatial and temporal scales, establishing network interactions across several levels of a hierarchical modular architecture, modulating and regulating the propagation of neuronal messages. (b) Information is processed at all levels of neuronal infrastructure from macromolecules to population dynamics. For example, intra-neuronal (changes in protein conformation, concentration and synthesis) and extra-neuronal factors (extracellular proteolysis, substrate patterning, myelin plasticity, microbes, metabolic status) can have a profound effect on neuronal computations. This means molecular message passing may have cognitive connotations. This essay introduces the concept of "supramolecular chemistry", involving the storage of information at the molecular level and its retrieval, transfer and processing at the supramolecular level, through transitory non-covalent molecular processes that are self-organized, self-assembled and dynamic. Finally, we note that the cortex comprises extremely heterogeneous cells, with distinct regional variations, macromolecular assembly, receptor repertoire and intrinsic microcircuitry. This suggests that every neuron (or group of neurons) embodies different molecular information that hands an operational effect on neuronal computation.
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E
2013-10-22
Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R
Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single onemore » of the endpoints in the geometry, an instruction for the collective operation.« less
Method and system for providing work machine multi-functional user interface
Hoff, Brian D [Peoria, IL; Akasam, Sivaprasad [Peoria, IL; Baker, Thomas M [Peoria, IL
2007-07-10
A method is performed to provide a multi-functional user interface on a work machine for displaying suggested corrective action. The process includes receiving status information associated with the work machine and analyzing the status information to determine an abnormal condition. The process also includes displaying a warning message on the display device indicating the abnormal condition and determining one or more corrective actions to handle the abnormal condition. Further, the process includes determining an appropriate corrective action among the one or more corrective actions and displaying a recommendation message on the display device reflecting the appropriate corrective action. The process may also include displaying a list including the remaining one or more corrective actions on the display device to provide alternative actions to an operator.
MaROS Strategic Relay Planning and Coordination Interfaces
NASA Technical Reports Server (NTRS)
Allard, Daniel A.
2010-01-01
The Mars Relay Operations Service (MaROS) is designed to provide planning and analysis tools in support of ongoing Mars Network relay operations. Strategic relay planning requires coordination between lander and orbiter mission ground data system (GDS) teams to schedule and execute relay communications passes. MaROS centralizes this process, correlating all data relevant to relay coordination to provide a cohesive picture of the relay state. Service users interact with the system through thin-layer command line and web user interface client applications. Users provide and utilize data such as lander view periods of orbiters, Deep Space Network (DSN) antenna tracks, and reports of relay pass performance. Users upload and download relevant relay data via formally defined and documented file structures including some described in Extensible Markup Language (XML). Clients interface with the system via an http-based Representational State Transfer (ReST) pattern using Javascript Object Notation (JSON) formats. This paper will provide a general overview of the service architecture and detail the software interfaces and considerations for interface design.
Increasing the Operational Value of Event Messages
NASA Technical Reports Server (NTRS)
Li, Zhenping; Savkli, Cetin; Smith, Dan
2003-01-01
Assessing the health of a space mission has traditionally been performed using telemetry analysis tools. Parameter values are compared to known operational limits and are plotted over various time periods. This presentation begins with the notion that there is an incredible amount of untapped information contained within the mission s event message logs. Through creative advancements in message handling tools, the event message logs can be used to better assess spacecraft and ground system status and to highlight and report on conditions not readily apparent when messages are evaluated one-at-a-time during a real-time pass. Work in this area is being funded as part of a larger NASA effort at the Goddard Space Flight Center to create component-based, middleware-based, standards-based general purpose ground system architecture referred to as GMSEC - the GSFC Mission Services Evolution Center. The new capabilities and operational concepts for event display, event data analyses and data mining are being developed by Lockheed Martin and the new subsystem has been named GREAT - the GMSEC Reusable Event Analysis Toolkit. Planned for use on existing and future missions, GREAT has the potential to increase operational efficiency in areas of problem detection and analysis, general status reporting, and real-time situational awareness.
Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lifflander, Jonathan; Meneses, Esteban; Menon, Harshita
2014-09-22
Deterministic replay of a parallel application is commonly used for discovering bugs or to recover from a hard fault with message-logging fault tolerance. For message passing programs, a major source of overhead during forward execution is recording the order in which messages are sent and received. During replay, this ordering must be used to deterministically reproduce the execution. Previous work in replay algorithms often makes minimal assumptions about the programming model and application in order to maintain generality. However, in many cases, only a partial order must be recorded due to determinism intrinsic in the code, ordering constraints imposed bymore » the execution model, and events that are commutative (their relative execution order during replay does not need to be reproduced exactly). In this paper, we present a novel algebraic framework for reasoning about the minimum dependencies required to represent the partial order for different concurrent orderings and interleavings. By exploiting this theory, we improve on an existing scalable message-logging fault tolerance scheme. The improved scheme scales to 131,072 cores on an IBM BlueGene/P with up to 2x lower overhead than one that records a total order.« less
2016-03-01
Representational state transfer Java messaging service Java application programming interface (API) Internet relay chat (IRC)/extensible messaging and...JBoss application server or an Apache Tomcat servlet container instance. The relational database management system can be either PostgreSQL or MySQL ... Java library called direct web remoting. This library has been part of the core CACE architecture for quite some time; however, there have not been
FIM Avionics Operations Manual
NASA Technical Reports Server (NTRS)
Alves, Erin E.
2017-01-01
This document describes the operation and use of the Flight Interval Management (FIM) Application installed on an electronic flight bag (EFB). Specifically, this document includes: 1) screen layouts for each page of the interface; 2) step-by-step instructions for data entry, data verification, and input error correction; 3) algorithm state messages and error condition alerting messages; 4) aircraft speed guidance and deviation indications; and 5) graphical display of the spatial relationships between the Ownship aircraft and the Target aircraft.
NASA Technical Reports Server (NTRS)
Hinton, David A.; Lohr, Gary W.
1988-01-01
Studies have shown that radio communications between pilots and air traffic control contribute to high pilot workload and are subject to various errors. These errors result from congestion on the voice radio channel, and missed and misunderstood messages. The use of digital data link has been proposed as a means of reducing this workload and error rate. A critical factor, however, in determining the potential benefit of data link will be the interface between future data link systems and the operator of those systems, both in the air and on the ground. The purpose of this effort was to evaluate the pilot interface with various levels of data link capability, in simulated general aviation, single-pilot instrument flight rule operations. Results show that the data link reduced demands on pilots' short-term memory, reduced the number of communication transmissions, and permitted the pilots to more easily allocate time to critical cockpit tasks while receiving air traffic control messages. The pilots who participated unanimously indicated a preference for data link communications over voice-only communications. There were, however, situations in which the pilot preferred the use of voice communications, and the ability for pilots to delay processing the data link messages, during high workload events, caused delays in the acknowledgement of messages to air traffic control.
Lightweight SIP/SDP compression scheme (LSSCS)
NASA Astrophysics Data System (ADS)
Wu, Jian J.; Demetrescu, Cristian
2001-10-01
In UMTS new IP based services with tight delay constraints will be deployed over the W-CDMA air interface such as IP multimedia and interactive services. To integrate the wireline and wireless IP services, 3GPP standard forum adopted the Session Initiation Protocol (SIP) as the call control protocol for the UMTS Release 5, which will implement next generation, all IP networks for real-time QoS services. In the current form the SIP protocol is not suitable for wireless transmission due to its large message size which will need either a big radio pipe for transmission or it will take far much longer to transmit than the current GSM Call Control (CC) message sequence. In this paper we present a novel compression algorithm called Lightweight SIP/SDP Compression Scheme (LSSCS), which acts at the SIP application layer and therefore removes the information redundancy before it is sent to the network and transport layer. A binary octet-aligned header is added to the compressed SIP/SDP message before sending it to the network layer. The receiver uses this binary header as well as the pre-cached information to regenerate the original SIP/SDP message. The key features of the LSSCS compression scheme are presented in this paper along with implementation examples. It is shown that this compression algorithm makes SIP transmission efficient over the radio interface without losing the SIP generality and flexibility.
Lang, Jun
2012-01-30
In this paper, we propose a novel secure image sharing scheme based on Shamir's three-pass protocol and the multiple-parameter fractional Fourier transform (MPFRFT), which can safely exchange information with no advance distribution of either secret keys or public keys between users. The image is encrypted directly by the MPFRFT spectrum without the use of phase keys, and information can be shared by transmitting the encrypted image (or message) three times between users. Numerical simulation results are given to verify the performance of the proposed algorithm.
E-mail, decisional styles, and rest breaks.
Baker, James R; Phillips, James G
2007-10-01
E-mail is a common but problematic work application. A scale was created to measure tendencies to use e-mail to take breaks (e-breaking); and self-esteem and decisional style (vigilance, procrastination, buck-passing, hypervigilance) were used to predict the self-reported and actual e-mail behaviors of 133 participants (students and marketing employees). Individuals who were low in defensive avoidance (buck-passing) engaged in more e-mailing per week, both in time spent on e-mail and message volume. E-breakers were more likely to engage in behavioral procrastination and spent more time on personal e-mail.
PyPanda: a Python package for gene regulatory network reconstruction
van IJzendoorn, David G.P.; Glass, Kimberly; Quackenbush, John; Kuijjer, Marieke L.
2016-01-01
Summary: PANDA (Passing Attributes between Networks for Data Assimilation) is a gene regulatory network inference method that uses message-passing to integrate multiple sources of ‘omics data. PANDA was originally coded in C ++. In this application note we describe PyPanda, the Python version of PANDA. PyPanda runs considerably faster than the C ++ version and includes additional features for network analysis. Availability and implementation: The open source PyPanda Python package is freely available at http://github.com/davidvi/pypanda. Contact: mkuijjer@jimmy.harvard.edu or d.g.p.van_ijzendoorn@lumc.nl PMID:27402905
PyPanda: a Python package for gene regulatory network reconstruction.
van IJzendoorn, David G P; Glass, Kimberly; Quackenbush, John; Kuijjer, Marieke L
2016-11-01
PANDA (Passing Attributes between Networks for Data Assimilation) is a gene regulatory network inference method that uses message-passing to integrate multiple sources of 'omics data. PANDA was originally coded in C ++. In this application note we describe PyPanda, the Python version of PANDA. PyPanda runs considerably faster than the C ++ version and includes additional features for network analysis. The open source PyPanda Python package is freely available at http://github.com/davidvi/pypanda CONTACT: mkuijjer@jimmy.harvard.edu or d.g.p.van_ijzendoorn@lumc.nl. © The Author 2016. Published by Oxford University Press.
Cooperative runtime monitoring
NASA Astrophysics Data System (ADS)
Hallé, Sylvain
2013-11-01
Requirements on message-based interactions can be formalised as an interface contract that specifies constraints on the sequence of possible messages that can be exchanged by multiple parties. At runtime, each peer can monitor incoming messages and check that the contract is correctly being followed by their respective senders. We introduce cooperative runtime monitoring, where a recipient 'delegates' its monitoring task to the sender, which is required to provide evidence that the message it sends complies with the contract. In turn, this evidence can be quickly checked by the recipient, which is then guaranteed of the sender's compliance to the contract without doing the monitoring computation by itself. A particular application of this concept is shown on web services, where service providers can monitor and enforce contract compliance of third-party clients at a small cost on the server side, while avoiding to certify or digitally sign them.
Sen. Cornyn, John [R-TX
2012-05-24
Senate - 01/02/2013 Message on House action received in Senate and at desk: House amendments to Senate bill. (All Actions) Tracker: This bill has the status Passed HouseHere are the steps for Status of Legislation:
Support for Debugging Automatically Parallelized Programs
NASA Technical Reports Server (NTRS)
Hood, Robert; Jost, Gabriele
2001-01-01
This viewgraph presentation provides information on support sources available for the automatic parallelization of computer program. CAPTools, a support tool developed at the University of Greenwich, transforms, with user guidance, existing sequential Fortran code into parallel message passing code. Comparison routines are then run for debugging purposes, in essence, ensuring that the code transformation was accurate.
Earth observation taken by the Expedition 43 crew.
2015-03-13
Earth observation taken during a day pass by the Expedition 43 crew aboard the International Space Station (ISS). Sent as part of Twitter message: #HappyStPatrickDay with best wishes from the #E43 crew! From space you can see the âEmerald Isleâ is very green!
New Frontier Congressional Gold Medal Act
Sen. Nelson, Bill [D-FL
2009-05-01
Senate - 09/23/2010 Message received in the Senate: Returned to the Senate pursuant to the provisions of H.Res. 1653. (All Actions) Notes: For further action, see H.R.2245, which became Public Law 111-44 on 8/7/2009. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
Sen. Dorgan, Byron L. [D-ND
2009-05-12
Senate - 09/23/2010 Message received in the Senate: Returned to the Senate pursuant to the provisions of H.Res. 1653. (All Actions) Notes: For further action, see H.R.1299, which became Public Law 111-145 on 3/4/2010. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
ERIC Educational Resources Information Center
Honawar, Vaishali
2006-01-01
A decade has passed since a few union leaders formed the network known as Teacher Union Reform Network (TURN) to search for innovative ways to enhance education. Selling their message has not always been easy. Created in 1995, TURN was the brain child of Adam Urbanski, the president of the Rochester (N.Y.) Teachers Association for the past 25…
Monitoring Data-Structure Evolution in Distributed Message-Passing Programs
NASA Technical Reports Server (NTRS)
Sarukkai, Sekhar R.; Beers, Andrew; Woodrow, Thomas S. (Technical Monitor)
1996-01-01
Monitoring the evolution of data structures in parallel and distributed programs, is critical for debugging its semantics and performance. However, the current state-of-art in tracking and presenting data-structure information on parallel and distributed environments is cumbersome and does not scale. In this paper we present a methodology that automatically tracks memory bindings (not the actual contents) of static and dynamic data-structures of message-passing C programs, using PVM. With the help of a number of examples we show that in addition to determining the impact of memory allocation overheads on program performance, graphical views can help in debugging the semantics of program execution. Scalable animations of virtual address bindings of source-level data-structures are used for debugging the semantics of parallel programs across all processors. In conjunction with light-weight core-files, this technique can be used to complement traditional debuggers on single processors. Detailed information (such as data-structure contents), on specific nodes, can be determined using traditional debuggers after the data structure evolution leading to the semantic error is observed graphically.
Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Message Passing and Shared Address Space Parallelism on an SMP Cluster
NASA Technical Reports Server (NTRS)
Shan, Hongzhang; Singh, Jaswinder P.; Oliker, Leonid; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2002-01-01
Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI+SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.
Local area network with fault-checking, priorities, and redundant backup
NASA Technical Reports Server (NTRS)
Morales, Sergio (Inventor); Friedman, Gary L. (Inventor)
1989-01-01
This invention is a redundant error detecting and correcting local area networked computer system having a plurality of nodes each including a network connector board within the node for connecting to an interfacing transceiver operably attached to a network cable. There is a first network cable disposed along a path to interconnect the nodes. The first network cable includes a plurality of first interfacing transceivers attached thereto. A second network cable is disposed in parallel with the first cable and, in like manner, includes a plurality of second interfacing transceivers attached thereto. There are a plurality of three position switches each having a signal input, three outputs for individual selective connection to the input, and a control input for receiving signals designating which of the outputs is to be connected to the signal input. Each of the switches includes means for designating a response address for responding to addressed signals appearing at the control input and each of the switches further has its signal input connected to a respective one of the input/output lines from the nodes. Also, one of the three outputs is connected to a repective one of the plurality of first interfacing transceivers. There is master switch control means having an output connected to the control inputs of the plurality of three position switches and an input for receiving directive signals for outputting addressed switch position signals to the three position switches as well as monitor and control computer means having a pair of network connector boards therein connected to respective ones of one of the first interfacing transceivers and one of the second interfacing transceivers and an output connected to the input of the master switch means for monitoring the status of the networked computer system by sending messages to the nodes and receiving and verifying messages therefrom and for sending control signals to the master switch to cause the master switch to cause respective ones of the nodes to use a desired one of the first and second cables for transmitting and receiving messages and for disconnecting desired ones of the nodes from both cables.
The data acquisition system for the Anglo-Australian Observatory 2-degree field project
NASA Technical Reports Server (NTRS)
Shortridge, K.; Farrell, T. J.; Bailey, J.
1992-01-01
The Anglo-Australian Observatory (AAO) is building a system that will provide a two-degree field of view at prime focus. A robot positioner will be used to locate up to 400 optical fibers at pre-determined positions in this field. While observations are being made using one set of 400 fibers, the robot will be positioning a second set of fibers in a background field that can be moved in to replace the first when the telescope is moved to a new position. The fibers feed two spectrographs each with a 1024 square CCD detector. The software system being produced to control this involves Vaxes for overall control and data recording, UNIX workstations for fiber configuration calculations and on-line data reduction, and VME systems running VxWorks for real-time control of critical parts such as the positioner robot. The system has to be able to interact with the observatory's present data acquisition systems, which use the ADAM system. As yet, the real-time parts of ADAM have not been ported to Unix, and so we are having to produce a smaller-scale system that is similar but inherently distributed (which ADAM is not). We are using this system as a testbed for ideas that we hope may eventually influence an ADAM II system. The system we are producing is based on a message system that is designed to be able to handle inter-process and inter-processor messages of any length, efficiently, and without ever requiring a task to block (i.e., be unresponsive to 'cancel' messages, enquiry messages), other than when deliberately waiting for external input - all of which will be through such messages. The essential requirement is that a message 'send' operation should never be able to block. The messages will be hierarchical, self-defining, machine-independent data structures. This allows us to provide very simple monitoring of messages for diagnostic purposes, and allows general purpose interface programs to be written without needing to share precise byte by byte message format definitions. Programs in this system have interfaces defined simply in terms of named actions and their parameters. Real-time control programs are required to be able to handle a number of such actions concurrently; data reduction programs will normally only need to handle one action at a time ('process an image', 'display a spectrum', etc).
Low-pass filtering of noisy Schlumberger sounding curves. Part I: Theory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patella, D.
1986-02-01
A contribution is given to the solution of the problem of filtering noise-degraded Schlumberger sounding curves. It is shown that the transformation to the pole-pole system is actually a smoothing operation that filters high-frequency noise. In the case of residual noise contamination in the transformed pole-pole curve, it is demonstrated that a subsequent application of a conventional rectangular low-pass filter, with cut-off frequency not less than the right-hand frequency limit of the main message pass-band, may satisfactorily solve the problem by leaving a pole-pole curve available for interpretation. An attempt is also made to understand the essential peculiarities of themore » pole-pole system as far as penetration depth, resolving power and selectivity power are concerned.« less
Message passing with queues and channels
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dozsa, Gabor J; Heidelberger, Philip; Kumar, Sameer
In an embodiment, a reception thread receives a source node identifier, a type, and a data pointer from an application and, in response, creates a receive request. If the source node identifier specifies a source node, the reception thread adds the receive request to a fast-post queue. If a message received from a network does not match a receive request on a posted queue, a polling thread adds a receive request that represents the message to an unexpected queue. If the fast-post queue contains the receive request, the polling thread removes the receive request from the fast-post queue. If themore » receive request that was removed from the fast-post queue does not match the receive request on the unexpected queue, the polling thread adds the receive request that was removed from the fast-post queue to the posted queue. The reception thread and the polling thread execute asynchronously from each other.« less
Secure Web-Site Access with Tickets and Message-Dependent Digests
Donato, David I.
2008-01-01
Although there are various methods for restricting access to documents stored on a World Wide Web (WWW) site (a Web site), none of the widely used methods is completely suitable for restricting access to Web applications hosted on an otherwise publicly accessible Web site. A new technique, however, provides a mix of features well suited for restricting Web-site or Web-application access to authorized users, including the following: secure user authentication, tamper-resistant sessions, simple access to user state variables by server-side applications, and clean session terminations. This technique, called message-dependent digests with tickets, or MDDT, maintains secure user sessions by passing single-use nonces (tickets) and message-dependent digests of user credentials back and forth between client and server. Appendix 2 provides a working implementation of MDDT with PHP server-side code and JavaScript client-side code.
Versteeg, Roelof J; Few, Douglas A; Kinoshita, Robert A; Johnson, Doug; Linda, Ondrej
2015-02-24
Methods, computer readable media, and apparatuses provide robotic explosive hazard detection. A robot intelligence kernel (RIK) includes a dynamic autonomy structure with two or more autonomy levels between operator intervention and robot initiative A mine sensor and processing module (ESPM) operating separately from the RIK perceives environmental variables indicative of a mine using subsurface perceptors. The ESPM processes mine information to determine a likelihood of a presence of a mine. A robot can autonomously modify behavior responsive to an indication of a detected mine. The behavior is modified between detection of mines, detailed scanning and characterization of the mine, developing mine indication parameters, and resuming detection. Real time messages are passed between the RIK and the ESPM. A combination of ESPM bound messages and RIK bound messages cause the robot platform to switch between modes including a calibration mode, the mine detection mode, and the mine characterization mode.
Versteeg, Roelof J.; Few, Douglas A.; Kinoshita, Robert A.; Johnson, Douglas; Linda, Ondrej
2015-12-15
Methods, computer readable media, and apparatuses provide robotic explosive hazard detection. A robot intelligence kernel (RIK) includes a dynamic autonomy structure with two or more autonomy levels between operator intervention and robot initiative A mine sensor and processing module (ESPM) operating separately from the RIK perceives environmental variables indicative of a mine using subsurface perceptors. The ESPM processes mine information to determine a likelihood of a presence of a mine. A robot can autonomously modify behavior responsive to an indication of a detected mine. The behavior is modified between detection of mines, detailed scanning and characterization of the mine, developing mine indication parameters, and resuming detection. Real time messages are passed between the RIK and the ESPM. A combination of ESPM bound messages and RIK bound messages cause the robot platform to switch between modes including a calibration mode, the mine detection mode, and the mine characterization mode.
Machine-checked proofs of the design and implementation of a fault-tolerant circuit
NASA Technical Reports Server (NTRS)
Bevier, William R.; Young, William D.
1990-01-01
A formally verified implementation of the 'oral messages' algorithm of Pease, Shostak, and Lamport is described. An abstract implementation of the algorithm is verified to achieve interactive consistency in the presence of faults. This abstract characterization is then mapped down to a hardware level implementation which inherits the fault-tolerant characteristics of the abstract version. All steps in the proof were checked with the Boyer-Moore theorem prover. A significant results is the demonstration of a fault-tolerant device that is formally specified and whose implementation is proved correct with respect to this specification. A significant simplifying assumption is that the redundant processors behave synchronously. A mechanically checked proof that the oral messages algorithm is 'optimal' in the sense that no algorithm which achieves agreement via similar message passing can tolerate a larger proportion of faulty processor is also described.
47 CFR 11.52 - EAS code and Attention Signal Monitoring requirements.
Code of Federal Regulations, 2012 CFR
2012-10-01
... technologies, such as instant messaging and email) the distribution of Common Alert Protocol (CAP)-formatted... Integrated Public Alert and Warning System (IPAWS) to enable (whether through “pull” interface technologies...
47 CFR 11.52 - EAS code and Attention Signal Monitoring requirements.
Code of Federal Regulations, 2013 CFR
2013-10-01
... technologies, such as instant messaging and email) the distribution of Common Alert Protocol (CAP)-formatted... Integrated Public Alert and Warning System (IPAWS) to enable (whether through “pull” interface technologies...
47 CFR 11.52 - EAS code and Attention Signal Monitoring requirements.
Code of Federal Regulations, 2014 CFR
2014-10-01
... technologies, such as instant messaging and email) the distribution of Common Alert Protocol (CAP)-formatted... Integrated Public Alert and Warning System (IPAWS) to enable (whether through “pull” interface technologies...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.
Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for themore » context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.« less
Data communications in a parallel active messaging interface of a parallel computer
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E
2013-10-29
Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint.
Parallel community climate model: Description and user`s guide
DOE Office of Scientific and Technical Information (OSTI.GOV)
Drake, J.B.; Flanery, R.E.; Semeraro, B.D.
This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain intomore » geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.« less
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1
NASA Technical Reports Server (NTRS)
Simon, Horst D.; Saini, Subhash; Grassi, Charles
1994-01-01
The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
TiD-Introducing and Benchmarking an Event-Delivery System for Brain-Computer Interfaces.
Breitwieser, Christian; Tavella, Michele; Schreuder, Martijn; Cincotti, Febo; Leeb, Robert; Muller-Putz, Gernot R
2017-12-01
In this paper, we present and analyze an event distribution system for brain-computer interfaces. Events are commonly used to mark and describe incidents during an experiment and are therefore critical for later data analysis or immediate real-time processing. The presented approach, called Tools for brain-computer interaction interface D (TiD), delivers messages in XML format via a buslike system using transmission control protocol connections or shared memory. A dedicated server dispatches TiD messages to distributed or local clients. The TiD message is designed to be flexible and contains time stamps for event synchronization, whereas events describe incidents, which occur during an experiment. TiD was tested extensively toward stability and latency. The effect of an occurring event jitter was analyzed and benchmarked on a reference implementation under different conditions as gigabit and 100-Mb Ethernet or Wi-Fi with a different number of event receivers. A 3-dB signal attenuation, which occurs when averaging jitter influenced trials aligned by events, is starting to become visible at around 1-2 kHz in the case of a gigabit connection. Mean event distribution times across operating systems are ranging from 0.3 to 0.5ms for a gigabit network connection for 10 6 events. Results for other environmental conditions are available in this paper. References already using TiD for event distribution are provided showing the applicability of TiD for event delivery with distributed or local clients.
Performance issues in management of the Space Station Information System
NASA Technical Reports Server (NTRS)
Johnson, Marjory J.
1988-01-01
The onboard segment of the Space Station Information System (SSIS), called the Data Management System (DMS), will consist of a Fiber Distributed Data Interface (FDDI) token-ring network. The performance of the DMS in scenarios involving two kinds of network management is analyzed. In the first scenario, how the transmission of routine management messages impacts performance of the DMS is examined. In the second scenario, techniques for ensuring low latency of real-time control messages in an emergency are examined.
Non-RF Chain of Custody Item Monitor (CoCIM) User Manual.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brotz, Jay Kristoffer; Wade, James Rokwel; Schwartz, Steven Robert
This User Manual contains a description of the wired and infrared (IR) variants of the Chain of Custody Item Monitor (CoCIM), the Coordinator for reading stored messages, and the inspector Message Viewer user interface (UI) software, as well as instructions for use. This manual does not include descriptions or use instructions for the radio frequency (RF) variant of the CoCIM. The intended audience is planners and participants in treaty verification exercises where chain of custody (CoC) elements are required.
LEGION: Lightweight Expandable Group of Independently Operating Nodes
NASA Technical Reports Server (NTRS)
Burl, Michael C.
2012-01-01
LEGION is a lightweight C-language software library that enables distributed asynchronous data processing with a loosely coupled set of compute nodes. Loosely coupled means that a node can offer itself in service to a larger task at any time and can withdraw itself from service at any time, provided it is not actively engaged in an assignment. The main program, i.e., the one attempting to solve the larger task, does not need to know up front which nodes will be available, how many nodes will be available, or at what times the nodes will be available, which is normally the case in a "volunteer computing" framework. The LEGION software accomplishes its goals by providing message-based, inter-process communication similar to MPI (message passing interface), but without the tight coupling requirements. The software is lightweight and easy to install as it is written in standard C with no exotic library dependencies. LEGION has been demonstrated in a challenging planetary science application in which a machine learning system is used in closed-loop fashion to efficiently explore the input parameter space of a complex numerical simulation. The machine learning system decides which jobs to run through the simulator; then, through LEGION calls, the system farms those jobs out to a collection of compute nodes, retrieves the job results as they become available, and updates a predictive model of how the simulator maps inputs to outputs. The machine learning system decides which new set of jobs would be most informative to run given the results so far; this basic loop is repeated until sufficient insight into the physical system modeled by the simulator is obtained.
Adding Fault Tolerance to NPB Benchmarks Using ULFM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parchman, Zachary W; Vallee, Geoffroy R; Naughton III, Thomas J
2016-01-01
In the world of high-performance computing, fault tolerance and application resilience are becoming some of the primary concerns because of increasing hardware failures and memory corruptions. While the research community has been investigating various options, from system-level solutions to application-level solutions, standards such as the Message Passing Interface (MPI) are also starting to include such capabilities. The current proposal for MPI fault tolerant is centered around the User-Level Failure Mitigation (ULFM) concept, which provides means for fault detection and recovery of the MPI layer. This approach does not address application-level recovery, which is currently left to application developers. In thismore » work, we present a mod- ification of some of the benchmarks of the NAS parallel benchmark (NPB) to include support of the ULFM capabilities as well as application-level strategies and mechanisms for application-level failure recovery. As such, we present: (i) an application-level library to checkpoint and restore data, (ii) extensions of NPB benchmarks for fault tolerance based on different strategies, (iii) a fault injection tool, and (iv) some preliminary results that show the impact of such fault tolerant strategies on the application execution.« less
Accelerating Climate and Weather Simulations through Hybrid Computing
NASA Technical Reports Server (NTRS)
Zhou, Shujia; Cruz, Carlos; Duffy, Daniel; Tucker, Robert; Purcell, Mark
2011-01-01
Unconventional multi- and many-core processors (e.g. IBM (R) Cell B.E.(TM) and NVIDIA (R) GPU) have emerged as effective accelerators in trial climate and weather simulations. Yet these climate and weather models typically run on parallel computers with conventional processors (e.g. Intel, AMD, and IBM) using Message Passing Interface. To address challenges involved in efficiently and easily connecting accelerators to parallel computers, we investigated using IBM's Dynamic Application Virtualization (TM) (IBM DAV) software in a prototype hybrid computing system with representative climate and weather model components. The hybrid system comprises two Intel blades and two IBM QS22 Cell B.E. blades, connected with both InfiniBand(R) (IB) and 1-Gigabit Ethernet. The system significantly accelerates a solar radiation model component by offloading compute-intensive calculations to the Cell blades. Systematic tests show that IBM DAV can seamlessly offload compute-intensive calculations from Intel blades to Cell B.E. blades in a scalable, load-balanced manner. However, noticeable communication overhead was observed, mainly due to IP over the IB protocol. Full utilization of IB Sockets Direct Protocol and the lower latency production version of IBM DAV will reduce this overhead.
PEGASUS 5: An Automated Pre-Processor for Overset-Grid CFD
NASA Technical Reports Server (NTRS)
Suhs, Norman E.; Rogers, Stuart E.; Dietz, William E.; Kwak, Dochan (Technical Monitor)
2002-01-01
An all new, automated version of the PEGASUS software has been developed and tested. PEGASUS provides the hole-cutting and connectivity information between overlapping grids, and is used as the final part of the grid generation process for overset-grid computational fluid dynamics approaches. The new PEGASUS code (Version 5) has many new features: automated hole cutting; a projection scheme for fixing gaps in overset surfaces; more efficient interpolation search methods using an alternating digital tree; hole-size optimization based on adding additional layers of fringe points; and an automatic restart capability. The new code has also been parallelized using the Message Passing Interface standard. The parallelization performance provides efficient speed-up of the execution time by an order of magnitude, and up to a factor of 30 for very large problems. The results of three example cases are presented: a three-element high-lift airfoil, a generic business jet configuration, and a complete Boeing 777-200 aircraft in a high-lift landing configuration. Comparisons of the computed flow fields for the airfoil and 777 test cases between the old and new versions of the PEGASUS codes show excellent agreement with each other and with experimental results.
SPOTting Model Parameters Using a Ready-Made Python Package
NASA Astrophysics Data System (ADS)
Houska, Tobias; Kraft, Philipp; Chamorro-Chavez, Alejandro; Breuer, Lutz
2017-04-01
The choice for specific parameter estimation methods is often more dependent on its availability than its performance. We developed SPOTPY (Statistical Parameter Optimization Tool), an open source python package containing a comprehensive set of methods typically used to calibrate, analyze and optimize parameters for a wide range of ecological models. SPOTPY currently contains eight widely used algorithms, 11 objective functions, and can sample from eight parameter distributions. SPOTPY has a model-independent structure and can be run in parallel from the workstation to large computation clusters using the Message Passing Interface (MPI). We tested SPOTPY in five different case studies to parameterize the Rosenbrock, Griewank and Ackley functions, a one-dimensional physically based soil moisture routine, where we searched for parameters of the van Genuchten-Mualem function and a calibration of a biogeochemistry model with different objective functions. The case studies reveal that the implemented SPOTPY methods can be used for any model with just a minimal amount of code for maximal power of parameter optimization. They further show the benefit of having one package at hand that includes number of well performing parameter search methods, since not every case study can be solved sufficiently with every algorithm or every objective function.
Determination of performance characteristics of scientific applications on IBM Blue Gene/Q
DOE Office of Scientific and Technical Information (OSTI.GOV)
Evangelinos, C.; Walkup, R. E.; Sachdeva, V.
The IBM Blue Gene®/Q platform presents scientists and engineers with a rich set of hardware features such as 16 cores per chip sharing a Level 2 cache, a wide SIMD (single-instruction, multiple-data) unit, a five-dimensional torus network, and hardware support for collective operations. Especially important is the feature related to cores that have four “hardware threads,” which makes it possible to hide latencies and obtain a high fraction of the peak issue rate from each core. All of these hardware resources present unique performance-tuning opportunities on Blue Gene/Q. We provide an overview of several important applications and solvers and studymore » them on Blue Gene/Q using performance counters and Message Passing Interface profiles. We also discuss how Blue Gene/Q tools help us understand the interaction of the application with the hardware and software layers and provide guidance for optimization. Furthermore, on the basis of our analysis, we discuss code improvement strategies targeting Blue Gene/Q. Information about how these algorithms map to the Blue Gene® architecture is expected to have an impact on future system design as we move to the exascale era.« less
Toward Petascale Biologically Plausible Neural Networks
NASA Astrophysics Data System (ADS)
Long, Lyle
This talk will describe an approach to achieving petascale neural networks. Artificial intelligence has been oversold for many decades. Computers in the beginning could only do about 16,000 operations per second. Computer processing power, however, has been doubling every two years thanks to Moore's law, and growing even faster due to massively parallel architectures. Finally, 60 years after the first AI conference we have computers on the order of the performance of the human brain (1016 operations per second). The main issues now are algorithms, software, and learning. We have excellent models of neurons, such as the Hodgkin-Huxley model, but we do not know how the human neurons are wired together. With careful attention to efficient parallel computing, event-driven programming, table lookups, and memory minimization massive scale simulations can be performed. The code that will be described was written in C + + and uses the Message Passing Interface (MPI). It uses the full Hodgkin-Huxley neuron model, not a simplified model. It also allows arbitrary network structures (deep, recurrent, convolutional, all-to-all, etc.). The code is scalable, and has, so far, been tested on up to 2,048 processor cores using 107 neurons and 109 synapses.
MPI Enhancements in John the Ripper
NASA Astrophysics Data System (ADS)
Sykes, Edward R.; Lin, Michael; Skoczen, Wesley
2010-11-01
John the Ripper (JtR) is an open source software package commonly used by system administrators to enforce password policy. JtR is designed to attack (i.e., crack) passwords encrypted in a wide variety of commonly used formats. While parallel implementations of JtR exist, there are several limitations to them. This research reports on two distinct algorithms that enhance this password cracking tool using the Message Passing Interface. The first algorithm is a novel approach that uses numerous processors to crack one password by using an innovative approach to workload distribution. In this algorithm the candidate password is distributed to all participating processors and the word list is divided based on probability so that each processor has the same likelihood of cracking the password while eliminating overlapping operations. The second algorithm developed in this research involves dividing the passwords within a password file equally amongst available processors while ensuring load-balanced and fault-tolerant behavior. This paper describes John the Ripper, the design of these two algorithms and preliminary results. Given the same amount of time, the original JtR can crack 29 passwords, whereas our algorithms 1 and 2 can crack an additional 35 and 45 passwords respectively.
Earth Observing System Data Gateway
NASA Technical Reports Server (NTRS)
Pfister, Robin; McMahon, Joe; Amrhein, James; Sefert, Ed; Marsans, Lorena; Solomon, Mark; Nestler, Mark
2006-01-01
The Earth Observing System Data Gateway (EDG) software provides a "one-stop-shopping" standard interface for exploring and ordering Earth-science data stored at geographically distributed sites. EDG enables a user to do the following: 1) Search for data according to high-level criteria (e.g., geographic location, time, or satellite that acquired the data); 2) Browse the results of a search, viewing thumbnail sketches of data that satisfy the user s criteria; and 3) Order selected data for delivery to a specified address on a chosen medium (e.g., compact disk or magnetic tape). EDG consists of (1) a component that implements a high-level client/server protocol, and (2) a collection of C-language libraries that implement the passing of protocol messages between an EDG client and one or more EDG servers. EDG servers are located at sites usually called "Distributed Active Archive Centers" (DAACs). Each DAAC may allow access to many individual data items, called "granules" (e.g., single Landsat images). Related granules are grouped into collections called "data sets." EDG enables a user to send a search query to multiple DAACs simultaneously, inspect the resulting information, select browseable granules, and then order selected data from the different sites in a seamless fashion.
Parallel distributed, reciprocal Monte Carlo radiation in coupled, large eddy combustion simulations
NASA Astrophysics Data System (ADS)
Hunsaker, Isaac L.
Radiation is the dominant mode of heat transfer in high temperature combustion environments. Radiative heat transfer affects the gas and particle phases, including all the associated combustion chemistry. The radiative properties are in turn affected by the turbulent flow field. This bi-directional coupling of radiation turbulence interactions poses a major challenge in creating parallel-capable, high-fidelity combustion simulations. In this work, a new model was developed in which reciprocal monte carlo radiation was coupled with a turbulent, large-eddy simulation combustion model. A technique wherein domain patches are stitched together was implemented to allow for scalable parallelism. The combustion model runs in parallel on a decomposed domain. The radiation model runs in parallel on a recomposed domain. The recomposed domain is stored on each processor after information sharing of the decomposed domain is handled via the message passing interface. Verification and validation testing of the new radiation model were favorable. Strong scaling analyses were performed on the Ember cluster and the Titan cluster for the CPU-radiation model and GPU-radiation model, respectively. The model demonstrated strong scaling to over 1,700 and 16,000 processing cores on Ember and Titan, respectively.
Quan, Guotao; Gong, Hui; Deng, Yong; Fu, Jianwei; Luo, Qingming
2011-02-01
High-speed fluorescence molecular tomography (FMT) reconstruction for 3-D heterogeneous media is still one of the most challenging problems in diffusive optical fluorescence imaging. In this paper, we propose a fast FMT reconstruction method that is based on Monte Carlo (MC) simulation and accelerated by a cluster of graphics processing units (GPUs). Based on the Message Passing Interface standard, we modified the MC code for fast FMT reconstruction, and different Green's functions representing the flux distribution in media are calculated simultaneously by different GPUs in the cluster. A load-balancing method was also developed to increase the computational efficiency. By applying the Fréchet derivative, a Jacobian matrix is formed to reconstruct the distribution of the fluorochromes using the calculated Green's functions. Phantom experiments have shown that only 10 min are required to get reconstruction results with a cluster of 6 GPUs, rather than 6 h with a cluster of multiple dual opteron CPU nodes. Because of the advantages of high accuracy and suitability for 3-D heterogeneity media with refractive-index-unmatched boundaries from the MC simulation, the GPU cluster-accelerated method provides a reliable approach to high-speed reconstruction for FMT imaging.
Solar wind interaction with Venus and Mars in a parallel hybrid code
NASA Astrophysics Data System (ADS)
Jarvinen, Riku; Sandroos, Arto
2013-04-01
We discuss the development and applications of a new parallel hybrid simulation, where ions are treated as particles and electrons as a charge-neutralizing fluid, for the interaction between the solar wind and Venus and Mars. The new simulation code under construction is based on the algorithm of the sequential global planetary hybrid model developed at the Finnish Meteorological Institute (FMI) and on the Corsair parallel simulation platform also developed at the FMI. The FMI's sequential hybrid model has been used for studies of plasma interactions of several unmagnetized and weakly magnetized celestial bodies for more than a decade. Especially, the model has been used to interpret in situ particle and magnetic field observations from plasma environments of Mars, Venus and Titan. Further, Corsair is an open source MPI (Message Passing Interface) particle and mesh simulation platform, mainly aimed for simulations of diffusive shock acceleration in solar corona and interplanetary space, but which is now also being extended for global planetary hybrid simulations. In this presentation we discuss challenges and strategies of parallelizing a legacy simulation code as well as possible applications and prospects of a scalable parallel hybrid model for the solar wind interactions of Venus and Mars.
SPOTting Model Parameters Using a Ready-Made Python Package.
Houska, Tobias; Kraft, Philipp; Chamorro-Chavez, Alejandro; Breuer, Lutz
2015-01-01
The choice for specific parameter estimation methods is often more dependent on its availability than its performance. We developed SPOTPY (Statistical Parameter Optimization Tool), an open source python package containing a comprehensive set of methods typically used to calibrate, analyze and optimize parameters for a wide range of ecological models. SPOTPY currently contains eight widely used algorithms, 11 objective functions, and can sample from eight parameter distributions. SPOTPY has a model-independent structure and can be run in parallel from the workstation to large computation clusters using the Message Passing Interface (MPI). We tested SPOTPY in five different case studies to parameterize the Rosenbrock, Griewank and Ackley functions, a one-dimensional physically based soil moisture routine, where we searched for parameters of the van Genuchten-Mualem function and a calibration of a biogeochemistry model with different objective functions. The case studies reveal that the implemented SPOTPY methods can be used for any model with just a minimal amount of code for maximal power of parameter optimization. They further show the benefit of having one package at hand that includes number of well performing parameter search methods, since not every case study can be solved sufficiently with every algorithm or every objective function.
SPOTting Model Parameters Using a Ready-Made Python Package
Houska, Tobias; Kraft, Philipp; Chamorro-Chavez, Alejandro; Breuer, Lutz
2015-01-01
The choice for specific parameter estimation methods is often more dependent on its availability than its performance. We developed SPOTPY (Statistical Parameter Optimization Tool), an open source python package containing a comprehensive set of methods typically used to calibrate, analyze and optimize parameters for a wide range of ecological models. SPOTPY currently contains eight widely used algorithms, 11 objective functions, and can sample from eight parameter distributions. SPOTPY has a model-independent structure and can be run in parallel from the workstation to large computation clusters using the Message Passing Interface (MPI). We tested SPOTPY in five different case studies to parameterize the Rosenbrock, Griewank and Ackley functions, a one-dimensional physically based soil moisture routine, where we searched for parameters of the van Genuchten-Mualem function and a calibration of a biogeochemistry model with different objective functions. The case studies reveal that the implemented SPOTPY methods can be used for any model with just a minimal amount of code for maximal power of parameter optimization. They further show the benefit of having one package at hand that includes number of well performing parameter search methods, since not every case study can be solved sufficiently with every algorithm or every objective function. PMID:26680783
NASA Astrophysics Data System (ADS)
Sudarmaji; Rudianto, Indra; Eka Nurcahya, Budi
2018-04-01
A strong tectonic earthquake with a magnitude of 5.9 Richter scale has been occurred in Yogyakarta and Central Java on May 26, 2006. The earthquake has caused severe damage in Yogyakarta and the southern part of Central Java, Indonesia. The understanding of seismic response of earthquake among ground shaking and the level of building damage is important. We present numerical modeling of 3D seismic wave propagation around Yogyakarta and the southern part of Central Java using spectral-element method on MPI-GPU (Graphics Processing Unit) computer cluster to observe its seismic response due to the earthquake. The homogeneous 3D realistic model is generated with detailed topography surface. The influences of free surface topography and layer discontinuity of the 3D model among the seismic response are observed. The seismic wave field is discretized using spectral-element method. The spectral-element method is solved on a mesh of hexahedral elements that is adapted to the free surface topography and the internal discontinuity of the model. To increase the data processing capabilities, the simulation is performed on a GPU cluster with implementation of MPI (Message Passing Interface).
Progress in Computational Simulation of Earthquakes
NASA Technical Reports Server (NTRS)
Donnellan, Andrea; Parker, Jay; Lyzenga, Gregory; Judd, Michele; Li, P. Peggy; Norton, Charles; Tisdale, Edwin; Granat, Robert
2006-01-01
GeoFEST(P) is a computer program written for use in the QuakeSim project, which is devoted to development and improvement of means of computational simulation of earthquakes. GeoFEST(P) models interacting earthquake fault systems from the fault-nucleation to the tectonic scale. The development of GeoFEST( P) has involved coupling of two programs: GeoFEST and the Pyramid Adaptive Mesh Refinement Library. GeoFEST is a message-passing-interface-parallel code that utilizes a finite-element technique to simulate evolution of stress, fault slip, and plastic/elastic deformation in realistic materials like those of faulted regions of the crust of the Earth. The products of such simulations are synthetic observable time-dependent surface deformations on time scales from days to decades. Pyramid Adaptive Mesh Refinement Library is a software library that facilitates the generation of computational meshes for solving physical problems. In an application of GeoFEST(P), a computational grid can be dynamically adapted as stress grows on a fault. Simulations on workstations using a few tens of thousands of stress and displacement finite elements can now be expanded to multiple millions of elements with greater than 98-percent scaled efficiency on over many hundreds of parallel processors (see figure).
A Matter of Urgency: Reducing Clinical Text Message Interruptions During Educational Sessions.
Mendel, Arielle; Lott, Anthony; Lo, Lisha; Wu, Robert
2018-04-25
Text messaging is increasingly replacing paging as a tool to reach physicians on medical wards. However, this phenomenon has resulted in high volumes of nonurgent messages that can disrupt the learning climate. Our objective was to reduce nonurgent educational interruptions to residents on general internal medicine. This was a quality improvement project conducted at an academic hospital network. Measurements and interventions took place on 8 general internal medicine inpatient teaching teams. Interventions included (1) refining the clinical communication process in collaboration with nursing leadership; (2) disseminating guidelines with posters at nursing stations; (3) introducing a noninterrupting option for message senders; (4) audit and feedback of messages; (5) adding an alert for message senders advising if a message would interrupt educational sessions; and (6) training and support to nurses and residents. Interruptions (text messages, phone calls, emails) received by institution-supplied team smartphones were tracked during educational hours using statistical process control charts. A 1-month record of text message content was analyzed for urgency at baseline and following the interventions. The interruption frequency decreased from a mean of 0.92 (95% CI, 0.88 to 0.97) to 0.59 (95% CI, 0.51 to0.67) messages per team per educational hour from January 2014 to December 2016. The proportion of nonurgent educational interruptions decreased from 223/273 (82%) messages over one month to 123/182 (68%; P < .01). Creation of communication guidelines and modification of text message interface with feedback from end-users were associated with a reduction in nonurgent educational interruptions. Continuous audit and feedback may be necessary to minimize nonurgent messages that disrupt educational sessions. © 2018 Society of Hospital Medicine.
NASA Astrophysics Data System (ADS)
Gaik Tay, Kim; Cheong, Tau Han; Foong Lee, Ming; Kek, Sie Long; Abdul-Kahar, Rosmila
2017-08-01
In the previous work on Euler’s spreadsheet calculator for solving an ordinary differential equation, the Visual Basic for Application (VBA) programming was used, however, a graphical user interface was not developed to capture users input. This weakness may make users confuse on the input and output since those input and output are displayed in the same worksheet. Besides, the existing Euler’s spreadsheet calculator is not interactive as there is no prompt message if there is a mistake in inputting the parameters. On top of that, there are no users’ instructions to guide users to input the derivative function. Hence, in this paper, we improved previous limitations by developing a user-friendly and interactive graphical user interface. This improvement is aimed to capture users’ input with users’ instructions and interactive prompt error messages by using VBA programming. This Euler’s graphical user interface spreadsheet calculator is not acted as a black box as users can click on any cells in the worksheet to see the formula used to implement the numerical scheme. In this way, it could enhance self-learning and life-long learning in implementing the numerical scheme in a spreadsheet and later in any programming language.
Sujansky, Walter V; Overhage, J Marc; Chang, Sophia; Frohlich, Jonah; Faus, Samuel A
2009-01-01
Electronic laboratory interfaces can significantly increase the value of ambulatory electronic health record (EHR) systems by providing laboratory result data automatically and in a computable form. However, many ambulatory EHRs cannot implement electronic laboratory interfaces despite the existence of messaging standards, such as Health Level 7, version 2 (HL7). Among several barriers to implementing laboratory interfaces is the extensive optionality within the HL7 message standard. This paper describes the rationale for and development of an HL7 implementation guide that seeks to eliminate most of the optionality inherent in HL7, but retain the information content required for reporting outpatient laboratory results. A work group of heterogeneous stakeholders developed the implementation guide based on a set of design principles that emphasized parsimony, practical requirements, and near-term adoption. The resulting implementation guide contains 93% fewer optional data elements than HL7. This guide was successfully implemented by 15 organizations during an initial testing phase and has been approved by the HL7 standards body as an implementation guide for outpatient laboratory reporting. Further testing is required to determine whether widespread adoption of the implementation guide by laboratories and EHR systems can facilitate the implementation of electronic laboratory interfaces.
Realization of Real-Time Clinical Data Integration Using Advanced Database Technology
Yoo, Sooyoung; Kim, Boyoung; Park, Heekyong; Choi, Jinwook; Chun, Jonghoon
2003-01-01
As information & communication technologies have advanced, interest in mobile health care systems has grown. In order to obtain information seamlessly from distributed and fragmented clinical data from heterogeneous institutions, we need solutions that integrate data. In this article, we introduce a method for information integration based on real-time message communication using trigger and advanced database technologies. Messages were devised to conform to HL7, a standard for electronic data exchange in healthcare environments. The HL7 based system provides us with an integrated environment in which we are able to manage the complexities of medical data. We developed this message communication interface to generate and parse HL7 messages automatically from the database point of view. We discuss how easily real time data exchange is performed in the clinical information system, given the requirement for minimum loading of the database system. PMID:14728271
Clinical data exchange standards and vocabularies for messages.
Huff, S. M.
1998-01-01
Motivation for the creation of electronic data interchange (message) standards is discussed. The ISO Open Systems Interface model is described. Clinical information models, message syntax and structure, and the need for a standardized coded vocabulary are explained. The HIPAA legislation and subsequent HHS transaction recommendations are reviewed. The history and mission statements of six of the most popular message development organizations (MDOs) are summarized, and the data exchange standards developed by these organizations are listed. The organizations described include Health Level Seven (HL7), American Standards for Testing and Materials (ASTM) E31, Digital Image Communication in Medicine (DICOM), European Committee for Standardization (Comité Européen de Normalisation), Technical Committee for Health Informatics (CEN/TC 251), the National Council for Prescription Drug Programs (NCPDP), and Accredited Standards Committee X12 Insurance Subcommittee (X12N). The locations of Internet web sites for the six organizations are provided as resources for further information. PMID:9929183
DMA shared byte counters in a parallel computer
Chen, Dong; Gara, Alan G.; Heidelberger, Philip; Vranas, Pavlos
2010-04-06
A parallel computer system is constructed as a network of interconnected compute nodes. Each of the compute nodes includes at least one processor, a memory and a DMA engine. The DMA engine includes a processor interface for interfacing with the at least one processor, DMA logic, a memory interface for interfacing with the memory, a DMA network interface for interfacing with the network, injection and reception byte counters, injection and reception FIFO metadata, and status registers and control registers. The injection FIFOs maintain memory locations of the injection FIFO metadata memory locations including its current head and tail, and the reception FIFOs maintain the reception FIFO metadata memory locations including its current head and tail. The injection byte counters and reception byte counters may be shared between messages.
Rep. Crowley, Joseph [D-NY-7
2011-05-26
Senate - 09/15/2011 Message on Senate action sent to the House. (All Actions) Notes: For further action, see H.R.2017, which became Public Law 112-33 on 9/30/2011. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
Emergency Border Security Supplemental Appropriations Act, 2010
Rep. Price, David E. [D-NC-4
2010-07-27
Senate - 09/23/2010 Message received in the Senate: Returned to the Senate pursuant to the provisions of H.Res. 1653. (All Actions) Notes: For further action, see H.R.6080, which became Public Law 111-230 on 8/13/2010. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
A Down-to-Earth Educational Operating System for Up-in-the-Cloud Many-Core Architectures
ERIC Educational Resources Information Center
Ziwisky, Michael; Persohn, Kyle; Brylow, Dennis
2013-01-01
We present "Xipx," the first port of a major educational operating system to a processor in the emerging class of many-core architectures. Through extensions to the proven Embedded Xinu operating system, Xipx gives students hands-on experience with system programming in a distributed message-passing environment. We expose the software primitives…
LDPC Codes--Structural Analysis and Decoding Techniques
ERIC Educational Resources Information Center
Zhang, Xiaojie
2012-01-01
Low-density parity-check (LDPC) codes have been the focus of much research over the past decade thanks to their near Shannon limit performance and to their efficient message-passing (MP) decoding algorithms. However, the error floor phenomenon observed in MP decoding, which manifests itself as an abrupt change in the slope of the error-rate curve,…
Comprehensive Iran Sanctions, Accountability, and Divestment Act of 2009
Sen. Dodd, Christopher J. [D-CT
2009-11-19
Senate - 09/23/2010 Message received in the Senate: Returned to the Senate pursuant to the provisions of H.Res. 1653. (All Actions) Notes: For further action, see H.R.2194, which became Public Law 111-195 on 7/1/2010. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
Federal Aviation Administration Extension Act of 2010
Sen. Rockefeller, John D., IV [D-WV
2010-03-25
Senate - 09/23/2010 Message received in the Senate: Returned to the Senate pursuant to the provisions of H.Res. 1653. (All Actions) Notes: For further action, see H.R.5147, which became Public Law 111-161 on 4/30/2010. Tracker: This bill has the status Passed SenateHere are the steps for Status of Legislation:
Minority Journalists Push Media to Maintain Diversity Commitment
ERIC Educational Resources Information Center
Anderson, Michelle D.
2008-01-01
In a media advisory released earlier this month, the National Association of Black Journalists (NABJ) had an urgent message for the newspaper industry: Diversity should not be treated like a passing fad and it should continue to be a top priority. The advocacy organization, where a majority of its 4,000 members are Black print journalists, warned…
From Our Practices to Yours: Key Messages for the Journey to Integrated Behavioral Health.
Gold, Stephanie B; Green, Larry A; Peek, C J
The historic, cultural separation of primary care and behavioral health has caused the spread of integrated care to lag behind other practice transformation efforts. The Advancing Care Together study was a 3-year evaluation of how practices implemented integrated care in their local contexts; at its culmination, practice leaders ("innovators") identified lessons learned to pass on to others. Individual feedback from innovators, key messages created by workgroups of innovators and the study team, and a synthesis of key messages from a facilitated discussion were analyzed for themes via immersion/crystallization. Five key themes were captured: (1) frame integrated care as a necessary paradigm shift to patient-centered, whole-person health care; (2) initialize: define relationships and protocols up-front, understanding they will evolve; (3) build inclusive, empowered teams to provide the foundation for integration; (4) develop a change management strategy of continuous evaluation and course-correction; and (5) use targeted data collection pertinent to integrated care to drive improvement and impart accountability. Innovators integrating primary care and behavioral health discerned key messages from their practical experience that they felt were worth sharing with others. Their messages present insight into the challenges unique to integrating care beyond other practice transformation efforts. © Copyright 2017 by the American Board of Family Medicine.
Concurrent hypercube system with improved message passing
NASA Technical Reports Server (NTRS)
Peterson, John C. (Inventor); Tuazon, Jesus O. (Inventor); Lieberman, Don (Inventor); Pniel, Moshe (Inventor)
1989-01-01
A network of microprocessors, or nodes, are interconnected in an n-dimensional cube having bidirectional communication links along the edges of the n-dimensional cube. Each node's processor network includes an I/O subprocessor dedicated to controlling communication of message packets along a bidirectional communication link with each end thereof terminating at an I/O controlled transceiver. Transmit data lines are directly connected from a local FIFO through each node's communication link transceiver. Status and control signals from the neighboring nodes are delivered over supervisory lines to inform the local node that the neighbor node's FIFO is empty and the bidirectional link between the two nodes is idle for data communication. A clocking line between neighbors, clocks a message into an empty FIFO at a neighbor's node and vica versa. Either neighbor may acquire control over the bidirectional communication link at any time, and thus each node has circuitry for checking whether or not the communication link is busy or idle, and whether or not the receive FIFO is empty. Likewise, each node can empty its own FIFO and in turn deliver a status signal to a neighboring node indicating that the local FIFO is empty. The system includes features of automatic message rerouting, block message transfer and automatic parity checking and generation.
Mechanic: The MPI/HDF code framework for dynamical astronomy
NASA Astrophysics Data System (ADS)
Słonina, Mariusz; Goździewski, Krzysztof; Migaszewski, Cezary
2015-01-01
We introduce the Mechanic, a new open-source code framework. It is designed to reduce the development effort of scientific applications by providing unified API (Application Programming Interface) for configuration, data storage and task management. The communication layer is based on the well-established Message Passing Interface (MPI) standard, which is widely used on variety of parallel computers and CPU-clusters. The data storage is performed within the Hierarchical Data Format (HDF5). The design of the code follows core-module approach which allows to reduce the user’s codebase and makes it portable for single- and multi-CPU environments. The framework may be used in a local user’s environment, without administrative access to the cluster, under the PBS or Slurm job schedulers. It may become a helper tool for a wide range of astronomical applications, particularly focused on processing large data sets, such as dynamical studies of long-term orbital evolution of planetary systems with Monte Carlo methods, dynamical maps or evolutionary algorithms. It has been already applied in numerical experiments conducted for Kepler-11 (Migaszewski et al., 2012) and νOctantis planetary systems (Goździewski et al., 2013). In this paper we describe the basics of the framework, including code listings for the implementation of a sample user’s module. The code is illustrated on a model Hamiltonian introduced by (Froeschlé et al., 2000) presenting the Arnold diffusion. The Arnold web is shown with the help of the MEGNO (Mean Exponential Growth of Nearby Orbits) fast indicator (Goździewski et al., 2008a) applied onto symplectic SABAn integrators family (Laskar and Robutel, 2001).
Performance issues for domain-oriented time-driven distributed simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1987-01-01
It has long been recognized that simulations form an interesting and important class of computations that may benefit from distributed or parallel processing. Since the point of parallel processing is improved performance, the recent proliferation of multiprocessors requires that we consider the performance issues that naturally arise when attempting to implement a distributed simulation. Three such issues are: (1) the problem of mapping the simulation onto the architecture, (2) the possibilities for performing redundant computation in order to reduce communication, and (3) the avoidance of deadlock due to distributed contention for message-buffer space. These issues are discussed in the context of a battlefield simulation implemented on a medium-scale multiprocessor message-passing architecture.
Parallel integer sorting with medium and fine-scale parallelism
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1993-01-01
Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
Reducing Interprocessor Dependence in Recoverable Distributed Shared Memory
NASA Technical Reports Server (NTRS)
Janssens, Bob; Fuchs, W. Kent
1994-01-01
Checkpointing techniques in parallel systems use dependency tracking and/or message logging to ensure that a system rolls back to a consistent state. Traditional dependency tracking in distributed shared memory (DSM) systems is expensive because of high communication frequency. In this paper we show that, if designed correctly, a DSM system only needs to consider dependencies due to the transfer of blocks of data, resulting in reduced dependency tracking overhead and reduced potential for rollback propagation. We develop an ownership timestamp scheme to tolerate the loss of block state information and develop a passive server model of execution where interactions between processors are considered atomic. With our scheme, dependencies are significantly reduced compared to the traditional message-passing model.
Oddone, María Julieta
2013-04-01
This article presents the content (discourse) analysis of messages transmitted by primary school readers in the period between 1880 to 2012. This study allowed us to explore the image of old age and aging that society has and passes on to new generations as well as the role assigned to this generational group. The historical periods that provide the context for the data were defined according to the continuity of or the turning points in the social values transmitted in the reading materials. The role assigned to elderly people and the image of old age that the Argentine society passed on and continues to pass on to younger generations demonstrate that each period described has its own model of aging.
Data communications in a parallel active messaging interface of a parallel computer
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E
2013-11-12
Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.
Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E
2014-02-11
Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Monitoring activities of satellite data processing services in real-time with SDDS Live Monitor
NASA Astrophysics Data System (ADS)
Duc Nguyen, Minh
2017-10-01
This work describes Live Monitor, the monitoring subsystem of SDDS - an automated system for space experiment data processing, storage, and distribution created at SINP MSU. Live Monitor allows operators and developers of satellite data centers to identify errors occurred in data processing quickly and to prevent further consequences caused by the errors. All activities of the whole data processing cycle are illustrated via a web interface in real-time. Notification messages are delivered to responsible people via emails and Telegram messenger service. The flexible monitoring mechanism implemented in Live Monitor allows us to dynamically change and control events being shown on the web interface on our demands. Physicists, whose space weather analysis models are functioning upon satellite data provided by SDDS, can use the developed RESTful API to monitor their own events and deliver customized notification messages by their needs.
NASA Astrophysics Data System (ADS)
Marshall, Stuart; Thaler, Jon; Schalk, Terry; Huffer, Michael
2006-06-01
The LSST Camera Control System (CCS) will manage the activities of the various camera subsystems and coordinate those activities with the LSST Observatory Control System (OCS). The CCS comprises a set of modules (nominally implemented in software) which are each responsible for managing one camera subsystem. Generally, a control module will be a long lived "server" process running on an embedded computer in the subsystem. Multiple control modules may run on a single computer or a module may be implemented in "firmware" on a subsystem. In any case control modules must exchange messages and status data with a master control module (MCM). The main features of this approach are: (1) control is distributed to the local subsystem level; (2) the systems follow a "Master/Slave" strategy; (3) coordination will be achieved by the exchange of messages through the interfaces between the CCS and its subsystems. The interface between the camera data acquisition system and its downstream clients is also presented.
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2014-08-12
Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Fencing data transfers in a parallel active messaging interface of a parallel computer
Blocksome, Michael A.; Mamidala, Amith R.
2015-06-02
Fencing data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task; the compute nodes coupled for data communications through the PAMI and through data communications resources including at least one segment of shared random access memory; including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers through a segment of shared memory; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints.
Blocksome, Michael A.; Mamidala, Amith R.
2013-09-03
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Fencing data transfers in a parallel active messaging interface of a parallel computer
Blocksome, Michael A.; Mamidala, Amith R.
2015-06-09
Fencing data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task; the compute nodes coupled for data communications through the PAMI and through data communications resources including at least one segment of shared random access memory; including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers through a segment of shared memory; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints.
Fencing data transfers in a parallel active messaging interface of a parallel computer
Blocksome, Michael A.; Mamidala, Amith R.
2015-08-11
Fencing data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint comprising a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI and through data communications resources including a deterministic data communications network, including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints.
Blocksome, Michael A; Mamidala, Amith R
2014-02-11
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Fencing data transfers in a parallel active messaging interface of a parallel computer
Blocksome, Michael A.; Mamidala, Amith R.
2015-06-30
Fencing data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint comprising a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI and through data communications resources including a deterministic data communications network, including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints.
Blocksome, Michael A.; Mamidala, Amith R.
2015-07-07
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Blocksome, Michael A.; Mamidala, Amith R.
2015-07-14
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Cross-language Babel structs—making scientific interfaces more efficient
NASA Astrophysics Data System (ADS)
Prantl, Adrian; Ebner, Dietmar; Epperly, Thomas G. W.
2013-01-01
Babel is an open-source language interoperability framework tailored to the needs of high-performance scientific computing. As an integral element of the Common Component Architecture, it is employed in a wide range of scientific applications where it is used to connect components written in different programming languages. In this paper we describe how we extended Babel to support interoperable tuple data types (structs). Structs are a common idiom in (mono-lingual) scientific application programming interfaces (APIs); they are an efficient way to pass tuples of nonuniform data between functions, and are supported natively by most programming languages. Using our extended version of Babel, developers of scientific codes can now pass structs as arguments between functions implemented in any of the supported languages. In C, C++, Fortran 2003/2008 and Chapel, structs can be passed without the overhead of data marshaling or copying, providing language interoperability at minimal cost. Other supported languages are Fortran 77, Fortran 90/95, Java and Python. We will show how we designed a struct implementation that is interoperable with all of the supported languages and present benchmark data to compare the performance of all language bindings, highlighting the differences between languages that offer native struct support and an object-oriented interface with getter/setter methods. A case study shows how structs can help simplify the interfaces of scientific codes significantly.
Google glass: a driver distraction cause or cure?
Sawyer, Ben D; Finomore, Victor S; Calvo, Andres A; Hancock, P A
2014-11-01
We assess the driving distraction potential of texting with Google Glass (Glass), a mobile wearable platform capable of receiving and sending short-message-service and other messaging formats. A known roadway danger, texting while driving has been targeted by legislation and widely banned. Supporters of Glass claim the head-mounted wearable computer is designed to deliver information without concurrent distraction. Existing literature supports the supposition that design decisions incorporated in Glass might facilitate messaging for drivers. We asked drivers in a simulator to drive and use either Glass or a smartphone-based messaging interface, then interrupted them with an emergency brake event. Both the response event and subsequent recovery were analyzed. Glass-delivered messages served to moderate but did not eliminate distracting cognitive demands. A potential passive cost to drivers merely wearing Glass was also observed. Messaging using either device impaired driving as compared to driving without multitasking. Glass in not a panacea as some supporters claim, but it does point the way to design interventions that effect reduced load in multitasking. Discussions of these identified benefits are framed within the potential of new in-vehicle systems that bring both novel forms of distraction and tools for mitigation into the driver's seat.
Chang, Sun-Il; Yoon, Euisik
2009-01-01
We report an energy efficient pseudo open-loop amplifier with programmable band-pass filter developed for neural interface systems. The proposed amplifier consumes 400nA at 2.5V power supply. The measured thermal noise level is 85nV/ radicalHz and input-referred noise is 1.69microV(rms) from 0.3Hz to 1 kHz. The amplifier has a noise efficiency factor of 2.43, the lowest in the differential topologies reported up to date to our knowledge. By programming the switched-capacitor frequency and bias current, we could control the bandwidth of the preamplifier from 138 mHz to 2.2 kHz to meet various application requirements. The entire preamplifier including band-pass filters has been realized in a small area of 0.043mm(2) using a 0.25microm CMOS technology.
Soft real-time alarm messages for ATLAS TDAQ
NASA Astrophysics Data System (ADS)
Darlea, G.; Al Shabibi, A.; Martin, B.; Lehmann Miotto, G.
2010-05-01
The ATLAS TDAQ network consists of three separate Ethernet-based networks (Data, Control and Management) with over 2000 end-nodes. The TDAQ system has to be aware of the meaningful network failures and events in order for it to take effective recovery actions. The first stage of the process is implemented with Spectrum, a commercial network management tool. Spectrum detects and registers all network events, then it publishes the information via a CORBA programming interface. A gateway program (called NSG—Network Service Gateway) connects to Spectrum through CORBA and exposes to its clients a Java RMI interface. This interface implements a callback mechanism that allows the clients to subscribe for monitoring "interesting" parts of the network. The last stage of the TDAQ network monitoring tool is implemented in a module named DNC (DAQ to Network Connection), which filters the events that are to be reported to the TDAQ system: it subscribes to the gateway only for the machines that are currently active in the system and it forwards only the alarms that are considered important for the current TDAQ data taking session. The network information is then synthesized and presented in a human-readable format. These messages can be further processed either by the shifter who is in charge, the network expert or the Online Expert System. This article aims to describe the different mechanisms of the chain that transports the network events to the front-end user, as well as the constraints and rules that govern the filtering and the final format of the alarm messages.
Yucca Mountain licensing support network archive assistant.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dunlavy, Daniel M.; Bauer, Travis L.; Verzi, Stephen J.
2008-03-01
This report describes the Licensing Support Network (LSN) Assistant--a set of tools for categorizing e-mail messages and documents, and investigating and correcting existing archives of categorized e-mail messages and documents. The two main tools in the LSN Assistant are the LSN Archive Assistant (LSNAA) tool for recategorizing manually labeled e-mail messages and documents and the LSN Realtime Assistant (LSNRA) tool for categorizing new e-mail messages and documents. This report focuses on the LSNAA tool. There are two main components of the LSNAA tool. The first is the Sandia Categorization Framework, which is responsible for providing categorizations for documents in anmore » archive and storing them in an appropriate Categorization Database. The second is the actual user interface, which primarily interacts with the Categorization Database, providing a way for finding and correcting categorizations errors in the database. A procedure for applying the LSNAA tool and an example use case of the LSNAA tool applied to a set of e-mail messages are provided. Performance results of the categorization model designed for this example use case are presented.« less
Network device interface for digitally interfacing data channels to a controller via a network
NASA Technical Reports Server (NTRS)
Konz, Daniel W. (Inventor); Winkelmann, Joseph P. (Inventor); Ellerbrock, Philip J. (Inventor); Grant, Robert L. (Inventor)
2007-01-01
The present invention provides a network device interface and method for digitally connecting a plurality of data channels, such as sensors, actuators, and subsystems, to a controller using a network bus. The network device interface interprets commands and data received from the controller and polls the data channels in accordance with these commands. Specifically, the network device interface receives digital commands and data from the controller, and based on these commands and data, communicates with the data channels to either retrieve data in the case of a sensor or send data to activate an actuator. Data retrieved from the sensor is converted into digital signals and transmitted to the controller. In some embodiments, network device interfaces associated with different data channels coordinate communications with the other interfaces based on either a transition in a command message sent by the bus controller or a synchronous clock signal.
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations
Páll, Szilárd; Fechner, Martin; Esztermann, Ansgar; de Groot, Bert L.; Grubmüller, Helmut
2015-01-01
The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well‐exploited with a combination of single instruction multiple data, multithreading, and message passing interface (MPI)‐based single program multiple data/multiple program multiple data parallelism while graphics processing units (GPUs) can be used as accelerators to compute interactions off‐loaded from the CPU. Here, we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance‐to‐price ratio, energy efficiency, and several other criteria. Although hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer‐class GPUs this improvement equally reflects in the performance‐to‐price ratio. Although memory issues in consumer‐class GPUs could pass unnoticed as these cards do not support error checking and correction memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost‐efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well‐balanced ratio of CPU and consumer‐class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. PMID:26238484
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations.
Kutzner, Carsten; Páll, Szilárd; Fechner, Martin; Esztermann, Ansgar; de Groot, Bert L; Grubmüller, Helmut
2015-10-05
The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well-exploited with a combination of single instruction multiple data, multithreading, and message passing interface (MPI)-based single program multiple data/multiple program multiple data parallelism while graphics processing units (GPUs) can be used as accelerators to compute interactions off-loaded from the CPU. Here, we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance-to-price ratio, energy efficiency, and several other criteria. Although hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer-class GPUs this improvement equally reflects in the performance-to-price ratio. Although memory issues in consumer-class GPUs could pass unnoticed as these cards do not support error checking and correction memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost-efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well-balanced ratio of CPU and consumer-class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.
Commanding and Controlling Satellite Clusters (IEEE Intelligent Systems, November/December 2000)
2000-01-01
real - time operating system , a message-passing OS well suited for distributed...ground Flight processors ObjectAgent RTOS SCL RTOS RDMS Space command language Real - time operating system Rational database management system TS-21 RDMS...engineer with Princeton Satellite Systems. She is working with others to develop ObjectAgent software to run on the OSE Real Time Operating System .
Sharing Content and Experiences in Smart Environments
NASA Astrophysics Data System (ADS)
Plomp, Johan; Heinilä, Juhani; Ikonen, Veikko; Kaasinen, Eija; Välkkynen, Pasi
Once upon a time… Stories used to be the only way to pass a message. The story teller would take his audience through the events by mere oration. Here and there he would hesitate, whisper, or gesticulate to emphasise his story or induce the right emotions in his audience. No doubt troubadourswere loved, they both brought news of the world as well as entertainment.
Dhamodharan, Udaya Suriya Raj Kumar; Vayanaperumal, Rajamani
2015-01-01
Wireless sensor networks are highly indispensable for securing network protection. Highly critical attacks of various kinds have been documented in wireless sensor network till now by many researchers. The Sybil attack is a massive destructive attack against the sensor network where numerous genuine identities with forged identities are used for getting an illegal entry into a network. Discerning the Sybil attack, sinkhole, and wormhole attack while multicasting is a tremendous job in wireless sensor network. Basically a Sybil attack means a node which pretends its identity to other nodes. Communication to an illegal node results in data loss and becomes dangerous in the network. The existing method Random Password Comparison has only a scheme which just verifies the node identities by analyzing the neighbors. A survey was done on a Sybil attack with the objective of resolving this problem. The survey has proposed a combined CAM-PVM (compare and match-position verification method) with MAP (message authentication and passing) for detecting, eliminating, and eventually preventing the entry of Sybil nodes in the network. We propose a scheme of assuring security for wireless sensor network, to deal with attacks of these kinds in unicasting and multicasting.
Dhamodharan, Udaya Suriya Raj Kumar; Vayanaperumal, Rajamani
2015-01-01
Wireless sensor networks are highly indispensable for securing network protection. Highly critical attacks of various kinds have been documented in wireless sensor network till now by many researchers. The Sybil attack is a massive destructive attack against the sensor network where numerous genuine identities with forged identities are used for getting an illegal entry into a network. Discerning the Sybil attack, sinkhole, and wormhole attack while multicasting is a tremendous job in wireless sensor network. Basically a Sybil attack means a node which pretends its identity to other nodes. Communication to an illegal node results in data loss and becomes dangerous in the network. The existing method Random Password Comparison has only a scheme which just verifies the node identities by analyzing the neighbors. A survey was done on a Sybil attack with the objective of resolving this problem. The survey has proposed a combined CAM-PVM (compare and match-position verification method) with MAP (message authentication and passing) for detecting, eliminating, and eventually preventing the entry of Sybil nodes in the network. We propose a scheme of assuring security for wireless sensor network, to deal with attacks of these kinds in unicasting and multicasting. PMID:26236773
Streaming data analytics via message passing with application to graph algorithms
Plimpton, Steven J.; Shead, Tim
2014-05-06
The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of eithermore » message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.« less
Teaching Controversial Topics to Skeptical High School Students
NASA Astrophysics Data System (ADS)
Ford, K. S.
2012-12-01
Tennessee passes the "Monkey Bill" (HB 0368, SB 0893), North Carolina's state government passes a law to criminalize reference in state documents to scientific models that predict sea level rise to reach at least one meter by the next century, and public concern still lags far behind the scientific community's concern on climate change. The American public and even science teachers across the country seem to have lost faith in the ability of the scientific community to unify a strong message about several important scientific lessons, including global warming in particular. This lack of a unified message has weakened the ability of science teachers to effectively teach the lesson of global warming. For science teachers in strongly conservative areas of the country, it is much easier to omit difficult topics and avoid angering parents and school board members. Teachers who do feel strongly about scientifically proven, yet publically controversial topics CAN teach these topics in conservative areas by confirming students' belief systems by being honest and open about motivations surrounding both sides of controversial topics, and by using vocabulary that avoids triggering negative perceptions about these controversial topics. For true learning and change of preconceived opinion to take place, it is important for students to come to the understanding in their own minds in an open and safe learning environment instead of having the message "preached" to them, which only serves to make them feel unintelligent and defensive if they disagree. This presentation will include lessons learned from a practicing science teacher who works in a community that overwhelmingly disputes the validity of human impacts on climate change.
Applications of Multi-Channel Safety Authentication Protocols in Wireless Networks.
Chen, Young-Long; Liau, Ren-Hau; Chang, Liang-Yu
2016-01-01
People can use their web browser or mobile devices to access web services and applications which are built into these servers. Users have to input their identity and password to login the server. The identity and password may be appropriated by hackers when the network environment is not safe. The multiple secure authentication protocol can improve the security of the network environment. Mobile devices can be used to pass the authentication messages through Wi-Fi or 3G networks to serve as a second communication channel. The content of the message number is not considered in a multiple secure authentication protocol. The more excessive transmission of messages would be easier to collect and decode by hackers. In this paper, we propose two schemes which allow the server to validate the user and reduce the number of messages using the XOR operation. Our schemes can improve the security of the authentication protocol. The experimental results show that our proposed authentication protocols are more secure and effective. In regard to applications of second authentication communication channels for a smart access control system, identity identification and E-wallet, our proposed authentication protocols can ensure the safety of person and property, and achieve more effective security management mechanisms.
Bad News Has Wings: Dread Risk Mediates Social Amplification in Risk Communication.
Jagiello, Robert D; Hills, Thomas T
2018-05-29
Social diffusion of information amplifies risk through processes of birth, death, and distortion of message content. Dread risk-involving uncontrollable, fatal, involuntary, and catastrophic outcomes (e.g., terrorist attacks and nuclear accidents)-may be particularly susceptible to amplification because of the psychological biases inherent in dread risk avoidance. To test this, initially balanced information about high or low dread topics was given to a set of individuals who then communicated this information through diffusion chains, each person passing a message to the next. A subset of these chains were also reexposed to the original information. We measured prior knowledge, perceived risk before and after transmission, and, at each link, number of positive and negative statements. Results showed that the more a message was transmitted the more negative statements it contained. This was highest for the high dread topic. Increased perceived risk and production of negative messages was closely related to the amount of negative information that was received, with domain knowledge mitigating this effect. Reexposue to the initial information was ineffectual in reducing bias, demonstrating the enhanced danger of socially transmitted information. © 2018 Society for Risk Analysis.
Keyboard and message evaluation for cockpit input to data link
DOT National Transportation Integrated Search
1971-11-01
The project reported-herein studied some methods for implementation of the man-machine interface of Digital Data Link for Air Traffic Control. An analysis of information transfer requirements indicated that a vocabulary or less than 200 words could y...
NASA Technical Reports Server (NTRS)
Begault, Durand R.; Bittner, Rachel M.; Anderson, Mark R.
2012-01-01
Auditory communication displays within the NextGen data link system may use multiple synthetic speech messages replacing traditional ATC and company communications. The design of an interface for selecting amongst multiple incoming messages can impact both performance (time to select, audit and release a message) and preference. Two design factors were evaluated: physical pressure-sensitive switches versus flat panel "virtual switches", and the presence or absence of auditory feedback from switch contact. Performance with stimuli using physical switches was 1.2 s faster than virtual switches (2.0 s vs. 3.2 s); auditory feedback provided a 0.54 s performance advantage (2.33 s vs. 2.87 s). There was no interaction between these variables. Preference data were highly correlated with performance.
Information and the Origin of Qualia
Orpwood, Roger
2017-01-01
This article argues that qualia are a likely outcome of the processing of information in local cortical networks. It uses an information-based approach and makes a distinction between information structures (the physical embodiment of information in the brain, primarily patterns of action potentials), and information messages (the meaning of those structures to the brain, and the basis of qualia). It develops formal relationships between these two kinds of information, showing how information structures can represent messages, and how information messages can be identified from structures. The article applies this perspective to basic processing in cortical networks or ensembles, showing how networks can transform between the two kinds of information. The article argues that an input pattern of firing is identified by a network as an information message, and that the output pattern of firing generated is a representation of that message. If a network is encouraged to develop an attractor state through attention or other re-entrant processes, then the message identified each time physical information is cycled through the network becomes “representation of the previous message”. Using an example of olfactory perception, it is shown how this piggy-backing of messages on top of previous messages could lead to olfactory qualia. The message identified on each pass of information could evolve from inner identity, to inner form, to inner likeness or image. The outcome is an olfactory quale. It is shown that the same outcome could result from information cycled through a hierarchy of networks in a resonant state. The argument for qualia generation is applied to other sensory modalities, showing how, through a process of brain-wide constraint satisfaction, a particular state of consciousness could develop at any given moment. Evidence for some of the key predictions of the theory is presented, using ECoG data and studies of gamma oscillations and attractors, together with an outline of what further evidence is needed to provide support for the theory. PMID:28484376
NCC Simulation Model: Simulating the operations of the network control center, phase 2
NASA Technical Reports Server (NTRS)
Benjamin, Norman M.; Paul, Arthur S.; Gill, Tepper L.
1992-01-01
The simulation of the network control center (NCC) is in the second phase of development. This phase seeks to further develop the work performed in phase one. Phase one concentrated on the computer systems and interconnecting network. The focus of phase two will be the implementation of the network message dialogues and the resources controlled by the NCC. These resources are requested, initiated, monitored and analyzed via network messages. In the NCC network messages are presented in the form of packets that are routed across the network. These packets are generated, encoded, decoded and processed by the network host processors that generate and service the message traffic on the network that connects these hosts. As a result, the message traffic is used to characterize the work done by the NCC and the connected network. Phase one of the model development represented the NCC as a network of bi-directional single server queues and message generating sources. The generators represented the external segment processors. The served based queues represented the host processors. The NCC model consists of the internal and external processors which generate message traffic on the network that links these hosts. To fully realize the objective of phase two it is necessary to identify and model the processes in each internal processor. These processes live in the operating system of the internal host computers and handle tasks such as high speed message exchanging, ISN and NFE interface, event monitoring, network monitoring, and message logging. Inter process communication is achieved through the operating system facilities. The overall performance of the host is determined by its ability to service messages generated by both internal and external processors.
Network device interface for digitally interfacing data channels to a controller via a network
NASA Technical Reports Server (NTRS)
Ellerbrock, Philip J. (Inventor); Grant, Robert L. (Inventor); Winkelmann, Joseph P. (Inventor); Konz, Daniel W. (Inventor)
2009-01-01
A communications system and method are provided for digitally connecting a plurality of data channels, such as sensors, actuators, and subsystems, to a controller using a network bus. The network device interface interprets commands and data received from the controller and polls the data channels in accordance with these commands. Specifically, the network device interface receives digital commands and data from the controller, and based on these commands and data, communicates with the data channels to either retrieve data in the case of a sensor or send data to activate an actuator. Data retrieved from the sensor is converted into digital signals and transmitted to the controller. Network device interfaces associated with different data channels can coordinate communications with the other interfaces based on either a transition in a command message sent by the bus controller or a synchronous clock signal.
Portable Life Support System: PLSS 101
NASA Technical Reports Server (NTRS)
Thomas, Gretchen A.
2011-01-01
This presentation reviewed basic interfaces and considerations necessary for prototype suit hardware integration from an advanced spacesuit engineer perspective during the early design and test phases. The discussion included such topics such as the human interface, suit pass-throughs, keep-out zones, hardware form factors, subjective feedback from suit tests, and electricity in the suit.
Clandestine Message Passing in Virtual Environments
2008-09-01
accessed April 4, 2008). Weir, Laila. “Boring Game? Outsorce It.” (August 24, 2004). http://www.wired.com/ entertainment / music /news/2004/08/ 64638...Multiplayer Online MOVES - Modeling Virtual Environments and Simulation MTV – Music Television NPS - Naval Postgraduate School PAN – Personal Area...Network PSP - PlayStation Portable RPG – Role-playing Game SL - Second Life SVN - Subversion VE – Virtual Environments vMTV – Virtual Music
ERIC Educational Resources Information Center
FitzGerald Murphy, Maggie
2016-01-01
Despite the fact that over fifty years have passed since its publication, Stuart Hall's article "The Supply of Demand" (1960), is remarkably relevant today. The central message that society must not be blinded by "prosperity" such that it no longer envisions and demands a better world is especially pertinent in light of the…
The Guardian. Volume 9, Number 3, Winter 2007
2007-01-01
message correctly. Muslims will ultimately determine whether the ideology of al Qaeda, its affiliates, franchisees , and fellow travelers represents...opined that the effort would not work or would not pass the cost– benefit comparison. Others wanted to develop comprehensive NATO doctrine to guide...full BAT capability, the access-control benefits of BAT/HIIDE are a significant improvement to overall ISAF force protection, even in stand-alone mode
PETSc Users Manual Revision 3.7
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balay, Satish; Abhyankar, S.; Adams, M.
This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication.
PETSc Users Manual Revision 3.8
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balay, S.; Abhyankar, S.; Adams, M.
This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication.
Statistical Inference in Graphical Models
2008-06-17
fuse probability theory and graph theory in such a way as to permit efficient rep- resentation and computation with probability distributions. They...message passing. 59 viii 1. INTRODUCTION In approaching real-world problems, we often need to deal with uncertainty. Probability and statis- tics provide a...dynamic programming methods. However, for many sensors of interest, the signal-to-noise ratio does not allow such a treatment. Another source of
Extending Word Highlighting in Multiparticipant Chat
2013-05-01
PAGE unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 [05:40] < daddy > is there a way to change 11.04 interface back to...10.10 [05:40] <DrFrankenstein> daddy : launching programs from a drop down menu instead of the screen with the icons? [05:40] <Soupermanito> yes, log out...user daddy asks a question about Unity, referring to it as the “11.04 interface.” For the corpus, we labeled only those messages that concern topics
Architectural impact of FDDI network on scheduling hard real-time traffic
NASA Technical Reports Server (NTRS)
Agrawal, Gopal; Chen, Baio; Zhao, Wei; Davari, Sadegh
1991-01-01
The architectural impact on guaranteeing synchronous message deadlines in FDDI (Fiber Distributed Data Interface) token ring networks is examined. The FDDI network does not have facility to support (global) priority arbitration which is a useful facility for scheduling hard real time activities. As a result, it was found that the worst case utilization of synchronous traffic in an FDDI network can be far less than that in a centralized single processor system. Nevertheless, it is proposed and analyzed that a scheduling method can guarantee deadlines of synchronous messages having traffic utilization up to 33 pct., the highest to date.
High-Performance Design Patterns for Modern Fortran
Haveraaen, Magne; Morris, Karla; Rouson, Damian; ...
2015-01-01
This paper presents ideas for using coordinate-free numerics in modern Fortran to achieve code flexibility in the partial differential equation (PDE) domain. We also show how Fortran, over the last few decades, has changed to become a language well-suited for state-of-the-art software development. Fortran’s new coarray distributed data structure, the language’s class mechanism, and its side-effect-free, pure procedure capability provide the scaffolding on which we implement HPC software. These features empower compilers to organize parallel computations with efficient communication. We present some programming patterns that support asynchronous evaluation of expressions comprised of parallel operations on distributed data. We implemented thesemore » patterns using coarrays and the message passing interface (MPI). We compared the codes’ complexity and performance. The MPI code is much more complex and depends on external libraries. The MPI code on Cray hardware using the Cray compiler is 1.5–2 times faster than the coarray code on the same hardware. The Intel compiler implements coarrays atop Intel’s MPI library with the result apparently being 2–2.5 times slower than manually coded MPI despite exhibiting nearly linear scaling efficiency. As compilers mature and further improvements to coarrays comes in Fortran 2015, we expect this performance gap to narrow.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kostin, Mikhail; Mokhov, Nikolai; Niita, Koji
A parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. It is intended to be used with older radiation transport codes implemented in Fortran77, Fortran 90 or C. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was developed and tested in conjunction with the MARS15 code. It is possible to use it with other codes such as PHITS, FLUKA andmore » MCNP after certain adjustments. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. The framework corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.« less
An Efficient Multiblock Method for Aerodynamic Analysis and Design on Distributed Memory Systems
NASA Technical Reports Server (NTRS)
Reuther, James; Alonso, Juan Jose; Vassberg, John C.; Jameson, Antony; Martinelli, Luigi
1997-01-01
The work presented in this paper describes the application of a multiblock gridding strategy to the solution of aerodynamic design optimization problems involving complex configurations. The design process is parallelized using the MPI (Message Passing Interface) Standard such that it can be efficiently run on a variety of distributed memory systems ranging from traditional parallel computers to networks of workstations. Substantial improvements to the parallel performance of the baseline method are presented, with particular attention to their impact on the scalability of the program as a function of the mesh size. Drag minimization calculations at a fixed coefficient of lift are presented for a business jet configuration that includes the wing, body, pylon, aft-mounted nacelle, and vertical and horizontal tails. An aerodynamic design optimization is performed with both the Euler and Reynolds Averaged Navier-Stokes (RANS) equations governing the flow solution and the results are compared. These sample calculations establish the feasibility of efficient aerodynamic optimization of complete aircraft configurations using the RANS equations as the flow model. There still exists, however, the need for detailed studies of the importance of a true viscous adjoint method which holds the promise of tackling the minimization of not only the wave and induced components of drag, but also the viscous drag.
Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing
NASA Astrophysics Data System (ADS)
Amooie, M. A.; Moortgat, J.
2017-12-01
We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.
A Comparison of PETSC Library and HPF Implementations of an Archetypal PDE Computation
NASA Technical Reports Server (NTRS)
Hayder, M. Ehtesham; Keyes, David E.; Mehrotra, Piyush
1997-01-01
Two paradigms for distributed-memory parallel computation that free the application programmer from the details of message passing are compared for an archetypal structured scientific computation a nonlinear, structured-grid partial differential equation boundary value problem using the same algorithm on the same hardware. Both paradigms, parallel libraries represented by Argonne's PETSC, and parallel languages represented by the Portland Group's HPF, are found to be easy to use for this problem class, and both are reasonably effective in exploiting concurrency after a short learning curve. The level of involvement required by the application programmer under either paradigm includes specification of the data partitioning (corresponding to a geometrically simple decomposition of the domain of the PDE). Programming in SPAM style for the PETSC library requires writing the routines that discretize the PDE and its Jacobian, managing subdomain-to-processor mappings (affine global- to-local index mappings), and interfacing to library solver routines. Programming for HPF requires a complete sequential implementation of the same algorithm, introducing concurrency through subdomain blocking (an effort similar to the index mapping), and modest experimentation with rewriting loops to elucidate to the compiler the latent concurrency. Correctness and scalability are cross-validated on up to 32 nodes of an IBM SP2.
The Forest Method as a New Parallel Tree Method with the Sectional Voronoi Tessellation
NASA Astrophysics Data System (ADS)
Yahagi, Hideki; Mori, Masao; Yoshii, Yuzuru
1999-09-01
We have developed a new parallel tree method which will be called the forest method hereafter. This new method uses the sectional Voronoi tessellation (SVT) for the domain decomposition. The SVT decomposes a whole space into polyhedra and allows their flat borders to move by assigning different weights. The forest method determines these weights based on the load balancing among processors by means of the overload diffusion (OLD). Moreover, since all the borders are flat, before receiving the data from other processors, each processor can collect enough data to calculate the gravity force with precision. Both the SVT and the OLD are coded in a highly vectorizable manner to accommodate on vector parallel processors. The parallel code based on the forest method with the Message Passing Interface is run on various platforms so that a wide portability is guaranteed. Extensive calculations with 15 processors of Fujitsu VPP300/16R indicate that the code can calculate the gravity force exerted on 105 particles in each second for some ideal dark halo. This code is found to enable an N-body simulation with 107 or more particles for a wide dynamic range and is therefore a very powerful tool for the study of galaxy formation and large-scale structure in the universe.
Porting marine ecosystem model spin-up using transport matrices to GPUs
NASA Astrophysics Data System (ADS)
Siewertsen, E.; Piwonski, J.; Slawig, T.
2013-01-01
We have ported an implementation of the spin-up for marine ecosystem models based on transport matrices to graphics processing units (GPUs). The original implementation was designed for distributed-memory architectures and uses the Portable, Extensible Toolkit for Scientific Computation (PETSc) library that is based on the Message Passing Interface (MPI) standard. The spin-up computes a steady seasonal cycle of ecosystem tracers with climatological ocean circulation data as forcing. Since the transport is linear with respect to the tracers, the resulting operator is represented by matrices. Each iteration of the spin-up involves two matrix-vector multiplications and the evaluation of the used biogeochemical model. The original code was written in C and Fortran. On the GPU, we use the Compute Unified Device Architecture (CUDA) standard, a customized version of PETSc and a commercial CUDA Fortran compiler. We describe the extensions to PETSc and the modifications of the original C and Fortran codes that had to be done. Here we make use of freely available libraries for the GPU. We analyze the computational effort of the main parts of the spin-up for two exemplar ecosystem models and compare the overall computational time to those necessary on different CPUs. The results show that a consumer GPU can compete with a significant number of cluster CPUs without further code optimization.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sreepathi, Sarat; D'Azevedo, Eduardo; Philip, Bobby
On large supercomputers, the job scheduling systems may assign a non-contiguous node allocation for user applications depending on available resources. With parallel applications using MPI (Message Passing Interface), the default process ordering does not take into account the actual physical node layout available to the application. This contributes to non-locality in terms of physical network topology and impacts communication performance of the application. In order to mitigate such performance penalties, this work describes techniques to identify suitable task mapping that takes the layout of the allocated nodes as well as the application's communication behavior into account. During the first phasemore » of this research, we instrumented and collected performance data to characterize communication behavior of critical US DOE (United States - Department of Energy) applications using an augmented version of the mpiP tool. Subsequently, we developed several reordering methods (spectral bisection, neighbor join tree etc.) to combine node layout and application communication data for optimized task placement. We developed a tool called mpiAproxy to facilitate detailed evaluation of the various reordering algorithms without requiring full application executions. This work presents a comprehensive performance evaluation (14,000 experiments) of the various task mapping techniques in lowering communication costs on Titan, the leadership class supercomputer at Oak Ridge National Laboratory.« less
Liwo, Adam; Ołdziej, Stanisław; Czaplewski, Cezary; Kleinerman, Dana S.; Blood, Philip; Scheraga, Harold A.
2010-01-01
We report the implementation of our united-residue UNRES force field for simulations of protein structure and dynamics with massively parallel architectures. In addition to coarse-grained parallelism already implemented in our previous work, in which each conformation was treated by a different task, we introduce a fine-grained level in which energy and gradient evaluation are split between several tasks. The Message Passing Interface (MPI) libraries have been utilized to construct the parallel code. The parallel performance of the code has been tested on a professional Beowulf cluster (Xeon Quad Core), a Cray XT3 supercomputer, and two IBM BlueGene/P supercomputers with canonical and replica-exchange molecular dynamics. With IBM BlueGene/P, about 50 % efficiency and 120-fold speed-up of the fine-grained part was achieved for a single trajectory of a 767-residue protein with use of 256 processors/trajectory. Because of averaging over the fast degrees of freedom, UNRES provides an effective 1000-fold speed-up compared to the experimental time scale and, therefore, enables us to effectively carry out millisecond-scale simulations of proteins with 500 and more amino-acid residues in days of wall-clock time. PMID:20305729
NASA Astrophysics Data System (ADS)
Wu, J.; Yang, Y.; Luo, Q.; Wu, J.
2012-12-01
This study presents a new hybrid multi-objective evolutionary algorithm, the niched Pareto tabu search combined with a genetic algorithm (NPTSGA), whereby the global search ability of niched Pareto tabu search (NPTS) is improved by the diversification of candidate solutions arose from the evolving nondominated sorting genetic algorithm II (NSGA-II) population. Also, the NPTSGA coupled with the commonly used groundwater flow and transport codes, MODFLOW and MT3DMS, is developed for multi-objective optimal design of groundwater remediation systems. The proposed methodology is then applied to a large-scale field groundwater remediation system for cleanup of large trichloroethylene (TCE) plume at the Massachusetts Military Reservation (MMR) in Cape Cod, Massachusetts. Furthermore, a master-slave (MS) parallelization scheme based on the Message Passing Interface (MPI) is incorporated into the NPTSGA to implement objective function evaluations in distributed processor environment, which can greatly improve the efficiency of the NPTSGA in finding Pareto-optimal solutions to the real-world application. This study shows that the MS parallel NPTSGA in comparison with the original NPTS and NSGA-II can balance the tradeoff between diversity and optimality of solutions during the search process and is an efficient and effective tool for optimizing the multi-objective design of groundwater remediation systems under complicated hydrogeologic conditions.
NASA Astrophysics Data System (ADS)
Destefano, Anthony; Heerikhuisen, Jacob
2015-04-01
Fully 3D particle simulations can be a computationally and memory expensive task, especially when high resolution grid cells are required. The problem becomes further complicated when parallelization is needed. In this work we focus on computational methods to solve these difficulties. Hilbert curves are used to map the 3D particle space to the 1D contiguous memory space. This method of organization allows for minimized cache misses on the GPU as well as a sorted structure that is equivalent to an octal tree data structure. This type of sorted structure is attractive for uses in adaptive mesh implementations due to the logarithm search time. Implementations using the Message Passing Interface (MPI) library and NVIDIA's parallel computing platform CUDA will be compared, as MPI is commonly used on server nodes with many CPU's. We will also compare static grid structures with those of adaptive mesh structures. The physical test bed will be simulating heavy interstellar atoms interacting with a background plasma, the heliosphere, simulated from fully consistent coupled MHD/kinetic particle code. It is known that charge exchange is an important factor in space plasmas, specifically it modifies the structure of the heliosphere itself. We would like to thank the Alabama Supercomputer Authority for the use of their computational resources.
NASA Technical Reports Server (NTRS)
Smith, Dan
2007-01-01
The Goddard Mission Services Evolution Center, or GMSEC, was started in 2001 to create a new standard approach for managing GSFC missions. Standardized approaches in the past involved selecting and then integrating the most appropriate set of functional tools. Assumptions were made that "one size fits all" and that tool changes would not be necessary for many years. GMSEC took a very different approach and has proven to be very successful. The core of the GMSEC architecture consists of a publish/subscribe message bus, standardized message formats, and an Applications Programming Interface (API). The API supports multiple operating systems, programming languages and messaging middleware products. We use a GMSEC-developed free middleware for low-cost development. A high capacity, robust middleware is used for operations and a messaging system with a very small memory footprint is used for on-board flight software. Software components can use the standard message formats or develop adapters to convert from their native formats to the GMSEC formats. We do not want vendors to modify their core products. Over 50 software components are now available for use with the GMSEC architecture. Most available commercial telemetry and command systems, including the GMV hifly Satellite Control System, have been adapted to run in the GMSEC labs.
Sandia Compact Sensor Node (SCSN) v. 1.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
HARRINGTON, JOHN
2009-01-07
The SCSN communication protocol is implemented in software and incorporates elements of Frequency Division Multiple Access (FDMA), Time Division Multiple Access (TDMA), and Carrier Sense Multiple Access (CSMA) to reduce radio message collisions, latency, and power consumption. Alarm messages are expeditiously routed to a central node as a 'star' network with minimum overhead. Other messages can be routed along network links between any two nodes so that peer-to-peer communication is possible. Broadcast messages can be composed that flood the entire network or just specific portions with minimal radio traffic and latency. Two-way communication with sensor nodes, which sleep most ofmore » the time to conserve battery life, can occur at seven second intervals. SCSN software also incorporates special algorithms to minimize superfluous radio traffic that can result from excessive intrusion alarm messages. A built-in seismic detector is implemented with a geophone and software that distinguishes between pedestrian and vehicular targets. Other external sensors can be attached to a SCSN using supervised interface lines that are controlled by software. All software is written in the ANSI C language for ease of development, maintenance, and portability.« less
A Case for Application Oblivious Energy-Efficient MPI Runtime
DOE Office of Scientific and Technical Information (OSTI.GOV)
Venkatesh, Akshay; Vishnu, Abhinav; Hamidouche, Khaled
Power has become the major impediment in designing large scale high-end systems. Message Passing Interface (MPI) is the {\\em de facto} communication interface used as the back-end for designing applications, programming models and runtime for these systems. Slack --- the time spent by an MPI process in a single MPI call --- provides a potential for energy and power savings, if an appropriate power reduction technique such as core-idling/Dynamic Voltage and Frequency Scaling (DVFS) can be applied without perturbing application's execution time. Existing techniques that exploit slack for power savings assume that application behavior repeats across iterations/executions. However, an increasingmore » use of adaptive, data-dependent workloads combined with system factors (OS noise, congestion) makes this assumption invalid. This paper proposes and implements Energy Aware MPI (EAM) --- an application-oblivious energy-efficient MPI runtime. EAM uses a combination of communication models of common MPI primitives (point-to-point, collective, progress, blocking/non-blocking) and an online observation of slack for maximizing energy efficiency. Each power lever incurs time overhead, which must be amortized over slack to minimize degradation. When predicted communication time exceeds a lever overhead, the lever is used {\\em as soon as possible} --- to maximize energy efficiency. When mis-prediction occurs, the lever(s) are used automatically at specific intervals for amortization. We implement EAM using MVAPICH2 and evaluate it on ten applications using up to 4096 processes. Our performance evaluation on an InfiniBand cluster indicates that EAM can reduce energy consumption by 5--41\\% in comparison to the default approach, with negligible (less than 4\\% in all cases) performance loss.« less
EON: software for long time simulations of atomic scale systems
NASA Astrophysics Data System (ADS)
Chill, Samuel T.; Welborn, Matthew; Terrell, Rye; Zhang, Liang; Berthet, Jean-Claude; Pedersen, Andreas; Jónsson, Hannes; Henkelman, Graeme
2014-07-01
The EON software is designed for simulations of the state-to-state evolution of atomic scale systems over timescales greatly exceeding that of direct classical dynamics. States are defined as collections of atomic configurations from which a minimization of the potential energy gives the same inherent structure. The time evolution is assumed to be governed by rare events, where transitions between states are uncorrelated and infrequent compared with the timescale of atomic vibrations. Several methods for calculating the state-to-state evolution have been implemented in EON, including parallel replica dynamics, hyperdynamics and adaptive kinetic Monte Carlo. Global optimization methods, including simulated annealing, basin hopping and minima hopping are also implemented. The software has a client/server architecture where the computationally intensive evaluations of the interatomic interactions are calculated on the client-side and the state-to-state evolution is managed by the server. The client supports optimization for different computer architectures to maximize computational efficiency. The server is written in Python so that developers have access to the high-level functionality without delving into the computationally intensive components. Communication between the server and clients is abstracted so that calculations can be deployed on a single machine, clusters using a queuing system, large parallel computers using a message passing interface, or within a distributed computing environment. A generic interface to the evaluation of the interatomic interactions is defined so that empirical potentials, such as in LAMMPS, and density functional theory as implemented in VASP and GPAW can be used interchangeably. Examples are given to demonstrate the range of systems that can be modeled, including surface diffusion and island ripening of adsorbed atoms on metal surfaces, molecular diffusion on the surface of ice and global structural optimization of nanoparticles.
PhreeqcRM: A reaction module for transport simulators based on the geochemical model PHREEQC
Parkhurst, David L.; Wissmeier, Laurin
2015-01-01
PhreeqcRM is a geochemical reaction module designed specifically to perform equilibrium and kinetic reaction calculations for reactive transport simulators that use an operator-splitting approach. The basic function of the reaction module is to take component concentrations from the model cells of the transport simulator, run geochemical reactions, and return updated component concentrations to the transport simulator. If multicomponent diffusion is modeled (e.g., Nernst–Planck equation), then aqueous species concentrations can be used instead of component concentrations. The reaction capabilities are a complete implementation of the reaction capabilities of PHREEQC. In each cell, the reaction module maintains the composition of all of the reactants, which may include minerals, exchangers, surface complexers, gas phases, solid solutions, and user-defined kinetic reactants.PhreeqcRM assigns initial and boundary conditions for model cells based on standard PHREEQC input definitions (files or strings) of chemical compositions of solutions and reactants. Additional PhreeqcRM capabilities include methods to eliminate reaction calculations for inactive parts of a model domain, transfer concentrations and other model properties, and retrieve selected results. The module demonstrates good scalability for parallel processing by using multiprocessing with MPI (message passing interface) on distributed memory systems, and limited scalability using multithreading with OpenMP on shared memory systems. PhreeqcRM is written in C++, but interfaces allow methods to be called from C or Fortran. By using the PhreeqcRM reaction module, an existing multicomponent transport simulator can be extended to simulate a wide range of geochemical reactions. Results of the implementation of PhreeqcRM as the reaction engine for transport simulators PHAST and FEFLOW are shown by using an analytical solution and the reactive transport benchmark of MoMaS.
DOT National Transportation Integrated Search
2013-05-01
Cognitively oriented in-vehicle activities (cell-phone calls, speech interfaces, audio translations of text : messages, etc.) increasingly place non-visual demands on a drivers attention. While a drivers eyes may : remain oriented towards the r...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blocksome, Michael A.; Mamidala, Amith R.
2013-09-03
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segmentmore » of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.« less