Science.gov

Sample records for shared memory multiprocessors

  1. Shared versus distributed memory multiprocessors

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.

    1991-01-01

    The question of whether multiprocessors should have shared or distributed memory has attracted a great deal of attention. Some researchers argue strongly for building distributed memory machines, while others argue just as strongly for programming shared memory multiprocessors. A great deal of research is underway on both types of parallel systems. Special emphasis is placed on systems with a very large number of processors for computation intensive tasks and considers research and implementation trends. It appears that the two types of systems will likely converge to a common form for large scale multiprocessors.

  2. Dynamic Program Phase Detection in Distributed Shared-Memory Multiprocessors

    SciTech Connect

    Ipek, E; Martinez, J F; de Supinski, B R; McKee, S A; Schulz, M

    2006-03-06

    We present a novel hardware mechanism for dynamic program phase detection in distributed shared-memory (DSM) multiprocessors. We show that successful hardware mechanisms for phase detection in uniprocessors do not necessarily work well in DSM systems, since they lack the ability to incorporate the parallel application's global execution information and memory access behavior based on data distribution. We then propose a hardware extension to a well-known uniprocessor mechanism that significantly improves phase detection in the context of DSM multiprocessors. The resulting mechanism is modest in size and complexity, and is transparent to the parallel application.

  3. Efficient ICCG on a shared memory multiprocessor

    NASA Technical Reports Server (NTRS)

    Hammond, Steven W.; Schreiber, Robert

    1989-01-01

    Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.

  4. Performance and scalability aspects of directory-based cache coherence in shared-memory multiprocessors

    SciTech Connect

    Picano, S.; Meyer, D.G.; Brooks, E.D. III; Hoag, J.E.

    1993-05-01

    We present a study that accentuates the performance and scalability aspects of directory-based cache coherence in multiprocessor systems. Using a multiprocessor with a software-based coherence scheme, efficient implementations rely heavily on the programmer`s ability to explicitly manage the memory system, which is typically handled by hardware support on other bus-based, shared memory multiprocessors. We describe a scalable, shared memory, cache coherent multiprocessor and present simulation results obtained on three parallel programs. This multiprocessor configuration exhibits high performance at no additional parallel programming cost.

  5. A robot arm simulation with a shared memory multiprocessor machine

    NASA Technical Reports Server (NTRS)

    Kim, Sung-Soo; Chuang, Li-Ping

    1989-01-01

    A parallel processing scheme for a single chain robot arm is presented for high speed computation on a shared memory multiprocessor. A recursive formulation that is derived from a virtual work form of the d'Alembert equations of motion is utilized for robot arm dynamics. A joint drive system that consists of a motor rotor and gears is included in the arm dynamics model, in order to take into account gyroscopic effects due to the spinning of the rotor. The fine grain parallelism of mechanical and control subsystem models is exploited, based on independent computation associated with bodies, joint drive systems, and controllers. Efficiency and effectiveness of the parallel scheme are demonstrated through simulations of a telerobotic manipulator arm. Two different mechanical subsystem models, i.e., with and without gyroscopic effects, are compared, to show the trade-off between efficiency and accuracy.

  6. Dynamic programming on a shared-memory multiprocessor

    NASA Technical Reports Server (NTRS)

    Edmonds, Phil; Chu, Eleanor; George, Alan

    1993-01-01

    Three new algorithms for solving dynamic programming problems on a shared-memory parallel computer are described. All three algorithms attempt to balance work load, while keeping synchronization cost low. In particular, for a multiprocessor having p processors, an analysis of the best algorithm shows that the arithmetic cost is O(n-cubed/6p) and that the synchronization cost is O(absolute value of log sub C n) if p much less than n, where C = (2p-1)/(2p + 1) and n is the size of the problem. The low synchronization cost is important for machines where synchronization is expensive. Analysis and experiments show that the best algorithm is effective in balancing the work load and producing high efficiency.

  7. Dynamic programming on a shared-memory multiprocessor

    NASA Technical Reports Server (NTRS)

    Edmonds, Phil; Chu, Eleanor; George, Alan

    1993-01-01

    Three new algorithms for solving dynamic programming problems on a shared-memory parallel computer are described. All three algorithms attempt to balance work load, while keeping synchronization cost low. In particular, for a multiprocessor having p processors, an analysis of the best algorithm shows that the arithmetic cost is O(n-cubed/6p) and that the synchronization cost is O(absolute value of log sub C n) if p much less than n, where C = (2p-1)/(2p + 1) and n is the size of the problem. The low synchronization cost is important for machines where synchronization is expensive. Analysis and experiments show that the best algorithm is effective in balancing the work load and producing high efficiency.

  8. MPF: A portable message passing facility for shared memory multiprocessors

    NASA Technical Reports Server (NTRS)

    Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.

    1987-01-01

    The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.

  9. Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

    1998-01-01

    This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.

  10. Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

    1998-01-01

    This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.

  11. Simulation Analysis of Data Sharing in Shared Memory Multiprocessors

    DTIC Science & Technology

    2016-06-14

    relatively i mmune to the benefits of increasing cache size. See Chapter 5. 33 sharing with few processors there is little difference in protocol...of coherency overhead, but for the write-broadcast protocols only (see Table 4-11). The percentage difference between Cost of Berkelev Ownership...Berkeley Ownership 80.12 Fireflv -0.04 Table 4-11: Comparison of Write Run Model to Realistic Simulation This table contains the percentage difference

  12. Evolution of an Operating System for Large-Scale Shared-Memory Multiprocessors

    DTIC Science & Technology

    1989-03-01

    ACM Operat- ing Systems Review 19:5. [7] Crowl, L. A., "Shared Memory Multiprocessors and Sequential Programming Languages: A Case Study," Proceedings...Principles, 14-16 December 1981, pp. 64-75. In ACM Operating Systems Review 15:5. [20] Redell, D., "Experience with Topaz TeleDebugging," Proceedings, ACM...34The Interface Between Distributed Operating System and High-Level Programming Language," Proceedings of the 1986 International Conference on Parallel

  13. Data Type Coherency in Heterogeneous Shared Memory Multiprocessors

    DTIC Science & Technology

    1990-12-01

    Symposium on Computer Architecture, 1986, pp. 424-433. [BiFo87] Bisiani, R. and A. Forin, "Architectural Support for Multilanguage Parallel Programming on...Spring 87, IEEE Computer Society , 1987. [Rash88] Rashid, R.F., et al, "Machine-independent Virtual Memory Management for Paged Uniprocessor and

  14. Parallel-vector algorithms for particle simulations on shared-memory multiprocessors

    SciTech Connect

    Nishiura, Daisuke; Sakaguchi, Hide

    2011-03-01

    Over the last few decades, the computational demands of massive particle-based simulations for both scientific and industrial purposes have been continuously increasing. Hence, considerable efforts are being made to develop parallel computing techniques on various platforms. In such simulations, particles freely move within a given space, and so on a distributed-memory system, load balancing, i.e., assigning an equal number of particles to each processor, is not guaranteed. However, shared-memory systems achieve better load balancing for particle models, but suffer from the intrinsic drawback of memory access competition, particularly during (1) paring of contact candidates from among neighboring particles and (2) force summation for each particle. Here, novel algorithms are proposed to overcome these two problems. For the first problem, the key is a pre-conditioning process during which particle labels are sorted by a cell label in the domain to which the particles belong. Then, a list of contact candidates is constructed by pairing the sorted particle labels. For the latter problem, a table comprising the list indexes of the contact candidate pairs is created and used to sum the contact forces acting on each particle for all contacts according to Newton's third law. With just these methods, memory access competition is avoided without additional redundant procedures. The parallel efficiency and compatibility of these two algorithms were evaluated in discrete element method (DEM) simulations on four types of shared-memory parallel computers: a multicore multiprocessor computer, scalar supercomputer, vector supercomputer, and graphics processing unit. The computational efficiency of a DEM code was found to be drastically improved with our algorithms on all but the scalar supercomputer. Thus, the developed parallel algorithms are useful on shared-memory parallel computers with sufficient memory bandwidth.

  15. Avoiding and tolerating latency in large-scale next-generation shared-memory multiprocessors

    NASA Technical Reports Server (NTRS)

    Probst, David K.

    1993-01-01

    A scalable solution to the memory-latency problem is necessary to prevent the large latencies of synchronization and memory operations inherent in large-scale shared-memory multiprocessors from reducing high performance. We distinguish latency avoidance and latency tolerance. Latency is avoided when data is brought to nearby locales for future reference. Latency is tolerated when references are overlapped with other computation. Latency-avoiding locales include: processor registers, data caches used temporally, and nearby memory modules. Tolerating communication latency requires parallelism, allowing the overlap of communication and computation. Latency-tolerating techniques include: vector pipelining, data caches used spatially, prefetching in various forms, and multithreading in various forms. Relaxing the consistency model permits increased use of avoidance and tolerance techniques. Each model is a mapping from the program text to sets of partial orders on program operations; it is a convention about which temporal precedences among program operations are necessary. Information about temporal locality and parallelism constrains the use of avoidance and tolerance techniques. Suitable architectural primitives and compiler technology are required to exploit the increased freedom to reorder and overlap operations in relaxed models.

  16. Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry

    1998-01-01

    This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.

  17. Reader set encoding for directory of shared cache memory in multiprocessor system

    DOEpatents

    Ahn, Dnaiel; Ceze, Luis H.; Gara, Alan; Ohmacht, Martin; Xiaotong, Zhuang

    2014-06-10

    In a parallel processing system with speculative execution, conflict checking occurs in a directory lookup of a cache memory that is shared by all processors. In each case, the same physical memory address will map to the same set of that cache, no matter which processor originated that access. The directory includes a dynamic reader set encoding, indicating what speculative threads have read a particular line. This reader set encoding is used in conflict checking. A bitset encoding is used to specify particular threads that have read the line.

  18. Shared performance monitor in a multiprocessor system

    DOEpatents

    Chiu, George; Gara, Alan G; Salapura, Valentina

    2014-12-02

    A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU is further programmed to monitor event signals issued from non-processor devices.

  19. Layer-by-layer ordering in parallel finite element composition on shared-memory multiprocessors

    NASA Astrophysics Data System (ADS)

    Novikov, A. K.; Piminova, N. K.; Kopysov, S. P.; Sagdeeva, YA

    2016-11-01

    In this paper, we present new partitioning algorithms for unstructured meshes that prevent conflicts during parallel assembling of FEM matrices and vectors in shared memory. These algorithms use a ratio which we introduce to determine if any two mesh cells are adjacent. This adjacency ratio defines mesh layers, which are combined into domains and assigned to different parallel processes/threads. The proposed partitioning algorithms are compared with the existing algorithms on quasi-structured and unstructured meshes by the number of potential conflicts and by the load imbalance.

  20. Sparse Gaussian elimination with controlled fill-in on a shared memory multiprocessor

    NASA Technical Reports Server (NTRS)

    Alaghband, Gita; Jordan, Harry F.

    1989-01-01

    It is shown that in sparse matrices arising from electronic circuits, it is possible to do computations on many diagonal elements simultaneously. A technique for obtaining an ordered compatible set directly from the ordered incompatible table is given. The ordering is based on the Markowitz number of the pivot candidates. This technique generates a set of compatible pivots with the property of generating few fills. A novel heuristic algorithm is presented that combines the idea of an order-compatible set with a limited binary tree search to generate several sets of compatible pivots in linear time. An elimination set for reducing the matrix is generated and selected on the basis of a minimum Markowitz sum number. The parallel pivoting technique presented is a stepwise algorithm and can be applied to any submatrix of the original matrix. Thus, it is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds. Parameters are suggested to obtain a balance between parallelism and fill-ins. Results of applying the proposed algorithms on several large application matrices using the HEP multiprocessor (Kowalik, 1985) are presented and analyzed.

  1. Shared performance monitor in a multiprocessor system

    DOEpatents

    Chiu, George; Gara, Alan G.; Salapura, Valentina

    2012-07-24

    A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU comprises: a plurality of performance counters each for counting signals representing occurrences of events from one or more the plurality of processor units in the multiprocessor system; and, a plurality of input devices for receiving the event signals from one or more processor devices of the plurality of processor units, the plurality of input devices programmable to select event signals for receipt by one or more of the plurality of performance counters for counting, wherein the PMU is shared between multiple processing units, or within a group of processors in the multiprocessing system. The PMU is further programmed to monitor event signals issued from non-processor devices.

  2. A general model for memory interference in a multiprocessor system with memory hierarchy

    NASA Technical Reports Server (NTRS)

    Taha, Badie A.; Standley, Hilda M.

    1989-01-01

    The problem of memory interference in a multiprocessor system with a hierarchy of shared buses and memories is addressed. The behavior of the processors is represented by a sequence of memory requests with each followed by a determined amount of processing time. A statistical queuing network model for determining the extent of memory interference in multiprocessor systems with clusters of memory hierarchies is presented. The performance of the system is measured by the expected number of busy memory clusters. The results of the analytic model are compared with simulation results, and the correlation between them is found to be very high.

  3. A general model for memory interference in a multiprocessor system with memory hierarchy

    NASA Technical Reports Server (NTRS)

    Taha, Badie A.; Standley, Hilda M.

    1989-01-01

    The problem of memory interference in a multiprocessor system with a hierarchy of shared buses and memories is addressed. The behavior of the processors is represented by a sequence of memory requests with each followed by a determined amount of processing time. A statistical queuing network model for determining the extent of memory interference in multiprocessor systems with clusters of memory hierarchies is presented. The performance of the system is measured by the expected number of busy memory clusters. The results of the analytic model are compared with simulation results, and the correlation between them is found to be very high.

  4. Preliminary basic performance analysis of the Cedar multiprocessor memory system

    NASA Technical Reports Server (NTRS)

    Gallivan, K.; Jalby, W.; Turner, S.; Veidenbaum, A.; Wijshoff, H.

    1991-01-01

    Some preliminary basic results on the performance of the Cedar multiprocessor memory system are presented. Empirical results are presented and used to calibrate a memory system simulator which is then used to discuss the scalability of the system.

  5. Relaxing consistency in recoverable distributed shared memory

    NASA Technical Reports Server (NTRS)

    Janssens, Bob; Fuchs, W. K.

    1993-01-01

    Relaxed memory consistency models have recently been proposed to tolerate memory access latency in both hardware and software distributed shared memory systems. In recoverable shared memory multiprocessors, relaxing consistency has the added benefit of reducing the number of checkpoints needed to avoid rollback propagation. In this paper, we introduce new checkpointing algorithms that take advantage of relaxed consistency to reduce the performance overhead of checkpointing. We also introduce a scheme based on lazy relaxed consistency, that reduces both checkpointing overhead and the overhead of avoiding error propagation in systems with error latency. Multiprocessor address traces are used to evaluate the relaxed consistency approach to checkpointing with distributed shared memory.

  6. Optimal eigenvalue computation on distributed-memory MIMD multiprocessors

    SciTech Connect

    Crivelli, S.; Jessup, E. R.

    1992-10-01

    Simon proves that bisection is not the optimal method for computing an eigenvalue on a single vector processor. In this paper, we show that his analysis does not extend in a straightforward way to the computation of an eigenvalue on a distributed-memory MIMD multiprocessor. In particular, we show how the optimal number of sections (and processors) to use for multisection depends on variables such as the matrix size and certain parameters inherent to the machine. We also show that parallel multisection outperforms the variant of parallel bisection proposed by Swarztrauber or this problem on a distributed-memory MIMD multiprocessor. We present the results of experiments on the 64-processor Intel iPSC/2 hypercube and the 512-processor Intel Touchstone Delta mesh multiprocessor.

  7. Scalable Triadic Analysis of Large-Scale Graphs: Multi-Core vs. Multi-Processor vs. Multi-Threaded Shared Memory Architectures

    SciTech Connect

    Chin, George; Marquez, Andres; Choudhury, Sutanay; Feo, John T.

    2012-09-01

    Triadic analysis encompasses a useful set of graph mining methods that is centered on the concept of a triad, which is a subgraph of three nodes and the configuration of directed edges across the nodes. Such methods are often applied in the social sciences as well as many other diverse fields. Triadic methods commonly operate on a triad census that counts the number of triads of every possible edge configuration in a graph. Like other graph algorithms, triadic census algorithms do not scale well when graphs reach tens of millions to billions of nodes. To enable the triadic analysis of large-scale graphs, we developed and optimized a triad census algorithm to efficiently execute on shared memory architectures. We will retrace the development and evolution of a parallel triad census algorithm. Over the course of several versions, we continually adapted the code’s data structures and program logic to expose more opportunities to exploit parallelism on shared memory that would translate into improved computational performance. We will recall the critical steps and modifications that occurred during code development and optimization. Furthermore, we will compare the performances of triad census algorithm versions on three specific systems: Cray XMT, HP Superdome, and AMD multi-core NUMA machine. These three systems have shared memory architectures but with markedly different hardware capabilities to manage parallelism.

  8. Using Pin as a Memory Reference Generator for Multiprocessor Simulation

    SciTech Connect

    McCurdy, C

    2005-10-22

    In this paper we describe how we have used Pin to generate a multithreaded reference stream for simulation of a multiprocessor on a uniprocessor. We have taken special care to model as accurately as possible the effects of cache coherence protocol state, and lock and barrier synchronization on the performance of multithreaded applications running on multiprocessor hardware. We first describe a simplified version of the algorithm, which uses semaphores to synchronize instrumented application threads and the simulator on every memory reference. We then describe modifications to that algorithm to model the microarchitectural features of the Itanium2 that affect the timing of memory reference issue. An experimental evaluation determines that while cycle-accurate multithreaded simulation is possible using our approach, the use of semaphores has a negative impact on the performance of the simulator.

  9. Relaxing consistency in recoverable distributed shared memory

    NASA Technical Reports Server (NTRS)

    Janssens, Bob; Fuchs, W. K.

    1993-01-01

    Relaxed memory consistency models tolerate increased memory access latency in both hardware and software distributed shared memory systems. In recoverable systems, relaxing consistency has the added benefit of reducing the number of checkpoints needed to avoid rollback propagation. In this paper, we introduce new checkpointing algorithms that take advantage of relaxed consistency to reduce the performance overhead of checkpointing. We also introduce a scheme based on lazy relaxed consistency, that reduces both checkpointing overhead and the overhead of avoiding error propagation in systems with error latency. We use multiprocessor address traces to evaluate the relaxed consistency approach to checkpointing with distributed shared memory.

  10. Low Latency Messages on Distributed Memory Multiprocessors

    DOE PAGES

    Rosing, Matt; Saltz, Joel

    1995-01-01

    This article describes many of the issues in developing an efficient interface for communication on distributed memory machines. Although the hardware component of message latency is less than 1 ws on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 μs. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. This article describes several tests performed and many of the issues involvedmore » in supporting low latency messages on distributed memory machines.« less

  11. Low latency messages on distributed memory multiprocessors

    NASA Technical Reports Server (NTRS)

    Rosing, Matthew; Saltz, Joel

    1993-01-01

    Many of the issues in developing an efficient interface for communication on distributed memory machines are described and a portable interface is proposed. Although the hardware component of message latency is less than one microsecond on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 microseconds. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. Based on several tests that were run on the iPSC/860, an interface that will better match current distributed memory machines is proposed. The model used in the proposed interface consists of a computation processor and a communication processor on each node. Communication between these processors and other nodes in the system is done through a buffered network. Information that is transmitted is either data or procedures to be executed on the remote processor. The dual processor system is better suited for efficiently handling asynchronous communications compared to a single processor system. The ability to send data or procedure is very flexible for minimizing message latency, based on the type of communication being performed. The test performed and the proposed interface are described.

  12. Software Coherence in Multiprocessor Memory Systems. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Bolosky, William Joseph

    1993-01-01

    Processors are becoming faster and multiprocessor memory interconnection systems are not keeping up. Therefore, it is necessary to have threads and the memory they access as near one another as possible. Typically, this involves putting memory or caches with the processors, which gives rise to the problem of coherence: if one processor writes an address, any other processor reading that address must see the new value. This coherence can be maintained by the hardware or with software intervention. Systems of both types have been built in the past; the hardware-based systems tended to outperform the software ones. However, the ratio of processor to interconnect speed is now so high that the extra overhead of the software systems may no longer be significant. This issue is explored both by implementing a software maintained system and by introducing and using the technique of offline optimal analysis of memory reference traces. It finds that in properly built systems, software maintained coherence can perform comparably to or even better than hardware maintained coherence. The architectural features necessary for efficient software coherence to be profitable include a small page size, a fast trap mechanism, and the ability to execute instructions while remote memory references are outstanding.

  13. Multi-ring performance of the Kendall square multiprocessor

    SciTech Connect

    Dunigan, T.H.

    1994-03-01

    Performance of the hierarchical shared-memory system of the Kendall Square Research multiprocessor is measured and characterized. The performance of prefetch is measured. Latency, bandwidth, and contention are analyzed on a 4-ring, 128 processor system. Scalability comparisons are made with other shared-memory and distributed-memory multiprocessors.

  14. Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

    DOEpatents

    Ohmacht, Martin

    2017-08-15

    In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.

  15. Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

    DOEpatents

    Ohmacht, Martin

    2014-09-09

    In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.

  16. A Tool for Efficient Execution and Development of Repetitive Task Graphs on a Distributed Memory Multiprocessor.

    DTIC Science & Technology

    1995-09-01

    MEMORY MULTIPROCESSOR by Charles Brian Koman September 1995 Thesis Advisor: Amr Zaky Approved for public release; distribution is unlimited. 19960401 032...Koman, Charles Brian 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Naval Postgraduate School REPORT NUMBER Monterey, CA...EXECUTION AND DEVELOPMENT OF REPETITIVE TASK GRAPHS ON A DISTRIBUTED MEMORY MULTIPROCESSOR Charles Brian Koman Lieutenant, United States Navy B.E.E., Georgia

  17. Kendall Square multiprocessor: Early experiences and performance

    SciTech Connect

    Dunigan, T.H.

    1992-04-01

    Initial performance results and early experiences are reported for the Kendall Square Research multiprocessor. The basic architecture of the shared-memory multiprocessor is described, and computational and I/O performance is measured for both serial and parallel programs. Experiences in porting various applications are described.

  18. Optical RAM-enabled cache memory and optical routing for chip multiprocessors: technologies and architectures

    NASA Astrophysics Data System (ADS)

    Pleros, Nikos; Maniotis, Pavlos; Alexoudi, Theonitsa; Fitsios, Dimitris; Vagionas, Christos; Papaioannou, Sotiris; Vyrsokinos, K.; Kanellos, George T.

    2014-03-01

    The processor-memory performance gap, commonly referred to as "Memory Wall" problem, owes to the speed mismatch between processor and electronic RAM clock frequencies, forcing current Chip Multiprocessor (CMP) configurations to consume more than 50% of the chip real-estate for caching purposes. In this article, we present our recent work spanning from Si-based integrated optical RAM cell architectures up to complete optical cache memory architectures for Chip Multiprocessor configurations. Moreover, we discuss on e/o router subsystems with up to Tb/s routing capacity for cache interconnection purposes within CMP configurations, currently pursued within the FP7 PhoxTrot project.

  19. Vienna FORTRAN: A FORTRAN language extension for distributed memory multiprocessors

    NASA Technical Reports Server (NTRS)

    Chapman, Barbara; Mehrotra, Piyush; Zima, Hans

    1991-01-01

    Exploiting the performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna FORTRAN is a language extension of FORTRAN which provides the user with a wide range of facilities for such mapping of data structures. However, programs in Vienna FORTRAN are written using global data references. Thus, the user has the advantage of a shared memory programming paradigm while explicitly controlling the placement of data. The basic features of Vienna FORTRAN are presented along with a set of examples illustrating the use of these features.

  20. A single-assignment language in a distributed memory multiprocessor

    NASA Technical Reports Server (NTRS)

    Evripidou, P.; Najjar, W.; Gaudiot, J.-L.

    1989-01-01

    The implementation of the single-assignment programming language SISAL (McGraw et al., 1985) on a Symult 2010 parallel computer is described. The advantages of single-assignment languages over imperative languages in a multiprocessor environment are reviewed; the characteristics of SISAL are summarized; the program-graph generation and dynamic data partitioning procedures are explained; and the application of SISAL in constructing a concurrent iterative multigrid algorithm is discussed in detail and illustrated with diagrams.

  1. Communications Patterns in a Symbolic Multiprocessor.

    DTIC Science & Technology

    1987-06-01

    tied to memory through a ’ butterfly ’ switching network. All system memory is shared among all the proces- sors. There are no fundamental engineering...Palo Alto, CA, 1.0 edition, November 2 1982. , [51 Development of a Butterfly Multiprocessor Test Bed. Quarterly Technical Re- ,:€ port 5872, Bolt... Monarch multiprocessor. MIT VLSI Seminar, April 14 1987. Author is employed by B.B.N. Inc. [26] Alloan Gottlieb, Ralph Grishman, Clyde Kruskal, Kevin

  2. Direct Deposit -- When Message Passing Meets Shared Memory

    DTIC Science & Technology

    2000-05-19

    by H. Karl in [64]. The paper implements the pure DSM code, the pure message passing code and a few intermediate forms on the Charlotte DSM system [8...ASPLOS VI), pages 51–60, San Jose, October 1994. ACM. [64] H. Karl . Bridging the gap between distributed shared memory and message passing. Concurrency...pages 94 – 101, 1988. [73] P.N. Loewenstein and D.L. Dill. Verification of a multiprocessor cache protocol using simulation relations and higher-order

  3. Conditional load and store in a shared memory

    DOEpatents

    Blumrich, Matthias A; Ohmacht, Martin

    2015-02-03

    A method, system and computer program product for implementing load-reserve and store-conditional instructions in a multi-processor computing system. The computing system includes a multitude of processor units and a shared memory cache, and each of the processor units has access to the memory cache. In one embodiment, the method comprises providing the memory cache with a series of reservation registers, and storing in these registers addresses reserved in the memory cache for the processor units as a result of issuing load-reserve requests. In this embodiment, when one of the processor units makes a request to store data in the memory cache using a store-conditional request, the reservation registers are checked to determine if an address in the memory cache is reserved for that processor unit. If an address in the memory cache is reserved for that processor, the data are stored at this address.

  4. Multiprocessor execution of functional programs

    SciTech Connect

    Goldberg, B. )

    1988-10-01

    Functional languages have recently gained attention as vehicles for programming in a concise and element manner. In addition, it has been suggested that functional programming provides a natural methodology for programming multiprocessor computers. This paper describes research that was performed to demonstrate that multiprocessor execution of functional programs on current multiprocessors is feasible, and results in a significant reduction in their execution times. Two implementations of the functional language ALFL were built on commercially available multiprocessors. Alfalfa is an implementation on the Intel iPSC hypercube multiprocessor, and Buckwheat is an implementation on the Encore Multimax shared-memory multiprocessor. Each implementation includes a compiler that performs automatic decomposition of ALFL programs and a run-time system that supports their execution. The compiler is responsible for detecting the inherent parallelism in a program, and decomposing the program into a collection of tasks, called serial combinators, that can be executed in parallel. The abstract machine model supported by Alfalfa and Buckwheat is called heterogeneous graph reduction, which is a hybrid of graph reduction and conventional stack-oriented execution. This model supports parallelism, lazy evaluation, and higher order functions while at the same time making efficient use of the processors in the system. The Alfalfa and Buckwheat runtime systems support dynamic load balancing, interprocessor communication (if required), and storage management. A large number of experiments were performed on Alfalfa and Buckwheat for a variety of programs. The results of these experiments, as well as the conclusions drawn from them, are presented.

  5. Dynamically reconfigurable multiprocessor system for high-order-bidirectional-associative-memory-based image recognition

    NASA Astrophysics Data System (ADS)

    Wu, Chwan-Hwa; Roland, David A.

    1991-08-01

    In this paper a high-order bidirectional associative memory (HOBAM) based image recognition system and a dynamically reconfigurable multiprocessor system that achieves real- time response are reported. The HOBAM has been utilized to recognize corrupted images of human faces (with hats, glasses, masks, and slight translation and scaling effects). In addition, the HOBAM, incorporated with edge detection techniques, has been used to recognize isolated objects within multiple-object images. Successful recognition rates have been achieved. A dynamically reconfigurable multiprocessor system and parallel software have been developed to achieve real-time response for image recognition. The system consists of Inmos transputers and crossbar switches (IMS C004). The communication links can be dynamically connected by circuit switching. This is the first time and the transputers and crossbar switches are reported to form a low-cost multiprocessor system connected by a switching network. Moreover, the switching network simplifies the design of the communication in parallel software without handling the message routing. Although the HOBAM is a fully connected network, the algorithm minimizes the amount of information that needs to be exchanged between processors using a data compression technique. The detailed design of both hardware and software are discussed in the paper. Significant speedup through parallel processing is accomplished. The architecture of the experimental system is a cost-effective design for an embedded system for neural network applications on computer vision.

  6. Solution of large nonlinear quasistatic structural mechanics problems on distributed-memory multiprocessor computers

    SciTech Connect

    Blanford, M.

    1997-12-31

    Most commercially-available quasistatic finite element programs assemble element stiffnesses into a global stiffness matrix, then use a direct linear equation solver to obtain nodal displacements. However, for large problems (greater than a few hundred thousand degrees of freedom), the memory size and computation time required for this approach becomes prohibitive. Moreover, direct solution does not lend itself to the parallel processing needed for today`s multiprocessor systems. This talk gives an overview of the iterative solution strategy of JAS3D, the nonlinear large-deformation quasistatic finite element program. Because its architecture is derived from an explicit transient-dynamics code, it does not ever assemble a global stiffness matrix. The author describes the approach he used to implement the solver on multiprocessor computers, and shows examples of problems run on hundreds of processors and more than a million degrees of freedom. Finally, he describes some of the work he is presently doing to address the challenges of iterative convergence for ill-conditioned problems.

  7. Compiler-directed cache management in multiprocessors

    NASA Technical Reports Server (NTRS)

    Cheong, Hoichi; Veidenbaum, Alexander V.

    1990-01-01

    The necessity of finding alternatives to hardware-based cache coherence strategies for large-scale multiprocessor systems is discussed. Three different software-based strategies sharing the same goals and general approach are presented. They consist of a simple invalidation approach, a fast selective invalidation scheme, and a version control scheme. The strategies are suitable for shared-memory multiprocessor systems with interconnection networks and a large number of processors. Results of trace-driven simulations conducted on numerical benchmark routines to compare the performance of the three schemes are presented.

  8. A multiprocessor computer simulation model employing a feedback scheduler/allocator for memory space and bandwidth matching and TMR processing

    NASA Technical Reports Server (NTRS)

    Bradley, D. B.; Irwin, J. D.

    1974-01-01

    A computer simulation model for a multiprocessor computer is developed that is useful for studying the problem of matching multiprocessor's memory space, memory bandwidth and numbers and speeds of processors with aggregate job set characteristics. The model assumes an input work load of a set of recurrent jobs. The model includes a feedback scheduler/allocator which attempts to improve system performance through higher memory bandwidth utilization by matching individual job requirements for space and bandwidth with space availability and estimates of bandwidth availability at the times of memory allocation. The simulation model includes provisions for specifying precedence relations among the jobs in a job set, and provisions for specifying precedence execution of TMR (Triple Modular Redundant and SIMPLEX (non redundant) jobs.

  9. A Parallel Crossbar Routing Chip for a Shared Memory Multiprocessor

    DTIC Science & Technology

    1991-03-01

    SPONSORING i MONITORING Office of Naval Research AGENCY REPORT NUMBER Information Systems Arlington, Virginia 22217 11. SUPPLEMENTARY NOTES None l2a...Direct vs. Indirect Networks .............................. 21 2.2.3 Non -Blocking Circuit Switched Networks ....................... 22 2.2.4 Clos...61 5.4.2 P And S Control Signals .................................. 65 5.4.3 Independent 4x4 Routing Mode

  10. Shared Memory Parallelization of an Implicit ADI-type CFD Code

    NASA Technical Reports Server (NTRS)

    Hauser, Th.; Huang, P. G.

    1999-01-01

    A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.

  11. Memory access in shared virtual memory

    SciTech Connect

    Berrendorf, R. )

    1992-01-01

    Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.

  12. Memory access in shared virtual memory

    SciTech Connect

    Berrendorf, R.

    1992-09-01

    Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.

  13. A simple modern correctness condition for a space-based high-performance multiprocessor

    NASA Technical Reports Server (NTRS)

    Probst, David K.; Li, Hon F.

    1992-01-01

    A number of U.S. national programs, including space-based detection of ballistic missile launches, envisage putting significant computing power into space. Given sufficient progress in low-power VLSI, multichip-module packaging and liquid-cooling technologies, we will see design of high-performance multiprocessors for individual satellites. In very high speed implementations, performance depends critically on tolerating large latencies in interprocessor communication; without latency tolerance, performance is limited by the vastly differing time scales in processor and data-memory modules, including interconnect times. The modern approach to tolerating remote-communication cost in scalable, shared-memory multiprocessors is to use a multithreaded architecture, and alter the semantics of shared memory slightly, at the price of forcing the programmer either to reason about program correctness in a relaxed consistency model or to agree to program in a constrained style. The literature on multiprocessor correctness conditions has become increasingly complex, and sometimes confusing, which may hinder its practical application. We propose a simple modern correctness condition for a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and the parallel programming system.

  14. The performance of disk arrays in shared-memory database machines

    NASA Technical Reports Server (NTRS)

    Katz, Randy H.; Hong, Wei

    1993-01-01

    In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metric data temperature as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley.

  15. Multiprocessor execution of functional programs

    SciTech Connect

    Goldberg, B.F.

    1988-01-01

    Functional languages have recently gained attention as vehicles for programming in a concise and elegant manner. In addition, it has been suggested that functional programming provides a natural methodology for programming multiprocessor computers. This dissertation demonstrates that multiprocessor execution of functional programs is feasible, and results in a significant reduction in their execution times. Two implementations of the functional language ALFL were built on commercially available multiprocessors. ALFL is an implementation on the Intel iPSC hypercube multiprocessor, and Buckwheat is an implementation on the Encore Multimax shared-memory multiprocessor. Each implementation includes a compiler that performs automatic decomposition of ALFL programs. The compiler is responsible for detecting the inherent parallelism in a program, and decomposing the program into a collection of tasks, called serial combinators, that can be executed in parallel. One of the primary goals of the compiler is to generate serial combinators exhibiting the coarsest granularity possibly without sacrificing useful parallelism. This dissertation describes the algorithms used by the compiler to analyze, decompose, and optimize functional programs. The abstract machine model supported by Alfalfa and Buckwheat is called heterogeneous graph reduction, which is a hybrid of graph reduction and conventional stack-oriented execution. This model supports parallelism, lazy evaluation, and higher order functions while at the same time making efficient use of the processors in the system. The Alfalfa and Buckwheat run-time systems support dynamic load balancing, interprocessor communication (if required) and storage management. A large number of experiments were performed on Alfalfa and Buckwheat for a variety of programs. The results of these experiments, as well as the conclusions drawn from them, are presented.

  16. Multiprocessor architectural study

    NASA Technical Reports Server (NTRS)

    Kosmala, A. L.; Stanten, S. F.; Vandever, W. H.

    1972-01-01

    An architectural design study was made of a multiprocessor computing system intended to meet functional and performance specifications appropriate to a manned space station application. Intermetrics, previous experience, and accumulated knowledge of the multiprocessor field is used to generate a baseline philosophy for the design of a future SUMC* multiprocessor. Interrupts are defined and the crucial questions of interrupt structure, such as processor selection and response time, are discussed. Memory hierarchy and performance is discussed extensively with particular attention to the design approach which utilizes a cache memory associated with each processor. The ability of an individual processor to approach its theoretical maximum performance is then analyzed in terms of a hit ratio. Memory management is envisioned as a virtual memory system implemented either through segmentation or paging. Addressing is discussed in terms of various register design adopted by current computers and those of advanced design.

  17. Debugging in a multi-processor environment

    SciTech Connect

    Spann, J.M.

    1981-09-29

    The Supervisory Control and Diagnostic System (SCDS) for the Mirror Fusion Test Facility (MFTF) consists of nine 32-bit minicomputers arranged in a tightly coupled distributed computer system utilizing a share memory as the data exchange medium. Debugging of more than one program in the multi-processor environment is a difficult process. This paper describes what new tools were developed and how the testing of software is performed in the SCDS for the MFTF project.

  18. Shared-memory parallel programming in C++

    SciTech Connect

    Beck, B. )

    1990-07-01

    This paper discusses how researchers have produced a set of portable parallel-programming constructs for C, implemented in M4 macros. These parallel-programming macros are available under the name Parmacs. The Parmacs macros let one write parallel C programs for shared-memory, distributed-memory, and mixed-memory (shared and distributed) systems. They have been implemented on several machines. Because Parmacs offers useful parallel-programming features, the author has considered how these problems might be overcome or avoided. The author thought that using C++, rather than C, would address these problems adequately, and describes the C++ features exploited. The work described addresses shared-memory constructs.

  19. Performing an allreduce operation using shared memory

    DOEpatents

    Archer, Charles J [Rochester, MN; Dozsa, Gabor [Ardsley, NY; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

    2012-04-17

    Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.

  20. Performing an allreduce operation using shared memory

    DOEpatents

    Archer, Charles J; Dozsa, Gabor; Ratterman, Joseph D; Smith, Brian E

    2014-06-10

    Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.

  1. Rollback-recovery techniques and architectural support for multiprocessor systems

    SciTech Connect

    Chiang Chungyang.

    1991-01-01

    The author proposes efficient and robust fault diagnosis and rollback-recovery techniques to enhance system availability as well as performance in both distributed-memory and shared-bus shared-memory multiprocessor systems. Architectural support for the proposed rollback-recovery technique in a bus-based shared-memory multiprocessor system is also investigated to adaptively fine tune the proposed rollback-recovery technique in this type of system. A comparison of the performance of the proposed techniques with other existing techniques is made, a topic on which little quantitative information is available in the literature. New diagnosis concepts are introduced to show that the author's diagnosis technique yields higher diagnosis coverage and facilitates the performance evaluation of various fault-diagnosis techniques.

  2. Recoverable distributed shared virtual memory - Memory coherence and storage structures

    NASA Technical Reports Server (NTRS)

    Wu, Kun-Lung; Fuchs, W. Kent

    1989-01-01

    This paper examines the problem of implementing rollback recovery in multicomputer distributed shared virtual memory environments, in which the shared memory is implemented in software and exists only virtually. A user-transparent checkpointing recovery scheme and new twin-page disk storage management are presented to implement a recoverable distributed shared virtual memory. The checkpointing scheme is integrated with the shared virtual memory management. The twin-page disk approach allows incremental checkpointing without an explicit undo at the time of recovery. A single consistent checkpoint state is maintained on stable disk storage. The recoverable distributed shared virtual memory allows the system to restart computation from a previous checkpoint due to a processor failure without a global restart.

  3. Multiprocessor architecture: Synthesis and evaluation

    NASA Technical Reports Server (NTRS)

    Standley, Hilda M.

    1990-01-01

    Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.

  4. Comparative Study of Message Passing and Shared Memory Parallel Programming Models in Neural Network Training

    SciTech Connect

    Vitela, J.; Gordillo, J.; Cortina, L; Hanebutte, U.

    1999-12-14

    It is presented a comparative performance study of a coarse grained parallel neural network training code, implemented in both OpenMP and MPI, standards for shared memory and message passing parallel programming environments, respectively. In addition, these versions of the parallel training code are compared to an implementation utilizing SHMEM the native SGI/CRAY environment for shared memory programming. The multiprocessor platform used is a SGI/Cray Origin 2000 with up to 32 processors. It is shown that in this study, the native CRAY environment outperforms MPI for the entire range of processors used, while OpenMP shows better performance than the other two environments when using more than 19 processors. In this study, the efficiency is always greater than 60% regardless of the parallel programming environment used as well as of the number of processors.

  5. Distributed job scheduling in SCI Local Area MultiProcessors

    SciTech Connect

    Agasaveeran, S.; Li, Qiang

    1996-12-31

    Local Area MultiProcessors (LAMP) is a network of personal workstations with distributed shared physical memory provided by high performance technologies such as SCI. LAMP is more tightly coupled than the traditional local area networks (LAN) but is more loosely coupled than the bus based multiprocessors. This paper presents a distributed scheduling algorithm which exploits the distributed shared memory in SCI-LAMP to schedule the idle remote processors among the requesting workstations. It considers fairness by allocating remote processing capacity to the requesting workstations based on their priorities according to the decay-usage scheduling approach. The performance of the algorithm in scheduling both sequential and parallel jobs is evaluated by simulation. It is found that the higher priority nodes achieve faster job response times and higher speedups than that of the lower priority nodes. Lower scheduling overhead allows finer granularity of remote processors sharing than in LAN.

  6. Direct access inter-process shared memory

    DOEpatents

    Brightwell, Ronald B; Pedretti, Kevin; Hudson, Trammell B

    2013-10-22

    A technique for directly sharing physical memory between processes executing on processor cores is described. The technique includes loading a plurality of processes into the physical memory for execution on a corresponding plurality of processor cores sharing the physical memory. An address space is mapped to each of the processes by populating a first entry in a top level virtual address table for each of the processes. The address space of each of the processes is cross-mapped into each of the processes by populating one or more subsequent entries of the top level virtual address table with the first entry in the top level virtual address table from other processes.

  7. C-MOS array design techniques: SUMC multiprocessor system study

    NASA Technical Reports Server (NTRS)

    Clapp, W. A.; Helbig, W. A.; Merriam, A. S.

    1972-01-01

    The current capabilities of LSI techniques for speed and reliability, plus the possibilities of assembling large configurations of LSI logic and storage elements, have demanded the study of multiprocessors and multiprocessing techniques, problems, and potentialities. Evaluated are three previous systems studies for a space ultrareliable modular computer multiprocessing system, and a new multiprocessing system is proposed that is flexibly configured with up to four central processors, four 1/0 processors, and 16 main memory units, plus auxiliary memory and peripheral devices. This multiprocessor system features a multilevel interrupt, qualified S/360 compatibility for ground-based generation of programs, virtual memory management of a storage hierarchy through 1/0 processors, and multiport access to multiple and shared memory units.

  8. Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer. Volume 1: FTMP principles of operation

    NASA Technical Reports Server (NTRS)

    Smith, T. B., Jr.; Lala, J. H.

    1983-01-01

    The basic organization of the fault tolerant multiprocessor, (FTMP) is that of a general purpose homogeneous multiprocessor. Three processors operate on a shared system (memory and I/O) bus. Replication and tight synchronization of all elements and hardware voting is employed to detect and correct any single fault. Reconfiguration is then employed to repair a fault. Multiple faults may be tolerated as a sequence of single faults with repair between fault occurrences.

  9. Hybrid Memory Management for Parallel Execution of Prolog on Shared Memory Multiprocessors

    DTIC Science & Technology

    1990-06-01

    Danielle Smith, Jerrie Tam, Herve Touati , and Benjamin Zorn. Outside of the C.S. division, I am grateful to a number of close friends that I have made...deallocating space at minimal cost. A study by Touati and Hama [TH88] indicated that for some pro- 3.4. PARALLEL EXECUTION OF PROLOG 31 grams, the...Programming: Proceedings of the North American Conference, MIT Press, 1989. H. Touati and A. Despain. An Empirical Study of the Warren Abstract Ma

  10. Shared memories reveal shared structure in neural activity across individuals

    PubMed Central

    Chen, J.; Leong, Y.C.; Honey, C.J.; Yong, C.H.; Norman, K.A.; Hasson, U.

    2016-01-01

    Our lives revolve around sharing experiences and memories with others. When different people recount the same events, how similar are their underlying neural representations? Participants viewed a fifty-minute movie, then verbally described the events during functional MRI, producing unguided detailed descriptions lasting up to forty minutes. As each person spoke, event-specific spatial patterns were reinstated in default-network, medial-temporal, and high-level visual areas. Individual event patterns were both highly discriminable from one another and similar between people, suggesting consistent spatial organization. In many high-order areas, patterns were more similar between people recalling the same event than between recall and perception, indicating systematic reshaping of percept into memory. These results reveal the existence of a common spatial organization for memories in high-level cortical areas, where encoded information is largely abstracted beyond sensory constraints; and that neural patterns during perception are altered systematically across people into shared memory representations for real-life events. PMID:27918531

  11. Parallel Navier-Stokes computations on shared and distributed memory architectures

    NASA Technical Reports Server (NTRS)

    Hayder, M. Ehtesham; Jayasimha, D. N.; Pillay, Sasi Kumar

    1995-01-01

    We study a high order finite difference scheme to solve the time accurate flow field of a jet using the compressible Navier-Stokes equations. As part of our ongoing efforts, we have implemented our numerical model on three parallel computing platforms to study the computational, communication, and scalability characteristics. The platforms chosen for this study are a cluster of workstations connected through fast networks (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and a distributed memory multiprocessor (the IBM SPI). Our focus in this study is on the LACE testbed. We present some results for the Cray YMP and the IBM SP1 mainly for comparison purposes. On the LACE testbed, we study: (1) the communication characteristics of Ethernet, FDDI, and the ALLNODE networks and (2) the overheads induced by the PVM message passing library used for parallelizing the application. We demonstrate that clustering of workstations is effective and has the potential to be computationally competitive with supercomputers at a fraction of the cost.

  12. An optical simulation of shared memory

    SciTech Connect

    Goldberg, L.A.; Matias, Y.; Rao, S.

    1994-06-01

    We present a work-optimal randomized algorithm for simulating a shared memory machine (PRAM) on an optical communication parallel computer (OCPC). The OCPC model is motivated by the potential of optical communication for parallel computation. The memory of an OCPC is divided into modules, one module per processor. Each memory module only services a request on a timestep if it receives exactly one memory request. Our algorithm simulates each step of an n lg lg n-processor EREW PRAM on an n-processor OCPC in O(lg lg n) expected delay. (The probability that the delay is longer than this is at most n{sup {minus}{alpha}} for any constant {alpha}). The best previous simulation, due to Valiant, required {Theta}(lg n) expected delay.

  13. Checkpointing Shared Memory Programs at the Application-level

    SciTech Connect

    Bronevetsky, G; Schulz, M; Szwed, P; Marques, D; Pingali, K

    2004-09-08

    Trends in high-performance computing are making it necessary for long-running applications to tolerate hardware faults. The most commonly used approach is checkpoint and restart(CPR)-the state of the computation is saved periodically on disk, and when a failure occurs, the computation is restarted from the last saved state. At present, it is the responsibility of the programmer to instrument applications for CPR. Our group is investigating the use of compiler technology to instrument codes to make them self-checkpointing and self-restarting, thereby providing an automatic solution to the problem of making long-running scientific applications resilient to hardware faults. Our previous work focused on message-passing programs. In this paper, we describe such a system for shared-memory programs running on symmetric multiprocessors. The system has two components: (i)a pre-compiler for source-to-source modification of applications, and (ii) a runtime system that implements a protocol for coordinating CPR among the threads of the parallel application. For the sake of concreteness, we focus on a non-trivial subset of OpenMP that includes barriers and locks. One of the advantages of this approach is that the ability to tolerate faults becomes embedded within the application itself, so applications become self-checkpointing and self-restarting on any platform. We demonstrate this by showing that our transformed benchmarks can checkpoint and restart on three different platforms (Windows/x86, Linux/x86, and Tru64/Alpha). Our experiments show that the overhead introduced by this approach is usually quite small; they also suggest ways in which the current implementation can be tuned to reduced overheads further.

  14. Exploring Shared Memory Protocols in FLASH

    SciTech Connect

    Horowitz, Mark; Kunz, Robert; Hall, Mary; Lucas, Robert; Chame, Jacqueline

    2007-04-01

    ABSTRACT The goal of this project was to improve the performance of large scientific and engineering applications through collaborative hardware and software mechanisms to manage the memory hierarchy of non-uniform memory access time (NUMA) shared-memory machines, as well as their component individual processors. In spite of the programming advantages of shared-memory platforms, obtaining good performance for large scientific and engineering applications on such machines can be challenging. Because communication between processors is managed implicitly by the hardware, rather than expressed by the programmer, application performance may suffer from unintended communication – communication that the programmer did not consider when developing his/her application. In this project, we developed and evaluated a collection of hardware, compiler, languages and performance monitoring tools to obtain high performance on scientific and engineering applications on NUMA platforms by managing communication through alternative coherence mechanisms. Alternative coherence mechanisms have often been discussed as a means for reducing unintended communication, although architecture implementations of such mechanisms are quite rare. This report describes an actual implementation of a set of coherence protocols that support coherent, non-coherent and write-update accesses for a CC-NUMA shared-memory architecture, the Stanford FLASH machine. Such an approach has the advantages of using alternative coherence only where it is beneficial, and also provides an evolutionary migration path for improving application performance. We present data on two computations, RandomAccess from the HPC Challenge benchmarks and a forward solver derived from LS-DYNA, showing the performance advantages of the alternative coherence mechanisms. For RandomAccess, the non-coherent and write-update versions can outperform the coherent version by factors of 5 and 2.5, respectively. In LS-DYNA, we obtain

  15. Reducing communication costs in the conjugate gradient algorithm on distributed memory multiprocessors

    SciTech Connect

    D'Azevedo, E.F.; Romine, C.H.

    1992-09-01

    The standard formulation of the conjugate gradient algorithm involves two inner product computations. The results of these two inner products are needed to update the search direction and the computed solution. In a distributed memory parallel environment, the computation and subsequent distribution of these two values requires two separate communication and synchronization phases. In this paper, we present a mathematically equivalent rearrangement of the standard algorithm that reduces the number of communication phases. We give a second derivation of the modified conjugate gradient algorithm in terms of the natural relationship with the underlying Lanczos process. We also present empirical evidence of the stability of this modified algorithm.

  16. Reducing communication costs in the conjugate gradient algorithm on distributed memory multiprocessors

    SciTech Connect

    D`Azevedo, E.F.; Romine, C.H.

    1992-09-01

    The standard formulation of the conjugate gradient algorithm involves two inner product computations. The results of these two inner products are needed to update the search direction and the computed solution. In a distributed memory parallel environment, the computation and subsequent distribution of these two values requires two separate communication and synchronization phases. In this paper, we present a mathematically equivalent rearrangement of the standard algorithm that reduces the number of communication phases. We give a second derivation of the modified conjugate gradient algorithm in terms of the natural relationship with the underlying Lanczos process. We also present empirical evidence of the stability of this modified algorithm.

  17. Impact of Load Balancing on Unstructured Adaptive Grid Computations for Distributed-Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Sohn, Andrew; Biswas, Rupak; Simon, Horst D.

    1996-01-01

    The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a new dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view. Whenever the computational mesh is adapted, JOVE is activated to eliminate the load imbalance. JOVE has been implemented on an IBM SP2 distributed-memory machine in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. We also show that JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.

  18. Shared memory for a fault-tolerant computer

    NASA Technical Reports Server (NTRS)

    Gilley, G. C. (Inventor)

    1976-01-01

    A system is described for sharing a memory in a fault-tolerant computer. The memory is under the direct control and monitoring of error detecting and error diagnostic units in the fault-tolerant computer. This computer verifies that data to and from the memory is legally encoded and verifies that words read from the memory at a desired address are, in fact, actually delivered from that desired address. The means are provided for a second processor, which is independent of the direct control and monitoring of the error checking and diagnostic units of the fault-tolerant computer, and to share the memory of the fault-tolerant computer. Circuitry is included to verify that: (1) the processor has properly accessed a desired memory location in the memory; (2) a data word read-out from the memory is properly coded; and (3) no inactive memory was erroneously outputting data onto the shared memory bus.

  19. Development of an MM5-Based Four Dimensional Variational Analysis System for Distributed Memory Multiprocessor Computers

    NASA Astrophysics Data System (ADS)

    Nehrkorn, T.; Modica, G. D.; Cerniglia, M.; Ruggiero, F. H.; Michalakes, J. G.; Zou, X.

    2001-05-01

    The MM5 four-dimensional variational analysis system (4DVAR) is being updated to allow its efficient execution on parallel distributed memory computers. The previous version of the MM5 4DVAR system (Zou et al. 1998 [3]) is coded for single processor computer architectures and its nonlinear, tangent-linear, and adjoint components are based on version 1 of the MM5. In order to take advantage of the parallelization mechanisms (Michalakes 2000 [2]) already in place for the latest release (Version 3.4) of the MM5 nonlinear model (NLM), the existing (Version 1) tangent linear (TLM) and adjoint model codes are also being updated to Version 3.4. We are using the Tangent Linear and Adjoint Model Compiler (TAMC; Giering and Kaminski 1988 [1]) in this process. The TAMC is a source-to-source translator that generates Fortran code for the TLM or adjoint from the Fortran code of the NLM. While it would be possible to incorporate the TAMC as part of a pre-compilation process--thus requiring the maintenance of the NLM code only--this would require that the NLM code first be modified as needed to result in the correct TLM and adjoint code output by TAMC. For the development of the MM5 adjoint, we have chosen instead to use TAMC as a development tool only, and separately maintain the TLM and adjoint versions of the model code. This approach makes it possible to minimize changes to the MM5 code as supported by NCAR. The TLM and adjoint are tested for correctness, using the standard comparison of the TLM and finite difference gradients to check for correctness of the former, and the definition of the adjoint to check for consistency of the TLM and adjoint. This testing is performed for individual subroutines (unit testing) as well as the complete model integration (unit integration testing), with objective functions designed to test different parts of the model state vector. Testing can be done for the entire model domain, or for selected model grid points. Finally, the TLM and

  20. Shared virtual memory and generalized speedup

    NASA Technical Reports Server (NTRS)

    Sun, Xian-He; Zhu, Jianping

    1994-01-01

    Generalized speedup is defined as parallel speed over sequential speed. The generalized speedup and its relation with other existing performance metrics, such as traditional speedup, efficiency, scalability, etc., are carefully studied. In terms of the introduced asymptotic speed, it was shown that the difference between the generalized speedup and the traditional speedup lies in the definition of the efficiency of uniprocessor processing, which is a very important issue in shared virtual memory machines. A scientific application was implemented on a KSR-1 parallel computer. Experimental and theoretical results show that the generalized speedup is distinct from the traditional speedup and provides a more reasonable measurement. In the study of different speedups, various causes of superlinear speedup are also presented.

  1. Parallel discrete event simulation using shared memory

    NASA Technical Reports Server (NTRS)

    Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

    1988-01-01

    With traditional event-list techniques, evaluating a detailed discrete-event simulation-model can often require hours or even days of computation time. By eliminating the event list and maintaining only sufficient synchronization to ensure causality, parallel simulation can potentially provide speedups that are linear in the numbers of processors. A set of shared-memory experiments, using the Chandy-Misra distributed-simulation algorithm, to simulate networks of queues is presented. Parameters of the study include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential-simulation of most queueing network models.

  2. Validation of multiprocessor systems

    NASA Technical Reports Server (NTRS)

    Siewiorek, D. P.; Segall, Z.; Kong, T.

    1982-01-01

    Experiments that can be used to validate fault free performance of multiprocessor systems in aerospace systems integrating flight controls and avionics are discussed. Engineering prototypes for two fault tolerant multiprocessors are tested.

  3. Study of performance on SMP and distributed memory architectures using a shared memory programming model

    SciTech Connect

    Brooks, E.D.; Warren, K.H.

    1997-08-08

    In this paper we examine the use of a shared memory programming model to address the problem of portability of application codes between distributed memory and shared memory architectures. We do this with an extension of the Parallel C Preprocessor. The extension, borrowed from Split-C and AC, uses type qualifiers instead of storage class modifiers to declare variables that are shared among processors. The type qualifier declaration supports an abstract shared memory facility on distributed memory machines while making direct use of hardware support on shared memory architectures. Our benchmarking study spans a wide range of shared memory and distributed memory platforms. Benchmarks include Gaussian elimination with back substitution, a two-dimensional fast Fourier transform, and a matrix-matrix multiply. We find that the type-qualifier-based shared memory programming model is capable of efficiently spanning both distributed memory and shared memory architectures. Although the resulting shared memory programming model is portable, it does not remove the need to arrange for overlapped or blocked remote memory references on platforms that require these tuning measures in order to obtain good performance.

  4. A parallel numerical simulation for supersonic flows using zonal overlapped grids and local time steps for common and distributed memory multiprocessors

    SciTech Connect

    Patel, N.R.; Sturek, W.B.; Hiromoto, R.

    1989-01-01

    Parallel Navier-Stokes codes are developed to solve both two- dimensional and three-dimensional flow fields in and around ramjet and nose tip configurations. A multi-zone overlapped grid technique is used to extend an explicit finite-difference method to more complicated geometries. Parallel implementations are developed for execution on both distributed and common-memory multiprocessor architectures. For the steady-state solutions, the use of the local time-step method has the inherent advantage of reducing the communications overhead commonly incurred by parallel implementations. Computational results of the codes are given for a series of test problems. The parallel partitioning of computational zones is also discussed. 5 refs., 18 figs.

  5. Solution of the Euler and Navier-Stokes equations on MIMD distributed memory multiprocessors using cyclic reduction

    SciTech Connect

    Curchitser, E.N.; Pelz, R.B.; Marconi, F. Grumman Aerospace Corp., Bethpage, NY )

    1992-01-01

    The Euler and Navier-Stokes equations are solved for the steady, two-dimensional flow over a NACA 0012 airfoil using a 1024 node nCUBE/2 multiprocessor. Second-order, upwind-discretized difference equations are solved implicitly using ADI factorization. Parallel cyclic reduction is employed to solve the block tridiagonal systems. For realistic problems, communication times are negligible compared to calculation times. The processors are tightly synchronized, and their loads are well balanced. When the flux Jacobians flux are frozen, the wall-clock time for one implicit timestep is about equal to that of a multistage explicit scheme. 10 refs.

  6. The hierarchical spatial decomposition of three-dimensional particle- in-cell plasma simulations on MIMD distributed memory multiprocessors

    SciTech Connect

    Walker, D.W.

    1992-07-01

    The hierarchical spatial decomposition method is a promising approach to decomposing the particles and computational grid in parallel particle-in-cell application codes, since it is able to maintain approximate dynamic load balance while keeping communication costs low. In this paper we investigate issues in implementing a hierarchical spatial decomposition on a hypercube multiprocessor. Particular attention is focused on the communication needed to update guard ring data, and on the load balancing method. The hierarchical approach is compared with other dynamic load balancing schemes.

  7. Parallel implementation and evaluation of motion estimation system algorithms on a distributed memory multiprocessor using knowledge based mappings

    NASA Technical Reports Server (NTRS)

    Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

    1989-01-01

    Several techniques to perform static and dynamic load balancing techniques for vision systems are presented. These techniques are novel in the sense that they capture the computational requirements of a task by examining the data when it is produced. Furthermore, they can be applied to many vision systems because many algorithms in different systems are either the same, or have similar computational characteristics. These techniques are evaluated by applying them on a parallel implementation of the algorithms in a motion estimation system on a hypercube multiprocessor system. The motion estimation system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from different time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters. It is shown that the performance gains when these data decomposition and load balancing techniques are used are significant and the overhead of using these techniques is minimal.

  8. Shared Memory versus Message Passing Architectures: An Application Based Study

    DTIC Science & Technology

    1988-11-09

    Driven Processor [3.9]. things are rapidly changing. Using specialized routing chips and the technique of wormhole routing [6]. the network latencies...between the two which force tradeoffs between the two architectures. The shared memory architecture considered in this paper has a single global...application programmer to control the degree of consistency explicitly. In this paper . we explore several such tradeoffs between shared-memory and message

  9. PANDA: A distributed multiprocessor operating system

    SciTech Connect

    Chubb, P.

    1989-01-01

    PANDA is a design for a distributed multiprocessor and an operating system. PANDA is designed to allow easy expansion of both hardware and software. As such, the PANDA kernel provides only message passing and memory and process management. The other features needed for the system (device drivers, secondary storage management, etc.) are provided as replaceable user tasks. The thesis presents PANDA's design and implementation, both hardware and software. PANDA uses multiple 68010 processors sharing memory on a VME bus, each such node potentially connected to others via a high speed network. The machine is completely homogeneous: there are no differences between processors that are detectable by programs running on the machine. A single two-processor node has been constructed. Each processor contains memory management circuits designed to allow processors to share page tables safely. PANDA presents a programmers' model similar to the hardware model: a job is divided into multiple tasks, each having its own address space. Within each task, multiple processes share code and data. Tasks can send messages to each other, and set up virtual circuits between themselves. Peripheral devices such as disc drives are represented within PANDA by tasks. PANDA divides secondary storage into volumes, each volume being accessed by a volume access task, or VAT. All knowledge about the way that data is stored on a disc is kept in its volume's VAT. The design is such that PANDA should provide a useful testbed for file systems and device drivers, as these can be installed without recompiling PANDA itself, and without rebooting the machine.

  10. Rollback Hardware For Time Warp Multiprocessor Systems

    NASA Technical Reports Server (NTRS)

    Robb, Michael J.; Buzzell, Calvin A.

    1996-01-01

    Rollback Chip (RBC) module is computer circuit board containing special-purpose memory circuits for use in multiprocessor computer system. Designed to help realize speedup potential of parallel processing for simulation of discrete events by use of Time Warp operating system.

  11. Rollback Hardware For Time Warp Multiprocessor Systems

    NASA Technical Reports Server (NTRS)

    Robb, Michael J.; Buzzell, Calvin A.

    1996-01-01

    Rollback Chip (RBC) module is computer circuit board containing special-purpose memory circuits for use in multiprocessor computer system. Designed to help realize speedup potential of parallel processing for simulation of discrete events by use of Time Warp operating system.

  12. Neural networks and MIMD-multiprocessors

    NASA Technical Reports Server (NTRS)

    Vanhala, Jukka; Kaski, Kimmo

    1990-01-01

    Two artificial neural network models are compared. They are the Hopfield Neural Network Model and the Sparse Distributed Memory model. Distributed algorithms for both of them are designed and implemented. The run time characteristics of the algorithms are analyzed theoretically and tested in practice. The storage capacities of the networks are compared. Implementations are done using a distributed multiprocessor system.

  13. Multi-processor including data flow accelerator module

    DOEpatents

    Davidson, George S.; Pierce, Paul E.

    1990-01-01

    An accelerator module for a data flow computer includes an intelligent memory. The module is added to a multiprocessor arrangement and uses a shared tagged memory architecture in the data flow computer. The intelligent memory module assigns locations for holding data values in correspondence with arcs leading to a node in a data dependency graph. Each primitive computation is associated with a corresponding memory cell, including a number of slots for operands needed to execute a primitive computation, a primitive identifying pointer, and linking slots for distributing the result of the cell computation to other cells requiring that result as an operand. Circuitry is provided for utilizing tag bits to determine automatically when all operands required by a processor are available and for scheduling the primitive for execution in a queue. Each memory cell of the module may be associated with any of the primitives, and the particular primitive to be executed by the processor associated with the cell is identified by providing an index, such as the cell number for the primitive, to the primitive lookup table of starting addresses. The module thus serves to perform functions previously performed by a number of sections of data flow architectures and coexists with conventional shared memory therein. A multiprocessing system including the module operates in a hybrid mode, wherein the same processing modules are used to perform some processing in a sequential mode, under immediate control of an operating system, while performing other processing in a data flow mode.

  14. Supporting shared data structures on distributed memory architectures

    NASA Technical Reports Server (NTRS)

    Koelbel, Charles; Mehrotra, Piyush; Vanrosendale, John

    1990-01-01

    Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece owned by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. A new programming environment is presented for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. The analysis and program transformations required to implement this environment are described, and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes are described.

  15. Externalising the autobiographical self: sharing personal memories online facilitated memory retention.

    PubMed

    Wang, Qi; Lee, Dasom; Hou, Yubo

    2017-07-01

    Internet technology provides a new means of recalling and sharing personal memories in the digital age. What is the mnemonic consequence of posting personal memories online? Theories of transactive memory and autobiographical memory would make contrasting predictions. In the present study, college students completed a daily diary for a week, listing at the end of each day all the events that happened to them on that day. They also reported whether they posted any of the events online. Participants received a surprise memory test after the completion of the diary recording and then another test a week later. At both tests, events posted online were significantly more likely than those not posted online to be recalled. It appears that sharing memories online may provide unique opportunities for rehearsal and meaning-making that facilitate memory retention.

  16. High Performance, Dependable Multiprocessor

    NASA Technical Reports Server (NTRS)

    Ramos, Jeremy; Samson, John R.; Troxel, Ian; Subramaniyan, Rajagopal; Jacobs, Adam; Greco, James; Cieslewski, Grzegorz; Curreri, John; Fischer, Michael; Grobelny, Eric; George, Alan; Aggarwal, Vikas; Patel, Minesh; Some, Raphael

    2006-01-01

    With the ever increasing demand for higher bandwidth and processing capacity of today's space exploration, space science, and defense missions, the ability to efficiently apply commercial-off-the-shelf (COTS) processors for on-board computing is now a critical need. In response to this need, NASA's New Millennium Program office has commissioned the development of Dependable Multiprocessor (DM) technology for use in payload and robotic missions. The Dependable Multiprocessor technology is a COTS-based, power efficient, high performance, highly dependable, fault tolerant cluster computer. To date, Honeywell has successfully demonstrated a TRL4 prototype of the Dependable Multiprocessor [I], and is now working on the development of a TRLS prototype. For the present effort Honeywell has teamed up with the University of Florida's High-performance Computing and Simulation (HCS) Lab, and together the team has demonstrated major elements of the Dependable Multiprocessor TRLS system.

  17. Distributed simulation using a real-time shared memory network

    NASA Technical Reports Server (NTRS)

    Simon, Donald L.; Mattern, Duane L.; Wong, Edmond; Musgrave, Jeffrey L.

    1993-01-01

    The Advanced Control Technology Branch of the NASA Lewis Research Center performs research in the area of advanced digital controls for aeronautic and space propulsion systems. This work requires the real-time implementation of both control software and complex dynamical models of the propulsion system. We are implementing these systems in a distributed, multi-vendor computer environment. Therefore, a need exists for real-time communication and synchronization between the distributed multi-vendor computers. A shared memory network is a potential solution which offers several advantages over other real-time communication approaches. A candidate shared memory network was tested for basic performance. The shared memory network was then used to implement a distributed simulation of a ramjet engine. The accuracy and execution time of the distributed simulation was measured and compared to the performance of the non-partitioned simulation. The ease of partitioning the simulation, the minimal time required to develop for communication between the processors and the resulting execution time all indicate that the shared memory network is a real-time communication technique worthy of serious consideration.

  18. Sharing Memory Robustly in Message-Passing Systems

    DTIC Science & Technology

    1990-02-16

    ust: a very restricted form of communication. Chor and Moscovici ([20]) present a hierarchy of resiliency for problems in shared-memory systems and...1985. [20] B. Chor, and L. Moscovici , Solvability in Asynchronous Environments, Proc. 30th Syrup. on Foun- * dations of Comp. Sc~en ce, pp. 422-427, 1989

  19. Graphical Visualization on Computational Simulation Using Shared Memory

    NASA Astrophysics Data System (ADS)

    Lima, A. B.; Correa, Eberth

    2014-03-01

    The Shared Memory technique is a powerful tool for parallelizing computer codes. In particular it can be used to visualize the results "on the fly" without stop running the simulation. In this presentation we discuss and show how to use the technique conjugated with a visualization code using openGL.

  20. Multiprocessors and runtime compilation

    NASA Technical Reports Server (NTRS)

    Saltz, Joel; Berryman, Harry; Wu, Janet

    1990-01-01

    Runtime preprocessing plays a major role in many efficient algorithms in computer science, as well as playing an important role in exploiting multiprocessor architectures. Examples are given that elucidate the importance of runtime preprocessing and show how these optimizations can be integrated into compilers. To support the arguments, transformations implemented in prototype multiprocessor compilers are described and benchmarks from the iPSC2/860, the CM-2, and the Encore Multimax/320 are presented.

  1. Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)

    2002-01-01

    The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.

  2. Efficient partitioning and assignment on programs for multiprocessor execution

    NASA Technical Reports Server (NTRS)

    Standley, Hilda M.

    1993-01-01

    The general problem studied is that of segmenting or partitioning programs for distribution across a multiprocessor system. Efficient partitioning and the assignment of program elements are of great importance since the time consumed in this overhead activity may easily dominate the computation, effectively eliminating any gains made by the use of the parallelism. In this study, the partitioning of sequentially structured programs (written in FORTRAN) is evaluated. Heuristics, developed for similar applications are examined. Finally, a model for queueing networks with finite queues is developed which may be used to analyze multiprocessor system architectures with a shared memory approach to the problem of partitioning. The properties of sequentially written programs form obstacles to large scale (at the procedure or subroutine level) parallelization. Data dependencies of even the minutest nature, reflecting the sequential development of the program, severely limit parallelism. The design of heuristic algorithms is tied to the experience gained in the parallel splitting. Parallelism obtained through the physical separation of data has seen some success, especially at the data element level. Data parallelism on a grander scale requires models that accurately reflect the effects of blocking caused by finite queues. A model for the approximation of the performance of finite queueing networks is developed. This model makes use of the decomposition approach combined with the efficiency of product form solutions.

  3. Parallel Memory Addressing Using Coincident Optical Pulses

    DTIC Science & Technology

    1989-09-15

    case reduces to a at the interface between the electronic memory structure more manageable 21n lines controlling processing units and the optical system...Addressing Donald M. Chiarulli, Rami G. Melhem, and Steven P. Levitan University of Pittsburgh omm on-bus, shared-memory .dcoder can process only a single...encoded multiprocessors are the most k address,thuslimitingmemoryaccess to widely used parallel processing single location. Memory interleaving tech

  4. Multiprocessor programming environment

    SciTech Connect

    Smith, M.B.; Fornaro, R.

    1988-12-01

    Programming tools and techniques have been well developed for traditional uniprocessor computer systems. The focus of this research project is on the development of a programming environment for a high speed real time heterogeneous multiprocessor system, with special emphasis on languages and compilers. The new tools and techniques will allow a smooth transition for programmers with experience only on single processor systems.

  5. Collaboration changes both the content and the structure of memory: Building the architecture of shared representations.

    PubMed

    Congleton, Adam R; Rajaram, Suparna

    2014-08-01

    Memory research has primarily focused on how individuals form and maintain memories across time. However, less is known about how groups of people working together can create and maintain shared memories of the past. Recent studies have focused on understanding the processes behind the formation of such shared memories, but none has investigated the structure of shared memory. This study investigated the circumstances under which collaboration would influence the likelihood that participants come to share both a similar content and a similar organization of the past by aligning their individual representations into a shared rendering. We tested how the frequency and the timing of collaboration affect participants' retrieval organization, and how this in turn influences the formation of shared memory and its persistence over time. Across numerous foundational and novel analyses, we observed that as the size of the collaborative inhibition effect-a counterintuitive finding that collaboration reduces group recall-increased, so did the amount of shared memory and the shared organization of memories. These findings reveal the interconnected relationship between collaborative inhibition, retrieval disruption, shared memory, and shared organization. Together, these relationships have intriguing implications for research across a wide variety of domains, including the formation of collective memory, beliefs and attitudes, parent-child narratives and the development of autobiographical memory, and the emergence of shared representations in educational settings.

  6. Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Caubet, Jordi; Biegel, Bryan A. (Technical Monitor)

    2002-01-01

    In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. This system consists of two major components: The OMPItrace dynamic instrumentation mechanism, which allows the tracing of processes and threads and the Paraver graphical user interface for inspection and analyses of the generated traces. We describe how to use the system to conduct a detailed comparative study of a benchmark code implemented in five different programming paradigms applicable for shared memory

  7. Distributed parallel messaging for multiprocessor systems

    DOEpatents

    Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

    2013-06-04

    A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.

  8. Power-Aware Compiler Controllable Chip Multiprocessor

    NASA Astrophysics Data System (ADS)

    Shikano, Hiroaki; Shirako, Jun; Wada, Yasutaka; Kimura, Keiji; Kasahara, Hironori

    A power-aware compiler controllable chip multiprocessor (CMP) is presented and its performance and power consumption are evaluated with the optimally scheduled advanced multiprocessor (OSCAR) parallelizing compiler. The CMP is equipped with power control registers that change clock frequency and power supply voltage to functional units including processor cores, memories, and an interconnection network. The OSCAR compiler carries out coarse-grain task parallelization of programs and reduces power consumption using architectural power control support and the compiler's power saving scheme. The performance evaluation shows that MPEG-2 encoding on the proposed CMP with four CPUs results in 82.6% power reduction in real-time execution mode with a deadline constraint on its sequential execution time. Furthermore, MP3 encoding on a heterogeneous CMP with four CPUs and four accelerators results in 53.9% power reduction at 21.1-fold speed-up in performance against its sequential execution in the fastest execution mode.

  9. A Multiprocessor Emulation Facility.

    DTIC Science & Technology

    1983-09-01

    make the facility usable. We considered three classes of machines: 1. Commercially available Motorola M68000 -based single board computers. 2. Our own...One of the most interesting multiprocessor systems built to date is the BBN Butterfly machine [17]. It currently consists of 10 M68000 boards connected...by a circuit switched network of butterfly (ie., FFT or shuffle exchange) topology. The machine can be extended to several hundred M68000s because the

  10. Considerations for Multiprocessor Topologies

    NASA Technical Reports Server (NTRS)

    Byrd, Gregory T.; Delagi, Bruce A.

    1987-01-01

    Choosing a multiprocessor interconnection topology may depend on high-level considerations, such as the intended application domain and the expected number of processors. It certainly depends on low-level implementation details, such as packaging and communications protocols. The authors first use rough measures of cost and performance to characterize several topologies. They then examine how implementation details can affect the realizable performance of a topology.

  11. Cache as point of coherence in multiprocessor system

    DOEpatents

    Blumrich, Matthias A.; Ceze, Luis H.; Chen, Dong; Gara, Alan; Heidelberger, Phlip; Ohmacht, Martin; Steinmacher-Burow, Burkhard; Zhuang, Xiaotong

    2016-11-29

    In a multiprocessor system, a conflict checking mechanism is implemented in the L2 cache memory. Different versions of speculative writes are maintained in different ways of the cache. A record of speculative writes is maintained in the cache directory. Conflict checking occurs as part of directory lookup. Speculative versions that do not conflict are aggregated into an aggregated version in a different way of the cache. Speculative memory access requests do not go to main memory.

  12. A Parallel Saturation Algorithm on Shared Memory Architectures

    NASA Technical Reports Server (NTRS)

    Ezekiel, Jonathan; Siminiceanu

    2007-01-01

    Symbolic state-space generators are notoriously hard to parallelize. However, the Saturation algorithm implemented in the SMART verification tool differs from other sequential symbolic state-space generators in that it exploits the locality of ring events in asynchronous system models. This paper explores whether event locality can be utilized to efficiently parallelize Saturation on shared-memory architectures. Conceptually, we propose to parallelize the ring of events within a decision diagram node, which is technically realized via a thread pool. We discuss the challenges involved in our parallel design and conduct experimental studies on its prototypical implementation. On a dual-processor dual core PC, our studies show speed-ups for several example models, e.g., of up to 50% for a Kanban model, when compared to running our algorithm only on a single core.

  13. Ensuring correct rollback recovery in distributed shared memory systems

    NASA Technical Reports Server (NTRS)

    Janssens, Bob; Fuchs, W. Kent

    1995-01-01

    Distributed shared memory (DSM) implemented on a cluster of workstations is an increasingly attractive platform for executing parallel scientific applications. Checkpointing and rollback techniques can be used in such a system to allow the computation to progress in spite of the temporary failure of one or more processing nodes. This paper presents the design of an independent checkpointing method for DSM that takes advantage of DSM's specific properties to reduce error-free and rollback overhead. The scheme reduces the dependencies that need to be considered for correct rollback to those resulting from transfers of pages. Furthermore, in-transit messages can be recovered without the use of logging. We extend the scheme to a DSM implementation using lazy release consistency, where the frequency of dependencies is further reduced.

  14. Reducing Interprocessor Dependence in Recoverable Distributed Shared Memory

    NASA Technical Reports Server (NTRS)

    Janssens, Bob; Fuchs, W. Kent

    1994-01-01

    Checkpointing techniques in parallel systems use dependency tracking and/or message logging to ensure that a system rolls back to a consistent state. Traditional dependency tracking in distributed shared memory (DSM) systems is expensive because of high communication frequency. In this paper we show that, if designed correctly, a DSM system only needs to consider dependencies due to the transfer of blocks of data, resulting in reduced dependency tracking overhead and reduced potential for rollback propagation. We develop an ownership timestamp scheme to tolerate the loss of block state information and develop a passive server model of execution where interactions between processors are considered atomic. With our scheme, dependencies are significantly reduced compared to the traditional message-passing model.

  15. Parallel discrete event simulation: A shared memory approach

    NASA Technical Reports Server (NTRS)

    Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

    1987-01-01

    With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.

  16. Parallel k-means++ for Multiple Shared-Memory Architectures

    SciTech Connect

    Mackey, Patrick S.; Lewis, Robert R.

    2016-09-22

    In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithms that are only approximations of k-means++. In this paper we present a parallelization of the exact k-means++ algorithm, with a proof of its correctness. We develop implementations for three distinct shared-memory architectures: multicore CPU, high performance GPU, and the massively multithreaded Cray XMT platform. We demonstrate the scalability of the algorithm on each platform. In addition we present a visual approach for showing which platform performed k-means++ the fastest for varying data sizes.

  17. Reducing Interprocessor Dependence in Recoverable Distributed Shared Memory

    NASA Technical Reports Server (NTRS)

    Janssens, Bob; Fuchs, W. Kent

    1994-01-01

    Checkpointing techniques in parallel systems use dependency tracking and/or message logging to ensure that a system rolls back to a consistent state. Traditional dependency tracking in distributed shared memory (DSM) systems is expensive because of high communication frequency. In this paper we show that, if designed correctly, a DSM system only needs to consider dependencies due to the transfer of blocks of data, resulting in reduced dependency tracking overhead and reduced potential for rollback propagation. We develop an ownership timestamp scheme to tolerate the loss of block state information and develop a passive server model of execution where interactions between processors are considered atomic. With our scheme, dependencies are significantly reduced compared to the traditional message-passing model.

  18. Translation techniques for distributed-shared memory programming models

    SciTech Connect

    Fuller, Douglas James

    2005-01-01

    The high performance computing community has experienced an explosive improvement in distributed-shared memory hardware. Driven by increasing real-world problem complexity, this explosion has ushered in vast numbers of new systems. Each new system presents new challenges to programmers and application developers. Part of the challenge is adapting to new architectures with new performance characteristics. Different vendors release systems with widely varying architectures that perform differently in different situations. Furthermore, since vendors need only provide a single performance number (total MFLOPS, typically for a single benchmark), they only have strong incentive initially to optimize the API of their choice. Consequently, only a fraction of the available APIs are well optimized on most systems. This causes issues porting and writing maintainable software, let alone issues for programmers burdened with mastering each new API as it is released. Also, programmers wishing to use a certain machine must choose their API based on the underlying hardware instead of the application. This thesis argues that a flexible, extensible translator for distributed-shared memory APIs can help address some of these issues. For example, a translator might take as input code in one API and output an equivalent program in another. Such a translator could provide instant porting for applications to new systems that do not support the application's library or language natively. While open-source APIs are abundant, they do not perform optimally everywhere. A translator would also allow performance testing using a single base code translated to a number of different APIs. Most significantly, this type of translator frees programmers to select the most appropriate API for a given application based on the application (and developer) itself instead of the underlying hardware.

  19. Low latency memory access and synchronization

    DOEpatents

    Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Hoenicke, Dirk; Ohmacht, Martin; Steinmacher-Burow, Burkhard D.; Takken, Todd E. , Vranas; Pavlos M.

    2010-10-19

    A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.

  20. Low latency memory access and synchronization

    DOEpatents

    Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Hoenicke, Dirk; Ohmacht, Martin; Steinmacher-Burow, Burkhard D.; Takken, Todd E.; Vranas, Pavlos M.

    2007-02-06

    A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.

  1. Large-grain pipelining on hypercube multiprocessors

    SciTech Connect

    King, Chung-Ta; Ni, Lionel M.

    1988-01-01

    A new paradigm, called large-grain pipelining, for developing efficient parallel algorithms on distributed-memory multiprocessors, e.g., hypercube machines, is introduced. Large-grain pipelining attempts to maximize the degree of overlapping and minimize the effect of communication overhead in a multiprocessor system through macro-pipelining between the nodes. Algorithms developed through large-grain pipelining to perform matrix multiplication are presented. To model the pipelined computations, an analytic model is introduced, which takes into account both underlying architecture and algorithm behavior. Through the analytical model, important design parameters, such as data partition sizes, can be determined. Experiments were conducted on a 64-node NCUBE multiprocessor. The measured results match closely with the analyzed results, which establishes the analytic model as an integral part of algorithm design. Comparison with an algorithm which does not use large-grain pipelining also shows that large-grain pipelining is an efficient scheme for achieving a greater parallelism. 14 refs., 12 figs.

  2. Back propagation parameter analysis on multiprocessors

    SciTech Connect

    Cerf, G.; Mokry, R.; Weintraub, J.

    1988-09-01

    In order to develop systems of artificial neural networks which can be scaled up to perform practical tasks such as pattern recognition or speech processing, the use of powerful computing tools is essential. Multiprocessors are becoming increasingly popular in the simulation and study of large networks, as the inherent parallelism of many neural architectures and learning algorithms lends itself quite naturally to implementation on concurrent processors. In this study, a multiprocessor system based on the Inmos transputer has been used to examine the stability and convergence rates of the back propagation algorithm as a function of changes in parameters such as activation values, number of hidden units, learning rate, momentum, and initial weight and bias configurations. The Victor V32 is a prototype low-cost message-passing multiprocessor system, designed and implemented by the Victor project in the Microsystems and VLSI group at the IBM T.J. Watson Research Center. A sample topology for the system is 32 nodes in a fixed 4 x 8 mesh. A host processor interfaced to a PC AT and connected to one of the corners of the mesh provides screen and disc I/O. Each of the 32 nodes consists of an INMOS T414 transputer and 4 Megabytes of local memory. Four high-speed (20 Mbits/sec) serial links provide communication among the nodes.

  3. A Comparison of Shared Memory Parallel Programming Models

    SciTech Connect

    Mogill, Jace A; Haglin, David J

    2010-05-24

    The dominant parallel programming models for shared memory computers, Pthreads and OpenMP, are both thread-centric in that they are based on explicit management of tasks and enforce data dependencies and output ordering through task management. By comparison, the Cray XMT programming model is data-centric where the primary concern of the programmer is managing data dependencies, allowing threads to progress in a data flow fashion. The XMT implements this programming model by associating tag bits with each word of memory, affording efficient fine grained data synchronization independent of the number of processors or how tasks are scheduled. When task management is implicit and synchronization is abundant, efficient, and easy to use, programmers have viable alternatives to traditional thread-centric algorithms. In this paper we compare the amount of available parallelism relative to the amount of work in a variety of different algorithms and data structures when synchronization does not need to be rationed, as well as identify opportunities for platform and performance portability of the data-centric programming model on multi-core processors.

  4. Coupled cluster algorithms for networks of shared memory parallel processors

    NASA Astrophysics Data System (ADS)

    Bentz, Jonathan L.; Olson, Ryan M.; Gordon, Mark S.; Schmidt, Michael W.; Kendall, Ricky A.

    2007-05-01

    As the popularity of using SMP systems as the building blocks for high performance supercomputers increases, so too increases the need for applications that can utilize the multiple levels of parallelism available in clusters of SMPs. This paper presents a dual-layer distributed algorithm, using both shared-memory and distributed-memory techniques to parallelize a very important algorithm (often called the "gold standard") used in computational chemistry, the single and double excitation coupled cluster method with perturbative triples, i.e. CCSD(T). The algorithm is presented within the framework of the GAMESS [M.W. Schmidt, K.K. Baldridge, J.A. Boatz, S.T. Elbert, M.S. Gordon, J.J. Jensen, S. Koseki, N. Matsunaga, K.A. Nguyen, S. Su, T.L. Windus, M. Dupuis, J.A. Montgomery, General atomic and molecular electronic structure system, J. Comput. Chem. 14 (1993) 1347-1363]. (General Atomic and Molecular Electronic Structure System) program suite and the Distributed Data Interface [M.W. Schmidt, G.D. Fletcher, B.M. Bode, M.S. Gordon, The distributed data interface in GAMESS, Comput. Phys. Comm. 128 (2000) 190]. (DDI), however, the essential features of the algorithm (data distribution, load-balancing and communication overhead) can be applied to more general computational problems. Timing and performance data for our dual-level algorithm is presented on several large-scale clusters of SMPs.

  5. Matrix factorization on a hypercube multiprocessor

    SciTech Connect

    Geist, G.A.; Heath, M.T.

    1985-08-01

    This paper is concerned with parallel algorithms for matrix factorization on distributed-memory, message-passing multiprocessors, with special emphasis on the hypercube. Both Cholesky factorization of symmetric positive definite matrices and LU factorization of nonsymmetric matrices using partial pivoting are considered. The use of the resulting triangular factors to solve systems of linear equations by forward and back substitutions is also considered. Efficiencies of various parallel computational approaches are compared in terms of empirical results obtained on an Intel iPSC hypercube. 19 refs., 6 figs., 2 tabs.

  6. Is sharing specific autobiographical memories a distinct form of self-disclosure?

    PubMed

    Beike, Denise R; Brandon, Nicole R; Cole, Holly E

    2016-04-01

    Theories of autobiographical memory posit a social function, meaning that recollecting and sharing memories of specific discrete events creates and maintains relationship intimacy. Eight studies with 1,271 participants tested whether sharing specific autobiographical memories in conversations increases feelings of closeness among conversation partners, relative to sharing other self-related information. The first 2 studies revealed that conversations in which specific autobiographical memories were shared were also accompanied by feelings of closeness among conversation partners. The next 5 studies experimentally introduced specific autobiographical memories versus general information about the self into conversations between mostly unacquainted pairs of participants. Discussing specific autobiographical memories led to greater closeness among conversation partners than discussing nonself-related topics, but no greater closeness than discussing other, more general self-related information. In the final study unacquainted pairs in whom feelings of closeness had been experimentally induced through shared humor were more likely to discuss specific autobiographical memories than unacquainted control participant pairs. We conclude that sharing specific autobiographical memories may express more than create relationship closeness, and discuss how relationship closeness may afford sharing of specific autobiographical memories by providing common ground, a social display, or a safety signal. (c) 2016 APA, all rights reserved).

  7. The C.mmp Multiprocessor

    DTIC Science & Technology

    1978-10-27

    minicomputers connected to a large shared memory through a central crosspoint switch . The system was constructed beginning in 1971, and for several years has...i 2.1. The Processor -Memory Switch 4 %_1ŕ 2.2. Memory Mapping and the Relocation Unit 6 2.3. Caches 8 2.4. Processor Extensions 8 2.4.1. Address...The Present C.mmp Configuration 12 3.1. Processors 12 3.2. Memory 15 3.3. Switch and IP Bus 15 3.4. Peripheral Devices 15 3.5. Links to Other

  8. Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

    SciTech Connect

    Jin, Shuangshuang; Chen, Yousu; Wu, Di; Diao, Ruisheng; Huang, Zhenyu

    2015-12-09

    Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Message Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.

  9. Parallelization of the NAS Conjugate Gradient Benchmark Using the Global Arrays Shared Memory Programming Model

    SciTech Connect

    Zhang, Yeliang; Tipparaju, Vinod; Nieplocha, Jarek; Hariri, Salim

    2005-04-08

    The NAS Conjugate Gradient (CG) benchmark is an important scientific kernel used to evaluate machine performance and compare characteristics of different programming models. Global Arrays (GA) toolkit supports a shared memory programming paradigm— even on distributed memory systems— and offers the programmer control over the distribution and locality that are important for optimizing performance on scalable architectures. In this paper, we describe and compare two different parallelization strategies of the CG benchmark using GA and report performance results on a shared-memory system as well as on a cluster. Performance benefits of using shared memory for irregular/sparse computations have been demonstrated before in context of the CG benchmark using OpenMP. Similarly, the GA implementation outperforms the standard MPI implementation on shared memory system, in our case the SGI Altix. However, with GA these benefits are extended to distributed memory systems and demonstrated on a Linux cluster with Myrinet.

  10. Programmable partitioning for high-performance coherence domains in a multiprocessor system

    DOEpatents

    Blumrich, Matthias A [Ridgefield, CT; Salapura, Valentina [Chappaqua, NY

    2011-01-25

    A multiprocessor computing system and a method of logically partitioning a multiprocessor computing system are disclosed. The multiprocessor computing system comprises a multitude of processing units, and a multitude of snoop units. Each of the processing units includes a local cache, and the snoop units are provided for supporting cache coherency in the multiprocessor system. Each of the snoop units is connected to a respective one of the processing units and to all of the other snoop units. The multiprocessor computing system further includes a partitioning system for using the snoop units to partition the multitude of processing units into a plurality of independent, memory-consistent, adjustable-size processing groups. Preferably, when the processor units are partitioned into these processing groups, the partitioning system also configures the snoop units to maintain cache coherency within each of said groups.

  11. Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver

    NASA Astrophysics Data System (ADS)

    Moustafa, Salli; Dutka-Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre

    2014-06-01

    This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46 × 106 spatial cells and 1 × 1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40:74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool.

  12. Sharing specific "We" autobiographical memories in close relationships: the role of contact frequency.

    PubMed

    Beike, Denise R; Cole, Holly E; Merrick, Carmen R

    2017-04-10

    Sharing memories in conversations with close others is posited to be part of the social function of autobiographical memory. The present research focused on the sharing of a particular type of memory: Specific memories about one-time co-experienced events, which we termed Specific We memories. Two studies with 595 total participants examined the factors that lead to and/or are influenced by the sharing of Specific We memories. In Study 1, participants reported on their most recent conversation. Specific We memories were reportedly discussed most often in conversations with others who were close and with whom the participant had frequent communication. In Study 2, participants were randomly assigned either to increase or to simply record the frequency of communication with a close other (parent). Increases in the frequency of reported sharing of Specific We memories as well as closeness to the parent resulted. Mediation analyses of both studies revealed causal relationships among reported sharing of Specific We memories and closeness. We discuss the relevance of these results for understanding the social function of autobiographical memory.

  13. Spaceborne autonomous multiprocessor systems

    NASA Technical Reports Server (NTRS)

    Fernquist, Alan

    1990-01-01

    The goal of this task is to provide technology for the specification and integration of advanced processors into the Space Station Freedom data management system environment through computer performance measurement tools, simulators, and an extended testbed facility. The approach focuses on five categories: (1) user requirements--determine the suitability of existing computer technologies and systems for real-time requirements of NASA missions; (2) system performance analysis--characterize the effects of languages, architectures, and commercially available hardware on real-time benchmarks; (3) system architecture--expand NASA's capability to solve problems with integrated numeric and symbolic requirements using advanced multiprocessor architectures; (4) parallel Ada technology--extend Ada software technology to utilize parallel architectures more efficiently; and (5) testbed--extend in-house testbed to support system performance and system analysis studies.

  14. Method and apparatus for single-stepping coherence events in a multiprocessor system under software control

    DOEpatents

    Blumrich, Matthias A.; Salapura, Valentina

    2010-11-02

    An apparatus and method are disclosed for single-stepping coherence events in a multiprocessor system under software control in order to monitor the behavior of a memory coherence mechanism. Single-stepping coherence events in a multiprocessor system is made possible by adding one or more step registers. By accessing these step registers, one or more coherence requests are processed by the multiprocessor system. The step registers determine if the snoop unit will operate by proceeding in a normal execution mode, or operate in a single-step mode.

  15. DiFX: A Software Correlator for Very Long Baseline Interferometry Using Multiprocessor Computing Environments

    NASA Astrophysics Data System (ADS)

    Deller, A. T.; Tingay, S. J.; Bailes, M.; West, C.

    2007-03-01

    We describe the development of an FX-style correlator for very long baseline interferometry (VLBI), implemented in software and intended to run in multiprocessor computing environments, such as large clusters of commodity machines (Beowulf clusters) or computers specifically designed for high-performance computing, such as multiprocessor shared-memory machines. We outline the scientific and practical benefits for VLBI correlation, these chiefly being due to the inherent flexibility of software and the fact that the highly parallel and scalable nature of the correlation task is well suited to a multiprocessor computing environment. We suggest scientific applications where such an approach to VLBI correlation is most suited and will give the best returns. We report detailed results from the Distributed FX (DiFX) software correlator running on the Swinburne supercomputer (a Beowulf cluster of ~300 commodity processors), including measures of the performance of the system. For example, to correlate all Stokes products for a 10 antenna array with an aggregate bandwidth of 64 MHz per station, and using typical time and frequency resolution, currently requires an order of 100 desktop-class compute nodes. Due to the effect of Moore's law on commodity computing performance, the total number and cost of compute nodes required to meet a given correlation task continues to decrease rapidly with time. We show detailed comparisons between DiFX and two existing hardware-based correlators: the Australian Long Baseline Array S2 correlator and the NRAO Very Long Baseline Array correlator. In both cases, excellent agreement was found between the correlators. Finally, we describe plans for the future operation of DiFX on the Swinburne supercomputer for both astrophysical and geodetic science.

  16. Principles for problem aggregation and assignment in medium scale multiprocessors

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Saltz, Joel H.

    1987-01-01

    One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior.

  17. Implementation of a parallel unstructured Euler solver on shared and distributed memory architectures

    NASA Technical Reports Server (NTRS)

    Mavriplis, D. J.; Das, Raja; Saltz, Joel; Vermeland, R. E.

    1992-01-01

    An efficient three dimensional unstructured Euler solver is parallelized on a Cray Y-MP C90 shared memory computer and on an Intel Touchstone Delta distributed memory computer. This paper relates the experiences gained and describes the software tools and hardware used in this study. Performance comparisons between two differing architectures are made.

  18. Embedded Multiprocessor Technology for VHSIC Insertion

    NASA Technical Reports Server (NTRS)

    Hayes, Paul J.

    1990-01-01

    Viewgraphs on embedded multiprocessor technology for VHSIC insertion are presented. The objective was to develop multiprocessor system technology providing user-selectable fault tolerance, increased throughput, and ease of application representation for concurrent operation. The approach was to develop graph management mapping theory for proper performance, model multiprocessor performance, and demonstrate performance in selected hardware systems.

  19. Sequoia: A fault-tolerant tightly coupled multiprocessor for transaction processing

    SciTech Connect

    Bernstein, P.A.

    1988-02-01

    The Sequoia computer is a tightly coupled multiprocessor, and thus attains the performance advantages of this style of architecture. It avoids most of the fault-tolerance disadvantages of tight coupling by using a new fault-tolerance design. The Sequoia architecture is similar to other multimicroprocessor architectures, such as those of Encore and Sequent, in that it gives dozens of microprocessors shared access to a large main memory. It resembles the Stratus architecture in its extensive use of hardware fault-detection techniques. It resembles Stratus and Auragen in its ability to quickly recover all processes after a single point failure, transparently to the user. However, Sequoia is unique in its combination of a large-scale tightly coupled architecture with a hardware approach to fault tolerance. This article gives an overview of how the hardware architecture and operating systems (OS) work together to provide a high degree of fault tolerance with good system performance.

  20. HyperForest: A high performance multi-processor architecture for real-time intelligent systems

    SciTech Connect

    Garcia, P. Jr.; Rebeil, J.P.; Pollard, H.

    1997-04-01

    Intelligent Systems are characterized by the intensive use of computer power. The computer revolution of the last few years is what has made possible the development of the first generation of Intelligent Systems. Software for second generation Intelligent Systems will be more complex and will require more powerful computing engines in order to meet real-time constraints imposed by new robots, sensors, and applications. A multiprocessor architecture was developed that merges the advantages of message-passing and shared-memory structures: expendability and real-time compliance. The HyperForest architecture will provide an expandable real-time computing platform for computationally intensive Intelligent Systems and open the doors for the application of these systems to more complex tasks in environmental restoration and cleanup projects, flexible manufacturing systems, and DOE`s own production and disassembly activities.

  1. A shared neural ensemble links distinct contextual memories encoded close in time

    NASA Astrophysics Data System (ADS)

    Cai, Denise J.; Aharoni, Daniel; Shuman, Tristan; Shobe, Justin; Biane, Jeremy; Song, Weilin; Wei, Brandon; Veshkini, Michael; La-Vu, Mimi; Lou, Jerry; Flores, Sergio E.; Kim, Isaac; Sano, Yoshitake; Zhou, Miou; Baumgaertel, Karsten; Lavi, Ayal; Kamata, Masakazu; Tuszynski, Mark; Mayford, Mark; Golshani, Peyman; Silva, Alcino J.

    2016-06-01

    Recent studies suggest that a shared neural ensemble may link distinct memories encoded close in time. According to the memory allocation hypothesis, learning triggers a temporary increase in neuronal excitability that biases the representation of a subsequent memory to the neuronal ensemble encoding the first memory, such that recall of one memory increases the likelihood of recalling the other memory. Here we show in mice that the overlap between the hippocampal CA1 ensembles activated by two distinct contexts acquired within a day is higher than when they are separated by a week. Several findings indicate that this overlap of neuronal ensembles links two contextual memories. First, fear paired with one context is transferred to a neutral context when the two contexts are acquired within a day but not across a week. Second, the first memory strengthens the second memory within a day but not across a week. Older mice, known to have lower CA1 excitability, do not show the overlap between ensembles, the transfer of fear between contexts, or the strengthening of the second memory. Finally, in aged mice, increasing cellular excitability and activating a common ensemble of CA1 neurons during two distinct context exposures rescued the deficit in linking memories. Taken together, these findings demonstrate that contextual memories encoded close in time are linked by directing storage into overlapping ensembles. Alteration of these processes by ageing could affect the temporal structure of memories, thus impairing efficient recall of related information.

  2. Shared visual attention and memory systems in the Drosophila brain.

    PubMed

    van Swinderen, Bruno; McCartney, Amber; Kauffman, Sarah; Flores, Kris; Agrawal, Kunal; Wagner, Jenée; Paulk, Angelique

    2009-06-19

    Selective attention and memory seem to be related in human experience. This appears to be the case as well in simple model organisms such as the fly Drosophila melanogaster. Mutations affecting olfactory and visual memory formation in Drosophila, such as in dunce and rutabaga, also affect short-term visual processes relevant to selective attention. In particular, increased optomotor responsiveness appears to be predictive of visual attention defects in these mutants. To further explore the possible overlap between memory and visual attention systems in the fly brain, we screened a panel of 36 olfactory long term memory (LTM) mutants for visual attention-like defects using an optomotor maze paradigm. Three of these mutants yielded high dunce-like optomotor responsiveness. We characterized these three strains by examining their visual distraction in the maze, their visual learning capabilities, and their brain activity responses to visual novelty. We found that one of these mutants, D0067, was almost completely identical to dunce(1) for all measures, while another, D0264, was more like wild type. Exploiting the fact that the LTM mutants are also Gal4 enhancer traps, we explored the sufficiency for the cells subserved by these elements to rescue dunce attention defects and found overlap at the level of the mushroom bodies. Finally, we demonstrate that control of synaptic function in these Gal4 expressing cells specifically modulates a 20-30 Hz local field potential associated with attention-like effects in the fly brain. Our study uncovers genetic and neuroanatomical systems in the fly brain affecting both visual attention and odor memory phenotypes. A common component to these systems appears to be the mushroom bodies, brain structures which have been traditionally associated with odor learning but which we propose might be also involved in generating oscillatory brain activity required for attention-like processes in the fly brain.

  3. Interference due to shared features between action plans is influenced by working memory span.

    PubMed

    Fournier, Lisa R; Behmer, Lawrence P; Stubblefield, Alexandra M

    2014-12-01

    In this study, we examined the interactions between the action plans that we hold in memory and the actions that we carry out, asking whether the interference due to shared features between action plans is due to selection demands imposed on working memory. Individuals with low and high working memory spans learned arbitrary motor actions in response to two different visual events (A and B), presented in a serial order. They planned a response to the first event (A) and while maintaining this action plan in memory they then executed a speeded response to the second event (B). Afterward, they executed the action plan for the first event (A) maintained in memory. Speeded responses to the second event (B) were delayed when it shared an action feature (feature overlap) with the first event (A), relative to when it did not (no feature overlap). The size of the feature-overlap delay was greater for low-span than for high-span participants. This indicates that interference due to overlapping action plans is greater when fewer working memory resources are available, suggesting that this interference is due to selection demands imposed on working memory. Thus, working memory plays an important role in managing current and upcoming action plans, at least for newly learned tasks. Also, managing multiple action plans is compromised in individuals who have low versus high working memory spans.

  4. Shared mushroom body circuits underlie visual and olfactory memories in Drosophila.

    PubMed

    Vogt, Katrin; Schnaitmann, Christopher; Dylla, Kristina V; Knapek, Stephan; Aso, Yoshinori; Rubin, Gerald M; Tanimoto, Hiromu

    2014-08-19

    In nature, animals form memories associating reward or punishment with stimuli from different sensory modalities, such as smells and colors. It is unclear, however, how distinct sensory memories are processed in the brain. We established appetitive and aversive visual learning assays for Drosophila that are comparable to the widely used olfactory learning assays. These assays share critical features, such as reinforcing stimuli (sugar reward and electric shock punishment), and allow direct comparison of the cellular requirements for visual and olfactory memories. We found that the same subsets of dopamine neurons drive formation of both sensory memories. Furthermore, distinct yet partially overlapping subsets of mushroom body intrinsic neurons are required for visual and olfactory memories. Thus, our results suggest that distinct sensory memories are processed in a common brain center. Such centralization of related brain functions is an economical design that avoids the repetition of similar circuit motifs. Copyright © 2014, Vogt et al.

  5. A system for simulating shared memory in heterogeneous distributed-memory networks with specialization for robotics applications

    SciTech Connect

    Jones, J.P.; Bangs, A.L.; Butler, P.L.

    1991-01-01

    Hetero Helix is a programming environment which simulates shared memory on a heterogeneous network of distributed-memory computers. The machines in the network may vary with respect to their native operating systems and internal representation of numbers. Hetero Helix presents a simple programming model to developers, and also considers the needs of designers, system integrators, and maintainers. The key software technology underlying Hetero Helix is the use of a compiler'' which analyzes the data structures in shared memory and automatically generates code which translates data representations from the format native to each machine into a common format, and vice versa. The design of Hetero Helix was motivated in particular by the requirements of robotics applications. Hetero Helix has been used successfully in an integration effort involving 27 CPUs in a heterogeneous network and a body of software totaling roughly 100,00 lines of code. 25 refs., 6 figs.

  6. A Preliminary Evaluation of Cache-Miss-Initiated Prefetching Techniques in Scalable Multiprocessors

    DTIC Science & Technology

    1994-05-01

    additional tradeoff between lower miss rates and the potential for network and memory contention. In this paper we use execution-driven simulation of...prefetching is the only strategy that can offer significant performance improvements for scalable multiprocessors. The remainder of this paper is...with that node. Throughout this paper we refer to the ensemble of addressable local memory and directory memory at each node as a "memory module

  7. Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

    2000-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.

  8. A Byzantine resilient processor with an encoded fault-tolerant shared memory

    NASA Technical Reports Server (NTRS)

    Butler, Bryan; Harper, Richard

    1990-01-01

    The memory requirements for ultra-reliable computers are expected to increase due to future increases in mission functionality and operating-system requirements. This increase will have a negative effect on the reliability and cost of the system. Increased memory size will also reduce the ability to reintegrate a channel after a transient fault, since the time required to reintegrate a channel in a conventional fault-tolerant processor is dominated by memory realignment time. A Byzantine Resilient Fault-Tolerant Processor with Fault-Tolerant Shared Memory (FTP/FTSM) is presented as a solution to these problems. The FTSM uses an encoded memory system, which reduces the memory requirement by one-half compared to a conventional quad-FTP design. This increases the reliability and decreases the cost of the system. The realignment problem is also addressed by the FTSM. Because any single error is corrected upon a read from the FTSM, a faulty channel's corrupted memory does not need realignment before reintegration of the faulty channel. A combination of correct-on-access and background scrubbing is proposed to prevent the accumulation of transient errors in the memory. With a hardware-implemented scrubber, the scrubbing cycle time, and therefore the memory fault latency, can be upper-bounded at a small value. This technique increases the reliability of the memory system and facilitates validation of its reliability model.

  9. A Byzantine resilient processor with an encoded fault-tolerant shared memory

    NASA Technical Reports Server (NTRS)

    Butler, Bryan; Harper, Richard

    1990-01-01

    The memory requirements for ultra-reliable computers are expected to increase due to future increases in mission functionality and operating-system requirements. This increase will have a negative effect on the reliability and cost of the system. Increased memory size will also reduce the ability to reintegrate a channel after a transient fault, since the time required to reintegrate a channel in a conventional fault-tolerant processor is dominated by memory realignment time. A Byzantine Resilient Fault-Tolerant Processor with Fault-Tolerant Shared Memory (FTP/FTSM) is presented as a solution to these problems. The FTSM uses an encoded memory system, which reduces the memory requirement by one-half compared to a conventional quad-FTP design. This increases the reliability and decreases the cost of the system. The realignment problem is also addressed by the FTSM. Because any single error is corrected upon a read from the FTSM, a faulty channel's corrupted memory does not need realignment before reintegration of the faulty channel. A combination of correct-on-access and background scrubbing is proposed to prevent the accumulation of transient errors in the memory. With a hardware-implemented scrubber, the scrubbing cycle time, and therefore the memory fault latency, can be upper-bounded at a small value. This technique increases the reliability of the memory system and facilitates validation of its reliability model.

  10. Socially shared mourning: construction and consumption of collective memory

    NASA Astrophysics Data System (ADS)

    Harju, Anu

    2015-04-01

    Social media, such as YouTube, is increasingly a site of collective remembering where personal tributes to celebrity figures become sites of public mourning. YouTube, especially, is rife with celebrity commemorations. Examining fans' online mourning practices on YouTube, this paper examines video tributes dedicated to the late Steve Jobs, with a focus on collective remembering and collective construction of memory. Combining netnography with critical discourse analysis, the analysis focuses on the user comments where the past unfolds in interaction and meanings are negotiated and contested. The paper argues that celebrity death may, for avid fans, be a source of disenfranchised grief, a type of grief characterised by inadequate social support, usually arising from lack of empathy for the loss. The paper sheds light on the functions digital memorials have for mourning fans (and fandom) and argues that social media sites have come to function as spaces of negotiation, legitimisation and alleviation of disenfranchised grief. It is also suggested that when it comes to disenfranchised grief, and grief work generally, the concept of community be widened to include communities of weak ties, a typical form of communal belonging on social media.

  11. High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

    NASA Technical Reports Server (NTRS)

    Simon, Horst D.; Saini, Subhash; Grassi, Charles

    1994-01-01

    The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.

  12. High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

    NASA Technical Reports Server (NTRS)

    Simon, Horst D.; Saini, Subhash; Grassi, Charles

    1994-01-01

    The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.

  13. A new shared-memory programming paradigm for molecular dynamics simulations on the Intel Paragon

    SciTech Connect

    D`Azevedo, E.F.; Romine, C.H.

    1994-12-01

    This report describes the use of shared memory emulation with DOLIB (Distributed Object Library) to simplify parallel programming on the Intel Paragon. A molecular dynamics application is used as an example to illustrate the use of the DOLIB shared memory library. SOTON-PAR, a parallel molecular dynamics code with explicit message-passing using a Lennard-Jones 6-12 potential, is rewritten using DOLIB primitives. The resulting code has no explicit message primitives and resembles a serial code. The new code can perform dynamic load balancing and achieves better performance than the original parallel code with explicit message-passing.

  14. Forgetting our personal past: socially shared retrieval-induced forgetting of autobiographical memories.

    PubMed

    Stone, Charles B; Barnier, Amanda J; Sutton, John; Hirst, William

    2013-11-01

    People often talk to others about their personal past. These discussions are inherently selective. Selective retrieval of memories in the course of a conversation may induce forgetting of unmentioned but related memories for both speakers and listeners (Cuc, Koppel, & Hirst, 2007). Cuc et al. (2007) defined the forgetting on the part of the speaker as within-individual retrieval-induced forgetting (WI-RIF) and the forgetting on the part of the listener as socially shared retrieval-induced forgetting (SS-RIF). However, if the forgetting associated with WI-RIF and SS-RIF is to be taken seriously as a mechanism that shapes both individual and shared memories, this mechanism must be demonstrated with meaningful material and in ecologically valid groups. In our first 2 experiments we extended SS-RIF from unemotional, experimenter-contrived material to the emotional and unemotional autobiographical memories of strangers (Experiment 1) and intimate couples (Experiment 2) when merely overhearing the speaker selectively practice memories. We then extended these results to the context of a free-flowing conversation (Experiments 3 and 4). In all 4 experiments we found WI-RIF and SS-RIF regardless of the emotional valence or individual ownership of the memories. We discuss our findings in terms of the role of conversational silence in shaping both our personal and shared pasts.

  15. A shared neural ensemble links distinct contextual memories encoded close in time

    PubMed Central

    Cai, Denise J.; Aharoni, Daniel; Shuman, Tristan; Shobe, Justin; Biane, Jeremy; Song, Weilin; Wei, Brandon; Veshkini, Michael; La-Vu, Mimi; Lou, Jerry; Flores, Sergio; Kim, Isaac; Sano, Yoshitake; Zhou, Miou; Baumgaertel, Karsten; Lavi, Ayal; Kamata, Masakazu; Tuszynski, Mark; Mayford, Mark; Golshani, Peyman; Silva, Alcino J.

    2016-01-01

    Recent studies suggest the hypothesis that a shared neural ensemble may link distinct memories encoded close in time1–13. According to the memory allocation hypothesis1,2, learning triggers a temporary increase in neuronal excitability14–16 that biases the representation of a subsequent memory to the neuronal ensemble encoding the first memory, such that recall of one memory increases the likelihood of recalling the other memory. Accordingly, we report that the overlap between the hippocampal CA1 ensembles activated by two distinct contexts acquired within a day is higher than when they are separated by a week. Multiple convergent findings indicate that this overlap of neuronal ensembles links two contextual memories. First, fear paired with one context is transferred to a neutral context when the two are acquired within a day but not across a week. Second, the first memory strengthens the second memory within a day but not across a week. Older mice, known to have lower CA1 excitability16,17, do not show the overlap between ensembles, the transfer of fear between contexts, or the strengthening of the second memory. Finally, in aged animals, increasing cellular excitability and activating a common ensemble of CA1 neurons during two distinct context exposures rescued the deficit in linking memories. Taken together, these findings demonstrate that contextual memories encoded close in time are linked by directing storage into overlapping ensembles. Alteration of these processes by aging could affect the temporal structure of memories, thus impairing efficient recall of related information. PMID:27251287

  16. Using memory in the Cedar system

    SciTech Connect

    McGrath, R.E.; Emrath, P.

    1987-01-01

    The design of the virtual memory system for the Cedar multiprocessor under construction at the University of Illinois is discussed. The Cedar architecture features a hierarchy of memory, some shared by all processors, and some shared by subsets of processors. The Xylem operating system is based on Alliant Computer Systems CONCENTRIX(TM) operating system, which is based on 4.2BSD UNIX(TM). Xylem supports multi-tasking and demand paging of parts of the memory hierarchy into a linear virtual address space. Memory may be private to a task or shared between all the tasks. The locality and attributes of a page may be modified during the execution of a program. Examples of how these mechanisms can be used are discussed. 14 figs.

  17. A Multiprocessor Operating System Simulator

    NASA Technical Reports Server (NTRS)

    Johnston, Gary M.; Campbell, Roy H.

    1988-01-01

    This paper describes a multiprocessor operating system simulator that was developed by the authors in the Fall semester of 1987. The simulator was built in response to the need to provide students with an environment in which to build and test operating system concepts as part of the coursework of a third-year undergraduate operating systems course. Written in C++, the simulator uses the co-routine style task package that is distributed with the AT&T C++ Translator to provide a hierarchy of classes that represents a broad range of operating system software and hardware components. The class hierarchy closely follows that of the 'Choices' family of operating systems for loosely- and tightly-coupled multiprocessors. During an operating system course, these classes are refined and specialized by students in homework assignments to facilitate experimentation with different aspects of operating system design and policy decisions. The current implementation runs on the IBM RT PC under 4.3bsd UNIX.

  18. A multiprocessor operating system simulator

    SciTech Connect

    Johnston, G.M.; Campbell, R.H. . Dept. of Computer Science)

    1988-01-01

    This paper describes a multiprocessor operating system simulator that was developed by the authors in the Fall of 1987. The simulator was built in response to the need to provide students with an environment in which to build and test operating system concepts as part of the coursework of a third-year undergraduate operating systems course. Written in C++, the simulator uses the co-routine style task package that is distributed with the AT and T C++ Translator to provide a hierarchy of classes that represents a broad range of operating system software and hardware components. The class hierarchy closely follows that of the Choices family of operating systems for loosely and tightly coupled multiprocessors. During an operating system course, these classes are refined and specialized by students in homework assignments to facilitate experimentation with different aspects of operating system design and policy decisions. The current implementation runs on the IBM RT PC under 4.3bsd UNIX.

  19. Reproducibility in a multiprocessor system

    DOEpatents

    Bellofatto, Ralph A; Chen, Dong; Coteus, Paul W; Eisley, Noel A; Gara, Alan; Gooding, Thomas M; Haring, Rudolf A; Heidelberger, Philip; Kopcsay, Gerard V; Liebsch, Thomas A; Ohmacht, Martin; Reed, Don D; Senger, Robert M; Steinmacher-Burow, Burkhard; Sugawara, Yutaka

    2013-11-26

    Fixing a problem is usually greatly aided if the problem is reproducible. To ensure reproducibility of a multiprocessor system, the following aspects are proposed; a deterministic system start state, a single system clock, phase alignment of clocks in the system, system-wide synchronization events, reproducible execution of system components, deterministic chip interfaces, zero-impact communication with the system, precise stop of the system and a scan of the system state.

  20. Data traffic reduction schemes for Cholesky factorization on asynchronous multiprocessor systems

    NASA Technical Reports Server (NTRS)

    Naik, Vijay K.; Patrick, Merrell L.

    1989-01-01

    Communication requirements of Cholesky factorization of dense and sparse symmetric, positive definite matrices are analyzed. The communication requirement is characterized by the data traffic generated on multiprocessor systems with local and shared memory. Lower bound proofs are given to show that when the load is uniformly distributed the data traffic associated with factoring an n x n dense matrix using n to the alpha power (alpha less than or equal 2) processors is omega(n to the 2 + alpha/2 power). For n x n sparse matrices representing a square root of n x square root of n regular grid graph the data traffic is shown to be omega(n to the 1 + alpha/2 power), alpha less than or equal 1. Partitioning schemes that are variations of block assignment scheme are described and it is shown that the data traffic generated by these schemes are asymptotically optimal. The schemes allow efficient use of up to O(n to the 2nd power) processors in the dense case and up to O(n) processors in the sparse case before the total data traffic reaches the maximum value of O(n to the 3rd power) and O(n to the 3/2 power), respectively. It is shown that the block based partitioning schemes allow a better utilization of the data accessed from shared memory and thus reduce the data traffic than those based on column-wise wrap around assignment schemes.

  1. Shared Representations in Language Processing and Verbal Short-Term Memory: The Case of Grammatical Gender

    ERIC Educational Resources Information Center

    Schweppe, Judith; Rummer, Ralf

    2007-01-01

    The general idea of language-based accounts of short-term memory is that retention of linguistic materials is based on representations within the language processing system. In the present sentence recall study, we address the question whether the assumption of shared representations holds for morphosyntactic information (here: grammatical gender…

  2. LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System

    SciTech Connect

    Kurzak, Jakub; Luszczek, Pitior; Faverge, Mathieu; Dongarra, Jack

    2012-03-01

    LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance LINPACK benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.

  3. Functions of Memory Sharing and Mother-Child Reminiscing Behaviors: Individual and Cultural Variations

    ERIC Educational Resources Information Center

    Kulkofsky, Sarah; Wang, Qi; Koh, Jessie Bee Kim

    2009-01-01

    This study examined maternal beliefs about the functions of memory sharing and the relations between these beliefs and mother-child reminiscing behaviors in a cross-cultural context. Sixty-three European American and 47 Chinese mothers completed an open-ended questionnaire concerning their beliefs about the functions of parent-child memory…

  4. Visual and Spatial Working Memory Are Not that Dissociated after All: A Time-Based Resource-Sharing Account

    ERIC Educational Resources Information Center

    Vergauwe, Evie; Barrouillet, Pierre; Camos, Valerie

    2009-01-01

    Examinations of interference between visual and spatial materials in working memory have suggested domain- and process-based fractionations of visuo-spatial working memory. The present study examined the role of central time-based resource sharing in visuo-spatial working memory and assessed its role in obtained interference patterns. Visual and…

  5. Parallel calculations on shared memory, NUMA-based computers using MATLAB

    NASA Astrophysics Data System (ADS)

    Krotkiewski, Marcin; Dabrowski, Marcin

    2014-05-01

    Achieving satisfactory computational performance in numerical simulations on modern computer architectures can be a complex task. Multi-core design makes it necessary to parallelize the code. Efficient parallelization on NUMA (Non-Uniform Memory Access) shared memory architectures necessitates explicit placement of the data in the memory close to the CPU that uses it. In addition, using more than 8 CPUs (~100 cores) requires a cluster solution of interconnected nodes, which involves (expensive) communication between the processors. It takes significant effort to overcome these challenges even when programming in low-level languages, which give the programmer full control over data placement and work distribution. Instead, many modelers use high-level tools such as MATLAB, which severely limit the optimization/tuning options available. Nonetheless, the advantage of programming simplicity and a large available code base can tip the scale in favor of MATLAB. We investigate whether MATLAB can be used for efficient, parallel computations on modern shared memory architectures. A common approach to performance optimization of MATLAB programs is to identify a bottleneck and migrate the corresponding code block to a MEX file implemented in, e.g. C. Instead, we aim at achieving a scalable parallel performance of MATLABs core functionality. Some of the MATLABs internal functions (e.g., bsxfun, sort, BLAS3, operations on vectors) are multi-threaded. Achieving high parallel efficiency of those may potentially improve the performance of significant portion of MATLABs code base. Since we do not have MATLABs source code, our performance tuning relies on the tools provided by the operating system alone. Most importantly, we use custom memory allocation routines, thread to CPU binding, and memory page migration. The performance tests are carried out on multi-socket shared memory systems (2- and 4-way Intel-based computers), as well as a Distributed Shared Memory machine with 96 CPU

  6. Multiprocessor computer overset grid method and apparatus

    DOEpatents

    Barnette, Daniel W.; Ober, Curtis C.

    2003-01-01

    A multiprocessor computer overset grid method and apparatus comprises associating points in each overset grid with processors and using mapped interpolation transformations to communicate intermediate values between processors assigned base and target points of the interpolation transformations. The method allows a multiprocessor computer to operate with effective load balance on overset grid applications.

  7. Vascular system modeling in parallel environment - distributed and shared memory approaches.

    PubMed

    Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne

    2011-07-01

    This paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages, and therefore, this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multicore machines, show that both algorithms provide a significant speedup.

  8. Coscheduling Technique for Symmetric Multiprocessor Clusters

    SciTech Connect

    Yoo, A B; Jette, M A

    2000-09-18

    Coscheduling is essential for obtaining good performance in a time-shared symmetric multiprocessor (SMP) cluster environment. However, the most common technique, gang scheduling, has limitations such as poor scalability and vulnerability to faults mainly due to explicit synchronization between its components. A decentralized approach called dynamic coscheduling (DCS) has been shown to be effective for network of workstations (NOW), but this technique is not suitable for the workloads on a very large SMP-cluster with thousands of processors. Furthermore, its implementation can be prohibitively expensive for such a large-scale machine. IN this paper, they propose a novel coscheduling technique based on the DCS approach which can achieve coscheduling on very large SMP-clusters in a scalable, efficient, and cost-effective way. In the proposed technique, each local scheduler achieves coscheduling based upon message traffic between the components of parallel jobs. Message trapping is carried out at the user-level, eliminating the need for unsupported hardware or device-level programming. A sending process attaches its status to outgoing messages so local schedulers on remote nodes can make more intelligent scheduling decisions. Once scheduled, processes are guaranteed some minimum period of time to execute. This provides an opportunity to synchronize the parallel job's components across all nodes and achieve good program performance. The results from a performance study reveal that the proposed technique is a promising approach that can reduce response time significantly over uncoordinated time-sharing and batch scheduling.

  9. Shared mushroom body circuits underlie visual and olfactory memories in Drosophila

    PubMed Central

    Vogt, Katrin; Schnaitmann, Christopher; Dylla, Kristina V; Knapek, Stephan; Aso, Yoshinori; Rubin, Gerald M; Tanimoto, Hiromu

    2014-01-01

    In nature, animals form memories associating reward or punishment with stimuli from different sensory modalities, such as smells and colors. It is unclear, however, how distinct sensory memories are processed in the brain. We established appetitive and aversive visual learning assays for Drosophila that are comparable to the widely used olfactory learning assays. These assays share critical features, such as reinforcing stimuli (sugar reward and electric shock punishment), and allow direct comparison of the cellular requirements for visual and olfactory memories. We found that the same subsets of dopamine neurons drive formation of both sensory memories. Furthermore, distinct yet partially overlapping subsets of mushroom body intrinsic neurons are required for visual and olfactory memories. Thus, our results suggest that distinct sensory memories are processed in a common brain center. Such centralization of related brain functions is an economical design that avoids the repetition of similar circuit motifs. DOI: http://dx.doi.org/10.7554/eLife.02395.001 PMID:25139953

  10. Parallel Reduction of Large Radar Interferometry Scenes on a Mid-scale, Symmetric Multiprocessor Mainframe Computer

    NASA Astrophysics Data System (ADS)

    Harcke, L. J.; Zebker, H. A.

    2006-12-01

    We report on experiences in processing repeat-orbit interferometry data sets on a mid-scale multiprocessor mainframe computer. Newer applications of interferometric and polarimetric data processing, such as permanent scatterer deformation monitoring, require the generation of many tens of repeat-pass interferometry data pairs, perhaps 30 to 50, to provide sufficient input to the deformation model. Moving existing radar processing techniques toward massively parallel computation provides a path to coping with such large data sets, which can consist of 30 to 50 gigabytes (GB) of raw data. In June 2006, the Stanford School of Earth Sciences dedicated a new computation center for general research use. Two large machines compose the center: a single-node, symmetric multiprocessor (SMP) machine with 48 processor cores and a single 192~GB memory, and a 64 node distributed cluster containing 128 processor cores with at least 2~GB of memory per node. Distributed processing of the matched filter for synthetic aperture radar image formation requires a high communication-to-computation ratio. Experiments performed over a decade ago on distributed memory supercomputers, and repeated a half-decade ago on commodity workstation clusters, both demonstrated saturation of inter-node communication links. For this reason, we chose to parallelize the interferometric processor on the shared memory computer using the OpenMP programming standard. We find, not unexpectedly, that the input/output stage of processing standard 100-by-100~kilometer ERS-1 scenes quickly dominates the total computation time, and that only modest increases in processing time are achieved after 8 to 16 processor cores are brought to bear on a single data set. The input and output data sit in single, serially accessed disk files, creating a bottleneck for overall throughput. This points to a scheme for efficient partitioning of mid-size (24 to 48~core) machines for reducing large Earth science data sets, where 3 to

  11. Simulating an aerospace multiprocessor. [for space guidance computers

    NASA Technical Reports Server (NTRS)

    Mallach, E. G.

    1976-01-01

    The paper describes a simulator which was used to evaluate the architecture of an aerospace multiprocessor. The simulator models interactions among the processors, memories, the central data bus, and a possible 'job stack'. Special features of the simulator are discussed, including the use of explicitly coded and individually distinguishable 'job models' instead of a statistically defined 'job mix' and a specialized Job Model Definition Language to automate the detailed coding of the models. Some results are presented which show that when the simulator was employed in conjunction with queuing theory and Markov-process analysis, more insight into system behavior was obtained than would have been with any one technique alone.

  12. Multiprocessor system with multiple concurrent modes of execution

    DOEpatents

    Ahn, Daniel; Ceze, Luis H.; Chen, Dong Chen; Gara, Alan; Heidelberger, Philip; Ohmacht, Martin

    2016-11-22

    A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.

  13. Simulating an aerospace multiprocessor. [for space guidance computers

    NASA Technical Reports Server (NTRS)

    Mallach, E. G.

    1976-01-01

    The paper describes a simulator which was used to evaluate the architecture of an aerospace multiprocessor. The simulator models interactions among the processors, memories, the central data bus, and a possible 'job stack'. Special features of the simulator are discussed, including the use of explicitly coded and individually distinguishable 'job models' instead of a statistically defined 'job mix' and a specialized Job Model Definition Language to automate the detailed coding of the models. Some results are presented which show that when the simulator was employed in conjunction with queuing theory and Markov-process analysis, more insight into system behavior was obtained than would have been with any one technique alone.

  14. Multiprocessor system with multiple concurrent modes of execution

    DOEpatents

    Ahn, Daniel; Ceze, Luis H; Chen, Dong; Gara, Alan; Heidelberger, Philip; Ohmacht, Martin

    2013-12-31

    A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.

  15. IMPACC: A Tightly Integrated MPI+OpenACC Framework Exploiting Shared Memory Parallelism

    SciTech Connect

    Lee, Seyong; Vetter, Jeffrey S

    2016-01-01

    We propose IMPACC, an MPI+OpenACC framework for heterogeneous accelerator clusters. IMPACC tightly integrates MPI and OpenACC, while exploiting the shared memory parallelism in the target system. IMPACC dynamically adapts the input MPI+OpenACC applications on the target heterogeneous accelerator clusters to fully exploit target system-specific features. IMPACC provides the programmers with the unified virtual address space, automatic NUMA-friendly task-device mapping, efficient integrated communication routines, seamless streamlining of asynchronous executions, and transparent memory sharing. We have implemented IMPACC and evaluated its performance using three heterogeneous accelerator systems, including Titan supercomputer. Results show that IMPACC can achieve easier programming, higher performance, and better scalability than the current MPI+OpenACC model.

  16. Fault tolerant onboard packet switch architecture for communication satellites: Shared memory per beam approach

    NASA Technical Reports Server (NTRS)

    Shalkhauser, Mary JO; Quintana, Jorge A.; Soni, Nitin J.

    1994-01-01

    The NASA Lewis Research Center is developing a multichannel communication signal processing satellite (MCSPS) system which will provide low data rate, direct to user, commercial communications services. The focus of current space segment developments is a flexible, high-throughput, fault tolerant onboard information switching processor. This information switching processor (ISP) is a destination-directed packet switch which performs both space and time switching to route user information among numerous user ground terminals. Through both industry study contracts and in-house investigations, several packet switching architectures were examined. A contention-free approach, the shared memory per beam architecture, was selected for implementation. The shared memory per beam architecture, fault tolerance insertion, implementation, and demonstration plans are described.

  17. Automatic Data Partitioning on Distributed Memory Multiprocessors

    DTIC Science & Technology

    1990-10-01

    to the user. In this paper , we present a novel approach to the problem of automatic data partitioning. We introduce the notion of constraints on data...partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper , we present a novel...tedious. In this paper , we propose a strategy which would instead allow a parallelizing compiler I to come up with a suitable data distribution pattern

  18. Multiprocessor Performance Debugging and Memory Bottlenecks

    DTIC Science & Technology

    1992-05-01

    30 e * o 4.- eqn xli spi dwl cho dod mp3 tpp tam pth esp min Loc oce mdg dna mat Benchmark Name CHAPTER 3. REDUCED COST BASIC BLOCK COLUNTING 46 FOR...spi dwf cho dod mp3 tpp tom pth esp min Loc Dce mdg dna mat Benchmark Name CHAEPTER 3. REDUCED COST BASIC BLOCK COUNTING 53 3.2 Compensating for...2.56 - Par. Overhead: 0.62 0.03 0.06 SEx 3 ( Histograr, 3 ( CGRAPH 3 Sttutogram (Fits vuO.16) - QC Est. Cft (sr. kids’ i • Eat. CPU 260. Figure 29

  19. Immune and nervous systems share molecular and functional similarities: memory storage mechanism.

    PubMed

    Habibi, L; Ebtekar, M; Jameie, S B

    2009-04-01

    One of the most complex and important features of both the nervous and immune systems is their data storage and retrieval capability. Both systems encounter a common and complex challenge on how to overcome the cumbersome task of data management. Because each neuron makes many synapses with other neurons, they are capable of receiving data from thousands of synaptic connections. The immune system B and T cells have to deal with a similar level of complexity because of their unlimited task of recognizing foreign antigens. As for the complexity of memory storage, it has been proposed that both systems may share a common set of molecular mechanisms. Here, we review the molecular bases of memory storage in neurons and immune cells based on recent studies and findings. The expression of certain molecules and mechanisms shared between the two systems, including cytokine networks, and cell surface receptors, are reviewed. Intracellular signaling similarities and certain mechanisms such as diversity, memory storage, and their related molecular properties are briefly discussed. Moreover, two similar genetic mechanisms used by both systems is discussed, putting forward the idea that DNA recombination may be an underlying mechanism involved in CNS memory storage.

  20. Exploring the use of I/O nodes for computation in a MIMD multiprocessor

    NASA Technical Reports Server (NTRS)

    Kotz, David; Cai, Ting

    1995-01-01

    As parallel systems move into the production scientific-computing world, the emphasis will be on cost-effective solutions that provide high throughput for a mix of applications. Cost effective solutions demand that a system make effective use of all of its resources. Many MIMD multiprocessors today, however, distinguish between 'compute' and 'I/O' nodes, the latter having attached disks and being dedicated to running the file-system server. This static division of responsibilities simplifies system management but does not necessarily lead to the best performance in workloads that need a different balance of computation and I/O. Of course, computational processes sharing a node with a file-system service may receive less CPU time, network bandwidth, and memory bandwidth than they would on a computation-only node. In this paper we begin to examine this issue experimentally. We found that high performance I/O does not necessarily require substantial CPU time, leaving plenty of time for application computation. There were some complex file-system requests, however, which left little CPU time available to the application. (The impact on network and memory bandwidth still needs to be determined.) For applications (or users) that cannot tolerate an occasional interruption, we recommend that they continue to use only compute nodes. For tolerant applications needing more cycles than those provided by the compute nodes, we recommend that they take full advantage of both compute and I/O nodes for computation, and that operating systems should make this possible.

  1. Parallel optical interconnects may reduce the communication bottleneck in symmetric multiprocessors.

    PubMed

    Collet, J H; Hlayhel, W; Litaize, D

    2001-07-10

    We start with a detailed analysis of the communication issues in today's symmetric multiprocessor (SMP) architectures to study the benefits of implementing optical interconnects (OI) in these machines. We show that the transmission of block addresses is the most critical communication bottleneck of future large SMPs owing to the need to preserve the coherence of data duplicated in caches. An address transmission bandwidth as high as 200-300 Gb/s may be necessary in ten years from now; this requirement will represent a difficult challenge for shared electric buses. In this context we suggest the introduction of simple point-to-point OIs for a SMP cache-coherent switch, i.e., for a VLSI switch that would emulate the shared-bus function. The operation might require as much as 10,000 input-outputs (IOs) to connect 100 processors, particularly if one maintains the present parallelism of transmissions to preserve a large bandwidth and a short memory access latency. The interest for OIs comes from the potential increase of the transmission frequency and from the possible integration of such a high density of IOs on top of electronic chips to overcome packaging issues. Then we consider the implementation of an optical bus that is a multipoint optical line involving more optical technology. This solution allows multiple simultaneous accesses to the bus, but the preservation of the coherence of caches can no longer be maintained with the usual fast snooping protocols.

  2. Global arrays: A portable {open_quotes}shared-memory{close_quotes} programming model for distributed memory computers

    SciTech Connect

    Harrison, R.J.; Nieplocha, J.; Littlefield, R.J.

    1994-11-01

    Portability, efficiency, and ease of coding are all important considerations in choosing the programming model for a scalable parallel application. The message-passing programming model is widely used because of its portability, yet some applications are too complex to code in it while also trying to maintain a balanced computation load and avoid redundant computations. The shared-memory programming model simplifies coding, but it is not portable and often provides little control over interprocessor data transfer costs. This paper describes a new approach, called Global Arrays (GA), that combines the better features of both other models, leading to both simple coding and efficient execution. The key concept of GA is that it provides a portable interface through which each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed matrices, with no need for explicit cooperation by other processes. The authors have implemented GA libraries on a variety of computer systems, including the Intel DELTA and Paragon, the IBM SP-1 (all message-passers), the Kendall Square KSR-2 (a nonuniform access shared-memory machine), and networks of Unix workstations. They discuss the design and implementation of these libraries, report their performance, illustrate the use of GA in the context of computational chemistry applications, and describe the use of a GA performance visualization tool.

  3. ATAMM enhancement and multiprocessor performance evaluation

    NASA Technical Reports Server (NTRS)

    Stoughton, John W.; Mielke, Roland R.; Som, Sukhamoy; Obando, Rodrigo; Malekpour, Mahyar R.; Jones, Robert L., III; Mandala, Brij Mohan V.

    1991-01-01

    ATAMM (Algorithm To Architecture Mapping Model) enhancement and multiprocessor performance evaluation is discussed. The following topics are included: the ATAMM model; ATAMM enhancement; ADM (Advanced Development Model) implementation of ATAMM; and ATAMM support tools.

  4. Multiprocessor performance modeling with ADAS

    NASA Technical Reports Server (NTRS)

    Hayes, Paul J.; Andrews, Asa M.

    1989-01-01

    A graph managing strategy referred to as the Algorithm to Architecture Mapping Model (ATAMM) appears useful for the time-optimized execution of application algorithm graphs in embedded multiprocessors and for the performance prediction of graph designs. This paper reports the modeling of ATAMM in the Architecture Design and Assessment System (ADAS) to make an independent verification of ATAMM's performance prediction capability and to provide a user framework for the evaluation of arbitrary algorithm graphs. Following an overview of ATAMM and its major functional rules are descriptions of the ADAS model of ATAMM, methods to enter an arbitrary graph into the model, and techniques to analyze the simulation results. The performance of a 7-node graph example is evaluated using the ADAS model and verifies the ATAMM concept by substantiating previously published performance results.

  5. Multiprocessor performance modeling with ADAS

    NASA Technical Reports Server (NTRS)

    Hayes, Paul J.; Andrews, Asa M.

    1989-01-01

    A graph managing strategy referred to as the Algorithm to Architecture Mapping Model (ATAMM) appears useful for the time-optimized execution of application algorithm graphs in embedded multiprocessors and for the performance prediction of graph designs. This paper reports the modeling of ATAMM in the Architecture Design and Assessment System (ADAS) to make an independent verification of ATAMM's performance prediction capability and to provide a user framework for the evaluation of arbitrary algorithm graphs. Following an overview of ATAMM and its major functional rules are descriptions of the ADAS model of ATAMM, methods to enter an arbitrary graph into the model, and techniques to analyze the simulation results. The performance of a 7-node graph example is evaluated using the ADAS model and verifies the ATAMM concept by substantiating previously published performance results.

  6. Multiprocessor Neural Network in Healthcare.

    PubMed

    Godó, Zoltán Attila; Kiss, Gábor; Kocsis, Dénes

    2015-01-01

    A possible way of creating a multiprocessor artificial neural network is by the use of microcontrollers. The RISC processors' high performance and the large number of I/O ports mean they are greatly suitable for creating such a system. During our research, we wanted to see if it is possible to efficiently create interaction between the artifical neural network and the natural nervous system. To achieve as much analogy to the living nervous system as possible, we created a frequency-modulated analog connection between the units. Our system is connected to the living nervous system through 128 microelectrodes. Two-way communication is provided through A/D transformation, which is even capable of testing psychopharmacons. The microcontroller-based analog artificial neural network can play a great role in medical singal processing, such as ECG, EEG etc.

  7. Optimal eigenvalue computation on a mesh multiprocessor

    SciTech Connect

    Crivelli, S.; Jessup, E. R.

    1993-01-01

    In this paper, we compare the costs of computing a single eigenvalue of a symmetric tridiagonal matrix by serial bisection and by parallel multisection on a mesh multiprocessor. We show how the optimal method for computing one eigenvalue depends on such variables as the matrix order and parameters of the multiprocessor used. We present the results of experiments on the 520-processor Intel Touchstone Delta to support our analysis.

  8. Shared and Distributed Memory Parallel Security Analysis of Large-Scale Source Code and Binary Applications

    SciTech Connect

    Quinlan, D; Barany, G; Panas, T

    2007-08-30

    Many forms of security analysis on large scale applications can be substantially automated but the size and complexity can exceed the time and memory available on conventional desktop computers. Most commercial tools are understandably focused on such conventional desktop resources. This paper presents research work on the parallelization of security analysis of both source code and binaries within our Compass tool, which is implemented using the ROSE source-to-source open compiler infrastructure. We have focused on both shared and distributed memory parallelization of the evaluation of rules implemented as checkers for a wide range of secure programming rules, applicable to desktop machines, networks of workstations and dedicated clusters. While Compass as a tool focuses on source code analysis and reports violations of an extensible set of rules, the binary analysis work uses the exact same infrastructure but is less well developed into an equivalent final tool.

  9. An implementation of SISAL for distributed-memory architectures

    SciTech Connect

    Beard, Patrick C.

    1995-06-01

    This thesis describes a new implementation of the implicitly parallel functional programming language SISAL, for massively parallel processor supercomputers. The Optimizing SISAL Compiler (OSC), developed at Lawrence Livermore National Laboratory, was originally designed for shared-memory multiprocessor machines and has been adapted to distributed-memory architectures. OSC has been relatively portable between shared-memory architectures, because they are architecturally similar, and OSC generates portable C code. However, distributed-memory architectures are not standardized -- each has a different programming model. Distributed-memory SISAL depends on a layer of software that provides a portable, distributed, shared-memory abstraction. This layer is provided by Split-C, a dialect of the C programming language developed at U.C. Berkeley, which has demonstrated good performance on distributed-memory architectures. Split-C provides important capabilities for good performance: support for program-specific distributed data structures, and split-phase memory operations. Distributed data structures help achieve good memory locality, while split-phase memory operations help tolerate the longer communication latencies inherent in distributed-memory architectures. The distributed-memory SISAL compiler and run-time system takes advantage of these capabilities. The results of these efforts is a compiler that runs identically on the Thinking Machines Connection Machine (CM-5), and the Meiko Computing Surface (CS-2).

  10. Exploiting Processor Groups to Extend Scalability of the GA Shared Memory Programming Model

    SciTech Connect

    Nieplocha, Jarek; Krishnan, Manoj Kumar; Palmer, Bruce J.; Tipparaju, Vinod; Zhang, Yeliang

    2005-05-04

    Exploiting processor groups is becoming increasingly important for programming next-generation high-end systems composed of tens or hundreds of thousands of processors. This paper discusses the requirements, functionality and development of multilevel-parallelism based on processor groups in the context of the Global Array (GA) shared memory programming model. The main effort involves management of shared data, rather than interprocessor communication. Experimental results for the NAS NPB Conjugate Gradient benchmark and a molecular dynamics (MD) application are presented for a Linux cluster with Myrinet and illustrate the value of the proposed approach for improving scalability. While the original GA version of the CG benchmark lagged MPI, the processor-group version outperforms MPI in all cases, except for a few points on the smallest problem size. Similarly, the group version of the MD application improves execution time by 58% on 32 processors.

  11. Testing and operating a multiprocessor chip with processor redundancy

    DOEpatents

    Bellofatto, Ralph E; Douskey, Steven M; Haring, Rudolf A; McManus, Moyra K; Ohmacht, Martin; Schmunkamp, Dietmar; Sugavanam, Krishnan; Weatherford, Bryan J

    2014-10-21

    A system and method for improving the yield rate of a multiprocessor semiconductor chip that includes primary processor cores and one or more redundant processor cores. A first tester conducts a first test on one or more processor cores, and encodes results of the first test in an on-chip non-volatile memory. A second tester conducts a second test on the processor cores, and encodes results of the second test in an external non-volatile storage device. An override bit of a multiplexer is set if a processor core fails the second test. In response to the override bit, the multiplexer selects a physical-to-logical mapping of processor IDs according to one of: the encoded results in the memory device or the encoded results in the external storage device. On-chip logic configures the processor cores according to the selected physical-to-logical mapping.

  12. Iterative algorithms for tridiagonal matrices on a WSI-multiprocessor

    SciTech Connect

    Gajski, D.D.; Sameh, A.H.; Wisniewski, J.A.

    1982-01-01

    With the rapid advances in semiconductor technology, the construction of Wafer Scale Integration (WSI)-multiprocessors consisting of a large number of processors is now feasible. We illustrate the implementation of some basic linear algebra algorithms on such multiprocessors.

  13. Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

    NASA Astrophysics Data System (ADS)

    Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

    2015-09-01

    The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.

  14. An experimental distributed microprocessor implementation with a shared memory communications and control medium

    NASA Technical Reports Server (NTRS)

    Mejzak, R. S.

    1980-01-01

    The distributed processing concept is defined in terms of control primitives, variables, and structures and their use in performing a decomposed discrete Fourier transform (DET) application function. The design assumes interprocessor communications to be anonymous. In this scheme, all processors can access an entire common database by employing control primitives. Access to selected areas within the common database is random, enforced by a hardware lock, and determined by task and subtask pointers. This enables the number of processors to be varied in the configuration without any modifications to the control structure. Decompositional elements of the DFT application function in terms of tasks and subtasks are also described. The experimental hardware configuration consists of IMSAI 8080 chassis which are independent, 8 bit microcomputer units. These chassis are linked together to form a multiple processing system by means of a shared memory facility. This facility consists of hardware which provides a bus structure to enable up to six microcomputers to be interconnected. It provides polling and arbitration logic so that only one processor has access to shared memory at any one time.

  15. Multiprocessor architecture to handle TJ-II VXI-based digitization channels

    NASA Astrophysics Data System (ADS)

    Crémy, C.; Vega, J.; Sánchez, E.; Dulya, C. M.; Portas, A.

    1999-01-01

    The data acquisition System (DAS) of the TJ-II stellerator provides up to 300 digitization channels integrated in register-based VXI modules designed in CIEMAT Laboratories. The modules are embedded into six 13-slot VXI chassis connected to the TJ-II DAS central computer by means of a dual LAN topology. During normal operation, remote control of the VXI systems and channel setup are accomplished through an Ethernet LAN, while two FDDI rings are dedicated to postdischarge fast data transfer. The former network link is performed by the bus controller whereas the latter one is provided through a FDDI node controller installed in the mainframe, thus creating a multiprocessor architecture. Dedicated software, running on the VxWorks operating system, has been developed to provide handling of the VXI systems including the following facilities: mainframe information readout, channel setup, real time digitization handling, and data transfer. This software, implemented in C++, is distributed over the two CPUs. Interprocessor communication for synchronization purposes is based on a backplane shared memory pool.

  16. Multiprocessor smalltalk: Implementation, performance, and analysis

    SciTech Connect

    Pallas, J.I.

    1990-01-01

    Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possible to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.

  17. Improvement of multiprocessing performance by using optical centralized shared bus

    NASA Astrophysics Data System (ADS)

    Han, Xuliang; Chen, Ray T.

    2004-06-01

    With the ever-increasing need to solve larger and more complex problems, multiprocessing is attracting more and more research efforts. One of the challenges facing the multiprocessor designers is to fulfill in an effective manner the communications among the processes running in parallel on multiple multiprocessors. The conventional electrical backplane bus provides narrow bandwidth as restricted by the physical limitations of electrical interconnects. In the electrical domain, in order to operate at high frequency, the backplane topology has been changed from the simple shared bus to the complicated switched medium. However, the switched medium is an indirect network. It cannot support multicast/broadcast as effectively as the shared bus. Besides the additional latency of going through the intermediate switching nodes, signal routing introduces substantial delay and considerable system complexity. Alternatively, optics has been well known for its interconnect capability. Therefore, it has become imperative to investigate how to improve multiprocessing performance by utilizing optical interconnects. From the implementation standpoint, the existing optical technologies still cannot fulfill the intelligent functions that a switch fabric should provide as effectively as their electronic counterparts. Thus, an innovative optical technology that can provide sufficient bandwidth capacity, while at the same time, retaining the essential merits of the shared bus topology, is highly desirable for the multiprocessing performance improvement. In this paper, the optical centralized shared bus is proposed for use in the multiprocessing systems. This novel optical interconnect architecture not only utilizes the beneficial characteristics of optics, but also retains the desirable properties of the shared bus topology. Meanwhile, from the architecture standpoint, it fits well in the centralized shared-memory multiprocessing scheme. Therefore, a smooth migration with substantial

  18. Memory T and memory B cells share a transcriptional program of self-renewal with long-term hematopoietic stem cells

    PubMed Central

    Luckey, Chance John; Bhattacharya, Deepta; Goldrath, Ananda W.; Weissman, Irving L.; Benoist, Christophe; Mathis, Diane

    2006-01-01

    The only cells of the hematopoietic system that undergo self-renewal for the lifetime of the organism are long-term hematopoietic stem cells and memory T and B cells. To determine whether there is a shared transcriptional program among these self-renewing populations, we first compared the gene-expression profiles of naïve, effector and memory CD8+ T cells with those of long-term hematopoietic stem cells, short-term hematopoietic stem cells, and lineage-committed progenitors. Transcripts augmented in memory CD8+ T cells relative to naïve and effector T cells were selectively enriched in long-term hematopoietic stem cells and were progressively lost in their short-term and lineage-committed counterparts. Furthermore, transcripts selectively decreased in memory CD8+ T cells were selectively down-regulated in long-term hematopoietic stem cells and progressively increased with differentiation. To confirm that this pattern was a general property of immunologic memory, we turned to independently generated gene expression profiles of memory, naïve, germinal center, and plasma B cells. Once again, memory-enriched and -depleted transcripts were also appropriately augmented and diminished in long-term hematopoietic stem cells, and their expression correlated with progressive loss of self-renewal function. Thus, there appears to be a common signature of both up- and down-regulated transcripts shared between memory T cells, memory B cells, and long-term hematopoietic stem cells. This signature was not consistently enriched in neural or embryonic stem cell populations and, therefore, appears to be restricted to the hematopoeitic system. These observations provide evidence that the shared phenotype of self-renewal in the hematopoietic system is linked at the molecular level. PMID:16492737

  19. A combined PLC and CPU approach to multiprocessor control

    SciTech Connect

    Harris, J.J.; Broesch, J.D.; Coon, R.M.

    1995-10-01

    A sophisticated multiprocessor control system has been developed for use in the E-Power Supply System Integrated Control (EPSSIC) on the DIII-D tokamak. EPSSIC provides control and interlocks for the ohmic heating coil power supply and its associated systems. Of particular interest is the architecture of this system: both a Programmable Logic Controller (PLC) and a Central Processor Unit (CPU) have been combined on a standard VME bus. The PLC and CPU input and output signals are routed through signal conditioning modules, which provide the necessary voltage and ground isolation. Additionally these modules adapt the signal levels to that of the VME I/O boards. One set of I/O signals is shared between the two processors. The resulting multiprocessor system provides a number of advantages: redundant operation for mission critical situations, flexible communications using conventional TCP/IP protocols, the simplicity of ladder logic programming for the majority of the control code, and an easily maintained and expandable non-proprietary system.

  20. Techniques for Improving the Performance of Sparse Matrix Factorization on Multiprocessor Workstations

    DTIC Science & Technology

    1990-06-01

    DATES COVERED 4. TITLE AND SUBTITLE S. FUNDING NUMBERS Techniques for Improving the Performance of Sparse Matrix 87-K-0828 Factorization on...ABSTRACT (Maximum 200 words) Abstract - this paper vk Ioo6at the problem of factoring large sparse systems of equations on high-performance multiprocessor... factorization codes achieve only a small fraction of this potential. A major limiting factor is the cost of memory accesses performed during the factorization

  1. gpuSPHASE-A shared memory caching implementation for 2D SPH using CUDA

    NASA Astrophysics Data System (ADS)

    Winkler, Daniel; Meister, Michael; Rezavand, Massoud; Rauch, Wolfgang

    2017-04-01

    Smoothed particle hydrodynamics (SPH) is a meshless Lagrangian method that has been successfully applied to computational fluid dynamics (CFD), solid mechanics and many other multi-physics problems. Using the method to solve transport phenomena in process engineering requires the simulation of several days to weeks of physical time. Based on the high computational demand of CFD such simulations in 3D need a computation time of years so that a reduction to a 2D domain is inevitable. In this paper gpuSPHASE, a new open-source 2D SPH solver implementation for graphics devices, is developed. It is optimized for simulations that must be executed with thousands of frames per second to be computed in reasonable time. A novel caching algorithm for Compute Unified Device Architecture (CUDA) shared memory is proposed and implemented. The software is validated and the performance is evaluated for the well established dambreak test case.

  2. Real-time topological image smoothing on shared memory parallel machines

    NASA Astrophysics Data System (ADS)

    Mahmoudi, Ramzi; Akil, Mohamed

    2011-03-01

    Smoothing filter is the method of choice for image preprocessing and pattern recognition. We present a new concurrent method for smoothing 2D object in binary case. Proposed method provides a parallel computation while preserving the topology by using homotopic transformations. We introduce an adapted parallelization strategy called split, distribute and merge (SDM) strategy which allows efficient parallelization of a large class of topological operators including, mainly, smoothing, skeletonization, and watershed algorithms. To achieve a good speedup, we cared about task scheduling. Distributed work during smoothing process is done by a variable number of threads. Tests on 2D binary image (512*512), using shared memory parallel machine (SMPM) with 8 CPU cores (2× Xeon E5405 running at frequency of 2 GHz), showed an enhancement of 5.2 thus a cadency of 32 images per second is achieved.

  3. Multiprocessor switch with selective pairing

    DOEpatents

    Gara, Alan; Gschwind, Michael K; Salapura, Valentina

    2014-03-11

    System, method and computer program product for a multiprocessing system to offer selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). Each paired microprocessor or processor cores that provide one highly reliable thread for high-reliability connect with a system components such as a memory "nest" (or memory hierarchy), an optional system controller, and optional interrupt controller, optional I/O or peripheral devices, etc. The memory nest is attached to a selective pairing facility via a switch or a bus

  4. Maintaining Packet Order in Reservation-Based Shared-Memory Optical Packet Switch

    NASA Astrophysics Data System (ADS)

    Wang, Xiaoliang; Jiang, Xiaohong; Horiguchi, Susumu

    Shared-Memory Optical Packet (SMOP) switch architecture is very promising for significantly reducing the amount of required optical memory, which is typically constructed from fiber delay lines (FDLs). The current reservation-based scheduling algorithms for SMOP switches can effectively utilize the FDLs and achieve a low packet loss rate by simply reserving the departure time for each arrival packet. It is notable, however, that such a simple scheduling scheme may introduce a significant packet out of order problem. In this paper, we first identify the two main sources of packet out of order problem in the current reservation-based SMOP switches. We then show that by introducing a “last-timestamp” variable and modifying the corresponding FDLs arrangement as well as the scheduling process in the current reservation-based SMOP switches, it is possible to keep packets in-sequence while still maintaining a similar delay and packet loss performance as the previous design. Finally, we further extend our work to support the variable-length burst switching.

  5. How We Transmit Memories to Other Brains: Constructing Shared Neural Representations Via Communication.

    PubMed

    Zadbood, A; Chen, J; Leong, Y C; Norman, K A; Hasson, U

    2017-10-01

    Humans are able to mentally construct an episode when listening to another person's recollection, even though they themselves did not experience the events. However, it is unknown how strongly the neural patterns elicited by mental construction resemble those found in the brain of the individual who experienced the original events. Using fMRI and a verbal communication task, we traced how neural patterns associated with viewing specific scenes in a movie are encoded, recalled, and then transferred to a group of naïve listeners. By comparing neural patterns across the 3 conditions, we report, for the first time, that event-specific neural patterns observed in the default mode network are shared across the encoding, recall, and construction of the same real-life episode. This study uncovers the intimate correspondences between memory encoding and event construction, and highlights the essential role our common language plays in the process of transmitting one's memories to other brains. © The Author 2017. Published by Oxford University Press.

  6. Aho-Corasick String Matching on Shared and Distributed Memory Parallel Architectures

    SciTech Connect

    Tumeo, Antonino; Villa, Oreste; Chavarría-Miranda, Daniel

    2012-03-01

    String matching is at the core of many critical applications, including network intrusion detection systems, search engines, virus scanners, spam filters, DNA and protein sequencing, and data mining. For all of these applications string matching requires a combination of (sometimes all) the following characteristics: high and/or predictable performance, support for large data sets and flexibility of integration and customization. Many software based implementations targeting conventional cache-based microprocessors fail to achieve high and predictable performance requirements, while Field-Programmable Gate Array (FPGA) implementations and dedicated hardware solutions fail to support large data sets (dictionary sizes) and are difficult to integrate and customize. The advent of multicore, multithreaded, and GPU-based systems is opening the possibility for software based solutions to reach very high performance at a sustained rate. This paper compares several software-based implementations of the Aho-Corasick string searching algorithm for high performance systems. We discuss the implementation of the algorithm on several types of shared-memory high-performance architectures (Niagara 2, large x86 SMPs and Cray XMT), distributed memory with homogeneous processing elements (InfiniBand cluster of x86 multicores) and heterogeneous processing elements (InfiniBand cluster of x86 multicores with NVIDIA Tesla C10 GPUs). We describe in detail how each solution achieves the objectives of supporting large dictionaries, sustaining high performance, and enabling customization and flexibility using various data sets.

  7. Parallel Fock matrix construction with distributed shared memory model for the FMO-MO method.

    PubMed

    Umeda, Hiroaki; Inadomi, Yuichi; Watanabe, Toshio; Yagi, Toru; Ishimoto, Takayoshi; Ikegami, Tsutomu; Tadano, Hiroto; Sakurai, Tetsuya; Nagashima, Umpei

    2010-10-01

    A parallel Fock matrix construction program for FMO-MO method has been developed with the distributed shared memory model. To construct a large-sized Fock matrix during FMO-MO calculations, a distributed parallel algorithm was designed to make full use of local memory to reduce communication, and was implemented on the Global Array toolkit. A benchmark calculation for a small system indicates that the parallelization efficiency of the matrix construction portion is as high as 93% at 1,024 processors. A large FMO-MO application on the epidermal growth factor receptor (EGFR) protein (17,246 atoms and 96,234 basis functions) was also carried out at the HF/6-31G level of theory, with the frontier orbitals being extracted by a Sakurai-Sugiura eigensolver. It takes 11.3 h for the FMO calculation, 49.1 h for the Fock matrix construction, and 10 min to extract 94 eigen-components on a PC cluster system using 256 processors.

  8. Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

    NASA Technical Reports Server (NTRS)

    Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

    1990-01-01

    Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.

  9. Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

    NASA Technical Reports Server (NTRS)

    Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

    1990-01-01

    Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.

  10. Shared neuroanatomical substrates of impaired phonological working memory across reading disability and autism.

    PubMed

    Lu, Chunming; Qi, Zhenghan; Harris, Adrianne; Weil, Lisa Wisman; Han, Michelle; Halverson, Kelly; Perrachione, Tyler K; Kjelgaard, Margaret; Wexler, Kenneth; Tager-Flusberg, Helen; Gabrieli, John D E

    2016-03-01

    Individuals with reading disability or individuals with autism spectrum disorder (ASD) are characterized, respectively, by their difficulties in reading or social communication, but both groups often have impaired phonological working memory (PWM). It is not known whether the impaired PWM reflects distinct or shared neuroanatomical abnormalities in these two diagnostic groups. White-matter structural connectivity via diffusion weighted imaging was examined in sixty-four children, ages 5-17 years, with reading disability, ASD, or typical development (TD), who were matched in age, gender, intelligence, and diffusion data quality. Children with reading disability and children with ASD exhibited reduced PWM compared to children with TD. The two diagnostic groups showed altered white-matter microstructure in the temporo-parietal portion of the left arcuate fasciculus (AF) and in the temporo-occipital portion of the right inferior longitudinal fasciculus (ILF), as indexed by reduced fractional anisotropy and increased radial diffusivity. Moreover, the structural integrity of the right ILF was positively correlated with PWM ability in the two diagnostic groups, but not in the TD group. These findings suggest that impaired PWM is transdiagnostically associated with shared neuroanatomical abnormalities in ASD and reading disability. Microstructural characteristics in left AF and right ILF may play important roles in the development of PWM. The right ILF may support a compensatory mechanism for children with impaired PWM.

  11. Shared neuroanatomical substrates of impaired phonological working memory across reading disability and autism

    PubMed Central

    Lu, Chunming; Qi, Zhenghan; Harris, Adrianne; Weil, Lisa Wisman; Han, Michelle; Halverson, Kelly; Perrachione, Tyler K.; Kjelgaard, Margaret; Wexler, Kenneth; Tager-Flusberg, Helen; Gabrieli, John D. E.

    2015-01-01

    Background Individuals with reading disability or individuals with autism spectrum disorder (ASD) are characterized, respectively, by their difficulties in reading or social communication, but both groups often have impaired phonological working memory (PWM). It is not known whether the impaired PWM reflects distinct or shared neuroanatomical abnormalities in these two diagnostic groups. Methods White-matter structural connectivity via diffusion weighted imaging was examined in sixty-four children, ages 5-17 years, with reading disability, ASD, or typical development (TD), who were matched in age, gender, intelligence, and diffusion data quality. Results Children with reading disability and children with ASD exhibited reduced PWM compared to children with TD. The two diagnostic groups showed altered white-matter microstructure in the temporo-parietal portion of the left arcuate fasciculus (AF) and in the temporo-occipital portion of the right inferior longitudinal fasciculus (ILF), as indexed by reduced fractional anisotropy and increased radial diffusivity. Moreover, the structural integrity of the right ILF was positively correlated with PWM ability in the two diagnostic groups, but not in the TD group. Conclusions These findings suggest that impaired PWM is transdiagnostically associated with shared neuroanatomical abnormalities in ASD and reading disability. Microstructural characteristics in left AF and right ILF may play important roles in the development of PWM. The right ILF may support a compensatory mechanism for children with impaired PWM. PMID:26949750

  12. Performance and Application of Parallel OVERFLOW Codes on Distributed and Shared Memory Platforms

    NASA Technical Reports Server (NTRS)

    Djomehri, M. Jahed; Rizk, Yehia M.

    1999-01-01

    The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each

  13. Performance and Application of Parallel OVERFLOW Codes on Distributed and Shared Memory Platforms

    NASA Technical Reports Server (NTRS)

    Djomehri, M. Jahed; Rizk, Yehia M.

    1999-01-01

    The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each

  14. Spaceborne VHSIC multiprocessor system for AI applications

    NASA Technical Reports Server (NTRS)

    Lum, Henry, Jr.; Shrobe, Howard E.; Aspinall, John G.

    1988-01-01

    A multiprocessor system, under design for space-station applications, makes use of the latest generation symbolic processor and packaging technology. The result will be a compact, space-qualified system two to three orders of magnitude more powerful than present-day symbolic processing systems.

  15. Fault detection, isolation and reconfiguration in FTMP Methods and experimental results. [fault tolerant multiprocessor

    NASA Technical Reports Server (NTRS)

    Lala, J. H.

    1983-01-01

    The Fault-Tolerant Multiprocessor (FTMP) is a highly reliable computer designed to meet a goal of 10 to the -10th failures per hour and built with the objective of flying an active-control transport aircraft. Fault detection, identification, and recovery software is described, and experimental results obtained by injecting faults in the pin level in the FTMP are presented. Over 21,000 faults were injected in the CPU, memory, bus interface circuits, and error detection, masking, and error reporting circuits of one LRU of the multiprocessor. Detection, isolation, and reconfiguration times were recorded for each fault, and the results were found to agree well with earlier assumptions made in reliability modeling.

  16. Design and evaluation of a fault-tolerant multiprocessor using hardware recovery blocks

    NASA Technical Reports Server (NTRS)

    Lee, Y. H.; Shin, K. G.

    1982-01-01

    A fault-tolerant multiprocessor with a rollback recovery mechanism is discussed. The rollback mechanism is based on the hardware recovery block which is a hardware equivalent to the software recovery block. The hardware recovery block is constructed by consecutive state-save operations and several state-save units in every processor and memory module. When a fault is detected, the multiprocessor reconfigures itself to replace the faulty component and then the process originally assigned to the faulty component retreats to one of the previously saved states in order to resume fault-free execution. A mathematical model is proposed to calculate both the coverage of multi-step rollback recovery and the risk of restart. A performance evaluation in terms of task execution time is also presented.

  17. Satisfiability Test with Synchronous Simulated Annealing on the Fujitsu AP1000 Massively-Parallel Multiprocessor

    NASA Technical Reports Server (NTRS)

    Sohn, Andrew; Biswas, Rupak

    1996-01-01

    Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.

  18. Fault detection, isolation and reconfiguration in FTMP Methods and experimental results. [fault tolerant multiprocessor

    NASA Technical Reports Server (NTRS)

    Lala, J. H.

    1983-01-01

    The Fault-Tolerant Multiprocessor (FTMP) is a highly reliable computer designed to meet a goal of 10 to the -10th failures per hour and built with the objective of flying an active-control transport aircraft. Fault detection, identification, and recovery software is described, and experimental results obtained by injecting faults in the pin level in the FTMP are presented. Over 21,000 faults were injected in the CPU, memory, bus interface circuits, and error detection, masking, and error reporting circuits of one LRU of the multiprocessor. Detection, isolation, and reconfiguration times were recorded for each fault, and the results were found to agree well with earlier assumptions made in reliability modeling.

  19. MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Taft, James R.

    1999-01-01

    Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.

  20. Serial order working memory and numerical ordinal processing share common processes and predict arithmetic abilities.

    PubMed

    Attout, Lucie; Majerus, Steve

    2017-09-12

    Recent studies have demonstrated that both ordinal number processing and serial order working memory (WM) abilities predict calculation achievement. This raises the question of shared ordinal processes operating in both numerical and WM domains. We explored this question by assessing the interrelations between numerical ordinal, serial order WM, and arithmetic abilities in 102 7- to 9-year-old children. We replicated previous studies showing that ordinal numerical judgement and serial order WM predict arithmetic abilities. Furthermore, we showed that ordinal numerical judgement abilities predict arithmetic abilities after controlling for serial order WM abilities while the relationship between serial order WM and arithmetic abilities was mediated by numerical ordinal judgement performance. We discuss these results in the light of recent theoretical frameworks considering that numerical ordinal codes support the coding of order information in verbal WM. Statement of contribution What is already known on this subject? Numerical ordinal processes predict mathematical achievement in adults. Order WM processing predicts first mathematical abilities. What the present study adds? Numerical ordinal processes predict mathematical achievement in children and independently of order WM. The link between order WM and mathematical abilities was mediated by long-term ordinal processes. © 2017 The British Psychological Society.

  1. Parallel computational steering for HPC applications using HDF5 files in distributed shared memory.

    PubMed

    Biddiscombe, John; Soumagne, Jerome; Oger, Guillaume; Guibert, David; Piccinali, Jean-Guillaume

    2012-06-01

    Interfacing a GUI driven visualization/analysis package to an HPC application enables a supercomputer to be used as an interactive instrument. We achieve this by replacing the IO layer in the HDF5 library with a custom driver which transfers data in parallel between simulation and analysis. Our implementation using ParaView as the interface, allows a flexible combination of parallel simulation, concurrent parallel analysis, and GUI client, either on the same or separate machines. Each MPI job may use different core counts or hardware configurations, allowing fine tuning of the amount of resources dedicated to each part of the workload. By making use of a distributed shared memory file, one may read data from the simulation, modify it using ParaView pipelines, write it back, to be reused by the simulation (or vice versa). This allows not only simple parameter changes, but complete remeshing of grids, or operations involving regeneration of field values over the entire domain. To avoid the problem of manually customizing the GUI for each application that is to be steered, we make use of XML templates that describe outputs from the simulation (and inputs back to it) to automatically generate GUI controls for manipulation of the simulation.

  2. Modeling the Performance of the Concert Multiprocessor.

    DTIC Science & Technology

    1987-05-01

    215, ,’ [MIJ Merchant, S.S., "llie Design and Performancc Analysis of an Arbiter for a Multiprocessor Sharcd-Mcmory System", Report #IlDS-Tl-[-1396...6IFIATION OF THIS PAGE AD-41 -361q P - REPORT DOCUMENTATION PAGE l. REPORT SECURITY CLASSIFICATION lb RESTRICTIVE MARKINGS Unclassified 2a. SECURITY...CLASSIFICATION AUTHORITY 3. DISTRIBUTION /AVAILABILITY OF REPORT 2b. DECLASSIFICATION IDOWNGRADING SCHEDULE Approved for public release; distribution is

  3. The fault-tolerant multiprocessor computer

    NASA Technical Reports Server (NTRS)

    Smith, T. B., III (Editor); Lala, J. H. (Editor); Goldberg, J. (Editor); Kautz, W. H. (Editor); Melliar-Smith, P. M. (Editor); Green, M. W. (Editor); Levitt, K. N. (Editor); Schwartz, R. L. (Editor); Weinstock, C. B. (Editor); Palumbo, D. L. (Editor)

    1986-01-01

    The development and evaluation of fault-tolerant computer architectures and software-implemented fault tolerance (SIFT) for use in advanced NASA vehicles and potentially in flight-control systems are described in a collection of previously published reports prepared for NASA. Topics addressed include the principles of fault-tolerant multiprocessor (FTMP) operation; processor and slave regional designs; FTMP executive, facilities, acceptance-test/diagnostic, applications, and support software; FTM reliability and availability models; SIFT hardware design; and SIFT validation and verification.

  4. Bibliography On Multiprocessors And Distributed Processing

    NASA Technical Reports Server (NTRS)

    Miya, Eugene N.

    1988-01-01

    Multiprocessor and Distributed Processing Bibliography package consists of large machine-readable bibliographic data base, which in addition to usual keyword searches, used for producing citations, indexes, and cross-references. Data base contains UNIX(R) "refer" -formatted ASCII data and implemented on any computer running under UNIX(R) operating system. Easily convertible to other operating systems. Requires approximately one megabyte of secondary storage. Bibliography compiled in 1985.

  5. Thread mapping using system-level model for shared memory multicores

    NASA Astrophysics Data System (ADS)

    Mitra, Reshmi

    Exploring thread-to-core mapping options for a parallel application on a multicore architecture is computationally very expensive. For the same algorithm, the mapping strategy (MS) with the best response time may change with data size and thread counts. The primary challenge is to design a fast, accurate and automatic framework for exploring these MSs for large data-intensive applications. This is to ensure that the users can explore the design space within reasonable machine hours, without thorough understanding on how the code interacts with the platform. Response time is related to the cycles per instructions retired (CPI), taking into account both active and sleep states of the pipeline. This work establishes a hybrid approach, based on Markov Chain Model (MCM) and Model Tree (MT) for system-level steady state CPI prediction. It is designed for shared memory multicore processors with coarse-grained multithreading. The thread status is represented by the MCM states. The program characteristics are modeled as the transition probabilities, representing the system moving between active and suspended thread states. The MT model extrapolates these probabilities for the actual application size (AS) from the smaller AS performance. This aspect of the framework, along with, the use of mathematical expressions for the actual AS performance information, results in a tremendous reduction in the CPI prediction time. The framework is validated using an electromagnetics application. The average performance prediction error for steady state CPI results with 12 different MSs is less than 1%. The total run time of model is of the order of minutes, whereas the actual application execution time is in terms of days.

  6. Energy efficient low power shared-memory Fast Fourier Transform (FFT) processor with dynamic voltage scaling

    NASA Astrophysics Data System (ADS)

    Fitrio, D.; Singh, J.; Stojcevski, A.

    2005-12-01

    Reduction of power dissipations in CMOS circuits needs to be addressed for portable battery devices. Selection of appropriate transistor library to minimise leakage current, implementation of low power design architectures, power management implementation, and the choice of chip packaging, all have impact on power dissipation and are important considerations in design and implementation of integrated circuits for low power applications. Energy-efficient architecture is highly desirable for battery operated systems, which operates in a wide variation of operating scenarios. Energy-efficient design aims to reconfigure its own architectures to scale down energy consumption depending upon the throughput and quality requirement. An energy efficient system should be able to decide its minimum power requirements by dynamically scaling its own operating frequency, supply voltage or the threshold voltage according to a variety of operating scenarios. The increasing product demand for application specific integrated circuit or processor for independent portable devices has influenced designers to implement dedicated processors with ultra low power requirements. One of these dedicated processors is a Fast Fourier Transform (FFT) processor, which is widely used in signal processing for numerous applications such as, wireless telecommunication and biomedical applications where the demand for extended battery life is extremely high. This paper presents the design and performance analysis of a low power shared memory FFT processor incorporating dynamic voltage scaling. Dynamic voltage scaling enables power supply scaling into various supply voltage levels. The concept behind the proposed solution is that if the speed of the main logic core can be adjusted according to input load or amount of processor's computation "just enough" to meet the requirement. The design was implemented using 0.12 μm ST-Microelectronic 6-metal layer CMOS dual- process technology in Cadence Analogue

  7. An Analysis of an Improved Bus-Based Multiprocessor Architecture

    NASA Technical Reports Server (NTRS)

    Ricks, Kenneth G.; Wells, B. Earl

    1998-01-01

    This paper analyses the effectiveness of a hybrid multiprocessing/multicomputing architecture that is based upon a single-board-computer multiprocessor (SBCM) architecture. Based upon empirical analysis using discrete event simulations and Monte Carlo techniques, this hybrid architecture, called the enhanced single-board-computer multiprocessor (ESBCM), is shown to have improved performance and scalability characteristics over current SBCM designs.

  8. Development and Validation of a Hierarchical Memory Model Incorporating CPU- and Memory-Operation Overlap

    SciTech Connect

    Lubeck, Olaf M.; Luo, Yong; Wasserman, Harvey J.; Bassetti, Federico

    1997-12-31

    Distributed shared memory architectures (DSM`s) such as the Origin 2000 are being implemented which extend the concept of single-processor cache hierarchies across an entire physically-distributed multiprocessor machine. The scalability of a DSM machine is inherently tied to memory hierarchy performance, including such issues as latency hiding techniques in the architecture, global cache-coherence protocols, memory consistency models and, of course, the inherent locality of reference in algorithms of interest. In this paper, we characterize application performance with a {open_quotes}memory-centric{close_quotes} view. Using a simple mean value analysis (MVA) strategy and empirical performance data, we infer the contribution of each level in the memory system to the application`s overall cycles per instruction (cpi). We account for the overlap of processor execution with memory accesses - a key parameter which is not directly measurable on the Origin systems. We infer the separate contributions of three major architecture features in the memory subsystem of the Origin 2000: cache size, outstanding loads-under-miss, and memory latency.

  9. Communication-Driven Codesign for Multiprocessor Systems

    DTIC Science & Technology

    2004-01-01

    62]. Next we will examine two related graph-theoretic models, the interprocessor com- munication graph (IPC graph) GIPC [101, 102] and the...multiprocessor schedule for G, we derive GIPC by instantiat- ing a vertex for each task, connecting an edge from each task to the task that succeeds it on the...x, y) in G that connects tasks that execute on different processors, an IPC edge is instantiated in GIPC from x to y. Figure 2.6 shows the IPC graph

  10. Partitioning of regular computation on multiprocessor systems

    NASA Technical Reports Server (NTRS)

    Lee, Fung Fung

    1988-01-01

    Problem partitioning of regular computation over two dimensional meshes on multiprocessor systems is examined. The regular computation model considered involves repetitive evaluation of values at each mesh point with local communication. The computational workload and the communication pattern are the same at each mesh point. The regular computation model arises in numerical solutions of partial differential equations and simulations of cellular automata. Given a communication pattern, a systematic way to generate a family of partitions is presented. The influence of various partitioning schemes on performance is compared on the basis of computation to communication ratio.

  11. Partitioning of regular computation on multiprocessor systems

    NASA Technical Reports Server (NTRS)

    Lee, Fung F.

    1990-01-01

    Problem partitioning of regular computation over two dimensional meshes on multiprocessor systems is examined. The regular computation model considered involves repetitive evaluation of values at each mesh point with local communication. The computational workload and the communication pattern are the same at each mesh point. The regular computation model arises in numerical solutions of partial differential equations and simulations of cellular automata. Given a communication pattern, a systematic way to generate a family of partitions is presented. The influence of various partitioning schemes on performance is compared on the basis of computation to communication ratio.

  12. Partitioning of regular computation on multiprocessor systems

    SciTech Connect

    Lee, F. . Computer Systems Lab.)

    1990-07-01

    Problem partitioning of regular computation over two-dimensional meshes on multiprocessor systems is examined. The regular computation model considered involves repetitive evaluation of values at each mesh point with local communication. The computational workload and the communication pattern are the same at each mesh point. The regular computation model arises in numerical solutions of partial differential equations and simulations of cellular automata. Given a communication pattern, a systematic way to generate a family of partitions is presented. The influence of various partitioning schemes on performance is compared on the basis of computation to communication ratio.

  13. Pipeline multiprocessor architecture for high speed cell image analysis

    SciTech Connect

    Castleman, K.R.; Price, K.H.; Eskenazi, R.; Ovadya, M.M.; Navon, M.A.

    1983-10-01

    A pipeline multiple-microprocessor architecture for high-speed digital image processing is being developed. The goal is a compact, fast, and low-cost pap smear analyzer for cervical cancer detection. Each processor communicates with one or two upstream processors and from one to 13 downstream processors via shared memory. Each of the identical pipeline modules (PC boards) has a Motorla MC6809 microprocessor with a 2 megabyte memory management unit, two 64kbyte dual-port image memories (shared with upstream processors) and one 64kbyte dual-port program memory (shared with a host computer). Intermodule communication is achieved by ribbon cables connected to connectors at the top of the boards. This allows considerable flexibility in configuring the system. This architecture should facilitate efficient (fast, low-cost) implementations of complex single-purpose image processing systems.

  14. Parallel algorithms for geometric connected component labeling on a hypercube multiprocessor

    NASA Technical Reports Server (NTRS)

    Belkhale, K. P.; Banerjee, P.

    1992-01-01

    Different algorithms for the geometric connected component labeling (GCCL) problem are defined each of which involves d stages of message passing, for a d-dimensional hypercube. The major idea is that in each stage a hypercube multiprocessor increases its knowledge of domain. The algorithms under consideration include the QUAD algorithm for small number of processors and the Overlap Quad algorithm for large number of processors, subject to the locality of the connected sets. These algorithms differ in their run time, memory requirements, and message complexity. They were implemented on an Intel iPSC2/D4/MX hypercube.

  15. A multiprocessor airborne lidar data system

    NASA Astrophysics Data System (ADS)

    Wright, C. W.; Bailey, S. A.; Heath, G. E.; Piazza, C. R.

    A new multiprocessor data acquisition system was developed for the existing Airborne Oceanographic Lidar (AOL). This implementation simultaneously utilizes five single board 68010 microcomputers, the UNIX system V operating system, and the real time executive VRTX. The original data acquisition system was implemented on a Hewlett Packard HP 21-MX 16 bit minicomputer using a multi-tasking real time operating system and a mixture of assembly and FORTRAN languages. The present collection of data sources produce data at widely varied rates and require varied amounts of burdensome real time processing and formatting. It was decided to replace the aging HP 21-MX minicomputer with a multiprocessor system. A new and flexible recording format was devised and implemented to accommodate the constantly changing sensor configuration. A central feature of this data system is the minimization of non-remote sensing bus traffic. Therefore, it is highly desirable that each micro be capable of functioning as much as possible on-card or via private peripherals. The bus is used primarily for the transfer of remote sensing data to or from the buffer queue.

  16. A multiprocessor airborne lidar data system

    NASA Technical Reports Server (NTRS)

    Wright, C. W.; Bailey, S. A.; Heath, G. E.; Piazza, C. R.

    1988-01-01

    A new multiprocessor data acquisition system was developed for the existing Airborne Oceanographic Lidar (AOL). This implementation simultaneously utilizes five single board 68010 microcomputers, the UNIX system V operating system, and the real time executive VRTX. The original data acquisition system was implemented on a Hewlett Packard HP 21-MX 16 bit minicomputer using a multi-tasking real time operating system and a mixture of assembly and FORTRAN languages. The present collection of data sources produce data at widely varied rates and require varied amounts of burdensome real time processing and formatting. It was decided to replace the aging HP 21-MX minicomputer with a multiprocessor system. A new and flexible recording format was devised and implemented to accommodate the constantly changing sensor configuration. A central feature of this data system is the minimization of non-remote sensing bus traffic. Therefore, it is highly desirable that each micro be capable of functioning as much as possible on-card or via private peripherals. The bus is used primarily for the transfer of remote sensing data to or from the buffer queue.

  17. Generalized multiprocessor scheduling for directed acylic graphs

    SciTech Connect

    Prasanna, G.N.S.; Musicus, B.R.

    1994-12-31

    This paper considerably extends the multiprocessor scheduling techniques in the authors` previous work, and applies it to matrix arithmetic compilation. In that paper, they presented several new results in the theory of homogeneous multiprocessor scheduling. A directed acyclic graph (DAG) of tasks is to be scheduled. Tasks are assumed to be parallelizable -- as more processors are applied to a task, the time taken to compute it decreases, yielding some speedup. Because of communication, synchronization, and task scheduling overhead, this speedup increases less than linearly with the number of processors applied. The optimal scheduling problem is to determine the number of processors assigned to each task, and task sequencing, to minimize the finishing time. Using optimal control theory, in the special case where the speedup function of each task is p{sup {alpha}}, where p is the amount of processing power applied to the task, a closed form solution for task graphs formed from parallel and series connections was derived. This paper extends these results for arbitrary DAGS. The optimality conditions impose nonlinear constraints on the flow of processing power from predecessors to successors, and on the finishing times of siblings. This paper presents a fast algorithm for determining and solving these nonlinear equations. The algorithm utilizes the structure of the finishing time equations to efficiently run a conjugate gradient minimization, leading to the optimal solution. The algorithm has been tested on a variety of DAGS. The results presented show that it is superior to alternative heuristic approaches.

  18. Distinct and shared cognitive functions mediate event- and time-based prospective memory impairment in normal ageing

    PubMed Central

    Gonneaud, Julie; Kalpouzos, Grégoria; Bon, Laetitia; Viader, Fausto; Eustache, Francis; Desgranges, Béatrice

    2011-01-01

    Prospective memory (PM) is the ability to remember to perform an action at a specific point in the future. Regarded as multidimensional, PM involves several cognitive functions that are known to be impaired in normal aging. In the present study, we set out to investigate the cognitive correlates of PM impairment in normal aging. Manipulating cognitive load, we assessed event- and time-based PM, as well as several cognitive functions, including executive functions, working memory and retrospective episodic memory, in healthy subjects covering the entire adulthood. We found that normal aging was characterized by PM decline in all conditions and that event-based PM was more sensitive to the effects of aging than time-based PM. Whatever the conditions, PM was linked to inhibition and processing speed. However, while event-based PM was mainly mediated by binding and retrospective memory processes, time-based PM was mainly related to inhibition. The only distinction between high- and low-load PM cognitive correlates lays in an additional, but marginal, correlation between updating and the high-load PM condition. The association of distinct cognitive functions, as well as shared mechanisms with event- and time-based PM confirms that each type of PM relies on a different set of processes. PMID:21678154

  19. Associative-memory representations emerge as shared spatial patterns of theta activity spanning the primate temporal cortex

    PubMed Central

    Nakahara, Kiyoshi; Adachi, Ken; Kawasaki, Keisuke; Matsuo, Takeshi; Sawahata, Hirohito; Majima, Kei; Takeda, Masaki; Sugiyama, Sayaka; Nakata, Ryota; Iijima, Atsuhiko; Tanigawa, Hisashi; Suzuki, Takafumi; Kamitani, Yukiyasu; Hasegawa, Isao

    2016-01-01

    Highly localized neuronal spikes in primate temporal cortex can encode associative memory; however, whether memory formation involves area-wide reorganization of ensemble activity, which often accompanies rhythmicity, or just local microcircuit-level plasticity, remains elusive. Using high-density electrocorticography, we capture local-field potentials spanning the monkey temporal lobes, and show that the visual pair-association (PA) memory is encoded in spatial patterns of theta activity in areas TE, 36, and, partially, in the parahippocampal cortex, but not in the entorhinal cortex. The theta patterns elicited by learned paired associates are distinct between pairs, but similar within pairs. This pattern similarity, emerging through novel PA learning, allows a machine-learning decoder trained on theta patterns elicited by a particular visual item to correctly predict the identity of those elicited by its paired associate. Our results suggest that the formation and sharing of widespread cortical theta patterns via learning-induced reorganization are involved in the mechanisms of associative memory representation. PMID:27282247

  20. Insertion of coherence requests for debugging a multiprocessor

    DOEpatents

    Blumrich, Matthias A.; Salapura, Valentina

    2010-02-23

    A method and system are disclosed to insert coherence events in a multiprocessor computer system, and to present those coherence events to the processors of the multiprocessor computer system for analysis and debugging purposes. The coherence events are inserted in the computer system by adding one or more special insert registers. By writing into the insert registers, coherence events are inserted in the multiprocessor system as if they were generated by the normal coherence protocol. Once these coherence events are processed, the processing of coherence events can continue in the normal operation mode.

  1. Multi-core and Many-core Shared-memory Parallel Raycasting Volume Rendering Optimization and Tuning

    SciTech Connect

    Howison, Mark

    2012-01-31

    Given the computing industry trend of increasing processing capacity by adding more cores to a chip, the focus of this work is tuning the performance of a staple visualization algorithm, raycasting volume rendering, for shared-memory parallelism on multi-core CPUs and many-core GPUs. Our approach is to vary tunable algorithmic settings, along with known algorithmic optimizations and two different memory layouts, and measure performance in terms of absolute runtime and L2 memory cache misses. Our results indicate there is a wide variation in runtime performance on all platforms, as much as 254% for the tunable parameters we test on multi-core CPUs and 265% on many-core GPUs, and the optimal configurations vary across platforms, often in a non-obvious way. For example, our results indicate the optimal configurations on the GPU occur at a crossover point between those that maintain good cache utilization and those that saturate computational throughput. This result is likely to be extremely difficult to predict with an empirical performance model for this particular algorithm because it has an unstructured memory access pattern that varies locally for individual rays and globally for the selected viewpoint. Our results also show that optimal parameters on modern architectures are markedly different from those in previous studies run on older architectures. And, given the dramatic performance variation across platforms for both optimal algorithm settings and performance results, there is a clear benefit for production visualization and analysis codes to adopt a strategy for performance optimization through auto-tuning. These benefits will likely become more pronounced in the future as the number of cores per chip and the cost of moving data through the memory hierarchy both increase.

  2. Shared and distinct contributions of rostrolateral prefrontal cortex to analogical reasoning and episodic memory retrieval.

    PubMed

    Westphal, Andrew J; Reggente, Nicco; Ito, Kaori L; Rissman, Jesse

    2016-03-01

    Rostrolateral prefrontal cortex (RLPFC) is widely appreciated to support higher cognitive functions, including analogical reasoning and episodic memory retrieval. However, these tasks have typically been studied in isolation, and thus it is unclear whether they involve common or distinct RLPFC mechanisms. Here, we introduce a novel functional magnetic resonance imaging (fMRI) task paradigm to compare brain activity during reasoning and memory tasks while holding bottom-up perceptual stimulation and response demands constant. Univariate analyses on fMRI data from twenty participants identified a large swath of left lateral prefrontal cortex, including RLPFC, that showed common engagement on reasoning trials with valid analogies and memory trials with accurately retrieved source details. Despite broadly overlapping recruitment, multi-voxel activity patterns within left RLPFC reliably differentiated these two trial types, highlighting the presence of at least partially distinct information processing modes. Functional connectivity analyses demonstrated that while left RLPFC showed consistent coupling with the fronto-parietal control network across tasks, its coupling with other cortical areas varied in a task-dependent manner. During the memory task, this region strengthened its connectivity with the default mode and memory retrieval networks, whereas during the reasoning task it coupled more strongly with a nearby left prefrontal region (BA 45) associated with semantic processing, as well as with a superior parietal region associated with visuospatial processing. Taken together, these data suggest a domain-general role for left RLPFC in monitoring and/or integrating task-relevant knowledge representations and showcase how its function cannot solely be attributed to episodic memory or analogical reasoning computations. © 2015 Wiley Periodicals, Inc.

  3. Real-time implementations of image segmentation algorithms on shared memory multicore architecture: a survey (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Akil, Mohamed

    2017-05-01

    The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.

  4. SHARE and Share Alike

    ERIC Educational Resources Information Center

    Baird, Jeffrey Marshall

    2006-01-01

    This article describes a reading comprehension program adopted at J. E. Cosgriff Memorial Catholic School in Salt Lake City, Utah. The program is called SHARE: Students Helping Achieve Reading Excellence, and involves seventh and eighth grade students teaching first and second graders reading comprehension strategies learned in middle school…

  5. Shared Etiology of Phonological Memory and Vocabulary Deficits in School-Age Children

    ERIC Educational Resources Information Center

    Peterson, Robin L.; Pennington, Bruce F.; Samuelsson, Stefan; Byrne, Brian; Olson, Richard K.

    2013-01-01

    Purpose: The goal of this study was to investigate the etiologic basis for the association between deficits in phonological memory (PM) and vocabulary in school-age children. Method: Children with deficits in PM or vocabulary were identified within the International Longitudinal Twin Study (ILTS; Samuelsson et al., 2005). The ILTS includes 1,045…

  6. Shared Etiology of Phonological Memory and Vocabulary Deficits in School-Age Children

    ERIC Educational Resources Information Center

    Peterson, Robin L.; Pennington, Bruce F.; Samuelsson, Stefan; Byrne, Brian; Olson, Richard K.

    2013-01-01

    Purpose: The goal of this study was to investigate the etiologic basis for the association between deficits in phonological memory (PM) and vocabulary in school-age children. Method: Children with deficits in PM or vocabulary were identified within the International Longitudinal Twin Study (ILTS; Samuelsson et al., 2005). The ILTS includes 1,045…

  7. VME rollback hardware for time warp multiprocessor systems

    NASA Technical Reports Server (NTRS)

    Robb, Michael J.; Buzzell, Calvin A.

    1992-01-01

    The purpose of the research effort is to develop and demonstrate innovative hardware to implement specific rollback and timing functions required for efficient queue management and precision timekeeping in multiprocessor discrete event simulations. The previously completed phase 1 effort demonstrated the technical feasibility of building hardware modules which eliminate the state saving overhead of the Time Warp paradigm used in distributed simulations on multiprocessor systems. The current phase 2 effort will build multiple pre-production rollback hardware modules integrated with a network of Sun workstations, and the integrated system will be tested by executing a Time Warp simulation. The rollback hardware will be designed to interface with the greatest number of multiprocessor systems possible. The authors believe that the rollback hardware will provide for significant speedup of large scale discrete event simulation problems and allow multiprocessors using Time Warp to dramatically increase performance.

  8. Hardware for a real-time multiprocessor simulator

    NASA Technical Reports Server (NTRS)

    Blech, R. A.; Arpasi, D. J.

    1984-01-01

    The hardware for a real time multiprocessor simulator (RTMPS) developed at the NASA Lewis Research Center is described. The RTMPS is a multiple microprocessor system used to investigate the application of parallel processing concepts to real time simulation. It is designed to provide flexible data exchange paths between processors by using off the shelf microcomputer boards and minimal customized interfacing. A dedicated operator interface allows easy setup of the simulator and quick interpreting of simulation data. Simulations for the RTMPS are coded in a NASA designed real time multiprocessor language (RTMPL). This language is high level and geared to the multiprocessor environment. A real time multiprocessor operating system (RTMPOS) has also been developed that provides a user friendly operator interface. The RTMPS and supporting software are currently operational and are being evaluated at Lewis. The results of this evaluation will be used to specify the design of an optimized parallel processing system for real time simulation of dynamic systems.

  9. Fault diagnosis in sparse multiprocessor systems

    NASA Technical Reports Server (NTRS)

    Blough, Douglas M.; Sullivan, Gregory F.; Masson, Gerald M.

    1988-01-01

    The problem of fault diagnosis in multiprocessor systems is considered under a uniformly probabilistic model in which processors are faulty with probability p. This work focuses on minimizing the number of tests that must be conducted in order to correctly diagnose the state of every processor in the system with high probability. A diagnosis algorithm that can correctly diagnose the state of every processor with probability approaching one in a class of systems performing slightly greater than a linear number of tests is presented. A nearly matching lower bound on the number of tests required to achieve correct diagnosis in arbitrary systems is also proven. The number of tests required under this probabilistic model is shown to be significantly less than under a bounded-size fault set model. Because the number of tests that must be conducted is a measure of the diagnosis overhead, these results represent a dramatic improvement in the performance of system-level diagnosis technique.

  10. Neural substrates of shared attention as social memory: A hyperscanning functional magnetic resonance imaging study.

    PubMed

    Koike, Takahiko; Tanabe, Hiroki C; Okazaki, Shuntaro; Nakagawa, Eri; Sasaki, Akihiro T; Shimada, Koji; Sugawara, Sho K; Takahashi, Haruka K; Yoshihara, Kazufumi; Bosch-Bayard, Jorge; Sadato, Norihiro

    2016-01-15

    During a dyadic social interaction, two individuals can share visual attention through gaze, directed to each other (mutual gaze) or to a third person or an object (joint attention). Shared attention is fundamental to dyadic face-to-face interaction, but how attention is shared, retained, and neutrally represented in a pair-specific manner has not been well studied. Here, we conducted a two-day hyperscanning functional magnetic resonance imaging study in which pairs of participants performed a real-time mutual gaze task followed by a joint attention task on the first day, and mutual gaze tasks several days later. The joint attention task enhanced eye-blink synchronization, which is believed to be a behavioral index of shared attention. When the same participant pairs underwent mutual gaze without joint attention on the second day, enhanced eye-blink synchronization persisted, and this was positively correlated with inter-individual neural synchronization within the right inferior frontal gyrus. Neural synchronization was also positively correlated with enhanced eye-blink synchronization during the previous joint attention task session. Consistent with the Hebbian association hypothesis, the right inferior frontal gyrus had been activated both by initiating and responding to joint attention. These results indicate that shared attention is represented and retained by pair-specific neural synchronization that cannot be reduced to the individual level. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  11. File-System Workload on a Scientific Multiprocessor

    NASA Technical Reports Server (NTRS)

    Kotz, David; Nieuwejaar, Nils

    1995-01-01

    Many scientific applications have intense computational and I/O requirements. Although multiprocessors have permitted astounding increases in computational performance, the formidable I/O needs of these applications cannot be met by current multiprocessors a their I/O subsystems. To prevent I/O subsystems from forever bottlenecking multiprocessors and limiting the range of feasible applications, new I/O subsystems must be designed. The successful design of computer systems (both hardware and software) depends on a thorough understanding of their intended use. A system designer optimizes the policies and mechanisms for the cases expected to most common in the user's workload. In the case of multiprocessor file systems, however, designers have been forced to build file systems based only on speculation about how they would be used, extrapolating from file-system characterizations of general-purpose workloads on uniprocessor and distributed systems or scientific workloads on vector supercomputers (see sidebar on related work). To help these system designers, in June 1993 we began the Charisma Project, so named because the project sought to characterize 1/0 in scientific multiprocessor applications from a variety of production parallel computing platforms and sites. The Charisma project is unique in recording individual read and write requests-in live, multiprogramming, parallel workloads (rather than from selected or nonparallel applications). In this article, we present the first results from the project: a characterization of the file-system workload an iPSC/860 multiprocessor running production, parallel scientific applications at NASA's Ames Research Center.

  12. LDRD final report : managing shared memory data distribution in hybrid HPC applications.

    SciTech Connect

    Merritt, Alexander M.; Pedretti, Kevin Thomas Tauke

    2010-09-01

    MPI is the dominant programming model for distributed memory parallel computers, and is often used as the intra-node programming model on multi-core compute nodes. However, application developers are increasingly turning to hybrid models that use threading within a node and MPI between nodes. In contrast to MPI, most current threaded models do not require application developers to deal explicitly with data locality. With increasing core counts and deeper NUMA hierarchies seen in the upcoming LANL/SNL 'Cielo' capability supercomputer, data distribution poses an upper boundary on intra-node scalability within threaded applications. Data locality therefore has to be identified at runtime using static memory allocation policies such as first-touch or next-touch, or specified by the application user at launch time. We evaluate several existing techniques for managing data distribution using micro-benchmarks on an AMD 'Magny-Cours' system with 24 cores among 4 NUMA domains and argue for the adoption of a dynamic runtime system implemented at the kernel level, employing a novel page table replication scheme to gather per-NUMA domain memory access traces.

  13. Autobiographical Memory Sharing in Everyday Life: Characteristics of a Good Story

    ERIC Educational Resources Information Center

    Baron, Jacqueline M.; Bluck, Susan

    2009-01-01

    Storytelling is a ubiquitous human activity that occurs across the lifespan as part of everyday life. Studies from three disparate literatures suggest that older adults (as compared to younger adults) are (a) less likely to recall story details, (b) more likely to go off-target when sharing stories, and, in contrast, (c) more likely to receive…

  14. Shared Memory Performance of Multi-Computer Terminals in Distributed Information Systems.

    ERIC Educational Resources Information Center

    Reddi, Arumalla V.

    1984-01-01

    Presents a system model for transmission of input data that is coming from terminals of users in a limited user resource-sharing environment. Performance of a mini/microcomputer receiving mixture of picture-phone terminal data is analyzed with constant service times, synchronous transmission, and single-server interruptions through first-order…

  15. Autobiographical Memory Sharing in Everyday Life: Characteristics of a Good Story

    ERIC Educational Resources Information Center

    Baron, Jacqueline M.; Bluck, Susan

    2009-01-01

    Storytelling is a ubiquitous human activity that occurs across the lifespan as part of everyday life. Studies from three disparate literatures suggest that older adults (as compared to younger adults) are (a) less likely to recall story details, (b) more likely to go off-target when sharing stories, and, in contrast, (c) more likely to receive…

  16. Multis: a new class of multiprocessor computers.

    PubMed

    Bell, C G

    1985-04-26

    Multis are a new class of computers based on multiple microprocessors. The small size, low cost, and high performance of microprocessors allow the design and construction of computer structures that offer significant advantages in manufacture, price-performance ratio, and reliability over traditional computer families. Currently, commercial multis consist of 4 to 28 modules, which include microprocessors, common memories, and input-output devices, all of which communicate through a single set of wires called a bus. Adding microprocessors together increases the performance of multis in direct proportion to their price and allows multis to offer a performance range that spans that of small minicomputers to mainframe computers. Multis are commercially available for applications ranging from real-time industrial control to transaction processing. Traditional batch, time-sharing, and transaction systems process a number of independent jobs that can be distributed among the microprocessors of a multi with a resulting increased throughput (number of jobs completed per unit of time). Many scientific applications (such as the solving of partial differential equations) and engineering applications (such as the checking of integrated circuit designs) are speeded up by this parallel computation; thus, multis produce results at supercomputer speed but at a fraction of the cost. Multis are likely to be the basis for the next, the fifth, generation of computers-a generation based on parallel processing.

  17. A class Hierarchical, object-oriented approach to virtual memory management

    NASA Technical Reports Server (NTRS)

    Russo, Vincent F.; Campbell, Roy H.; Johnston, Gary M.

    1989-01-01

    The Choices family of operating systems exploits class hierarchies and object-oriented programming to facilitate the construction of customized operating systems for shared memory and networked multiprocessors. The software is being used in the Tapestry laboratory to study the performance of algorithms, mechanisms, and policies for parallel systems. Described here are the architectural design and class hierarchy of the Choices virtual memory management system. The software and hardware mechanisms and policies of a virtual memory system implement a memory hierarchy that exploits the trade-off between response times and storage capacities. In Choices, the notion of a memory hierarchy is captured by abstract classes. Concrete subclasses of those abstractions implement a virtual address space, segmentation, paging, physical memory management, secondary storage, and remote (that is, networked) storage. Captured in the notion of a memory hierarchy are classes that represent memory objects. These classes provide a storage mechanism that contains encapsulated data and have methods to read or write the memory object. Each of these classes provides specializations to represent the memory hierarchy.

  18. Shared representations for working memory and mental imagery in early visual cortex.

    PubMed

    Albers, Anke Marit; Kok, Peter; Toni, Ivan; Dijkerman, H Chris; de Lange, Floris P

    2013-08-05

    Early visual areas contain specific information about visual items maintained in working memory, suggesting a role for early visual cortex in more complex cognitive functions [1-4]. It is an open question, however, whether these areas also underlie the ability to internally generate images de novo (i.e., mental imagery). Research on mental imagery has to this point focused mostly on whether mental images activate early sensory areas, with mixed results [5-7]. Recent studies suggest that multivariate pattern analysis of neural activity patterns in visual regions can reveal content-specific representations during cognitive processes, even though overall activation levels are low [1-4]. Here, we used this approach [8, 9] to study item-specific activity patterns in early visual areas (V1-V3) when these items are internally generated. We could reliably decode stimulus identity from neural activity patterns in early visual cortex during both working memory and mental imagery. Crucially, these activity patterns resembled those evoked by bottom-up visual stimulation, suggesting that mental images are indeed "perception-like" in nature. These findings suggest that the visual cortex serves as a dynamic "blackboard" [10, 11] that is used during both bottom-up stimulus processing and top-down internal generation of mental content. Copyright © 2013 Elsevier Ltd. All rights reserved.

  19. Exploration of SMP-Aware DAO Memory Performance Issues-Final Report 2002

    SciTech Connect

    de Supinski, B R; Yoo, A; McKee, S A; Schulz, M; Mohan, T

    2003-02-04

    The performance of many LLNL applications is dominated by the cost of main memory accesses. Worse, many current trends in computer architecture will lead to substantial degradation of the percentage of peak performance obtained by these codes. This project yields novel techniques that alleviate this problem in SMP-based systems, which are common at LLNL. Further, our techniques will complement other emerging mechanisms for improving memory system performance, such as processor-in-memory. The exploration of existing dynamic access ordering (DAO) mechanisms adapted to SMPs and the development of new memory performance optimization techniques will lead to significant improvements in run times for LLNL applications on future computing platforms, effectively increasing the size of the platform. In this project, we have focused on a range of techniques to overcome the performance bottleneck of current multiprocessor systems and to increase the single-node efficiency. These efforts include the design and implementation of a toolset to analyze memory access patterns of applications, the exploration of regularity metrics and their use to classify code behavior, and a set of microbenchmarks to assess and quantify the performance of SMP memory systems. We will make these tools available to the general laboratory user community to help the evaluation and optimization of LLNL applications. In addition, we explored the use of Dynamic Access Ordering (DAO) techniques in the realm of shared memory multiprocessors. The most critical part of the latter is the need to maintain coherence among reordered accesses due to possible aliasing. We have worked on several design alternatives to guarantee consistency in such systems without changing the user environment. This guarantees that such novel memory systems will be directly applicable for existing and future HPC codes at LLNL.

  20. Memory performance of Prolog architectures

    SciTech Connect

    Tick, E.

    1988-01-01

    Memory Performance of Prolog Architectures addresses these problems and reports dynamic data and instruction referencing characteristics of both sequential and parallel prolog architectures and corresponding uni-processor and multi-processor memory-hierarchy performance tradeoffs. Computer designers and logic programmers will find this work to be a valuable reference with many practical applications. Memory Performance of Prolog Architectures will also serve as an important textbook for graduate level courses in computer architecture and/or performance analysis.

  1. A fault-tolerant multiprocessor architecture for aircraft, volume 1. [autopilot configuration

    NASA Technical Reports Server (NTRS)

    Smith, T. B.; Hopkins, A. L.; Taylor, W.; Ausrotas, R. A.; Lala, J. H.; Hanley, L. D.; Martin, J. H.

    1978-01-01

    A fault-tolerant multiprocessor architecture is reported. This architecture, together with a comprehensive information system architecture, has important potential for future aircraft applications. A preliminary definition and assessment of a suitable multiprocessor architecture for such applications is developed.

  2. Memory Benchmarks for SMP-Based High Performance Parallel Computers

    SciTech Connect

    Yoo, A B; de Supinski, B; Mueller, F; Mckee, S A

    2001-11-20

    As the speed gap between CPU and main memory continues to grow, memory accesses increasingly dominates the performance of many applications. The problem is particularly acute for symmetric multiprocessor (SMP) systems, where the shared memory may be accessed concurrently by a group of threads running on separate CPUs. Unfortunately, several key issues governing memory system performance in current systems are not well understood. Complex interactions between the levels of the memory hierarchy, buses or switches, DRAM back-ends, system software, and application access patterns can make it difficult to pinpoint bottlenecks and determine appropriate optimizations, and the situation is even more complex for SMP systems. To partially address this problem, we formulated a set of multi-threaded microbenchmarks for characterizing and measuring the performance of the underlying memory system in SMP-based high-performance computers. We report our use of these microbenchmarks on two important SMP-based machines. This paper has four primary contributions. First, we introduce a microbenchmark suite to systematically assess and compare the performance of different levels in SMP memory hierarchies. Second, we present a new tool based on hardware performance monitors to determine a wide array of memory system characteristics, such as cache sizes, quickly and easily; by using this tool, memory performance studies can be targeted to the full spectrum of performance regimes with many fewer data points than is otherwise required. Third, we present experimental results indicating that the performance of applications with large memory footprints remains largely constrained by memory. Fourth, we demonstrate that thread-level parallelism further degrades memory performance, even for the latest SMPs with hardware prefetching and switch-based memory interconnects.

  3. The change probability effect: incidental learning, adaptability, and shared visual working memory resources.

    PubMed

    van Lamsweerde, Amanda E; Beck, Melissa R

    2011-12-01

    Statistical properties in the visual environment can be used to improve performance on visual working memory (VWM) tasks. The current study examined the ability to incidentally learn that a change is more likely to occur to a particular feature dimension (shape, color, or location) and use this information to improve change detection performance for that dimension (the change probability effect). Participants completed a change detection task in which one change type was more probable than others. Change probability effects were found for color and shape changes, but not location changes, and intentional strategies did not improve the effect. Furthermore, the change probability effect developed and adapted to new probability information quickly. Finally, in some conditions, an improvement in change detection performance for a probable change led to an impairment in change detection for improbable changes.

  4. Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

    NASA Technical Reports Server (NTRS)

    Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

    1994-01-01

    Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.

  5. Memory

    MedlinePlus

    ... it has to decide what is worth remembering. Memory is the process of storing and then remembering this information. There are different types of memory. Short-term memory stores information for a few ...

  6. Prefetching in file systems for MIMD multiprocessors

    NASA Technical Reports Server (NTRS)

    Kotz, David F.; Ellis, Carla Schlatter

    1990-01-01

    The question of whether prefetching blocks on the file into the block cache can effectively reduce overall execution time of a parallel computation, even under favorable assumptions, is considered. Experiments have been conducted with an interleaved file system testbed on the Butterfly Plus multiprocessor. Results of these experiments suggest that (1) the hit ratio, the accepted measure in traditional caching studies, may not be an adequate measure of performance when the workload consists of parallel computations and parallel file access patterns, (2) caching with prefetching can significantly improve the hit ratio and the average time to perform an I/O (input/output) operation, and (3) an improvement in overall execution time has been observed in most cases. In spite of these gains, prefetching sometimes results in increased execution times (a negative result, given the optimistic nature of the study). The authors explore why it is not trivial to translate savings on individual I/O requests into consistently better overall performance and identify the key problems that need to be addressed in order to improve the potential of prefetching techniques in the environment.

  7. Prefetching in file systems for MIMD multiprocessors

    NASA Technical Reports Server (NTRS)

    Kotz, David F.; Ellis, Carla Schlatter

    1990-01-01

    The question of whether prefetching blocks on the file into the block cache can effectively reduce overall execution time of a parallel computation, even under favorable assumptions, is considered. Experiments have been conducted with an interleaved file system testbed on the Butterfly Plus multiprocessor. Results of these experiments suggest that (1) the hit ratio, the accepted measure in traditional caching studies, may not be an adequate measure of performance when the workload consists of parallel computations and parallel file access patterns, (2) caching with prefetching can significantly improve the hit ratio and the average time to perform an I/O (input/output) operation, and (3) an improvement in overall execution time has been observed in most cases. In spite of these gains, prefetching sometimes results in increased execution times (a negative result, given the optimistic nature of the study). The authors explore why it is not trivial to translate savings on individual I/O requests into consistently better overall performance and identify the key problems that need to be addressed in order to improve the potential of prefetching techniques in the environment.

  8. FTMP (Fault Tolerant Multiprocessor) programmer's manual

    NASA Technical Reports Server (NTRS)

    Feather, F. E.; Liceaga, C. A.; Padilla, P. A.

    1986-01-01

    The Fault Tolerant Multiprocessor (FTMP) computer system was constructed using the Rockwell/Collins CAPS-6 processor. It is installed in the Avionics Integration Research Laboratory (AIRLAB) of NASA Langley Research Center. It is hosted by AIRLAB's System 10, a VAX 11/750, for the loading of programs and experimentation. The FTMP support software includes a cross compiler for a high level language called Automated Engineering Design (AED) System, an assembler for the CAPS-6 processor assembly language, and a linker. Access to this support software is through an automated remote access facility on the VAX which relieves the user of the burden of learning how to use the IBM 4381. This manual is a compilation of information about the FTMP support environment. It explains the FTMP software and support environment along many of the finer points of running programs on FTMP. This will be helpful to the researcher trying to run an experiment on FTMP and even to the person probing FTMP with fault injections. Much of the information in this manual can be found in other sources; we are only attempting to bring together the basic points in a single source. If the reader should need points clarified, there is a list of support documentation in the back of this manual.

  9. Memory.

    ERIC Educational Resources Information Center

    McKean, Kevin

    1983-01-01

    Discusses current research (including that involving amnesiacs and snails) into the nature of the memory process, differentiating between and providing examples of "fact" memory and "skill" memory. Suggests that three brain parts (thalamus, fornix, mammilary body) are involved in the memory process. (JN)

  10. Memory.

    ERIC Educational Resources Information Center

    McKean, Kevin

    1983-01-01

    Discusses current research (including that involving amnesiacs and snails) into the nature of the memory process, differentiating between and providing examples of "fact" memory and "skill" memory. Suggests that three brain parts (thalamus, fornix, mammilary body) are involved in the memory process. (JN)

  11. Multidisciplinary Simulation Acceleration using Multiple Shared-Memory Graphical Processing Units

    NASA Astrophysics Data System (ADS)

    Kemal, Jonathan Yashar

    For purposes of optimizing and analyzing turbomachinery and other designs, the unsteady Favre-averaged flow-field differential equations for an ideal compressible gas can be solved in conjunction with the heat conduction equation. We solve all equations using the finite-volume multiple-grid numerical technique, with the dual time-step scheme used for unsteady simulations. Our numerical solver code targets CUDA-capable Graphical Processing Units (GPUs) produced by NVIDIA. Making use of MPI, our solver can run across networked compute notes, where each MPI process can use either a GPU or a Central Processing Unit (CPU) core for primary solver calculations. We use NVIDIA Tesla C2050/C2070 GPUs based on the Fermi architecture, and compare our resulting performance against Intel Zeon X5690 CPUs. Solver routines converted to CUDA typically run about 10 times faster on a GPU for sufficiently dense computational grids. We used a conjugate cylinder computational grid and ran a turbulent steady flow simulation using 4 increasingly dense computational grids. Our densest computational grid is divided into 13 blocks each containing 1033x1033 grid points, for a total of 13.87 million grid points or 1.07 million grid points per domain block. To obtain overall speedups, we compare the execution time of the solver's iteration loop, including all resource intensive GPU-related memory copies. Comparing the performance of 8 GPUs to that of 8 CPUs, we obtain an overall speedup of about 6.0 when using our densest computational grid. This amounts to an 8-GPU simulation running about 39.5 times faster than running than a single-CPU simulation.

  12. Evaluation of the Cedar memory system: Configuration of 16 by 16

    NASA Technical Reports Server (NTRS)

    Gallivan, K.; Jalby, W.; Wijshoff, H.

    1991-01-01

    Some basic results on the performance of the Cedar multiprocessor system are presented. Empirical results on the 16 processor 16 memory bank system configuration, which show the behavior of the Cedar system under different modes of operation are presented.

  13. Real-Time Multiprocessor Programming Language (RTMPL) user's manual

    NASA Technical Reports Server (NTRS)

    Arpasi, D. J.

    1985-01-01

    A real-time multiprocessor programming language (RTMPL) has been developed to provide for high-order programming of real-time simulations on systems of distributed computers. RTMPL is a structured, engineering-oriented language. The RTMPL utility supports a variety of multiprocessor configurations and types by generating assembly language programs according to user-specified targeting information. Many programming functions are assumed by the utility (e.g., data transfer and scaling) to reduce the programming chore. This manual describes RTMPL from a user's viewpoint. Source generation, applications, utility operation, and utility output are detailed. An example simulation is generated to illustrate many RTMPL features.

  14. T cell memory to evolutionarily conserved and shared hemagglutinin epitopes of H1N1 viruses: a pilot scale study

    PubMed Central

    2013-01-01

    Background The 2009 pandemic influenza was milder than expected. Based on the apparent lack of pre-existing cross-protective antibodies to the A (H1N1)pdm09 strain, it was hypothesized that pre-existing CD4+ T cellular immunity provided the crucial immunity that led to an attenuation of disease severity. We carried out a pilot scale study by conducting in silico and in vitro T cellular assays in healthy population, to evaluate the pre-existing immunity to A (H1N1)pdm09 strain. Methods Large-scale epitope prediction analysis was done by examining the NCBI available (H1N1) HA proteins. NetMHCIIpan, an eptiope prediction tool was used to identify the putative and shared CD4+ T cell epitopes between seasonal H1N1 and A (H1N1)pdm09 strains. To identify the immunogenicity of these putative epitopes, human IFN-γ-ELISPOT assays were conducted using the peripheral blood mononuclear cells from fourteen healthy human donors. All donors were screened for the HLA-DRB1 alleles. Results Epitope-specific CD4+ T cellular memory responses (IFN-γ) were generated to highly conserved HA epitopes from majority of the donors (93%). Higher magnitude of the CD4+ T cell responses was observed in the older adults. The study identified two HA2 immunodominant CD4+ T cell epitopes, of which one was found to be novel. Conclusions The current study provides a compelling evidence of HA epitope specific CD4+ T cellular memory towards A (H1N1)pdm09 strain. These well-characterized epitopes could recruit alternative immunological pathways to overcome the challenge of annual seasonal flu vaccine escape. PMID:23641949

  15. Job-mix modeling and system analysis of an aerospace multiprocessor.

    NASA Technical Reports Server (NTRS)

    Mallach, E. G.

    1972-01-01

    An aerospace guidance computer organization, consisting of multiple processors and memory units attached to a central time-multiplexed data bus, is described. A job mix for this type of computer is obtained by analysis of Apollo mission programs. Multiprocessor performance is then analyzed using: 1) queuing theory, under certain 'limiting case' assumptions; 2) Markov process methods; and 3) system simulation. Results of the analyses indicate: 1) Markov process analysis is a useful and efficient predictor of simulation results; 2) efficient job execution is not seriously impaired even when the system is so overloaded that new jobs are inordinately delayed in starting; 3) job scheduling is significant in determining system performance; and 4) a system having many slow processors may or may not perform better than a system of equal power having few fast processors, but will not perform significantly worse.

  16. Job-mix modeling and system analysis of an aerospace multiprocessor.

    NASA Technical Reports Server (NTRS)

    Mallach, E. G.

    1972-01-01

    An aerospace guidance computer organization, consisting of multiple processors and memory units attached to a central time-multiplexed data bus, is described. A job mix for this type of computer is obtained by analysis of Apollo mission programs. Multiprocessor performance is then analyzed using: 1) queuing theory, under certain 'limiting case' assumptions; 2) Markov process methods; and 3) system simulation. Results of the analyses indicate: 1) Markov process analysis is a useful and efficient predictor of simulation results; 2) efficient job execution is not seriously impaired even when the system is so overloaded that new jobs are inordinately delayed in starting; 3) job scheduling is significant in determining system performance; and 4) a system having many slow processors may or may not perform better than a system of equal power having few fast processors, but will not perform significantly worse.

  17. Development and evaluation of a Fault-Tolerant Multiprocessor (FTMP) computer. Volume 2: FTMP software

    NASA Technical Reports Server (NTRS)

    Lala, J. H.; Smith, T. B., III

    1983-01-01

    The software developed for the Fault-Tolerant Multiprocessor (FTMP) is described. The FTMP executive is a timer-interrupt driven dispatcher that schedules iterative tasks which run at 3.125, 12.5, and 25 Hz. Major tasks which run under the executive include system configuration control, flight control, and display. The flight control task includes autopilot and autoland functions for a jet transport aircraft. System Displays include status displays of all hardware elements (processors, memories, I/O ports, buses), failure log displays showing transient and hard faults, and an autopilot display. All software is in a higher order language (AED, an ALGOL derivative). The executive is a fully distributed general purpose executive which automatically balances the load among available processor triads. Provisions for graceful performance degradation under processing overload are an integral part of the scheduling algorithms.

  18. Selection in spatial working memory is independent of perceptual selective attention, but they interact in a shared spatial priority map.

    PubMed

    Hedge, Craig; Oberauer, Klaus; Leonards, Ute

    2015-11-01

    We examined the relationship between the attentional selection of perceptual information and of information in working memory (WM) through four experiments, using a spatial WM-updating task. Participants remembered the locations of two objects in a matrix and worked through a sequence of updating operations, each mentally shifting one dot to a new location according to an arrow cue. Repeatedly updating the same object in two successive steps is typically faster than switching to the other object; this object switch cost reflects the shifting of attention in WM. In Experiment 1, the arrows were presented in random peripheral locations, drawing perceptual attention away from the selected object in WM. This manipulation did not eliminate the object switch cost, indicating that the mechanisms of perceptual selection do not underlie selection in WM. Experiments 2a and 2b corroborated the independence of selection observed in Experiment 1, but showed a benefit to reaction times when the placement of the arrow cue was aligned with the locations of relevant objects in WM. Experiment 2c showed that the same benefit also occurs when participants are not able to mark an updating location through eye fixations. Together, these data can be accounted for by a framework in which perceptual selection and selection in WM are separate mechanisms that interact through a shared spatial priority map.

  19. Performance characterization and validation of ASCI applications: A memory centric view

    SciTech Connect

    Lubeck, O.M.; Luo, Y.; Wasserman, H.; Bassetti, F.

    1997-10-01

    Performance and scalability of high performance scientific applications on large scale parallel machines are more dependent on the hierarchical memory subsystems of these machines than the peak instruction rate of the processors employed. The dependence is likely to increase in the future. While single-processor performance may double every eighteen months, memory bandwidth increases by only 15% during the same period. In addition, distributed shared memory (DSM) architectures are now being implemented which extend the concept of single-processor cache hierarchies across an entire physically-distributed multi-processor machine. Machines which will be available to the Department of Energy`s Accelerated Strategic Computing Initiative (ASCI) can have as many as 128 processors in a single DSM. Scalability of these machines to large numbers of processors is ultimately tied to issues of memory hierarchy performance, which includes data migration policies and distributed cache coherence protocols. Investigations of the performance improvements of applications over time and across new generations of machines must explicitly account for the effects of memory performance. In this paper, the authors characterize application performance with a memory-centric view. The applications are a representative part of the ASCI workload. Using a simple Mean Value Analysis (MVA) strategy and observed performance data, they infer the contribution of each level in the memory system to the application`s overall performance in cycles per instruction (CPI). Their empirical model accounts for the overlap of processor execution with memory accesses.

  20. Hardware configuration for a real-time multiprocessor simulator

    NASA Technical Reports Server (NTRS)

    Blech, R. A.; Williams, A. D.

    1986-01-01

    The Real-Time Multiprocessor Simulator (RTMPS) is a multiple microcomputer system used to investigate the application of parallel-processing concepts to real-time simulation. This users manual describes the set-up and installation considerations for the RTMPS hardware. Any modifications or further improvements to the RTMPS hardware will be documented in an addendum to this manual.

  1. Software for event oriented processing on multiprocessor systems

    SciTech Connect

    Fischler, M.; Areti, H.; Biel, J.; Bracker, S.; Case, G.; Gaines, I.; Husby, D.; Nash, T.

    1984-08-01

    Computing intensive problems that require the processing of numerous essentially independent events are natural customers for large scale multi-microprocessor systems. This paper describes the software required to support users with such problems in a multiprocessor environment. It is based on experience with and development work aimed at processing very large amounts of high energy physics data.

  2. Fault tree models for fault tolerant hypercube multiprocessors

    NASA Technical Reports Server (NTRS)

    Boyd, Mark A.; Tuazon, Jezus O.

    1991-01-01

    Three candidate fault tolerant hypercube architectures are modeled, their reliability analyses are compared, and the resulting implications of these methods of incorporating fault tolerance into hypercube multiprocessors are discussed. In the course of performing the reliability analyses, the use of HARP and fault trees in modeling sequence dependent system behaviors is demonstrated.

  3. Fault tree models for fault tolerant hypercube multiprocessors

    NASA Technical Reports Server (NTRS)

    Boyd, Mark A.; Tuazon, Jezus O.

    1991-01-01

    Three candidate fault tolerant hypercube architectures are modeled, their reliability analyses are compared, and the resulting implications of these methods of incorporating fault tolerance into hypercube multiprocessors are discussed. In the course of performing the reliability analyses, the use of HARP and fault trees in modeling sequence dependent system behaviors is demonstrated.

  4. Techniques and tools for efficiently modeling multiprocessor systems

    NASA Technical Reports Server (NTRS)

    Carpenter, T.; Yalamanchili, S.

    1990-01-01

    System-level tools and methodologies associated with an integrated approach to the development of multiprocessor systems are examined. Tools for capturing initial program structure, automated program partitioning, automated resource allocation, and high-level modeling of the combined application and resource are discussed. The primary language focus of the current implementation is Ada, although the techniques should be appropriate for other programming paradigms.

  5. Characterizing parallel file-access patterns on a large-scale multiprocessor

    NASA Technical Reports Server (NTRS)

    Purakayastha, Apratim; Ellis, Carla Schlatter; Kotz, David; Nieuwejaar, Nils; Best, Michael

    1994-01-01

    Rapid increases in the computational speeds of multiprocessors have not been matched by corresponding performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwidth and high-volume data transfer between the I/O subsystem and thousands of processors. Design of such high-performance parallel file systems depends on a thorough grasp of the expected workload. So far there have been no comprehensive usage studies of multiprocessor file systems. Our CHARISMA project intends to fill this void. The first results from our study involve an iPSC/860 at NASA Ames. This paper presents results from a different platform, the CM-5 at the National Center for Supercomputing Applications. The CHARISMA studies are unique because we collect information about every individual read and write request and about the entire mix of applications running on the machines. The results of our trace analysis lead to recommendations for parallel file system design. First the file system should support efficient concurrent access to many files, and I/O requests from many jobs under varying load conditions. Second, it must efficiently manage large files kept open for long periods. Third, it should expect to see small requests predominantly sequential access patterns, application-wide synchronous access, no concurrent file-sharing between jobs appreciable byte and block sharing between processes within jobs, and strong interprocess locality. Finally, the trace data suggest that node-level write caches and collective I/O request interfaces may be useful in certain environments.

  6. Transactional Distributed Shared Memory

    DTIC Science & Technology

    1992-07-01

    Defense Advanced Research Projects Agency, Information Science and Technology Office, under the tide Research on Paralel Computing issued by DARPA/CMO...except Bisiani et al, rely on a consistency model known as sequential consistency, after Lamport’s definition [Lamport 791: [A system is sequentially

  7. Queueing analysis of a canonical model of real-time multiprocessors

    NASA Technical Reports Server (NTRS)

    Krishna, C. M.; Shin, K. G.

    1983-01-01

    A logical classification of multiprocessor structures from the point of view of control applications is presented. A computation of the response time distribution for a canonical model of a real time multiprocessor is presented. The multiprocessor is approximated by a blocking model. Two separate models are derived: one created from the system's point of view, and the other from the point of view of an incoming task.

  8. FTMP - A highly reliable Fault-Tolerant Multiprocessor for aircraft

    NASA Technical Reports Server (NTRS)

    Hopkins, A. L., Jr.; Smith, T. B., III; Lala, J. H.

    1978-01-01

    The FTMP (Fault-Tolerant Multiprocessor) is a complex multiprocessor computer that employs a form of redundancy related to systems considered by Mathur (1971), in which each major module can substitute for any other module of the same type. Despite the conceptual simplicity of the redundancy form, the implementation has many intricacies owing partly to the low target failure rate, and partly to the difficulty of eliminating single-fault vulnerability. An extensive analysis of the computer through the use of such modeling techniques as Markov processes and combinatorial mathematics shows that for random hard faults the computer can meet its requirements. It is also shown that the maintenance scheduled at intervals of 200 hr or more can be adequate most of the time.

  9. Analysis of a Multiprocessor Guidance Computer. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Maltach, E. G.

    1969-01-01

    The design of the next generation of spaceborne digital computers is described. It analyzes a possible multiprocessor computer configuration. For the analysis, a set of representative space computing tasks was abstracted from the Lunar Module Guidance Computer programs as executed during the lunar landing, from the Apollo program. This computer performs at this time about 24 concurrent functions, with iteration rates from 10 times per second to once every two seconds. These jobs were tabulated in a machine-independent form, and statistics of the overall job set were obtained. It was concluded, based on a comparison of simulation and Markov results, that the Markov process analysis is accurate in predicting overall trends and in configuration comparisons, but does not provide useful detailed information in specific situations. Using both types of analysis, it was determined that the job scheduling function is a critical one for efficiency of the multiprocessor. It is recommended that research into the area of automatic job scheduling be performed.

  10. Communication Complexity of the Gaussian Elimination Algorithm on Multiprocessors.

    DTIC Science & Technology

    1985-01-01

    8217 "ring, Technical Report 349. Computer Science Dpt, Yale University. 1984. [6] C. Kamath and A. Sameh , The preconditioned conjugate gradient algorithm...on a multiprocessor. Technical Report ANL/MCS-TM-28, Argonne National Lab., 1984. [7] D. Lawrie, A.H. Sameh . The Computation and Communication...Solution of Sparse Linear Systems : Models and Architectures, Technical Report 84-35, ICASE. 1984. [10] A.H. Sameh . Numerical Parallel Algorithms - A

  11. Plasma physics modeling and the Cray-2 multiprocessor

    SciTech Connect

    Killeen, J.

    1985-01-01

    The importance of computer modeling in the magnetic fusion energy research program is discussed. The need for the most advanced supercomputers is described. To meet the demand for more powerful scientific computers to solve larger and more complicated problems, the computer industry is developing multiprocessors. The role of the Cray-2 in plasma physics modeling is discussed with some examples. 28 refs., 2 figs., 1 tab.

  12. Self-Tuned Congestion Control for Multiprocessor Networks

    DTIC Science & Technology

    2005-01-01

    multiprocessor networks, including virtual cut-through [15] networks and wormhole networks [6, 5]. However, in this paper we evaluate the technique in the...context of wormhole switched, k-ary,n-cube networks. Simulation results for a 16-ary,2-cube (256 node net- work) show that our congestion control...through the net- work. Each packet is composed of flits (flow control units) that are transferred between network nodes.1 Both wormhole routing and cut

  13. Modelling parallel programs and multiprocessor architectures with AXE

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Fineman, Charles E.

    1991-01-01

    AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.

  14. Modeling and measurement of fault-tolerant multiprocessors

    NASA Technical Reports Server (NTRS)

    Shin, K. G.; Woodbury, M. H.; Lee, Y. H.

    1985-01-01

    The workload effects on computer performance are addressed first for a highly reliable unibus multiprocessor used in real-time control. As an approach to studing these effects, a modified Stochastic Petri Net (SPN) is used to describe the synchronous operation of the multiprocessor system. From this model the vital components affecting performance can be determined. However, because of the complexity in solving the modified SPN, a simpler model, i.e., a closed priority queuing network, is constructed that represents the same critical aspects. The use of this model for a specific application requires the partitioning of the workload into job classes. It is shown that the steady state solution of the queuing model directly produces useful results. The use of this model in evaluating an existing system, the Fault Tolerant Multiprocessor (FTMP) at the NASA AIRLAB, is outlined with some experimental results. Also addressed is the technique of measuring fault latency, an important microscopic system parameter. Most related works have assumed no or a negligible fault latency and then performed approximate analyses. To eliminate this deficiency, a new methodology for indirectly measuring fault latency is presented.

  15. SYMNET: an optical interconnection network for scalable high-performance symmetric multiprocessors.

    PubMed

    Louri, Ahmed; Kodi, Avinash Karanth

    2003-06-10

    We address the primary limitation of the bandwidth to satisfy the demands for address transactions in future cache-coherent symmetric multiprocessors (SMPs). It is widely known that the bus speed and the coherence overhead limit the snoop/address bandwidth needed to broadcast address transactions to all processors. As a solution, we propose a scalable address subnetwork called symmetric multiprocessor network (SYMNET) in which address requests and snoop responses of SMPs are implemented optically. SYMNET not only has the ability to pipeline address requests, but also multiple address requests from different processors can propagate through the address subnetwork simultaneously. This is in contrast with all electrical bus-based SMPs, where only a single request is broadcast on the physical address bus at any given point in time. The simultaneous propagation of multiple address requests in SYMNET increases the available address bandwidth and lowers the latency of the network, but the preservation of cache coherence can no longer be maintained with the usual fast snooping protocols. A modified snooping cache-coherence protocol, coherence in SYMNET (COSYM) is introduced to solve the coherence problem. We evaluated SYMNET with a subset of Splash-2 benchmarks and compared it with the electrical bus-based MOESI (modified, owned, exclusive, shared, invalid) protocol. Our simulation studies have shown a 5-66% improvement in execution time for COSYM as compared with MOESI for various applications. Simulations have also shown that the average latency for a transaction to complete by use of COSYM protocol was 5-78% better than the MOESI protocol. SYMNET can scale up to hundreds of processors while still using fast snooping-based cache-coherence protocols, and additional performance gains may be attained with further improvement in optical device technology.

  16. Memory System Technologies for Future High-End Computing Systems

    SciTech Connect

    McKee, S A; de Supinski, B R; Mueller, F; Tyson, G S

    2003-05-16

    Our ability to solve Grand Challenge Problems in computing hinges on the development of reliable and efficient High-End Computing systems. Unfortunately, the increasing gap between memory and processor speeds remains one of the major bottlenecks in modern architectures. Uniprocessor nodes still suffer, but symmetric multiprocessor nodes--where access to physical memory is shared among all processors--are among the hardest hit. In the latter case, the memory system must juggle multiple working sets and maintain memory coherence, on top of simply responding to access requests. To illustrate the severity of the current situation, consider two important examples: even the high-performance parallel supercomputers in use at Department of Energy National labs observe single-processor utilization rates as low as 5%, and transaction processing commercial workloads see utilizations of at most about 33%. A wealth of research demonstrates that traditional memory systems are incapable of bridging the processor/memory performance gap, and the problem continues to grow. The success of future High-End Computing platforms therefore depends on our developing hardware and software technologies to dramatically relieve the memory bottleneck. In order to take better advantage of the tremendous computing power of modern microprocessors and future High-End systems, we consider it crucial to develop the hardware for intelligent, adaptable memory systems; the middleware and OS modifications to manage them; and the compiler technology and performance tools to exploit them. Taken together, these will provide the foundations for meeting the requirements of future generations of performance-critical, parallel systems based on either uniprocessor or SMP nodes (including PIM organizations). We feel that such solutions should not be vendor-specific, but should be sufficiently general and adaptable such that the technologies could be leveraged by any commercial vendor of High-End Computing systems

  17. Childhood trauma and negative memory bias as shared risk factors for psychopathology and comorbidity in a naturalistic psychiatric patient sample.

    PubMed

    Vrijsen, Janna N; van Amen, Camiel T; Koekkoek, Bauke; van Oostrom, Iris; Schene, Aart H; Tendolkar, Indira

    2017-06-01

    Both childhood trauma and negative memory bias are associated with the onset and severity level of several psychiatric disorders, such as depression and anxiety disorders. Studies on these risk factors, however, generally use homogeneous noncomorbid samples. Hence, studies in naturalistic psychiatric samples are lacking. Moreover, we know little about the quantitative relationship between the frequency of traumatic childhood events, strength of memory bias and number of comorbid psychiatric disorders; the latter being an index of severity. The current study examined the association of childhood trauma and negative memory bias with psychopathology in a large naturalistic psychiatric patient sample. Frequency of traumatic childhood events (emotional neglect, psychological-, physical- and sexual abuse) was assessed using a questionnaire in a sample of 252 adult psychiatric patients with no psychotic or bipolar-I disorder and no cognitive disorder as main diagnosis. Patients were diagnosed for DSM-IV Axis-I and Axis-II disorders using a structured clinical interview. This allowed for the assessment of comorbidity between disorders. Negative memory bias for verbal stimuli was measured using a computer task. Linear regression models revealed that the frequency of childhood trauma as well as negative memory bias was positively associated with psychiatric comorbidity, separately and above and beyond each other (all p < .01). The results indicate that childhood trauma and negative memory bias may be of importance for a broader spectrum of psychiatric diagnoses, besides the frequently studied affective disorders. Importantly, frequently experiencing traumatic events during childhood increases the risk of comorbid psychiatric disorders.

  18. Sharing Songs with Children.

    ERIC Educational Resources Information Center

    Wolf, Jan

    2000-01-01

    Notes that songs can bridge generations, draw students and teachers together, and remain forever in a child's memory. Suggests the following approaches for early childhood educators: sing with children, share favorite songs with students, allow students to sing and share their favorite songs, and do not focus on students' musical memory. (LBT)

  19. Development and evaluation of a Fault-Tolerant Multiprocessor (FTMP) computer. Volume 3: FTMP test and evaluation

    NASA Technical Reports Server (NTRS)

    Lala, J. H.; Smith, T. B., III

    1983-01-01

    The experimental test and evaluation of the Fault-Tolerant Multiprocessor (FTMP) is described. Major objectives of this exercise include expanding validation envelope, building confidence in the system, revealing any weaknesses in the architectural concepts and in their execution in hardware and software, and in general, stressing the hardware and software. To this end, pin-level faults were injected into one LRU of the FTMP and the FTMP response was measured in terms of fault detection, isolation, and recovery times. A total of 21,055 stuck-at-0, stuck-at-1 and invert-signal faults were injected in the CPU, memory, bus interface circuits, Bus Guardian Units, and voters and error latches. Of these, 17,418 were detected. At least 80 percent of undetected faults are estimated to be on unused pins. The multiprocessor identified all detected faults correctly and recovered successfully in each case. Total recovery time for all faults averaged a little over one second. This can be reduced to half a second by including appropriate self-tests.

  20. Memories.

    ERIC Educational Resources Information Center

    Brand, Judith, Ed.

    1998-01-01

    This theme issue of the journal "Exploring" covers the topic of "memories" and describes an exhibition at San Francisco's Exploratorium that ran from May 22, 1998 through January 1999 and that contained over 40 hands-on exhibits, demonstrations, artworks, images, sounds, smells, and tastes that demonstrated and depicted the biological,…

  1. Memories.

    ERIC Educational Resources Information Center

    Brand, Judith, Ed.

    1998-01-01

    This theme issue of the journal "Exploring" covers the topic of "memories" and describes an exhibition at San Francisco's Exploratorium that ran from May 22, 1998 through January 1999 and that contained over 40 hands-on exhibits, demonstrations, artworks, images, sounds, smells, and tastes that demonstrated and depicted the biological,…

  2. A reconfigurable optoelectronic interconnect technology for multi-processor networks

    SciTech Connect

    Lu, Y.C.; Cheng, J.; Zolper, J.C.; Klem, J.

    1995-05-01

    This paper describes a new optical interconnect architecture and the integrated optoelectronic circuit technology for implementing a parallel, reconfigurable, multiprocessor network. The technology consists of monolithic array`s of optoelectronic switches that integrate vertical-cavity surface-emitting lasers with three-terminal heterojunction phototransistors, which effectively combined the functions of an optical transceiver and an optical spatial routing switch. These switches have demonstrated optical switching at 200 Mb/s, and electrical-to-optical data conversion at > 500 Mb/s, with a small-signal electrical-to-optical modulation bandwidth of {approximately} 4 GHz.

  3. Shared Attention.

    PubMed

    Shteynberg, Garriy

    2015-09-01

    Shared attention is extremely common. In stadiums, public squares, and private living rooms, people attend to the world with others. Humans do so across all sensory modalities-sharing the sights, sounds, tastes, smells, and textures of everyday life with one another. The potential for attending with others has grown considerably with the emergence of mass media technologies, which allow for the sharing of attention in the absence of physical co-presence. In the last several years, studies have begun to outline the conditions under which attending together is consequential for human memory, motivation, judgment, emotion, and behavior. Here, I advance a psychological theory of shared attention, defining its properties as a mental state and outlining its cognitive, affective, and behavioral consequences. I review empirical findings that are uniquely predicted by shared-attention theory and discuss the possibility of integrating shared-attention, social-facilitation, and social-loafing perspectives. Finally, I reflect on what shared-attention theory implies for living in the digital world.

  4. CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms.

    PubMed

    Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W

    2012-06-01

    As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  5. CUDA Optimization Strategies for Compute- and Memory-Bound Neuroimaging Algorithms

    PubMed Central

    Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W.

    2011-01-01

    As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. PMID:21159404

  6. Transactional memories: A new abstraction for parallel processing

    SciTech Connect

    Fasel, J.H.; Lubeck, O.M.; Agrawal, D.; Bruno, J.L.; El Abbadi, A.

    1997-12-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at Los Alamos National Laboratory (LANL). Current distributed memory multiprocessor computer systems make the development of parallel programs difficult. From a programmer`s perspective, it would be most desirable if the underlying hardware and software could provide the programming abstraction commonly referred to as sequential consistency--a single address space and multiple threads; but enforcement of sequential consistency limits opportunities for architectural and operating system performance optimizations, leading to poor performance. Recently, Herlihy and Moss have introduced a new abstraction called transactional memories for parallel programming. The programming model is shared memory with multiple threads. However, data consistency is obtained through the use of transactions rather than mutual exclusion based on locking. The transaction approach permits the underlying system to exploit the potential parallelism in transaction processing. The authors explore the feasibility of designing parallel programs using the transaction paradigm for data consistency and a barrier type of thread synchronization.

  7. Scalable Multiprocessor for High-Speed Computing in Space

    NASA Technical Reports Server (NTRS)

    Lux, James; Lang, Minh; Nishimoto, Kouji; Clark, Douglas; Stosic, Dorothy; Bachmann, Alex; Wilkinson, William; Steffke, Richard

    2004-01-01

    A report discusses the continuing development of a scalable multiprocessor computing system for hard real-time applications aboard a spacecraft. "Hard realtime applications" signifies applications, like real-time radar signal processing, in which the data to be processed are generated at "hundreds" of pulses per second, each pulse "requiring" millions of arithmetic operations. In these applications, the digital processors must be tightly integrated with analog instrumentation (e.g., radar equipment), and data input/output must be synchronized with analog instrumentation, controlled to within fractions of a microsecond. The scalable multiprocessor is a cluster of identical commercial-off-the-shelf generic DSP (digital-signal-processing) computers plus generic interface circuits, including analog-to-digital converters, all controlled by software. The processors are computers interconnected by high-speed serial links. Performance can be increased by adding hardware modules and correspondingly modifying the software. Work is distributed among the processors in a parallel or pipeline fashion by means of a flexible master/slave control and timing scheme. Each processor operates under its own local clock; synchronization is achieved by broadcasting master time signals to all the processors, which compute offsets between the master clock and their local clocks.

  8. Experience with a Genetic Algorithm Implemented on a Multiprocessor Computer

    NASA Technical Reports Server (NTRS)

    Plassman, Gerald E.; Sobieszczanski-Sobieski, Jaroslaw

    2000-01-01

    Numerical experiments were conducted to find out the extent to which a Genetic Algorithm (GA) may benefit from a multiprocessor implementation, considering, on one hand, that analyses of individual designs in a population are independent of each other so that they may be executed concurrently on separate processors, and, on the other hand, that there are some operations in a GA that cannot be so distributed. The algorithm experimented with was based on a gaussian distribution rather than bit exchange in the GA reproductive mechanism, and the test case was a hub frame structure of up to 1080 design variables. The experimentation engaging up to 128 processors confirmed expectations of radical elapsed time reductions comparing to a conventional single processor implementation. It also demonstrated that the time spent in the non-distributable parts of the algorithm and the attendant cross-processor communication may have a very detrimental effect on the efficient utilization of the multiprocessor machine and on the number of processors that can be used effectively in a concurrent manner. Three techniques were devised and tested to mitigate that effect, resulting in efficiency increasing to exceed 99 percent.

  9. Instrumentation, performance visualization, and debugging tools for multiprocessors

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Fineman, Charles E.; Hontalas, Philip J.

    1991-01-01

    The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessor architectures. However, without effective means to monitor (and visualize) program execution, debugging, and tuning parallel programs becomes intractably difficult as program complexity increases with the number of processors. Research on performance evaluation tools for multiprocessors is being carried out at ARC. Besides investigating new techniques for instrumenting, monitoring, and presenting the state of parallel program execution in a coherent and user-friendly manner, prototypes of software tools are being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Our current tool set, the Ames Instrumentation Systems (AIMS), incorporates features from various software systems developed in academia and industry. The execution of FORTRAN programs on the Intel iPSC/860 can be automatically instrumented and monitored. Performance data collected in this manner can be displayed graphically on workstations supporting X-Windows. We have successfully compared various parallel algorithms for computational fluid dynamics (CFD) applications in collaboration with scientists from the Numerical Aerodynamic Simulation Systems Division. By performing these comparisons, we show that performance monitors and debuggers such as AIMS are practical and can illuminate the complex dynamics that occur within parallel programs.

  10. Experience with a Genetic Algorithm Implemented on a Multiprocessor Computer

    NASA Technical Reports Server (NTRS)

    Plassman, Gerald E.; Sobieszczanski-Sobieski, Jaroslaw

    2000-01-01

    Numerical experiments were conducted to find out the extent to which a Genetic Algorithm (GA) may benefit from a multiprocessor implementation, considering, on one hand, that analyses of individual designs in a population are independent of each other so that they may be executed concurrently on separate processors, and, on the other hand, that there are some operations in a GA that cannot be so distributed. The algorithm experimented with was based on a gaussian distribution rather than bit exchange in the GA reproductive mechanism, and the test case was a hub frame structure of up to 1080 design variables. The experimentation engaging up to 128 processors confirmed expectations of radical elapsed time reductions comparing to a conventional single processor implementation. It also demonstrated that the time spent in the non-distributable parts of the algorithm and the attendant cross-processor communication may have a very detrimental effect on the efficient utilization of the multiprocessor machine and on the number of processors that can be used effectively in a concurrent manner. Three techniques were devised and tested to mitigate that effect, resulting in efficiency increasing to exceed 99 percent.

  11. Multi-processor developments in the United States for future high energy physics experiments and accelerators

    SciTech Connect

    Gaines, I.

    1988-03-01

    The use of multi-processors for analysis and high-level triggering in High Energy Physics experiments, pioneered by the early emulator systems, has reached maturity, in particular with the multiple microprocessor systems in use at Fermilab. It is widely acknowledged that such systems will fulfill the major portion of the computing needs of future large experiments. Recent developments at Fermilab's Advanced Computer Program will make such systems even more powerful, cost-effective, and easier to use than they are at present. The next generation of microprocessors, already available, will provide CPU power of about one VAX 780 equivalent/$300, while supporting most VMS FORTRAN extensions and large (>8MB) amounts of memory. Low cost high density mass storage devices (based on video tape cartridge technology) will allow parallel I/O to remove potential I/O bottlenecks in systems of over 1000 VAX equipment processors. New interconnection schemes and system software will allow more flexible topologies and extremely high data bandwidth, especially for on-line systems. This talk will summarize the work at the Advanced Computer Program and the rest of the US in this field. 3 refs., 4 figs.

  12. A Fault-Tolerant Multiprocessor for Real-Time Control Applications

    NASA Astrophysics Data System (ADS)

    Roberts, Thomas E.; Johnson, Barry W.

    1987-10-01

    This paper presents the design, analysis, and experimental evaluation of a fault-tolerant multiprocessor for use in systems requiring real-time, microprocessor-based control. Example applications of the fault-tolerant system are found in robotics, process control, manufacturing, and factory automation. The architecture for the multiprocessor is presented and analyzed for reliability, availability, and safety. A prototype of the fault-tolerant multiprocessor has been constructed, using Intel 8088 processors, and experimentally evaluated in the laboratory. Both hardware and software descriptions of the system are provided, and an example application to the control of an electric wheelchair is presented.

  13. Cache directory look-up re-use as conflict check mechanism for speculative memory requests

    DOEpatents

    Ohmacht, Martin

    2013-09-10

    In a cache memory, energy and other efficiencies can be realized by saving a result of a cache directory lookup for sequential accesses to a same memory address. Where the cache is a point of coherence for speculative execution in a multiprocessor system, with directory lookups serving as the point of conflict detection, such saving becomes particularly advantageous.

  14. Combined shared and distributed memory ab-initio computations of molecular-hydrogen systems in the correlated state: Process pool solution and two-level parallelism

    NASA Astrophysics Data System (ADS)

    Biborski, Andrzej; Kądzielawa, Andrzej P.; Spałek, Józef

    2015-12-01

    An efficient computational scheme devised for investigations of ground state properties of the electronically correlated systems is presented. As an example, (H2)n chain is considered with the long-range electron-electron interactions taken into account. The implemented procedure covers: (i) single-particle Wannier wave-function basis construction in the correlated state, (ii) microscopic parameters calculation, and (iii) ground state energy optimization. The optimization loop is based on highly effective process-pool solution - specific root-workers approach. The hierarchical, two-level parallelism was applied: both shared (by use of Open Multi-Processing) and distributed (by use of Message Passing Interface) memory models were utilized. We discuss in detail the feature that such approach results in a substantial increase of the calculation speed reaching factor of 300 for the fully parallelized solution. The scheme elaborated in detail reflects the situation in which the most demanding task is the single-particle basis optimization.

  15. Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model.

    PubMed

    Jaber, Khalid Mohammad; Abdullah, Rosni; Rashid, Nur'Aini Abdul

    2014-01-01

    In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time.

  16. Operating system for a real-time multiprocessor propulsion system simulator. User's manual

    NASA Technical Reports Server (NTRS)

    Cole, G. L.

    1985-01-01

    The NASA Lewis Research Center is developing and evaluating experimental hardware and software systems to help meet future needs for real-time, high-fidelity simulations of air-breathing propulsion systems. Specifically, the real-time multiprocessor simulator project focuses on the use of multiple microprocessors to achieve the required computing speed and accuracy at relatively low cost. Operating systems for such hardware configurations are generally not available. A real time multiprocessor operating system (RTMPOS) that supports a variety of multiprocessor configurations was developed at Lewis. With some modification, RTMPOS can also support various microprocessors. RTMPOS, by means of menus and prompts, provides the user with a versatile, user-friendly environment for interactively loading, running, and obtaining results from a multiprocessor-based simulator. The menu functions are described and an example simulation session is included to demonstrate the steps required to go from the simulation loading phase to the execution phase.

  17. Parallel algorithm of VLBI software correlator under multiprocessor environment

    NASA Astrophysics Data System (ADS)

    Zheng, Weimin; Zhang, Dong

    2007-11-01

    The correlator is the key signal processing equipment of a Very Lone Baseline Interferometry (VLBI) synthetic aperture telescope. It receives the mass data collected by the VLBI observatories and produces the visibility function of the target, which can be used to spacecraft position, baseline length measurement, synthesis imaging, and other scientific applications. VLBI data correlation is a task of data intensive and computation intensive. This paper presents the algorithms of two parallel software correlators under multiprocessor environments. A near real-time correlator for spacecraft tracking adopts the pipelining and thread-parallel technology, and runs on the SMP (Symmetric Multiple Processor) servers. Another high speed prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm is realized on a small Beowulf cluster platform. Both correlators have the characteristic of flexible structure, scalability, and with 10-station data correlating abilities.

  18. Multiprocessor system with daisy-chained processor selection

    SciTech Connect

    Yamanaka, K.

    1988-09-27

    A multiprocessor system is described comprising: a bus, a master operation processing unit connected to the bus for generating on the bus data to be processes and a command for processing the data, slave operation processing units connected to the bus for receiving the data and the command from the master operation processing unit. The salve operation processing unit includes a priority discriminator which sequentially selects the slave operation processing units in a preset priority sequence. The priority discriminator also includes means for determining conditions of each of the slave operation processing units and means for initiating execution of the command in the first slave operation processing units selected in the preset priority sequence which has its determined conditions meeting preselected conditions. The initiated slave operation processing unit includes means for processing the data in accordance with the command.

  19. System architecture for asynchronous multi-processor robotic control system

    NASA Technical Reports Server (NTRS)

    Steele, Robert D.; Long, Mark; Backes, Paul

    1993-01-01

    The architecture for the Modular Telerobot Task Execution System (MOTES) as implemented in the Supervisory Telerobotics (STELER) Laboratory is described. MOTES is the software component of the remote site of a local-remote telerobotic system which is being developed for NASA for space applications, in particular Space Station Freedom applications. The system is being developed to provide control and supervised autonomous control to support both space based operation and ground-remote control with time delay. The local-remote architecture places task planning responsibilities at the local site and task execution responsibilities at the remote site. This separation allows the remote site to be designed to optimize task execution capability within a limited computational environment such as is expected in flight systems. The local site task planning system could be placed on the ground where few computational limitations are expected. MOTES is written in the Ada programming language for a multiprocessor environment.

  20. A shared, flexible neural map architecture reflects capacity limits in both visual short-term memory and enumeration.

    PubMed

    Knops, André; Piazza, Manuela; Sengupta, Rakesh; Eger, Evelyn; Melcher, David

    2014-07-23

    Human cognition is characterized by severe capacity limits: we can accurately track, enumerate, or hold in mind only a small number of items at a time. It remains debated whether capacity limitations across tasks are determined by a common system. Here we measure brain activation of adult subjects performing either a visual short-term memory (vSTM) task consisting of holding in mind precise information about the orientation and position of a variable number of items, or an enumeration task consisting of assessing the number of items in those sets. We show that task-specific capacity limits (three to four items in enumeration and two to three in vSTM) are neurally reflected in the activity of the posterior parietal cortex (PPC): an identical set of voxels in this region, commonly activated during the two tasks, changed its overall response profile reflecting task-specific capacity limitations. These results, replicated in a second experiment, were further supported by multivariate pattern analysis in which we could decode the number of items presented over a larger range during enumeration than during vSTM. Finally, we simulated our results with a computational model of PPC using a saliency map architecture in which the level of mutual inhibition between nodes gives rise to capacity limitations and reflects the task-dependent precision with which objects need to be encoded (high precision for vSTM, lower precision for enumeration). Together, our work supports the existence of a common, flexible system underlying capacity limits across tasks in PPC that may take the form of a saliency map.

  1. Memory loss

    MedlinePlus

    ... this page: //medlineplus.gov/ency/article/003257.htm Memory loss To use the sharing features on this ... Bethesda, MD 20894 U.S. Department of Health and Human Services National Institutes of Health Page last updated: ...

  2. Evict on write, a management strategy for a prefetch unit and/or first level cache in a multiprocessor system with speculative execution

    DOEpatents

    Gara, Alan; Ohmacht, Martin

    2014-09-16

    In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.

  3. Allo-HLA Cross-Reactivities of CMV-, FLU- and VZV-Specific Memory T Cells Are Shared by Different Individuals.

    PubMed

    van den Heuvel, H; Heutinck, K M; van der Meer-Prins, E M W; Yong, S L; van Miert, P P M C; Anholts, J D H; Franke-van Dijk, M E I; Zhang, X Q; Roelen, D L; Ten Berge, R J M; Claas, F H J

    2017-03-23

    Virus-specific T cells can recognize allogeneic HLA (allo-HLA) through TCR cross-reactivity. The allospecificity often differs per individual ("private cross-reactivity"), but can also be shared by multiple individuals ("public cross-reactivity"). However, only a few examples of the latter have been described. Since these could facilitate alloreactivity prediction in transplantation, we aimed to identify novel public cross-reactivities of human virus-specific CD8+ T cells directed against allo-HLA by assessing their reactivity in mixed-lymphocyte reactions. Further characterization was done by studying TCR usage with primer-based DNA sequencing, cytokine production with enzyme-linked immunosorbent assays (ELISAs), and cytotoxicity with (51) Chromium-release assays. We identified three novel public allo-HLA cross-reactivities of human virus-specific CD8(+) T cells. CMV B35/IPS CD8(+) T cells cross-reacted with HLA-B51 and/or HLA-B58/B57 (23% of tetramer-positive individuals), FLU A2/GIL CD8(+) T cells with HLA-B38 (90% of tetramer-positive individuals) and VZV A2/ALW CD8(+) T cells with HLA-B55 (two unrelated individuals). Cross-reactivity was tested against different cell types including endothelial and epithelial cells. All cross-reactive T cells expressed a memory phenotype, emphasizing the importance for transplantation. We conclude that public allo-HLA cross-reactivity of virus-specific memory T cells is not uncommon, which may create novel opportunities for alloreactivity prediction and risk estimation in transplantation. This article is protected by copyright. All rights reserved.

  4. A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction

    DOE PAGES

    Kumar, B.; Huang, C. -H.; Sadayappan, P.; ...

    1995-01-01

    In this article, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required workingmore » storage of size O(7 n ) for multiplying 2 n × 2 n matrices. We present a modified formulation in which the working storage requirement is reduced to O(4 n ). The modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor. Performance results on a Cray Y-MP8/64 are presented.« less

  5. Solution of open region electromagnetic scattering problems on hypercube multiprocessors

    SciTech Connect

    Gedney, S.D.

    1991-01-01

    This thesis focuses on development of parallel algorithms that exploit hypercube multiprocessor computers for the solution of the scattering of electromagnetic fields by bodies situated in an unbounded space. Initially, algorithms based on the method of moments are investigated for coarse-grained MIMD hypercubes as well as finite-grained MIMD and SIMD hypercubes. It is shown that by exploiting the architecture of each hypercube, supercomputer performance can be obtained using the JPL Mark III hypercube and the Thinking Machine's CM2. Second, the use of the finite-element method for solution of the scattering by bodies of composite materials is presented. For finite bodies situated in an unbounded space, use of an absorbing boundary condition is investigated. A method known as the mixed-{chi} formulation is presented, which reduces the mesh density in the regions away from the scatterer, enhancing the use of an absorbing boundary condition. The scattering by troughs or slots is also investigated using a combined FEM/MoM formulation. This method is extended to the problem of the diffraction of electromagnetic waves by thick conducting and/or dielectric gratings. Finally, the adaptation of the FEM method onto a coarse-grained hypercube is presented.

  6. Performance and economy of a fault-tolerant multiprocessor

    NASA Technical Reports Server (NTRS)

    Lala, J. H.; Smith, C. J.

    1979-01-01

    The FTMP (Fault-Tolerant Multiprocessor) is one of two central aircraft fault-tolerant architectures now in the prototype phase under NASA sponsorship. The intended application of the computer includes such critical real-time tasks as 'fly-by-wire' active control and completely automatic Category III landings of commercial aircraft. The FTMP architecture is briefly described and it is shown that it is a viable solution to the multi-faceted problems of safety, speed, and cost. Three job dispatch strategies are described, and their results with respect to job-starting delay are presented. The first strategy is a simple First-Come-First-Serve (FCFS) job dispatch executive. The other two schedulers are an adaptive FCFS and an interrupt driven scheduler. Three failure modes are discussed, and the FTMP survival probability in the face of random hard failures is evaluated. It is noted that the hourly cost of operating two FTMPs in a transport aircraft can be as little as one-to-two percent of the total flight-hour cost of the aircraft.

  7. Performance and economy of a fault-tolerant multiprocessor

    NASA Technical Reports Server (NTRS)

    Lala, J. H.; Smith, C. J.

    1979-01-01

    The FTMP (Fault-Tolerant Multiprocessor) is one of two central aircraft fault-tolerant architectures now in the prototype phase under NASA sponsorship. The intended application of the computer includes such critical real-time tasks as 'fly-by-wire' active control and completely automatic Category III landings of commercial aircraft. The FTMP architecture is briefly described and it is shown that it is a viable solution to the multi-faceted problems of safety, speed, and cost. Three job dispatch strategies are described, and their results with respect to job-starting delay are presented. The first strategy is a simple First-Come-First-Serve (FCFS) job dispatch executive. The other two schedulers are an adaptive FCFS and an interrupt driven scheduler. Three failure modes are discussed, and the FTMP survival probability in the face of random hard failures is evaluated. It is noted that the hourly cost of operating two FTMPs in a transport aircraft can be as little as one-to-two percent of the total flight-hour cost of the aircraft.

  8. Ordered fast fourier transforms on a massively parallel hypercube multiprocessor

    NASA Technical Reports Server (NTRS)

    Tong, Charles; Swarztrauber, Paul N.

    1989-01-01

    Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine.

  9. Distribution and reliability in a multiprocessor operating system

    SciTech Connect

    Sindhu, P.S.

    1984-01-01

    This thesis explores whether distributed hardware can be used to build reliable systems that are practical. It deals mostly with issues in the design and construction of the MEDUSA operating system, built to run of the distributed multiprocessor Cm. The goal is to exploit the redundancy, physical distribution, and powerful communication facilities within Cm to produce a robust system - one that can function satisfactorily in spite of hardware failure, bad data, heavy demands on its services, and other disruptive occurrences. The thesis demonstrates that, given appropriate hardware, it is possible to provide robustness without significantly sacrificing performance or greatly increasing complexity. Factors important to this success were 1) the treatment of reliability as a probabilistic rather than an absolute guarantee, which resulted in a structure whose components explicitly acknowledge and cope with failures in one another; 2) the adoption of a systematic approach to exception processing, which made the complexity of recovery manageable; and 3) the exploitation of system knowledge in the design of reliability mechanisms, which made it possible to achieve robustness while retaining good normal-case performance.

  10. MULTIPROCESSOR AND DISTRIBUTED PROCESSING BIBLIOGRAPHIC DATA BASE SOFTWARE SYSTEM

    NASA Technical Reports Server (NTRS)

    Miya, E. N.

    1994-01-01

    Multiprocessors and distributed processing are undergoing increased scientific scrutiny for many reasons. It is more and more difficult to keep track of the existing research in these fields. This package consists of a large machine-readable bibliographic data base which, in addition to the usual keyword searches, can be used for producing citations, indexes, and cross-references. The data base is compiled from smaller existing multiprocessing bibliographies, and tables of contents from journals and significant conferences. There are approximately 4,000 entries covering topics such as parallel and vector processing, networks, supercomputers, fault-tolerant computers, and cellular automata. Each entry is represented by 21 fields including keywords, author, referencing book or journal title, volume and page number, and date and city of publication. The data base contains UNIX 'refer' formatted ASCII data and can be implemented on any computer running under the UNIX operating system. The data base requires approximately one megabyte of secondary storage. The documentation for this program is included with the distribution tape, although it can be purchased for the price below. This bibliography was compiled in 1985 and updated in 1988.

  11. MULTIPROCESSOR AND DISTRIBUTED PROCESSING BIBLIOGRAPHIC DATA BASE SOFTWARE SYSTEM

    NASA Technical Reports Server (NTRS)

    Miya, E. N.

    1994-01-01

    Multiprocessors and distributed processing are undergoing increased scientific scrutiny for many reasons. It is more and more difficult to keep track of the existing research in these fields. This package consists of a large machine-readable bibliographic data base which, in addition to the usual keyword searches, can be used for producing citations, indexes, and cross-references. The data base is compiled from smaller existing multiprocessing bibliographies, and tables of contents from journals and significant conferences. There are approximately 4,000 entries covering topics such as parallel and vector processing, networks, supercomputers, fault-tolerant computers, and cellular automata. Each entry is represented by 21 fields including keywords, author, referencing book or journal title, volume and page number, and date and city of publication. The data base contains UNIX 'refer' formatted ASCII data and can be implemented on any computer running under the UNIX operating system. The data base requires approximately one megabyte of secondary storage. The documentation for this program is included with the distribution tape, although it can be purchased for the price below. This bibliography was compiled in 1985 and updated in 1988.

  12. Efficient diagnosis of multiprocessor systems under probabilistic models

    NASA Technical Reports Server (NTRS)

    Blough, Douglas M.; Sullivan, Gregory F.; Masson, Gerald M.

    1989-01-01

    The problem of fault diagnosis in multiprocessor systems is considered under a probabilistic fault model. The focus is on minimizing the number of tests that must be conducted in order to correctly diagnose the state of every processor in the system with high probability. A diagnosis algorithm that can correctly diagnose the state of every processor with probability approaching one in a class of systems performing slightly greater than a linear number of tests is presented. A nearly matching lower bound on the number of tests required to achieve correct diagnosis in arbitrary systems is also proven. Lower and upper bounds on the number of tests required for regular systems are also presented. A class of regular systems which includes hypercubes is shown to be correctly diagnosable with high probability. In all cases, the number of tests required under this probabilistic model is shown to be significantly less than under a bounded-size fault set model. Because the number of tests that must be conducted is a measure of the diagnosis overhead, these results represent a dramatic improvement in the performance of system-level diagnosis techniques.

  13. Validation of a fault-tolerant multiprocessor: Baseline experiments and workload implementation

    NASA Technical Reports Server (NTRS)

    Feather, Frank; Siewiorek, Daniel; Segall, Zary

    1985-01-01

    In the future, aircraft must employ highly reliable multiprocessors in order to achieve flight safety. Such computers must be experimentally validated before they are deployed. This project outlines a methodology for validating reliable multiprocessors. The methodology begins with baseline experiments, which tests a single phenomenon. As experiments progress, tools for performance testing are developed. The methodology is used, in part, on the Fault Tolerant Multiprocessor (FTMP) at NASA-Langley's AIRLAB facility. Experiments are designed to evaluate the fault-free performance of the system. Presented are the results of interrupt baseline experiments performed on FTMP. Interrupt causing exception conditions were tested, and several were found to have unimplemented interrupt handling software while one had an unimplemented interrupt vector. A synthetic workload model for realtime multiprocessors is then developed as an application level performance analysis tool. Details of the workload implementation and calibration are presented. Both the experimental methodology and the synthetic workload model are general enough to be applicable to reliable multiprocessors beside FTMP.

  14. Modeling techniques in a parallelizing compiler for the B-HIVE multiprocessor system

    NASA Technical Reports Server (NTRS)

    Kim, Sukil; Agrawal, Dharma P.; Mauney, Jon; Leu, Ja-Song

    1989-01-01

    The parallelizing compiler for the B-HIVE loosely-coupled multiprocessor system uses a medium grain model to minimize the communication overhead. A medium grain model is shown to be an optimum way of merging fine grain operations into parallel tasks such that the parallelism obtained at the grain level is retained and communication overhead is decreased. A new communication model is introduced in this paper, allowing additional overlap between computation and communication. Simulation results indicate that the medium grain communication model shows promise for automatic parallelization for a loosely-coupled multiprocessor system.

  15. Chapman-1024 Processor Shared Memory

    NASA Technical Reports Server (NTRS)

    Ciotii, Robert

    2003-01-01

    NASA has developed new technology that improves upon weakness in current mainstream supercomputer designs: those of "scalability," "humadmachine interface," and "load balancing." The system simplifies running large computer simulations of national and international importance like climate prediction and space vehicle design.

  16. Memory management and compiler support for rapid recovery from failures in computer systems

    NASA Technical Reports Server (NTRS)

    Fuchs, W. K.

    1991-01-01

    This paper describes recent developments in the use of memory management and compiler technology to support rapid recovery from failures in computer systems. The techniques described include cache coherence protocols for user transparent checkpointing in multiprocessor systems, compiler-based checkpoint placement, compiler-based code modification for multiple instruction retry, and forward recovery in distributed systems utilizing optimistic execution.

  17. Performance Evaluation of Parallel Algorithms and Architectures in Concurrent Multiprocessor Systems

    DTIC Science & Technology

    1988-09-01

    encountered in the IN for each shared memory access. If hot spots [25) occur often during the execution of a program such a delay will have a...SIGMETRICS ConL., Banff , Alberta, 1987. [101 Kerola, T., H. Schwetman, "Monit: A Performance Monitoring Tool for Parallel and Pseudo-Parallel Programs...Hwang (ed.), IEEE Comp. Soc., 1984. [231 Perron, R., and C. Mundie, "The Architecture of the Alliant FX/8 Computer," Digest of Papers, Compcon, Spring 86

  18. Memory interface simulator: A computer design aid

    NASA Technical Reports Server (NTRS)

    Taylor, D. S.; Williams, T.; Weatherbee, J. E.

    1972-01-01

    Results are presented of a study conducted with a digital simulation model being used in the design of the Automatically Reconfigurable Modular Multiprocessor System (ARMMS), a candidate computer system for future manned and unmanned space missions. The model simulates the activity involved as instructions are fetched from random access memory for execution in one of the system central processing units. A series of model runs measured instruction execution time under various assumptions pertaining to the CPU's and the interface between the CPU's and RAM. Design tradeoffs are presented in the following areas: Bus widths, CPU microprogram read only memory cycle time, multiple instruction fetch, and instruction mix.

  19. Implementation of multigrid methods for solving Navier-Stokes equations on a multiprocessor system

    NASA Technical Reports Server (NTRS)

    Naik, Vijay K.; Taasan, Shlomo

    1987-01-01

    Presented are schemes for implementing multigrid algorithms on message based MIMD multiprocessor systems. To address the various issues involved, a nontrivial problem of solving the 2-D incompressible Navier-Stokes equations is considered as the model problem. Three different multigrid algorithms are considered. Results from implementing these algorithms on an Intel iPSC are presented.

  20. Evaluation of the impact chip multiprocessors have on SNL application performance.

    SciTech Connect

    Doerfler, Douglas W.

    2009-10-01

    This report describes trans-organizational efforts to investigate the impact of chip multiprocessors (CMPs) on the performance of important Sandia application codes. The impact of CMPs on the performance and applicability of Sandia's system software was also investigated. The goal of the investigation was to make algorithmic and architectural recommendations for next generation platform acquisitions.

  1. A distributed multiprocessor system designed for real-time image processing

    NASA Astrophysics Data System (ADS)

    Yin, Zhiyi; Heng, Wei

    2008-11-01

    In real-time image processing, a large amount of data is needed to be processed at a very high speed. Considering the problems faced in real-time image processing, a distributed multiprocessor system is proposed in this paper. In the design of the distributed multiprocessor system, processing tasks are allocated to various processes, which are bound to different CPUs. Several designs are discussed, and making full use of every process is very important to system's excellent performance. Furthermore, the problems of realization fasten on the inter-process communication, the synchronization, and the stability. System analysis and performance tests both show that the distributed multiprocessor system is able to improve system's performance variously, including the delay, the throughput rate, the stability, the scalability. And the system can be expanded easy at aspects of software and hardware. In a word, the distributed multiprocessor system designed for real-time image processing, based on distributed algorithms, not only improves system's performance variously, but also costs low and expands easy.

  2. Programmable Optoelectronic Multiprocessors: Design, Performance and CAD Development

    NASA Astrophysics Data System (ADS)

    Kiamilev, Fouad Eskender

    1992-01-01

    This thesis describes the development of Programmable Optoelectronic Multiprocessor (POEM) architectures and systems. POEM systems combine simple electronic processing elements with free-space optical interconnects to implement high-performance, massively-parallel computers. POEM architectures are fundamentally different from architectures used in conventional VLSI systems. Novel system partitioning and processing element design methods have been developed to ensure efficient implementation of POEM architectures with optoelectronic technology. The main contributions of this thesis are: architecture and software design for the POEM prototype built at UCSD; detailed technology design-tradeoff and comparison studies for POEM interconnection networks; and application of the VHSIC Hardware Description Language (VHDL) to the design, simulation, and synthesis of POEM computers. A general-purpose POEM SIMD parallel computer architecture has been designed for symbolic computing applications. A VHDL simulation of this architecture was written to test the POEM hardware running parallel programs prior to prototype fabrication. Detailed performance comparison of this architecture with all-optical computing, based on symbolic substitution, has also been carried out to show that POEMs offer higher computational efficiency. A detailed technological design of a packet-switched POEM multistage interconnection network system has been performed. This design uses optically interconnected stages of K x K electronic switching elements, where K is a variable parameter, called grain-size, that determines the ratio of optics to electronics in the system. A thorough cost and performance comparison between this design and existing VLSI implementations was undertaken to show that the POEM approach offers better scalability and higher performance. The grain-size was optimized, showing that switch sizes of 16 x 16 to 256 x 256 provide maximum performance/cost. The effects of varying

  3. Meeting the Memory Challenges of Brain-Scale Network Simulation

    PubMed Central

    Kunkel, Susanne; Potjans, Tobias C.; Eppler, Jochen M.; Plesser, Hans Ekkehard; Morrison, Abigail; Diesmann, Markus

    2012-01-01

    The development of high-performance simulation software is crucial for studying the brain connectome. Using connectome data to generate neurocomputational models requires software capable of coping with models on a variety of scales: from the microscale, investigating plasticity, and dynamics of circuits in local networks, to the macroscale, investigating the interactions between distinct brain regions. Prior to any serious dynamical investigation, the first task of network simulations is to check the consistency of data integrated in the connectome and constrain ranges for yet unknown parameters. Thanks to distributed computing techniques, it is possible today to routinely simulate local cortical networks of around 105 neurons with up to 109 synapses on clusters and multi-processor shared-memory machines. However, brain-scale networks are orders of magnitude larger than such local networks, in terms of numbers of neurons and synapses as well as in terms of computational load. Such networks have been investigated in individual studies, but the underlying simulation technologies have neither been described in sufficient detail to be reproducible nor made publicly available. Here, we discover that as the network model sizes approach the regime of meso- and macroscale simulations, memory consumption on individual compute nodes becomes a critical bottleneck. This is especially relevant on modern supercomputers such as the Blue Gene/P architecture where the available working memory per CPU core is rather limited. We develop a simple linear model to analyze the memory consumption of the constituent components of neuronal simulators as a function of network size and the number of cores used. This approach has multiple benefits. The model enables identification of key contributing components to memory saturation and prediction of the effects of potential improvements to code before any implementation takes place. As a consequence, development cycles can be shorter and less

  4. Meeting the memory challenges of brain-scale network simulation.

    PubMed

    Kunkel, Susanne; Potjans, Tobias C; Eppler, Jochen M; Plesser, Hans Ekkehard; Morrison, Abigail; Diesmann, Markus

    2011-01-01

    The development of high-performance simulation software is crucial for studying the brain connectome. Using connectome data to generate neurocomputational models requires software capable of coping with models on a variety of scales: from the microscale, investigating plasticity, and dynamics of circuits in local networks, to the macroscale, investigating the interactions between distinct brain regions. Prior to any serious dynamical investigation, the first task of network simulations is to check the consistency of data integrated in the connectome and constrain ranges for yet unknown parameters. Thanks to distributed computing techniques, it is possible today to routinely simulate local cortical networks of around 10(5) neurons with up to 10(9) synapses on clusters and multi-processor shared-memory machines. However, brain-scale networks are orders of magnitude larger than such local networks, in terms of numbers of neurons and synapses as well as in terms of computational load. Such networks have been investigated in individual studies, but the underlying simulation technologies have neither been described in sufficient detail to be reproducible nor made publicly available. Here, we discover that as the network model sizes approach the regime of meso- and macroscale simulations, memory consumption on individual compute nodes becomes a critical bottleneck. This is especially relevant on modern supercomputers such as the Blue Gene/P architecture where the available working memory per CPU core is rather limited. We develop a simple linear model to analyze the memory consumption of the constituent components of neuronal simulators as a function of network size and the number of cores used. This approach has multiple benefits. The model enables identification of key contributing components to memory saturation and prediction of the effects of potential improvements to code before any implementation takes place. As a consequence, development cycles can be shorter and

  5. Multiprocessor Real-Time Locking Protocols for Replicated Resources

    DTIC Science & Technology

    2016-07-01

    Size −1 of 0..k initially k /∗shared∗/ Delta : 0..∞ initially 0 /∗ shared ∗/ Num available : 0..k initially 0 /∗ shared ∗/ Pending requests : 0..m... initially 0 /∗ shared ∗/ start time : 0..∞ initially 0 /∗ private to each request ∗/ start slot : 0.. Size initially 0 /∗ private to each request ∗/ j...0..∞ initially 0 /∗ private to each request ∗/ t : 0..∞ initially 0 /∗ private to each request ∗/ num slots : 0.. ⌈ Lmax Slot size ⌉ initially

  6. Apparatus for multiprocessor-based control of a multiagent robot

    NASA Technical Reports Server (NTRS)

    Peters, II, Richard Alan (Inventor)

    2009-01-01

    An architecture for robot intelligence enables a robot to learn new behaviors and create new behavior sequences autonomously and interact with a dynamically changing environment. Sensory information is mapped onto a Sensory Ego-Sphere (SES) that rapidly identifies important changes in the environment and functions much like short term memory. Behaviors are stored in a DBAM that creates an active map from the robot's current state to a goal state and functions much like long term memory. A dream state converts recent activities stored in the SES and creates or modifies behaviors in the DBAM.

  7. Performance of the butterfly processor-memory interconnection in a vector environment

    NASA Astrophysics Data System (ADS)

    Brooks, E. D., III

    1985-02-01

    A fundamental hurdle impeding the development of large N common memory multiprocessors is the performance limitation in the switch connecting the processors to the memory modules. Multistage networks currently considered for this connection have a memory latency which grows like (ALPHA)log2N*. For scientific computing, it is natural to look for a multiprocessor architecture that will enable the use of vector operations to mask memory latency. The problem to be overcome here is the chaotic behavior introduced by conflicts occurring in the switch. The performance of the butterfly or indirect binary n-cube network in a vector processing environment is examined. A simple modification of the standard 2X2 switch node used in such networks which adaptively removes chaotic behavior during a vector operation is described.

  8. Origins of Autobiographical Memory.

    ERIC Educational Resources Information Center

    Harley, Keryn; Reese, Elaine

    1999-01-01

    Tested predictions of infantile amnesia theory compared with social-interactionist account of autobiographical memory. Found maternal reminiscing style and self-recognition when child was 19 months old uniquely predicted children's shared memory reports across time, even with children's initial language and nonverbal memory factored out.…

  9. Validation of fault-free behavior of a reliable multiprocessor system - FTMP: A case study. [Fault-Tolerant Multi-Processor avionics

    NASA Technical Reports Server (NTRS)

    Clune, E.; Segall, Z.; Siewiorek, D.

    1984-01-01

    A program of experiments has been conducted at NASA-Langley to test the fault-free performance of a Fault-Tolerant Multiprocessor (FTMP) avionics system for next-generation aircraft. Baseline measurements of an operating FTMP system were obtained with respect to the following parameters: instruction execution time, frame size, and the variation of clock ticks. The mechanisms of frame stretching were also investigated. The experimental results are summarized in a table. Areas of interest for future tests are identified, with emphasis given to the implementation of a synthetic workload generation mechanism on FTMP.

  10. The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor

    DTIC Science & Technology

    1991-06-01

    latencies by rapidly switching between threads of computation. This paper describes the Alewife architecture and concentrates on the novel hardware features...avoided. Alewife’s processor, Sparcle, is designed to toler- ate these latencies by rapidly switching between threads of computation. This paper ...processor tolerates the resulting latencies by rapidly switching be- tween threads of computation. This paper focuses on the organization of the Alewife

  11. Safe and Efficient Support for Embeded Multi-Processors in ADA

    NASA Astrophysics Data System (ADS)

    Ruiz, Jose F.

    2010-08-01

    New software demands increasing processing power, and multi-processor platforms are spreading as the answer to achieve the required performance. Embedded real-time systems are also subject to this trend, but in the case of real-time mission-critical systems, the properties of reliability, predictability and analyzability are also paramount. The Ada 2005 language defined a subset of its tasking model, the Ravenscar profile, that provides the basis for the implementation of deterministic and time analyzable applications on top of a streamlined run-time system. This Ravenscar tasking profile, originally designed for single processors, has proven remarkably useful for modelling verifiable real-time single-processor systems. This paper proposes a simple extension to the Ravenscar profile to support multi-processor systems using a fully partitioned approach. The implementation of this scheme is simple, and it can be used to develop applications amenable to schedulability analysis.

  12. Bus interconnection in multiprocessor environment: The communication principle and the arbitration techniques

    NASA Astrophysics Data System (ADS)

    Joly, R.

    1983-06-01

    The structure and the algorithms of the arbiter function managing the data traffic in a multibus architecture are studied. The arbitration algorithms such as daisy chaining, polling or independent request are reviewed. An arbitration algorithm is implemented to be used in the ARCADE multibus multiprocessor system. The bus system of ARCADE is described and an experimental model is implemented. The resulting system is more complex than originally estimated, leading to another approach which is outlined: a high rate multipurpose interconnection structure.

  13. Scheduling Constrained-Deadline Parallel Tasks on Two-type Heterogeneous Multiprocessors

    DTIC Science & Technology

    2015-01-13

    heterogeneous multiprocessor with provably good performance. I. INTRODUCTION Software systems are expected to do more with less, i.e., providing more...developing methods that provide foundations for doing so while ensuring, before run-time, that the software system can respond, at run- time, to certain events...Andersson Gurulingesh 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Software

  14. Method for wiring allocation and switch configuration in a multiprocessor environment

    DOEpatents

    Aridor, Yariv; Domany, Tamar; Frachtenberg, Eitan; Gal, Yoav; Shmueli, Edi; Stockmeyer, legal representative, Robert E.; Stockmeyer, Larry Joseph

    2008-07-15

    A method for wiring allocation and switch configuration in a multiprocessor computer, the method including employing depth-first tree traversal to determine a plurality of paths among a plurality of processing elements allocated to a job along a plurality of switches and wires in a plurality of D-lines, and selecting one of the paths in accordance with at least one selection criterion.

  15. A class of multiprocessors for real-time image and multidimensional signal processing

    NASA Astrophysics Data System (ADS)

    Denayer, Tony; Vanzieleghem, Etienne; Jespers, Paul G. A.

    1988-06-01

    A 90,000-transistor, 50-MIPS (million-instruction-per-second) multiprocessor chip designed for image orthogonal transform is discussed. The architectural principle, derived from a tensorial formalism, is usable for the other linear processings of multidimensional signals (e.g., n-dimensional convolution). A regularity factor of more than 99 percent was obtained by taking advantage of systolic principles at both chip and bit levels.

  16. Controlling fine-grain non-numeric parallelism on a combinator-based multiprocessor system

    SciTech Connect

    Chu, Pong Ping.

    1989-01-01

    The author has developed a scheme to extend the SASL programming language and its run-time system for fine grain parallel processing. The proposed scheme provides a mechanism that can override the original lazy semantics by augmenting proper eager information. This information is first annotated in SASL programs and then translated to the combinator control tags by a new set of optimization rules. The effectiveness of this scheme has been evaluated through the simulation of a set of symbolic-oriented programs on an idealized shared-memory system. The results show that a considerable amount of parallelism can be extracted from a wide variety of application programs.

  17. Operating system for a real-time multiprocessor propulsion system simulator

    NASA Technical Reports Server (NTRS)

    Cole, G. L.

    1984-01-01

    The success of the Real Time Multiprocessor Operating System (RTMPOS) in the development and evaluation of experimental hardware and software systems for real time interactive simulation of air breathing propulsion systems was evaluated. The Real Time Multiprocessor Operating System (RTMPOS) provides the user with a versatile, interactive means for loading, running, debugging and obtaining results from a multiprocessor based simulator. A front end processor (FEP) serves as the simulator controller and interface between the user and the simulator. These functions are facilitated by the RTMPOS which resides on the FEP. The RTMPOS acts in conjunction with the FEP's manufacturer supplied disk operating system that provides typical utilities like an assembler, linkage editor, text editor, file handling services, etc. Once a simulation is formulated, the RTMPOS provides for engineering level, run time operations such as loading, modifying and specifying computation flow of programs, simulator mode control, data handling and run time monitoring. Run time monitoring is a powerful feature of RTMPOS that allows the user to record all actions taken during a simulation session and to receive advisories from the simulator via the FEP. The RTMPOS is programmed mainly in PASCAL along with some assembly language routines. The RTMPOS software is easily modified to be applicable to hardware from different manufacturers.

  18. RTMPL: A structured programming and documentation utility for real-time multiprocessor simulations

    NASA Technical Reports Server (NTRS)

    Arpasi, D. J.

    1984-01-01

    The NASA Lewis Research Center is developing and evaluating experimental hardware and software systems to help meet future needs for real time simulations of air-breathing propulsion systems. The Real Time Multiprocessor Simulator (RTMPS) project is aimed at developing a prototype simulator system that uses multiple microprocessors to achieve the desired computing speed and accuracy at relatively low cost. Software utilities are being developed to provide engineering-level programming and interactive operation of the simulator. Two major software development efforts were undertaken in the RTMPS project. A real time multiprocessor operating system was developed to provide for interactive operation of the simulator. The second effort was aimed at developing a structured, high-level, engineering-oriented programming language and translator that would facilitate the programming of the simulator. The Real Time Multiprocessor Programming Language (RTMPL) allows the user to describe simulation tasks for each processor in a straight-forward, structured manner. The RTMPL utility acts as an assembly language programmer, translating the high-level simulation description into time-efficient assembly language code for the processors. The utility sets up all of the interfaces between the simulator hardware, firmware, and operating system.

  19. Dynamic power management for UML modeled applications on multiprocessor SoC

    NASA Astrophysics Data System (ADS)

    Kukkala, Petri; Arpinen, Tero; Setälä, Mikko; Hännikäinen, Marko; Hämäläinen, Timo D.

    2007-02-01

    The paper presents a novel scheme of dynamic power management for UML modeled applications that are executed on a multiprocessor System-on-Chip (SoC) in a distributed manner. The UML models for both application and architecture are designed according to a well-defined UML profile for embedded system design, called TUT-Profile. Application processes are considered as elementary units of distributed execution, and their mapping on a multiprocessor SoC can be dynamically changed at run-time. Our approach on the dynamic power management balances utilized processor resources against current workload at runtime by (1) observing the processor and workload statistics, (2) re-evaluating the amount of required resources (i.e. the number of active processors), and (3) re-mapping the application processes to the minimum set of active processors. The inactive processors are set to a power-save state by using clock-gating. The approach integrates the well-known power management techniques tightly with the UML based design of embedded systems in a novel way. We evaluated the dynamic power management with a WLAN terminal implemented on a multiprocessor SoC on Altera Stratix II FPGA containing up to five Nios II processors and dedicated hardware accelerators. Measurements proved up to 21% savings in the power consumption of the whole FPGA board.

  20. Fault-Tolerant Multiprocessor and VLSI-Based Systems.

    DTIC Science & Technology

    1987-03-15

    PRADHAN 15 MAR 87 UCLASFIED AFOSR-TR-87-89 1 AFOSR-84-8052 F/ G 2/7 NL IIIIIIIIIMIIII Illlllllllllll llElhllElhlllil E/I/IEE/I h//E EEIIEIIIEIIIIE Ellll...address field. Different groups can have a different number of banks in them. Thus group G , may consist of 4 banks while group G2 may have only 1 bank. If...memory banks into groups is given in Figure 2. Banks within each group G (2k) are organized for low-order interleaving; high-order address bits are used

  1. Expert Systems on Multiprocessor Architectures. Volume 3. Technical Reports

    DTIC Science & Technology

    1991-06-01

    track consisting of scantimes 100, 110, 120 .... 1 50. Suppose that the rate of data .rrival is high, causing message order to be scrambled , and that...sera’etsm xo-- under ur control m aa actons hared memory --s-.r-buced meory .ia~P ?O fl Egg ---- fO~s mu Ftl-pr r uhe-pr# macnines mac -es -Fgure 1...where there are one or more streams of continuous input data, the problem appears as scrambled data arrival - the data may be out of temporal sequence

  2. A balanced submatrix merging algorithm for multiprocessor architectures

    NASA Technical Reports Server (NTRS)

    Chu, Eleanor; George, Alan

    1992-01-01

    In this article, a parallel algorithm which applies Givens rotations to selectively annihilate k(k + 1)/2 nonzero elements from two k x n(k not more than n) upper trapezoidal submatrices is described. The new algorithm is suitable for implementation on either a pair of directly connected local-memory processors or two clusters of multiple tightly-coupled processors. Analyses show that in both cases the proposed algorithms achieve optimal speed-up by balancing the work load distribution and masking interprocessor or intercluster communication by computation if k is much small than n. In the context of solving large scale least squares problems, this submatrix merging step is repetitively needed during the entire computation and, furthermore, there are usually many pairs of such submatrices to be merged with each submatrix stored in the memory of a processor or a cluster of processors. The proposed algorithm can be applied to each pair of submatrices concurrently, and thus parallelizes an important step in solving the least squares problems.

  3. Shared Resources

    Treesearch

    David B. Butts

    1987-01-01

    Wildfires do not respect property boundaries. Whole geographic regions are typically impacted by major wildfire outbreaks. Various fire related resources can be shared to solve such crises; whether they are shared, and how they are shared depends to a great extent upon the rapport among the agencies involved. Major progress has been achieved over the past decade...

  4. Origins of autobiographical memory.

    PubMed

    Harley, K; Reese, E

    1999-09-01

    This study tested the predictions of M. L. Howe and M. L. Courage's (1993, 1997) theory of infantile amnesia compared with a social-interactionist account of autobiographical memory development (R. Fivush & E. Reese, 1992; K. Nelson, 1993b). Fifty-eight mother-child dyads were assessed for maternal styles of talking about the past and for children's self-recognition, language production, and nonverbal memory when the children were 19 months old. Children's shared and independent memory reports were then assessed from 19 to 32 months. Maternal reminiscing style and self-recognition uniquely predicted children's shared memory reports across time, even with children's initial language and nonverbal memory factored out. Self-recognition skills also predicted children's later independent memory. These results support a pluralistic account of the origins of autobiographical memory.

  5. A high speed multi-tasking, multi-processor telemetry system

    SciTech Connect

    Wu, Kung Chris

    1996-12-31

    This paper describes a small size, light weight, multitasking, multiprocessor telemetry system capable of collecting 32 channels of differential signals at a sampling rate of 6.25 kHz per channel. The system is designed to collect data from remote wind turbine research sites and transfer the data via wireless communication. A description of operational theory, hardware components, and itemized cost is provided. Synchronization with other data acquisition systems and test data on data transmission rates is also given. 11 refs., 7 figs., 4 tabs.

  6. Energy-efficient fault tolerance in multiprocessor real-time systems

    NASA Astrophysics Data System (ADS)

    Guo, Yifeng

    The recent progress in the multiprocessor/multicore systems has important implications for real-time system design and operation. From vehicle navigation to space applications as well as industrial control systems, the trend is to deploy multiple processors in real-time systems: systems with 4 -- 8 processors are common, and it is expected that many-core systems with dozens of processing cores will be available in near future. For such systems, in addition to general temporal requirement common for all real-time systems, two additional operational objectives are seen as critical: energy efficiency and fault tolerance. An intriguing dimension of the problem is that energy efficiency and fault tolerance are typically conflicting objectives, due to the fact that tolerating faults (e.g., permanent/transient) often requires extra resources with high energy consumption potential. In this dissertation, various techniques for energy-efficient fault tolerance in multiprocessor real-time systems have been investigated. First, the Reliability-Aware Power Management (RAPM) framework, which can preserve the system reliability with respect to transient faults when Dynamic Voltage Scaling (DVS) is applied for energy savings, is extended to support parallel real-time applications with precedence constraints. Next, the traditional Standby-Sparing (SS) technique for dual processor systems, which takes both transient and permanent faults into consideration while saving energy, is generalized to support multiprocessor systems with arbitrary number of identical processors. Observing the inefficient usage of slack time in the SS technique, a Preference-Oriented Scheduling Framework is designed to address the problem where tasks are given preferences for being executed as soon as possible (ASAP) or as late as possible (ALAP). A preference-oriented earliest deadline (POED) scheduler is proposed and its application in multiprocessor systems for energy-efficient fault tolerance is

  7. Multi-objective two-stage multiprocessor flow shop scheduling - a subgroup particle swarm optimisation approach

    NASA Astrophysics Data System (ADS)

    Huang, Rong-Hwa; Yang, Chang-Lin; Hsu, Chun-Ting

    2015-12-01

    Flow shop production system - compared to other economically important production systems - is popular in real manufacturing environments. This study focuses on the flow shop with multiprocessor scheduling problem (FSMP), and develops an improved particle swarm optimisation heuristic to solve it. Additionally, this study designs an integer programming model to perform effectiveness and robustness testing on the proposed heuristic. Experimental results demonstrate a 10% to 50% improvement in the effectiveness of the proposed heuristic in small-scale problem tests, and a 10% to 40% improvement in the robustness of the heuristic in large-scale problem tests, indicating extremely satisfactory performance.

  8. A simple executive for a fault-tolerant, real-time multiprocessor.

    NASA Technical Reports Server (NTRS)

    Filene, R. J.; Green, A. I.

    1971-01-01

    Description of a simple executive for operation with a fault-tolerant multiprocessor that is oriented toward application in an environment where the primary function is to provide real-time control. The primary executive function is to accept requests for jobs placed by other jobs or from peripheral equipment and then schedule their initiation in accordance with the request parameters. The executive is also brought into action when a processor fails, so that appropriate disposition may be made of the job that was running on the failed processor. Many architectural features intended to support this executive concept are included.

  9. Programmable controller with a multiprocessor-based high speed interactive language system

    SciTech Connect

    Matsuzaki, K.; Hata, S.; Ohkochi, O.; Okamura, M.; Sugimoto, N.

    1983-01-01

    A multiprocessor-based programmable controller (PC) capable of sequence control and data processing has been developed. This PC consists of a custom processor for a relay ladder program and a 68000 16-bit microprocessor for a basic program. The basic program is executed by an interpreter which is an order faster than a conventional interpreter of a personal computer. The relay ladder program and the basic program can activate and communicate with each other. Although the controller features more control functions than conventional PCs, it can be easily operated interactively on site. 2 references.

  10. Fault-tolerant computers. Multiprocessor architecture tunes in to transaction processing

    SciTech Connect

    Cohen, K.I.

    1983-01-27

    The availability of fast, low-cost 16- and 32-bit microprocessors makes it possible at last to build a truly cost-effective generation of fault-tolerant computers. One such system employs a multiprocessor architecture optimized for transaction-oriented applications. Called the synapse expansion architecture, it is tolerant of component failures, may easily be economically expanded in small increments, and is not tied to any one microprocessor instruction set. Yet thanks to the specially developed operating software, neither operators nor programmers are aware of the architecture's uniqueness. The author looks at the architecture of the synapse expansion general purpose computer.

  11. A simple executive for a fault-tolerant, real-time multiprocessor.

    NASA Technical Reports Server (NTRS)

    Filene, R. J.; Green, A. I.

    1971-01-01

    Description of a simple executive for operation with a fault-tolerant multiprocessor that is oriented toward application in an environment where the primary function is to provide real-time control. The primary executive function is to accept requests for jobs placed by other jobs or from peripheral equipment and then schedule their initiation in accordance with the request parameters. The executive is also brought into action when a processor fails, so that appropriate disposition may be made of the job that was running on the failed processor. Many architectural features intended to support this executive concept are included.

  12. ScalaBLAST 2.0: Rapid and robust BLAST calculations on multiprocessor systems

    SciTech Connect

    Oehmen, Christopher S.; Baxter, Douglas J.

    2013-03-15

    BLAST remains one of the most widely used tools in computational biology. The rate at which new sequence data is available continues to grow exponentially, driving the emergence of new fields of biological research. At the same time multicore systems and conventional clusters are more accessible. ScalaBLAST has been designed to run on conventional multiprocessor systems with an eye to extreme parallelism, enabling parallel BLAST calculations using over 16,000 processing cores with a portable, robust, fault-resilient design. ScalaBLAST 2.0 source code can be freely downloaded from http://omics.pnl.gov/software/ScalaBLAST.php.

  13. Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer. Volume 4: FTMP executive summary

    NASA Technical Reports Server (NTRS)

    Smith, T. B., III; Lala, J. H.

    1984-01-01

    The FTMP architecture is a high reliability computer concept modeled after a homogeneous multiprocessor architecture. Elements of the FTMP are operated in tight synchronism with one another and hardware fault-detection and fault-masking is provided which is transparent to the software. Operating system design and user software design is thus greatly simplified. Performance of the FTMP is also comparable to that of a simplex equivalent due to the efficiency of fault handling hardware. The FTMP project constructed an engineering module of the FTMP, programmed the machine and extensively tested the architecture through fault injection and other stress testing. This testing confirmed the soundness of the FTMP concepts.

  14. Sharing code.

    PubMed

    Kubilius, Jonas

    2014-01-01

    Sharing code is becoming increasingly important in the wake of Open Science. In this review I describe and compare two popular code-sharing utilities, GitHub and Open Science Framework (OSF). GitHub is a mature, industry-standard tool but lacks focus towards researchers. In comparison, OSF offers a one-stop solution for researchers but a lot of functionality is still under development. I conclude by listing alternative lesser-known tools for code and materials sharing.

  15. Myrmics Memory Allocator

    SciTech Connect

    Lymperis, S.

    2011-09-23

    MMA is a stand-alone memory management system for MPI clusters. It implements a shared Partitioned Global Address Space, where multiple MPI processes request objects from the allocator and the latter provides them with system-wide unique memory addresses for each object. It provides applications with an intuitive way of managing the memory system in a unified way, thus enabling easier writing of irregular application code.

  16. Myrmics Memory Allocator

    SciTech Connect

    Lymperis, S.

    2011-09-23

    MMA is a stand-alone memory management system for MPI clusters. It implements a shared Partitioned Global Address Space, where multiple MPI processes request objects from the allocator and the latter provides them with system-wide unique memory addresses for each object. It provides applications with an intuitive way of managing the memory system in a unified way, thus enabling easier writing of irregular application code.

  17. SPMTM: A Novel ScratchPad Memory Based Hybrid Nested Transactional Memory Framework

    NASA Astrophysics Data System (ADS)

    Feng, Degui; Jiang, Guanjun; Zhang, Tiefei; Hu, Wei; Chen, Tianzhou; Cao, Mingteng

    Chip multiprocessor (CMP) has been the mainstream of processor design with the progress in semiconductor technology. It provides higher concurrency for the threads compared with the traditional single-core processor. Lock-based synchronization of multi-threads has been proved as an inefficient approach with high overhead. The previous works show that TM is an efficient solution to solve the synchronization of multi-threads. This paper presents SPMTM, a novel on-chip memory based nested TM framework. The on-chip memory used in this framework is not cache but scratchpad memory (SPM), which is software-controlled SRAM on chip. TM information will be stored in SPM to enhance the access speed and reduce the power consumption in SPMTM. Experimental results show that SPMTM can obtain average 16.3% performance improvement of the benchmarks compared with lock-based synchronization and with the increase in the number of processor core, the performance improvement is more significant.

  18. Sharing code

    PubMed Central

    Kubilius, Jonas

    2014-01-01

    Sharing code is becoming increasingly important in the wake of Open Science. In this review I describe and compare two popular code-sharing utilities, GitHub and Open Science Framework (OSF). GitHub is a mature, industry-standard tool but lacks focus towards researchers. In comparison, OSF offers a one-stop solution for researchers but a lot of functionality is still under development. I conclude by listing alternative lesser-known tools for code and materials sharing. PMID:25165519

  19. Animal models of source memory.

    PubMed

    Crystal, Jonathon D

    2016-01-01

    Source memory is the aspect of episodic memory that encodes the origin (i.e., source) of information acquired in the past. Episodic memory (i.e., our memories for unique personal past events) typically involves source memory because those memories focus on the origin of previous events. Source memory is at work when, for example, someone tells a favorite joke to a person while avoiding retelling the joke to the friend who originally shared the joke. Importantly, source memory permits differentiation of one episodic memory from another because source memory includes features that were present when the different memories were formed. This article reviews recent efforts to develop an animal model of source memory using rats. Experiments are reviewed which suggest that source memory is dissociated from other forms of memory. The review highlights strengths and weaknesses of a number of animal models of episodic memory. Animal models of source memory may be used to probe the biological bases of memory. Moreover, these models can be combined with genetic models of Alzheimer's disease to evaluate pharmacotherapies that ultimately have the potential to improve memory.

  20. Insights on consciousness from taste memory research.

    PubMed

    Gallo, Milagros

    2016-01-01

    Taste research in rodents supports the relevance of memory in order to determine the content of consciousness by modifying both taste perception and later action. Associated with this issue is the fact that taste and visual modalities share anatomical circuits traditionally related to conscious memory. This challenges the view of taste memory as a type of non-declarative unconscious memory.

  1. Simulating a small turboshaft engine in real-time multiprocessor simulator (RTMPS) environment

    NASA Technical Reports Server (NTRS)

    Milner, E. J.; Arpasi, D. J.

    1986-01-01

    A Real-Time Multiprocessor Simulator (RTMPS) has been developed at NASA Lewis Research Center. The RTMPS uses parallel microprocessors to achieve computing speeds needed for real-time engine simulation. This report describes the use of the RTMPS system to simulate a small turboshaft engine. The process of programming the engine equations and distributing them over one, two, and four processors is discussed. Steady-state and transient results from the RTMPS simulation are compared with results from a main-frame-based simulation. Processor execution times and the associated execution time savings for the two and four processor cases are presented using actual data obtained from the RTMPS system. Included is a discussion of why the minimum achievable calculation time for the turboshaft engine model was attained using four processors. Finally, future enhancements to the RTMPS system are discussed including the development of a generalized partitioning algorithm to automatically distribute the system equations among the processors in optimum fashion.

  2. Probabilistic evaluation of on-line checks in fault-tolerant multiprocessor systems

    NASA Technical Reports Server (NTRS)

    Nair, V. S. S.; Hoskote, Yatin V.; Abraham, Jacob A.

    1992-01-01

    The analysis of fault-tolerant multiprocessor systems that use concurrent error detection (CED) schemes is much more difficult than the analysis of conventional fault-tolerant architectures. Various analytical techniques have been proposed to evaluate CED schemes deterministically. However, these approaches are based on worst-case assumptions related to the failure of system components. Often, the evaluation results do not reflect the actual fault tolerance capabilities of the system. A probabilistic approach to evaluate the fault detecting and locating capabilities of on-line checks in a system is developed. The various probabilities associated with the checking schemes are identified and used in the framework of the matrix-based model. Based on these probabilistic matrices, estimates for the fault tolerance capabilities of various systems are derived analytically.

  3. High-performance multiprocessor architecture for a 3-D lattice gas model

    NASA Technical Reports Server (NTRS)

    Lee, F.; Flynn, M.; Morf, M.

    1991-01-01

    The lattice gas method has recently emerged as a promising discrete particle simulation method in areas such as fluid dynamics. We present a very high-performance scalable multiprocessor architecture, called ALGE, proposed for the simulation of a realistic 3-D lattice gas model, Henon's 24-bit FCHC isometric model. Each of these VLSI processors is as powerful as a CRAY-2 for this application. ALGE is scalable in the sense that it achieves linear speedup for both fixed and increasing problem sizes with more processors. The core computation of a lattice gas model consists of many repetitions of two alternating phases: particle collision and propagation. Functional decomposition by symmetry group and virtual move are the respective keys to efficient implementation of collision and propagation.

  4. Closed-form solutions of performability. [modeling of a degradable buffer/multiprocessor system

    NASA Technical Reports Server (NTRS)

    Meyer, J. F.

    1981-01-01

    Methods which yield closed form performability solutions for continuous valued variables are developed. The models are similar to those employed in performance modeling (i.e., Markovian queueing models) but are extended so as to account for variations in structure due to faults. In particular, the modeling of a degradable buffer/multiprocessor system is considered whose performance Y is the (normalized) average throughput rate realized during a bounded interval of time. To avoid known difficulties associated with exact transient solutions, an approximate decomposition of the model is employed permitting certain submodels to be solved in equilibrium. These solutions are then incorporated in a model with fewer transient states and by solving the latter, a closed form solution of the system's performability is obtained. In conclusion, some applications of this solution are discussed and illustrated, including an example of design optimization.

  5. Programmable Optoelectronic Multiprocessors And Their Comparison With Symbolic Substitution For Digital Optical Computing

    NASA Astrophysics Data System (ADS)

    Kiamilev, F.; Esener, Sadik C.; Paturi, R.; Fainmar, Y.; Mercier, P.; Guest, C. C.; Lee, Sing H.

    1989-04-01

    This paper introduces programmable arrays of optically inter-connected electronic processors and compares them with conventional symbolic substitution (SS) systems. The comparison is made on the basis of computational efficiency, speed, size, energy utilization, programmability, and fault tolerance. The small grain size and space-invariant connections of SS lead to poor computational efficiency, difficult programming, and difficult incorporation of fault tolerance. Reliance on optical gates as its fundamental building elements is shown to give poor energy utilization. Programmable optoelectronic multiprocessor (POEM) systems, on the other hand, provide the architectural flexibility for good computational efficiency, use an energy-efficient combination of technologies, and support traditional programming methodologies and fault tolerance. Although the inherent clock speed of POEM systems is slower than that of SS systems, for most problems they will provide greater computational throughput. This comparison does not take into account the recent addition of crossover interconnect and space-variant masks to the SS architecture.

  6. Commodity multi-processor systems in the ATLAS level-2 trigger

    SciTech Connect

    Abolins, M.; Blair, R.; Bock, R.; Bogaerts, A.; Dawson, J.; Ermoline, Y.; Hauser, R.; Kugel, A.; Lay, R.; Muller, M.; Noffz, K.-H.; Pope, B.; Schlereth, J.; Werner, P.

    2000-05-23

    Low cost SMP (Symmetric Multi-Processor) systems provide substantial CPU and I/O capacity. These features together with the ease of system integration make them an attractive and cost effective solution for a number of real-time applications in event selection. In ATLAS the authors consider them as intelligent input buffers (active ROB complex), as event flow supervisors or as powerful processing nodes. Measurements of the performance of one off-the-shelf commercial 4-processor PC with two PCI buses, equipped with commercial FPGA based data source cards (microEnable) and running commercial software are presented and mapped on such applications together with a long-term program of work. The SMP systems may be considered as an important building block in future data acquisition systems.

  7. Dynamic modelling and estimation of the error due to asynchronism in a redundant asynchronous multiprocessor system

    NASA Technical Reports Server (NTRS)

    Huynh, Loc C.; Duval, R. W.

    1986-01-01

    The use of Redundant Asynchronous Multiprocessor System to achieve ultrareliable Fault Tolerant Control Systems shows great promise. The development has been hampered by the inability to determine whether differences in the outputs of redundant CPU's are due to failures or to accrued error built up by slight differences in CPU clock intervals. This study derives an analytical dynamic model of the difference between redundant CPU's due to differences in their clock intervals and uses this model with on-line parameter identification to idenitify the differences in the clock intervals. The ability of this methodology to accurately track errors due to asynchronisity generate an error signal with the effect of asynchronisity removed and this signal may be used to detect and isolate actual system failures.

  8. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications

    SciTech Connect

    Kamil, Shoaib A; Hendry, Gilbert; Biberman, Aleksandr; Chan, Johnnie; Lee, Benjamin G.; Mohiyuddin, Marghoob; Jain, Ankit; Bergman, Keren; Carloni, Luca; Kubiatowicz, John; Oliker, Leonid; Shalf, John

    2009-01-31

    As multiprocessors scale to unprecedented numbers of cores in order to sustain performance growth, it is vital that these gains are not nullified by high energy consumption from inter-core communication. With recent advances in 3D Integration CMOS technology, the possibility for realizing hybrid photonic-electronic networks-on-chip warrants investigating real application traces on functionally comparable photonic and electronic network designs. We present a comparative analysis using both synthetic benchmarks as well as real applications, run through detailed cycle accurate models implemented under the OMNeT++ discrete event simulation environment. Results show that when utilizing standard process-to-processor mapping methods, this hybrid network can achieve 75X improvement in energy efficiency for synthetic benchmarks and up to 37X improvement for real scientific applications, defined as network performance per energy spent, over an electronic mesh for large messages across a variety of communication patterns.

  9. Dynamic Scheduling Real-Time Task Using Primary-Backup Overloading Strategy for Multiprocessor Systems

    NASA Astrophysics Data System (ADS)

    Sun, Wei; Yu, Chen; Défago, Xavier; Inoguchi, Yasushi

    The scheduling of real-time tasks with fault-tolerant requirements has been an important problem in multiprocessor systems. The primary-backup (PB) approach is often used as a fault-tolerant technique to guarantee the deadlines of tasks despite the presence of faults. In this paper we propose a dynamic PB-based task scheduling approach, wherein an allocation parameter is used to search the available time slots for a newly arriving task, and the previously scheduled tasks can be re-scheduled when there is no available time slot for the newly arriving task. In order to improve the schedulability we also propose an overloading strategy for PB-overloading and Backup-backup (BB) overloading. Our proposed task scheduling algorithm is compared with some existing scheduling algorithms in the literature through simulation studies. The results have shown that the task rejection ratio of our real-time task scheduling algorithm is almost 50% lower than the compared algorithms.

  10. Characterizing parallel file-access patterns on a large-scale multiprocessor

    NASA Technical Reports Server (NTRS)

    Purakayastha, A.; Ellis, Carla; Kotz, David; Nieuwejaar, Nils; Best, Michael L.

    1995-01-01

    High-performance parallel file systems are needed to satisfy tremendous I/O requirements of parallel scientific applications. The design of such high-performance parallel file systems depends on a comprehensive understanding of the expected workload, but so far there have been very few usage studies of multiprocessor file systems. This paper is part of the CHARISMA project, which intends to fill this void by measuring real file-system workloads on various production parallel machines. In particular, we present results from the CM-5 at the National Center for Supercomputing Applications. Our results are unique because we collect information about nearly every individual I/O request from the mix of jobs running on the machine. Analysis of the traces leads to various recommendations for parallel file-system design.

  11. Mapping virtual addresses to different physical addresses for value disambiguation for thread memory access requests

    DOEpatents

    Gala, Alan; Ohmacht, Martin

    2014-09-02

    A multiprocessor system includes nodes. Each node includes a data path that includes a core, a TLB, and a first level cache implementing disambiguation. The system also includes at least one second level cache and a main memory. For thread memory access requests, the core uses an address associated with an instruction format of the core. The first level cache uses an address format related to the size of the main memory plus an offset corresponding to hardware thread meta data. The second level cache uses a physical main memory address plus software thread meta data to store the memory access request. The second level cache accesses the main memory using the physical address with neither the offset nor the thread meta data after resolving speculation. In short, this system includes mapping of a virtual address to a different physical addresses for value disambiguation for different threads.

  12. Asynchronous and corrected-asynchronous numerical solutions of parabolic PDES on MIMD multiprocessors

    NASA Technical Reports Server (NTRS)

    Amitai, Dganit; Averbuch, Amir; Itzikowitz, Samuel; Turkel, Eli

    1991-01-01

    A major problem in achieving significant speed-up on parallel machines is the overhead involved with synchronizing the concurrent process. Removing the synchronization constraint has the potential of speeding up the computation. The authors present asynchronous (AS) and corrected-asynchronous (CA) finite difference schemes for the multi-dimensional heat equation. Although the discussion concentrates on the Euler scheme for the solution of the heat equation, it has the potential for being extended to other schemes and other parabolic partial differential equations (PDEs). These schemes are analyzed and implemented on the shared memory multi-user Sequent Balance machine. Numerical results for one and two dimensional problems are presented. It is shown experimentally that the synchronization penalty can be about 50 percent of run time: in most cases, the asynchronous scheme runs twice as fast as the parallel synchronous scheme. In general, the efficiency of the parallel schemes increases with processor load, with the time level, and with the problem dimension. The efficiency of the AS may reach 90 percent and over, but it provides accurate results only for steady-state values. The CA, on the other hand, is less efficient, but provides more accurate results for intermediate (non steady-state) values.

  13. Abnormal fault-recovery characteristics of the fault-tolerant multiprocessor uncovered using a new fault-injection methodology

    NASA Technical Reports Server (NTRS)

    Padilla, Peter A.

    1991-01-01

    An investigation was made in AIRLAB of the fault handling performance of the Fault Tolerant MultiProcessor (FTMP). Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once in every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles Byzantine or lying faults. Byzantine faults behave such that the faulted unit points to a working unit as the source of errors. The design's problems involve: (1) the design and interface between the simplex error detection hardware and the error processing software, (2) the functional capabilities of the FTMP system bus, and (3) the communication requirements of a multiprocessor architecture. These weak areas in the FTMP's design increase the probability that, for any hardware fault, a good line replacement unit (LRU) is mistakenly disabled by the fault management software.

  14. Design and construction of the high-speed optoelectronic memory system demonstrator.

    PubMed

    Barbieri, Roberto; Benabes, Philippe; Bierhoff, Thomas; Caswell, Josh J; Gauthier, Alain; Jahns, Jürgen; Jarczynski, Manfred; Lukowicz, Paul; Oksman, Jacques; Russell, Gordon A; Schrage, Jürgen; Snowdon, John F; Stübbe, Oliver; Troster, Gerhard; Wirz, Marco

    2008-07-01

    The high-speed optoelectronic memory system project is concerned with the reduction of latency within multiprocessor computer systems (a key problem) by the use of optoelectronics and associated packaging technologies. System demonstrators have been constructed to enable the evaluation of the technologies in terms of manufacturability. The system combines fiber, free space, and planar integrated optical waveguide technologies to augment the electronic memory and the processor components. Modeling and simulation techniques were developed toward the analysis and design of board-integrated waveguide transmission characteristics and optical interfacing. We describe the fabrication, assembly, and simulation of the major components within the system.

  15. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    SciTech Connect

    Nash, T.

    1989-05-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. 6 figs.

  16. Debugging Fortran on a shared memory machine

    SciTech Connect

    Allen, T.R.; Padua, D.A.

    1987-01-01

    Debugging on a parallel processor is more difficult than debugging on a serial machine because errors in a parallel program may introduce nondeterminism. The approach to parallel debugging presented here attempts to reduce the problem of debugging on a parallel machine to that of debugging on a serial machine by automatically detecting nondeterminism. 20 refs., 6 figs.

  17. Shared Memory Consistency Models: A Tutorial

    DTIC Science & Technology

    1995-09-01

    Boolean Matching for Full-Custom ECL Gates.’’ On-Chip Caches.’’ Robert N. Mayo, Herve Touati . Steven J.E. Wilton and Norman P. Jouppi. WRL Research...Boolean Matching for Full-Custom ECL Gates’’ Robert N. Mayo and Herve Touati . WRL Technical Note TN-37, June 1993. ‘‘Ramonamap - An Example of

  18. Optical Shared Memory System Demonstration Model

    DTIC Science & Technology

    1990-07-01

    Interconnects for VLSI," Opt. Eng., vol. 25, no. 10, pp. 1109 -1118 (October 1986). 10 J.C. Kirsch , D.G. Gregory, T.D. Hudson, D.J. Lanteigne, "Design of...34 Opt. Eng., vol. 25, no. 10, pp. 1109 -1118 (October 1986). 17 J.C. Kirsch , D.G. Gregory, T.D. Hudson, D.J. Lanteigne, "Design of Photopolymer...lator. Fir thl riif,diflah tra:.lpar’lt leIr,-h.- , irv tj..d. uk fir tie i a r S=rt ) Per ! mo dulator partially rellht-iin :irrors, aoll elcctrodeus

  19. Towards Scalable 1024 Processor Shared Memory Systems

    NASA Technical Reports Server (NTRS)

    Ciotti, Robert B.; Thigpen, William W. (Technical Monitor)

    2001-01-01

    Over the past 3 years, NASA Ames has been involved in a cooperative effort with SGI to develop the largest single system image systems available. Currently a 1024 Origin3OOO is under development, with first boot expected later in the summer of 2001. This paper discusses some early results with a 512p Origin3OOO system and some arcane IRIX system calls that can dramatically improve scaling performance.

  20. 3-dimensional magnetotelluric inversion including topography using deformed hexahedral edge finite elements and direct solvers parallelized on symmetric multiprocessor computers - Part II: direct data-space inverse solution

    NASA Astrophysics Data System (ADS)

    Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.

    2016-01-01

    Following the creation described in Part I of a deformable edge finite-element simulator for 3-D magnetotelluric (MT) responses using direct solvers, in Part II we develop an algorithm named HexMT for 3-D regularized inversion of MT data including topography. Direct solvers parallelized on large-RAM, symmetric multiprocessor (SMP) workstations are used also for the Gauss-Newton model update. By exploiting the data-space approach, the computational cost of the model update becomes much less in both time and computer memory than the cost of the forward simulation. In order to regularize using the second norm of the gradient, we factor the matrix related to the regularization term and apply its inverse to the Jacobian, which is done using the MKL PARDISO library. For dense matrix multiplication and factorization related to the model update, we use the PLASMA library which shows very good scalability across processor cores. A synthetic test inversion using a simple hill model shows that including topography can be important; in this case depression of the electric field by the hill can cause false conductors at depth or mask the presence of resistive structure. With a simple model of two buried bricks, a uniform spatial weighting for the norm of model smoothing recovered more accurate locations for the tomographic images compared to weightings which were a function of parameter Jacobians. We implement joint inversion for static distortion matrices tested using the Dublin secret model 2, for which we are able to reduce nRMS to ˜1.1 while avoiding oscillatory convergence. Finally we test the code on field data by inverting full impedance and tipper MT responses collected around Mount St Helens in the Cascade volcanic chain. Among several prominent structures, the north-south trending, eruption-controlling shear zone is clearly imaged in the inversion.

  1. Mechanical memory

    DOEpatents

    Gilkey, Jeffrey C.; Duesterhaus, Michelle A.; Peter, Frank J.; Renn, Rosemarie A.; Baker, Michael S.

    2006-05-16

    A first-in-first-out (FIFO) microelectromechanical memory apparatus (also termed a mechanical memory) is disclosed. The mechanical memory utilizes a plurality of memory cells, with each memory cell having a beam which can be bowed in either of two directions of curvature to indicate two different logic states for that memory cell. The memory cells can be arranged around a wheel which operates as a clocking actuator to serially shift data from one memory cell to the next. The mechanical memory can be formed using conventional surface micromachining, and can be formed as either a nonvolatile memory or as a volatile memory.

  2. Mechanical memory

    DOEpatents

    Gilkey, Jeffrey C.; Duesterhaus, Michelle A.; Peter, Frank J.; Renn, Rosemarie A.; Baker, Michael S.

    2006-08-15

    A first-in-first-out (FIFO) microelectromechanical memory apparatus (also termed a mechanical memory) is disclosed. The mechanical memory utilizes a plurality of memory cells, with each memory cell having a beam which can be bowed in either of two directions of curvature to indicate two different logic states for that memory cell. The memory cells can be arranged around a wheel which operates as a clocking actuator to serially shift data from one memory cell to the next. The mechanical memory can be formed using conventional surface micromachining, and can be formed as either a nonvolatile memory or as a volatile memory.

  3. Human memory B cells.

    PubMed

    Seifert, M; Küppers, R

    2016-12-01

    A key feature of the adaptive immune system is the generation of memory B and T cells and long-lived plasma cells, providing protective immunity against recurring infectious agents. Memory B cells are generated in germinal center (GC) reactions in the course of T cell-dependent immune responses and are distinguished from naive B cells by an increased lifespan, faster and stronger response to stimulation and expression of somatically mutated and affinity matured immunoglobulin (Ig) genes. Approximately 40% of human B cells in adults are memory B cells, and several subsets were identified. Besides IgG(+) and IgA(+) memory B cells, ∼50% of peripheral blood memory B cells express IgM with or without IgD. Further smaller subpopulations have additionally been described. These various subsets share typical memory B cell features, but likely also fulfill distinct functions. IgM memory B cells appear to have the propensity for refined adaptation upon restimulation in additional GC reactions, whereas reactivated IgG B cells rather differentiate directly into plasma cells. The human memory B-cell pool is characterized by (sometimes amazingly large) clonal expansions, often showing extensive intraclonal IgV gene diversity. Moreover, memory B-cell clones are frequently composed of members of various subsets, showing that from a single GC B-cell clone a variety of memory B cells with distinct functions is generated. Thus, the human memory B-cell compartment is highly diverse and flexible. Several B-cell malignancies display features suggesting a derivation from memory B cells. This includes a subset of chronic lymphocytic leukemia, hairy cell leukemia and marginal zone lymphomas. The exposure of memory B cells to oncogenic events during their generation in the GC, the longevity of these B cells and the ease to activate them may be key determinants for their malignant transformation.

  4. The Performance of Parallel Disk Write Methods for Linux Multiprocessor Nodes

    SciTech Connect

    Benson, G D; Long, K; Pacheco, P

    2003-05-07

    Despite increasing attention paid to parallel I/O and the introduction of MPI-IO, there is limited, practical data to help guide a programmer in the choice of a good parallel write strategy in the absence of a parallel file system. In this study we experimentally evaluate several methods for implementing parallel computations that interleave a significant number of contiguous or strided writes to a local disk on Linux-based multiprocessor nodes. Using synthetic benchmark programs written with MPI and Pthreads, we have acquired detailed performance data for different application characteristics of programs running on dual processor nodes. In general, our results show that programs that perform a significant amount of I/O relative to pure computation benefit greatly from the use of threads, while programs that perform relatively little I/O obtain excellent results using only MPI. For a pure MPI approach, we have found that it is best to use two writing processes with mmap(). For Pthreads it is usually best to use write() for contiguous data and writev() for strided data. Codes that use mmap() tend to benefit from periodic syncs of the data of the data to the disk, while codes that use write() or writev() tend to have better performance with few syncs. A straightforward use of ROMIO usually does not perform as well as these direct approaches for writing to the local disk.

  5. Fault-free behavior of reliable multiprocessor systems: FTMP experiments in AIRLAB

    NASA Technical Reports Server (NTRS)

    Clune, E.; Segall, Z.; Siewiorek, D.

    1985-01-01

    This report describes a set of experiments which were implemented on the Fault tolerant Multi-Processor (FTMP) at NASA/Langley's AIRLAB facility. These experiments are part of an effort to formulate and evaluate validation methodologies for fault-tolerant computers. This report deals with the measurement of single parameters (baselines) of a fault free system. The initial set of baseline experiments lead to the following conclusions: (1) The system clock is constant and independent of workload in the tested cases; (2) the instruction execution times are constant; (3) the R4 frame size is 40mS with some variation; (4) the frame stretching mechanism has some flaws in its implementation that allow the possibility of an infinite stretching of frame duration. Future experiments are planned. Some will broaden the results of these initial experiments. Others will measure the system more dynamically. The implementation of a synthetic workload generation mechanism for FTMP is planned to enhance the experimental environment of the system.

  6. Reusing existing resources for testing a multi-processor system-on-chip

    NASA Astrophysics Data System (ADS)

    Lee, Seung Eun

    2013-03-01

    In this article, we propose a test strategy for a multi-processor system-on-chip and model the test time for distributed Intellectual Property (IP) cores. The proposed test methodology uses the existing on-chip resources, IP cores and network elements in network-on-chip. The use of embedded IP cores as a built- in self-test (BIST) module completes the test much faster than an external test and provides flexibility in the test program. Moreover, the reuse of the existing network resources as a test media eliminates additional test access mechanism (TAM) wires in the design and increases test parallelism, reducing the area and test time. Based on the proposed test methodology, we evaluate the test time for distributed IP cores. First, we define the model for a distributed IP core with four parameters in the context of test purposes. Next, the required test time is driven. Finally, we show the characteristics of IP cores for a parallel testing that provides useful information for the test scheduling.

  7. Use of a genetic algorithm to solve fluid flow problems on an NCUBE/2 multiprocessor computer

    SciTech Connect

    Pryor, R.J.; Cline, D.D.

    1992-04-01

    This paper presents a method to solve partial differential equations governing two-phase fluid flow by using a genetic algorithm on the NCUBE/2 multiprocessor computer. Genetic algorithms represent a significant departure from traditional approaches of solving fluid flow problems. The inherent parallelism of genetic algorithms offers the prospect of obtaining solutions faster than ever possible. The paper discusses the two-phase flow equations, the genetic representation of the unknowns, the fitness function, the genetic operators, and the implementation of the genetic algorithm on the NCUBE/2 computer. The paper investigates the implementation efficiency using a pipe blowdown test and presents the effects of varying both the genetic parameters and the number of processors. The results show that genetic algorithms provide a major advancement in methods for solving two-phase flow problems. A desired goal of solving these equations for a specific simulation problem in real time or faster requires computers with an order of magnitude more processors or faster than the NCUBE/2's 1024.

  8. Use of a genetic algorithm to solve fluid flow problems on an NCUBE/2 multiprocessor computer

    SciTech Connect

    Pryor, R.J.; Cline, D.D.

    1992-04-01

    This paper presents a method to solve partial differential equations governing two-phase fluid flow by using a genetic algorithm on the NCUBE/2 multiprocessor computer. Genetic algorithms represent a significant departure from traditional approaches of solving fluid flow problems. The inherent parallelism of genetic algorithms offers the prospect of obtaining solutions faster than ever possible. The paper discusses the two-phase flow equations, the genetic representation of the unknowns, the fitness function, the genetic operators, and the implementation of the genetic algorithm on the NCUBE/2 computer. The paper investigates the implementation efficiency using a pipe blowdown test and presents the effects of varying both the genetic parameters and the number of processors. The results show that genetic algorithms provide a major advancement in methods for solving two-phase flow problems. A desired goal of solving these equations for a specific simulation problem in real time or faster requires computers with an order of magnitude more processors or faster than the NCUBE/2`s 1024.

  9. Demand-driven interpretation of FP programs on a data-flow multiprocessor

    SciTech Connect

    Wei, Y.H.; Gaudiot, J.L.

    1988-08-01

    The functional programming language approach has been proposed as a solution to the programmability of large scale multiprocessor systems. This paper presents a demand-driven evaluation system for the list-structure language, Backus' FP systems. It enables execution in a data-driven environment. A formal approach for transforming FP programs into lazy programs, which contains the notion of demands, is used. The superset language of FP is called DFP (demand-driven FP). The DFP lazy programs are shown to have the property of always evaluating a sufficient and necessary result. A demand reduction scheme is used to remove unnecessary demand propagations on DFP programs to reduce run-time overhead. The DFP programs are translated into data-flow graphs according to the graph schemata developed from the FP-DFP transformation rules. The execution characteristics of the DFP graphs are identified and the architecture supports for efficient execution are suggested. Due to the laziness and the least evaluation property of transformed DFP programs, the system allows programming in FP with infinite data structures and the application of partial-functional-value evaluation. Examples of these applications including an infinite sequence generation and a fast Fourier transform are used to demonstrate the transformation process, the principles of run-time interpretation, the effectiveness of the transformation, and the power of the lazy evaluation system.

  10. Common interface real-time multiprocessor operating system for embedded systems. Master's thesis

    SciTech Connect

    Rottman, M.S.

    1991-03-04

    Large real time applications such as aerospace avionics systems, battle management, and factory automation place many demands and constraints on the computing system not found in other applications. Software development is hindered by software dependence on the computer architecture and the lack of portability between systems. This thesis specifies and designs a real time multiprocessor operating system (RTMOS) that implements a consistent programming model, enabling the development of real time parallel software independent of the target architecture. The RTMOS defines the core functionality required to demonstrate the programming model. The RTMOS functional requirements are specified using Structured Analysis and Design Technique (SADT). A hybrid of the Design Approach for Real-Time Software (DARTS) is used to perform the preliminary and detailed designs. The preliminary design is architecture-independent; the detailed design phase maps the design to a specific parallel system, the Intel iPSC/2 hypercube. The modular RTMOS design partitions operating system operations and data structures from hardware-dependent functions for portability.

  11. Reconfigurable fault-tolerant multiprocessor system for real-time control

    SciTech Connect

    Kao, M.L.

    1986-01-01

    Real-time control applications place stringent constraints in computers controlling them since the failure of a computer could result in costly damages and even loss of human lives. Fault-tolerant computers, therefore, have been always in high demand in critical avionic and aerospace applications. However, the use of redundancy techniques to achieve fault tolerance in industrial applications has only recently become feasible due to the rapid decrease in cost and increase in performance of microprocessors. As more and more robots are being built to replace human beings in dangerous and difficult tasks, the need for a reliable computer for robotics control increases. This need, in particular, motivated the research described in this dissertation - the design and implementation of a reconfigurable fault-tolerant multiprocessor system (the FREMP system). The FREMP system consists of four processing units (PUs) and three common parallel buses. Each PU is a combination of an Intel 86/30 single board computer and a custom fault detection/masking circuit board (FDM board). A hardware/software combined scheme was devised to detect faults and correct errors. This scheme has shown to be more efficient than software voting while maintaining the flexibility of software approaches. Time-frame scheduling was adopted to schedule tasks for execution.

  12. Optimal Scheme for Search State Space and Scheduling on Multiprocessor Systems

    NASA Astrophysics Data System (ADS)

    Youness, Hassan A.; Sakanushi, Keishi; Takeuchi, Yoshinori; Salem, Ashraf; Wahdan, Abdel-Moneim; Imai, Masaharu

    A scheduling algorithm aims to minimize the overall execution time of the program by properly allocating and arranging the execution order of the tasks on the core processors such that the precedence constraints among the tasks are preserved. In this paper, we present a new scheduling algorithm by using geometry analysis of the Task Precedence Graph (TPG) based on A* search technique and uses a computationally efficient cost function for guiding the search with reduced complexity and pruning techniques to produce an optimal solution for the allocation/scheduling problem of a parallel application to parallel and multiprocessor architecture. The main goal of this work is to significantly reduce the search space and achieve the optimality or near optimal solution. We implemented the algorithm on general task graph problems that are processed on most of related search work and obtain the optimal scheduling with a small number of states. The proposed algorithm reduced the exhaustive search by at least 50% of search space. The viability and potential of the proposed algorithm is demonstrated by an illustrative example.

  13. System and method for memory allocation in a multiclass memory system

    DOEpatents

    Loh, Gabriel; Meswani, Mitesh; Ignatowski, Michael; Nutter, Mark

    2016-06-28

    A system for memory allocation in a multiclass memory system includes a processor coupleable to a plurality of memories sharing a unified memory address space, and a library store to store a library of software functions. The processor identifies a type of a data structure in response to a memory allocation function call to the library for allocating memory to the data structure. Using the library, the processor allocates portions of the data structure among multiple memories of the multiclass memory system based on the type of the data structure.

  14. Sharing values, sharing a vision

    SciTech Connect

    Not Available

    1993-12-31

    Teamwork, partnership and shared values emerged as recurring themes at the Third Technology Transfer/Communications Conference. The program drew about 100 participants who sat through a packed two days to find ways for their laboratories and facilities to better help American business and the economy. Co-hosts were the Lawrence Livermore National Laboratory and the Lawrence Berkeley Laboratory, where most meetings took place. The conference followed traditions established at the First Technology Transfer/Communications Conference, conceived of and hosted by the Pacific Northwest Laboratory in May 1992 in Richmond, Washington, and the second conference, hosted by the National Renewable Energy Laboratory in January 1993 in Golden, Colorado. As at the other conferences, participants at the third session represented the fields of technology transfer, public affairs and communications. They came from Department of Energy headquarters and DOE offices, laboratories and production facilities. Continued in this report are keynote address; panel discussion; workshops; and presentations in technology transfer.

  15. Microsupercomputers: Design and implementation. Technical progress report, November 1988-March 1989

    SciTech Connect

    Hennessy, J.L.; Horowitz, M.A.

    1989-03-01

    Contents: (1) parallel processor architecture; (2) parallel software; (3) unit processor architecture; (4) computer aided designs tools; (5) very large scale integration. keywords: scalable shared memory multiprocessors, high performance cache design.

  16. Can High Bandwidth and Latency Justify Large Cache Blocks in Scalable Multiprocessors?

    DTIC Science & Technology

    1994-01-01

    the remote access latency and bandwidth. In this paper , we examine the relatiomship between these factors in the context of large-scale, network-based...each block of memory. Each node contains the directory for the memory associated with that node. Throughout this paper we refer to the ensemble of...another parameter in our study). The latency of the memory module is 10 processor cycles. The interconnection network is a bi-directional wormhole

  17. Implementation of spectral-finite difference method for simulation of stratified turbulent flows on distributed memory multiprocessors

    SciTech Connect

    Garg, R.P.; Ferziger, J.H.; Monismith, S.G.

    1995-12-01

    The parallel implementation of a spectral finite difference algorithm for simulation of stratified turbulent flows on iPSC/860 Hypercube and Paragon XP/S parallel computers is presented. A single-program multiple data abstraction is used in conjunction with static data partitioning scheme. Performance measurements of the overall algorithm are presented for three different uni-partitioning schemes and are discussed in the context of associated dependency and communication overheads. The timing measurements show that Paragon is about 60% faster than iPSC/860 but the iPSC/860 shows a better speedup efficiency. Unscaled speedup efficiency of up to 91% was obtained on iPSC/860.

  18. The FORCE - A highly portable parallel programming language

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.

  19. The FORCE - A highly portable parallel programming language

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.

  20. Efficient mapping algorithms for scheduling robot inverse dynamics computation on a multiprocessor system

    NASA Technical Reports Server (NTRS)

    Lee, C. S. G.; Chen, C. L.

    1989-01-01

    Two efficient mapping algorithms for scheduling the robot inverse dynamics computation consisting of m computational modules with precedence relationship to be executed on a multiprocessor system consisting of p identical homogeneous processors with processor and communication costs to achieve minimum computation time are presented. An objective function is defined in terms of the sum of the processor finishing time and the interprocessor communication time. The minimax optimization is performed on the objective function to obtain the best mapping. This mapping problem can be formulated as a combination of the graph partitioning and the scheduling problems; both have been known to be NP-complete. Thus, to speed up the searching for a solution, two heuristic algorithms were proposed to obtain fast but suboptimal mapping solutions. The first algorithm utilizes the level and the communication intensity of the task modules to construct an ordered priority list of ready modules and the module assignment is performed by a weighted bipartite matching algorithm. For a near-optimal mapping solution, the problem can be solved by the heuristic algorithm with simulated annealing. These proposed optimization algorithms can solve various large-scale problems within a reasonable time. Computer simulations were performed to evaluate and verify the performance and the validity of the proposed mapping algorithms. Finally, experiments for computing the inverse dynamics of a six-jointed PUMA-like manipulator based on the Newton-Euler dynamic equations were implemented on an NCUBE/ten hypercube computer to verify the proposed mapping algorithms. Computer simulation and experimental results are compared and discussed.

  1. Polynomial algorithms for multiprocessor scheduling with a small number of job lengths

    SciTech Connect

    McCormick, S.T.; Smallwood, S.R.; Spieksma, F.C.R.

    1997-06-01

    The following problem was originally motivated by a question arising in scheduling maintenance periods for aircraft. Each maintenance period is a job, and the maintenance facilities are machines. In this context, there are very few different types of maintenances performed, so it is natural to consider the problem with only a small, fixed number C of different types of jobs. Each job type has a processing time, and each machine is available for the same length of time. A machine can handle at most one job at a time, all jobs are released at time zero, there are no due dates or precedence constraints, and preemption is not allowed. The question is whether it is possible to finish all jobs. We call this problem the Multiprocessor Scheduling Problem with C job lengths (MSPC). Scheduling problems such as MSPC where we can partition the jobs into a relatively few types such that all jobs of each type are identical are often called high-multiplicity problems. High-multiplicity problems are interesting because their input is very compact: the input to MSPC consists of only 2C + 2 numbers. For the case C = 2 we present a polynomial-time algorithm. We show that this algorithm produces a schedule that uses at most three different one-machine schedules, the minimum possible number. Further, we extend this algorithm to the case of machine-dependent deadlines and to a multi-parametric case. Finally, we discuss why our approach appears not to extend to the case C > 2.

  2. MEMORY MODULATION

    PubMed Central

    Roozendaal, Benno; McGaugh, James L.

    2011-01-01

    Our memories are not all created equally strong: Some experiences are well remembered while others are remembered poorly, if at all. Research on memory modulation investigates the neurobiological processes and systems that contribute to such differences in the strength of our memories. Extensive evidence from both animal and human research indicates that emotionally significant experiences activate hormonal and brain systems that regulate the consolidation of newly acquired memories. These effects are integrated through noradrenergic activation of the basolateral amygdala which regulates memory consolidation via interactions with many other brain regions involved in consolidating memories of recent experiences. Modulatory systems not only influence neurobiological processes underlying the consolidation of new information, but also affect other mnemonic processes, including memory extinction, memory recall and working memory. In contrast to their enhancing effects on consolidation, adrenal stress hormones impair memory retrieval and working memory. Such effects, as with memory consolidation, require noradrenergic activation of the basolateral amygdala and interactions with other brain regions. PMID:22122145

  3. Memory Matters

    MedlinePlus

    ... different parts. Some of them are important for memory. The hippocampus (say: hih-puh-KAM-pus) is one of the more important parts of the brain that processes memories. Old information and new information, or memories, are ...

  4. The OpenMP Memory Model

    SciTech Connect

    Hoeflinger, J P; de Supinski, B R

    2005-06-01

    The memory model of OpenMP has been widely misunderstood since the first OpenMP specification was published in 1997 (Fortran 1.0). The proposed OpenMP specification (version 2.5) includes a memory model section to address this issue. This section unifies and clarifies the text about the use of memory in all previous specifications, and relates the model to well-known memory consistency semantics. In this paper, we discuss the memory model and show its implications for future distributed shared memory implementations of OpenMP.

  5. Bipartite memory network architectures for parallel processing

    SciTech Connect

    Smith, W.; Kale, L.V. . Dept. of Computer Science)

    1990-01-01

    Parallel architectures are boradly classified as either shared memory or distributed memory architectures. In this paper, the authors propose a third family of architectures, called bipartite memory network architectures. In this architecture, processors and memory modules constitute a bipartite graph, where each processor is allowed to access a small subset of the memory modules, and each memory module allows access from a small set of processors. The architecture is particularly suitable for computations requiring dynamic load balancing. The authors explore the properties of this architecture by examining the Perfect Difference set based topology for the graph. Extensions of this topology are also suggested.

  6. Process Management and Exception Handling in Multiprocessor Operating Systems Using Object-Oriented Design Techniques. Revised Sep. 1988

    NASA Technical Reports Server (NTRS)

    Russo, Vincent; Johnston, Gary; Campbell, Roy

    1988-01-01

    The programming of the interrupt handling mechanisms, process switching primitives, scheduling mechanism, and synchronization primitives of an operating system for a multiprocessor require both efficient code in order to support the needs of high- performance or real-time applications and careful organization to facilitate maintenance. Although many advantages have been claimed for object-oriented class hierarchical languages and their corresponding design methodologies, the application of these techniques to the design of the primitives within an operating system has not been widely demonstrated. To investigate the role of class hierarchical design in systems programming, the authors have constructed the Choices multiprocessor operating system architecture the C++ programming language. During the implementation, it was found that many operating system design concerns can be represented advantageously using a class hierarchical approach, including: the separation of mechanism and policy; the organization of an operating system into layers, each of which represents an abstract machine; and the notions of process and exception management. In this paper, we discuss an implementation of the low-level primitives of this system and outline the strategy by which we developed our solution.

  7. Parallel processing of real-time dynamic systems simulation on OSCAR (Optimally SCheduled Advanced multiprocessoR)

    NASA Technical Reports Server (NTRS)

    Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke

    1989-01-01

    Parallel processing of real-time dynamic systems simulation on a multiprocessor system named OSCAR is presented. In the simulation of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for parallel processing of the simulation since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, parallelism inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the parallelism from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.

  8. Episodic memory in nonhuman animals

    PubMed Central

    Templer, Victoria L.

    2013-01-01

    Summary Episodic memories differ from other types of memory because they represent aspects of the past not present in other memories, such as the time, place, or social context in which the memories were formed. Focus on phenomenal experience in human memory, such as the sense of “having been there” has resulted in conceptualizations of episodic memory that are difficult or impossible to apply to nonhumans. It is therefore a significant challenge for investigators to agree on objective behavioral criteria that can be applied in nonhumans and still capture features of memory thought to be critical in humans. Some investigators have attempted to use neurobiological parallels to bridge this gap. However, defining memory types on the basis of the brain structures involved rather than on identified cognitive mechanisms risks missing the most crucial functional aspects of episodic memory, which are ultimately behavioral. The most productive way forward is likely a combination of neurobiology and sophisticated cognitive testing that identifies the mental representations present in episodic memory. Investigators that have refined their approach from asking the naïve question “do nonhuman animals have episodic memory” to instead asking “what aspects of episodic memory are shared by humans and nonhumans” are making progress. PMID:24028963

  9. Experiences with Transitioning Science Data Production from a Symmetric Multiprocessor Platform to a Linux Cluster Environment

    NASA Astrophysics Data System (ADS)

    Walter, R. J.; Protack, S. P.; Harris, C. J.; Caruthers, C.; Kusterer, J. M.

    2008-12-01

    NASA's Atmospheric Science Data Center at the NASA Langley Research Center performs all of the science data processing for the Multi-angle Imaging SpectroRadiometer (MISR) instrument. MISR is one of the five remote sensing instruments flying aboard NASA's Terra spacecraft. From the time of Terra launch in December 1999 until February 2008, all MISR science data processing was performed on a Silicon Graphics, Inc. (SGI) platform. However, dramatic improvements in commodity computing technology coupled with steadily declining project budgets during that period eventually made transitioning MISR processing to a commodity computing environment both feasible and necessary. The Atmospheric Science Data Center has successfully ported the MISR science data processing environment from the SGI platform to a Linux cluster environment. There were a multitude of technical challenges associated with this transition. Even though the core architecture of the production system did not change, the manner in which it interacted with underlying hardware was fundamentally different. In addition, there are more potential throughput bottlenecks in a cluster environment than there are in a symmetric multiprocessor environment like the SGI platform and each of these had to be addressed. Once all the technical issues associated with the transition were resolved, the Atmospheric Science Data Center had a MISR science data processing system with significantly higher throughput than the SGI platform at a fraction of the cost. In addition to the commodity hardware, free and open source software such as S4PM, Sun Grid Engine, PostgreSQL and Ganglia play a significant role in the new system. Details of the technical challenges and resolutions, software systems, performance improvements, and cost savings associated with the transition will be discussed. The Atmospheric Science Data Center in Langley's Science Directorate leads NASA's program for the processing, archival and distribution of Earth

  10. Memory Palaces

    ERIC Educational Resources Information Center

    Wood, Marianne

    2007-01-01

    This article presents a lesson called Memory Palaces. A memory palace is a memory tool used to remember information, usually as visual images, in a sequence that is logical to the person remembering it. In his book, "In the Palaces of Memory", George Johnson calls them "...structure(s) for arranging knowledge. Lots of connections to language arts,…

  11. Involuntary memory chains: what do they tell us about autobiographical memory organisation?

    PubMed

    Mace, John H; Clevinger, Amanda M; Bernas, Ronan S

    2013-04-01

    Involuntary memory chains are spontaneous recollections of the past that occur as a sequence of associated memories. This memory phenomenon has provided some insights into the nature of associations in autobiographical memory. For example, it has shown that conceptually associated memories (memories sharing similar content, such as the same people or themes) are more prevalent than general-event associated memories (memories from the same extended event period, such as a trip). This finding has suggested that conceptual associations are a central organisational principle in the autobiographical memory system. This study used involuntary memories chains to gain additional insights into the associative structure of autobiographical memory. Among the main results, we found that general-event associations have higher rates of forgetting than conceptual associations, and in long memory chains (i.e., those with more than two memories) conceptually associated memories were more likely to activate memories in their associative class, whereas general-event associated memories were less likely to activate memories in their associative class. We interpret the results as further evidence that conceptual associations are a major organising principle in the autobiographical memory system, and attempt to explain why general-event associations have shorter lifespans than conceptual associations.

  12. Targeted Memory Reactivation during Sleep Adaptively Promotes the Strengthening or Weakening of Overlapping Memories.

    PubMed

    Oyarzún, Javiera P; Morís, Joaquín; Luque, David; de Diego-Balaguer, Ruth; Fuentemilla, Lluís

    2017-08-09

    System memory consolidation is conceptualized as an active process whereby newly encoded memory representations are strengthened through selective memory reactivation during sleep. However, our learning experience is highly overlapping in content (i.e., shares common elements), and memories of these events are organized in an intricate network of overlapping associated events. It remains to be explored whether and how selective memory reactivation during sleep has an impact on these overlapping memories acquired during awake time. Here, we test in a group of adult women and men the prediction that selective memory reactivation during sleep entails the reactivation of associated events and that this may lead the brain to adaptively regulate whether these associated memories are strengthened or pruned from memory networks on the basis of their relative associative strength with the shared element. Our findings demonstrate the existence of efficient regulatory neural mechanisms governing how complex memory networks are shaped during sleep as a function of their associative memory strength.SIGNIFICANCE STATEMENT Numerous studies have demonstrated that system memory consolidation is an active, selective, and sleep-dependent process in which only subsets of new memories become stabilized through their reactivation. However, the learning experience is highly overlapping in content and thus events are encoded in an intricate network of related memories. It remains to be explored whether and how memory reactivation has an impact on overlapping memories acquired during awake time. Here, we show that sleep memory reactivation promotes strengthening and weakening of overlapping memories based on their associative memory strength. These results suggest the existence of an efficient regulatory neural mechanism that avoids the formation of cluttered memory representation of multiple events and promotes stabilization of complex memory networks. Copyright © 2017 the authors 0270-6474/17/377748-11$15.00/0.

  13. State recovery and lockstep execution restart in a system with multiprocessor pairing

    DOEpatents

    Gara, Alan; Gschwind, Michael K; Salapura, Valentina

    2014-01-21

    System, method and computer program product for a multiprocessing system to offer selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). Each paired microprocessor or processor cores that provide one highly reliable thread for high-reliability connect with a system components such as a memory "nest" (or memory hierarchy), an optional system controller, and optional interrupt controller, optional I/O or peripheral devices, etc. The memory nest is attached to a selective pairing facility via a switch or a bus. Each selectively paired processor core is includes a transactional execution facility, whereing the system is configured to enable processor rollback to a previous state and reinitialize lockstep execution in order to recover from an incorrect execution when an incorrect execution has been detected by the selective pairing facility.

  14. The evolution of episodic memory

    PubMed Central

    Allen, Timothy A.; Fortin, Norbert J.

    2013-01-01

    One prominent view holds that episodic memory emerged recently in humans and lacks a “(neo)Darwinian evolution” [Tulving E (2002) Annu Rev Psychol 53:1–25]. Here, we review evidence supporting the alternative perspective that episodic memory has a long evolutionary history. We show that fundamental features of episodic memory capacity are present in mammals and birds and that the major brain regions responsible for episodic memory in humans have anatomical and functional homologs in other species. We propose that episodic memory capacity depends on a fundamental neural circuit that is similar across mammalian and avian species, suggesting that protoepisodic memory systems exist across amniotes and, possibly, all vertebrates. The implication is that episodic memory in diverse species may primarily be due to a shared underlying neural ancestry, rather than the result of evolutionary convergence. We also discuss potential advantages that episodic memory may offer, as well as species-specific divergences that have developed on top of the fundamental episodic memory architecture. We conclude by identifying possible time points for the emergence of episodic memory in evolution, to help guide further research in this area. PMID:23754432

  15. SHARING EDUCATIONAL SERVICES.

    ERIC Educational Resources Information Center

    Catskill Area Project in Small School Design, Oneonta, NY.

    SHARED SERVICES, A COOPERATIVE SCHOOL RESOURCE PROGRAM, IS DEFINED IN DETAIL. INCLUDED IS A DISCUSSION OF THEIR NEED, ADVANTAGES, GROWTH, DESIGN, AND OPERATION. SPECIFIC PROCEDURES FOR OBTAINING STATE AID IN SHARED SERVICES, EFFECTS OF SHARED SERVICES ON THE SCHOOL, AND HINTS CONCERNING SHARED SERVICES ARE DESCRIBED. CHARACTERISTICS OF THE SMALL…

  16. Modeling and performance evaluation of the DPS25 packet switching multiprocessor system

    NASA Astrophysics Data System (ADS)

    Mitropoulos, Spyridon

    1987-05-01

    A packet switching communication network is evaluated via queue network modeling theory. Direct access memory systems, busses linking the network processors, commutation and signal processors, and virtual reference circuits are considered. All models are developed with the help of the QNAP2 program and listings of the models are given. The method is illustrated with results from simulations at the various system levels.

  17. Memory Matters

    MedlinePlus

    ... blood vessel (which carries the blood) bursts. continue Brain Injuries Affect Memory At any age, an injury to ... with somebody's memory. Some people who recover from brain injuries need to learn old things all over again, ...

  18. Collaboratively Sharing Scientific Data

    NASA Astrophysics Data System (ADS)

    Wang, Fusheng; Vergara-Niedermayr, Cristobal

    Scientific research becomes increasingly reliant on multi-disciplinary, multi-institutional collaboration through sharing experimental data. Indeed, data sharing is mandatory by government research agencies such as NIH. The major hurdles for data sharing come from: i) the lack of data sharing infrastructure to make data sharing convenient for users; ii) users’ fear of losing control of their data; iii) difficulty on sharing schemas and incompatible data from sharing partners; and iv) inconsistent data under schema evolution. In this paper, we develop a collaborative data sharing system SciPort, to support consistency preserved data sharing among multiple distributed organizations. The system first provides Central Server based lightweight data integration architecture, so data and schemas can be conveniently shared across multiple organizations. Through distributed schema management, schema sharing and evolution is made possible, while data consistency is maintained and data compatibility is enforced. With this data sharing system, distributed sites can now consistently share their research data and their associated schemas with much convenience and flexibility. SciPort has been successfully used for data sharing in biomedical research, clinical trials and large scale research collaboration.

  19. Data Traffic Reduction Schemes for Cholesky Factorization on Asynchronous Multiprocessor Systems

    DTIC Science & Technology

    1989-06-01

    are used to get a h’wer b1 ilnd ,,n the data traffic in cmiputing the Ch,lesky factor. Le una 2 Let I bc the amount of computational work which is to...even if the initial values of matriz A are in the processor local memory before the computation begins. 7 D - - - - - - - - F -~ I I I E SH G 2-I

  20. Memory Dysfunction

    PubMed Central

    Matthews, Brandy R.

    2015-01-01

    Purpose of Review: This article highlights the dissociable human memory systems of episodic, semantic, and procedural memory in the context of neurologic illnesses known to adversely affect specific neuroanatomic structures relevant to each memory system. Recent Findings: Advances in functional neuroimaging and refinement of neuropsychological and bedside assessment tools continue to support a model of multiple memory systems that are distinct yet complementary and to support the potential for one system to be engaged as a compensatory strategy when a counterpart system fails. Summary: Episodic memory, the ability to recall personal episodes, is the subtype of memory most often perceived as dysfunctional by patients and informants. Medial temporal lobe structures, especially the hippocampal formation and associated cortical and subcortical structures, are most often associated with episodic memory loss. Episodic memory dysfunction may present acutely, as in concussion; transiently, as in transient global amnesia (TGA); subacutely, as in thiamine deficiency; or chronically, as in Alzheimer disease. Semantic memory refers to acquired knowledge about the world. Anterior and inferior temporal lobe structures are most often associated with semantic memory loss. The semantic variant of primary progressive aphasia (svPPA) is the paradigmatic disorder resulting in predominant semantic memory dysfunction. Working memory, associated with frontal lobe function, is the active maintenance of information in the mind that can be potentially manipulated to complete goal-directed tasks. Procedural memory, the ability to learn skills that become automatic, involves the basal ganglia, cerebellum, and supplementary motor cortex. Parkinson disease and related disorders result in procedural memory deficits. Most memory concerns warrant bedside cognitive or neuropsychological evaluation and neuroimaging to assess for specific neuropathologies and guide treatment. PMID:26039844

  1. Emerging memories

    NASA Astrophysics Data System (ADS)

    Baldi, Livio; Bez, Roberto; Sandhu, Gurtej

    2014-12-01

    Memory is a key component of any data processing system. Following the classical Turing machine approach, memories hold both the data to be processed and the rules for processing them. In the history of microelectronics, the distinction has been rather between working memory, which is exemplified by DRAM, and storage memory, exemplified by NAND. These two types of memory devices now represent 90% of all memory market and 25% of the total semiconductor market, and have been the technology drivers in the last decades. Even if radically different in characteristics, they are however based on the same storage mechanism: charge storage, and this mechanism seems to be near to reaching its physical limits. The search for new alternative memory approaches, based on more scalable mechanisms, has therefore gained new momentum. The status of incumbent memory technologies and their scaling limitations will be discussed. Emerging memory technologies will be analyzed, starting from the ones that are already present for niche applications, and which are getting new attention, thanks to recent technology breakthroughs. Maturity level, physical limitations and potential for scaling will be compared to existing memories. At the end the possible future composition of memory systems will be discussed.

  2. Memory protection

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.

    1988-01-01

    Accidental overwriting of files or of memory regions belonging to other programs, browsing of personal files by superusers, Trojan horses, and viruses are examples of breakdowns in workstations and personal computers that would be significantly reduced by memory protection. Memory protection is the capability of an operating system and supporting hardware to delimit segments of memory, to control whether segments can be read from or written into, and to confine accesses of a program to its segments alone. The absence of memory protection in many operating systems today is the result of a bias toward a narrow definition of performance as maximum instruction-execution rate. A broader definition, including the time to get the job done, makes clear that cost of recovery from memory interference errors reduces expected performance. The mechanisms of memory protection are well understood, powerful, efficient, and elegant. They add to performance in the broad sense without reducing instruction execution rate.

  3. Declarative memory.

    PubMed

    Riedel, Wim J; Blokland, Arjan

    2015-01-01

    Declarative Memory consists of memory for events (episodic memory) and facts (semantic memory). Methods to test declarative memory are key in investigating effects of potential cognition-enhancing substances--medicinal drugs or nutrients. A number of cognitive performance tests assessing declarative episodic memory tapping verbal learning, logical memory, pattern recognition memory, and paired associates learning are described. These tests have been used as outcome variables in 34 studies in humans that have been described in the literature in the past 10 years. Also, the use of episodic tests in animal research is discussed also in relation to the drug effects in these tasks. The results show that nutritional supplementation of polyunsaturated fatty acids has been investigated most abundantly and, in a number of cases, but not all, show indications of positive effects on declarative memory, more so in elderly than in young subjects. Studies investigating effects of registered anti-Alzheimer drugs, cholinesterase inhibitors in mild cognitive impairment, show positive and negative effects on declarative memory. Studies mainly carried out in healthy volunteers investigating the effects of acute dopamine stimulation indicate enhanced memory consolidation as manifested specifically by better delayed recall, especially at time points long after learning and more so when drug is administered after learning and if word lists are longer. The animal studies reveal a different picture with respect to the effects of different drugs on memory performance. This suggests that at least for episodic memory tasks, the translational value is rather poor. For the human studies, detailed parameters of the compositions of word lists for declarative memory tests are discussed and it is concluded that tailored adaptations of tests to fit the hypothesis under study, rather than "off-the-shelf" use of existing tests, are recommended.

  4. Generalized quantum secret sharing

    SciTech Connect

    Singh, Sudhir Kumar; Srikanth, R.

    2005-01-01

    We explore a generalization of quantum secret sharing (QSS) in which classical shares play a complementary role to quantum shares, exploring further consequences of an idea first studied by Nascimento, Mueller-Quade, and Imai [Phys. Rev. A 64, 042311 (2001)]. We examine three ways, termed inflation, compression, and twin thresholding, by which the proportion of classical shares can be augmented. This has the important application that it reduces quantum (information processing) players by replacing them with their classical counterparts, thereby making quantum secret sharing considerably easier and less expensive to implement in a practical setting. In compression, a QSS scheme is turned into an equivalent scheme with fewer quantum players, compensated for by suitable classical shares. In inflation, a QSS scheme is enlarged by adding only classical shares and players. In a twin-threshold scheme, we invoke two separate thresholds for classical and quantum shares based on the idea of information dilution.

  5. The Structure of Memory: Fixed of Flexible? Structural Learning Series.

    ERIC Educational Resources Information Center

    Scandura, Joseph M.

    Most current information processing theories of cognition and memory share one common feature: the structure (state-space) of memory is fixed and retrieval from memory involves searching through that structure. Learning, where it is treated at all, involves transforming one such structure into another. This form of representation is questioned and…

  6. A Formal Model of Capacity Limits in Working Memory

    ERIC Educational Resources Information Center

    Oberauer, Klaus; Kliegl, Reinhold

    2006-01-01

    A mathematical model of working-memory capacity limits is proposed on the key assumption of mutual interference between items in working memory. Interference is assumed to arise from overwriting of features shared by these items. The model was fit to time-accuracy data of memory-updating tasks from four experiments using nonlinear mixed effect…

  7. Children's Working Memory: Investigating Performance Limitations in Complex Span Tasks

    ERIC Educational Resources Information Center

    Conlin, J.A.; Gathercole, S.E.; Adams, J.W.

    2005-01-01

    Three experiments investigated the roles of resource-sharing and intrinsic memory demands in complex working memory span performance in 7- and 9-year-olds. In Experiment 1, the processing complexity of arithmetic operations was varied under conditions in which processing times were equivalent. Memory span did not differ as a function of processing…

  8. A Formal Model of Capacity Limits in Working Memory

    ERIC Educational Resources Information Center

    Oberauer, Klaus; Kliegl, Reinhold

    2006-01-01

    A mathematical model of working-memory capacity limits is proposed on the key assumption of mutual interference between items in working memory. Interference is assumed to arise from overwriting of features shared by these items. The model was fit to time-accuracy data of memory-updating tasks from four experiments using nonlinear mixed effect…

  9. The influence of visual feedback from the recent past on the programming of grip aperture is grasp-specific, shared between hands, and mediated by sensorimotor memory not task set.

    PubMed

    Tang, Rixin; Whitwell, Robert L; Goodale, Melvyn A

    2015-05-01

    Goal-directed movements, such as reaching out to grasp an object, are necessarily constrained by the spatial properties of the target such as its size, shape, and position. For example, during a reach-to-grasp movement, the peak width of the aperture formed by the thumb and fingers in flight (peak grip aperture, PGA) is linearly related to the target's size. Suppressing vision throughout the movement (visual open loop) has a small though significant effect on this relationship. Visual open loop conditions also produce a large increase in the PGA compared to when vision is available throughout the movement (visual closed loop). Curiously, this differential effect of the availability of visual feedback is influenced by the presentation order: the difference in PGA between closed- and open-loop trials is smaller when these trials are intermixed (an effect we have called 'homogenization'). Thus, grasping movements are affected not only by the availability of visual feedback (closed loop or open loop) but also by what happened on the previous trial. It is not clear, however, whether this carry-over effect is mediated through motor (or sensorimotor) memory or through the interference of different task sets for closed-loop and open-loop feedback that determine when the movements are fully specified. We reasoned that sensorimotor memory, but not a task set for closed and open loop feedback, would be specific to the type of response. We tested this prediction in a condition in which pointing to targets was alternated with grasping those same targets. Critically, in this condition, when pointing was performed in open loop, grasping was always performed in closed loop (and vice versa). Despite the fact that closed- and open-loop trials were alternating in this condition, we found no evidence for homogenization of the PGA. Homogenization did occur, however, in a follow-up experiment in which grasping movements and visual feedback were alternated between the left and the right

  10. Eliminating Useless Messages in Write-Update Protocols on Scalable Multiprocessors.

    DTIC Science & Technology

    1994-10-01

    between successive write operations to the data [Eggers and Katz, 1988]. The disadvantage of WU is that every write operation to shared data requires...CetogogDston ushi useful739 4- term $.9 1.8 - teno prolif I prdif 15 ’false 1.8- Wee I M 12.5- 1o 2 - 0.41 . 1.0 1 .5 1.5- 050 1 042 00.40.7 0.4 .. 0.0

  11. A comparison of multiprocessor scheduling methods for iterative data flow architectures

    NASA Technical Reports Server (NTRS)

    Storch, Matthew

    1993-01-01

    A comparative study is made between the Algorithm to Architecture Mapping Model (ATAMM) and three other related multiprocessing models from the published literature. The primary focus of all four models is the non-preemptive scheduling of large-grain iterative data flow graphs as required in real-time systems, control applications, signal processing, and pipelined computations. Important characteristics of the models such as injection control, dynamic assignment, multiple node instantiations, static optimum unfolding, range-chart guided scheduling, and mathematical optimization are identified. The models from the literature are compared with the ATAMM for performance, scheduling methods, memory requirements, and complexity of scheduling and design procedures.

  12. Memories Are Made of This

    ERIC Educational Resources Information Center

    Chang, Christine

    2010-01-01

    In this article, the author shares her memories of Sally Smith, the founder of The Lab School of Washington, where she works as the director of the Occupational Therapy. When the author first met Smith, Smith asked her what brought her to The Lab School at that point in her career. She told Smith that her background was rather eclectic, since she…

  13. Memories Are Made of This

    ERIC Educational Resources Information Center

    Chang, Christine

    2010-01-01

    In this article, the author shares her memories of Sally Smith, the founder of The Lab School of Washington, where she works as the director of the Occupational Therapy. When the author first met Smith, Smith asked her what brought her to The Lab School at that point in her career. She told Smith that her background was rather eclectic, since she…

  14. Runtime and Programming Support for Memory Adaptation in Scientific Applications via Local Disk and Remote Memory

    SciTech Connect

    Mills, Richard T; Yue, Chuan; Andreas, Stathopoulos; Nikolopoulos, Dimitrios S

    2007-01-01

    The ever increasing memory demands of many scientific applications and the complexity of today's shared computational resources still require the occasional use of virtual memory, network memory, or even out-of-core implementations, with well known drawbacks in performance and usability. In Mills et al. (Adapting to memory pressure from within scientific applications on multiprogrammed COWS. In: International Parallel and Distributed Processing Symposium, IPDPS, Santa Fe, NM, 2004), we introduced a basic framework for a runtime, user-level library, MMlib, in which DRAM is treated as a dynamic size cache for large memory objects residing on local disk. Application developers can specify and access these objects through MMlib, enabling their application to execute optimally under variable memory availability, using as much DRAM as fluctuating memory levels will allow. In this paper, we first extend our earlier MMlib prototype from a proof of concept to a usable, robust, and flexible library. We present a general framework that enables fully customizable memory malleability in a wide variety of scientific applications. We provide several necessary enhancements to the environment sensing capabilities of MMlib, and introduce a remote memory capability, based on MPI communication of cached memory blocks between 'compute nodes' and designated memory servers. The increasing speed of interconnection networks makes a remote memory approach attractive, especially at the large granularity present in large scientific applications. We show experimental results from three important scientific applications that require the general MMlib framework. The memory-adaptive versions perform nearly optimally under constant memory pressure and execute harmoniously with other applications competing for memory, without thrashing the memory system. Under constant memory pressure, we observe execution time improvements of factors between three and

  15. Parallel variable-band Choleski solvers for computational structural analysis applications on vector multiprocessor supercomputers

    NASA Technical Reports Server (NTRS)

    Poole, E. L.; Overman, A. L.

    1991-01-01

    A Choleski method used to solve linear systems of equations that arise in large scale structural analyses is described. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is also used for two different parallel implementations, demonstrating the use of CRAY macrotasking. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both the CRAY-2 and CRAY Y-MP computers. CPU and wall clock timings are given for the various parallel methods and are compared to single processor timings of the same algorithm. Computation rates over 1 GIGAFLOP (1 billion floating point operations per second) on a four processor CRAY-2 and over 2 GIGAFLOPS on an eight processor CRAY Y-MP are demonstrated as measured by wall clock time in a dedicated environment. Reduced wall clock times for the parallel methods relative to the single processor implementation of the same Choleski algorithm are also demonstrated for runs made in multi-user mode.

  16. Flashbulb Memories

    PubMed Central

    Hirst, William; Phelps, Elizabeth A.

    2015-01-01

    We review and analyze the key theories, debates, findings, and omissions of the existing literature on flashbulb memories (FBMs), including what factors affect their formation, retention, and degree of confidence. We argue that FBMs do not require special memory mechanisms and are best characterized as involving both forgetting and mnemonic distortions, despite a high level of confidence. Factual memories for FBM-inducing events generally follow a similar pattern. Although no necessary and sufficient factors straightforwardly account for FBM retention, media attention particularly shapes memory for the events themselves. FBMs are best characterized in term of repetitions, even of mnemonic distortions, whereas event memories evidence corrections. The bearing of this literature on social identity and traumatic memories is also discussed. PMID:26997762

  17. DMA shared byte counters in a parallel computer

    DOEpatents

    Chen, Dong; Gara, Alan G.; Heidelberger, Philip; Vranas, Pavlos

    2010-04-06

    A parallel computer system is constructed as a network of interconnected compute nodes. Each of the compute nodes includes at least one processor, a memory and a DMA engine. The DMA engine includes a processor interface for interfacing with the at least one processor, DMA logic, a memory interface for interfacing with the memory, a DMA network interface for interfacing with the network, injection and reception byte counters, injection and reception FIFO metadata, and status registers and control registers. The injection FIFOs maintain memory locations of the injection FIFO metadata memory locations including its current head and tail, and the reception FIFOs maintain the reception FIFO metadata memory locations including its current head and tail. The injection byte counters and reception byte counters may be shared between messages.

  18. Memory in health and in schizophrenia

    PubMed Central

    Gur, Ruben C.; Gur, Raquel E.

    2013-01-01

    Memory is an important capacity needed for survival in a changing environment, and its principles are shared across species. These principles have been studied since the inception of behavioral science, and more recently neuroscience has helped understand brain systems and mechanisms responsible for enabling aspects of memory. Here we outline the history of work on memory and its neural underpinning, and describe the major dimensions of memory processing that have been evaluated by cognitive neuroscience, focusing on episodic memory. We present evidence in healthy populations for sex differences—females outperforming in verbal and face memory, and age effects—slowed memory processes with age. We then describe deficits associated with schizophrenia. Impairment in schizophrenia is more severe in patients with negative symptoms—especially flat affect—who also show deficits in measures of social cognition. This evidence implicates medial temporal and frontal regions in schizophrenia. PMID:24459407

  19. Memory in health and in schizophrenia.

    PubMed

    Gur, Ruben C; Gur, Raquel E

    2013-12-01

    Memory is an important capacity needed for survival in a changing environment, and its principles are shared across species. These principles have been studied since the inception of behavioral science, and more recently neuroscience has helped understand brain systems and mechanisms responsible for enabling aspects of memory. Here we outline the history of work on memory and its neural underpinning, and describe the major dimensions of memory processing that have been evaluated by cognitive neuroscience, focusing on episodic memory. We present evidence in healthy populations for sex differences-females outperforming in verbal and face memory, and age effects-slowed memory processes with age. We then describe deficits associated with schizophrenia. Impairment in schizophrenia is more severe in patients with negative symptoms-especially flat affect-who also show deficits in measures of social cognition. This evidence implicates medial temporal and frontal regions in schizophrenia.

  20. Skilled Memory.

    DTIC Science & Technology

    1980-11-06

    Morse code (Bryan & Harter , 1899). In every case, memory performance of the expert seems to violate the established limits of short- term memory. How is...of immediate memory. Quarterly Journal of Experimental psychology, 1958, 10, 12-21. Bryan, W. L., & Harter N. psychological Review, 1899, 6, 345-375...16, 1980 Page 5 Civil Govt Non Govt Dr. Susan Chipman 1 Dr. John R. Anderson Learning and Development Department of Psychology National Institute of

  1. Virtual memory

    NASA Technical Reports Server (NTRS)

    Denning, P. J.

    1986-01-01

    Virtual memory was conceived as a way to automate overlaying of program segments. Modern computers have very large main memories, but need automatic solutions to the relocation and protection problems. Virtual memory serves this need as well and is thus useful in computers of all sizes. The history of the idea is traced, showing how it has become a widespread, little noticed feature of computers today.

  2. HEP - A semaphore-synchronized multiprocessor with central control. [Heterogeneous Element Processor

    NASA Technical Reports Server (NTRS)

    Gilliland, M. C.; Smith, B. J.; Calvert, W.

    1976-01-01

    The paper describes the design concept of the Heterogeneous Element Processor (HEP), a system tailored to the special needs of scientific simulation. In order to achieve high-speed computation required by simulation, HEP features a hierarchy of processes executing in parallel on a number of processors, with synchronization being largely accomplished by hardware. A full-empty-reserve scheme of synchronization is realized by zero-one-valued hardware semaphores. A typical system has, besides the control computer and the scheduler, an algebraic module, a memory module, a first-in first-out (FIFO) module, an integrator module, and an I/O module. The architecture of the scheduler and the algebraic module is examined in detail.

  3. HEP - A semaphore-synchronized multiprocessor with central control. [Heterogeneous Element Processor

    NASA Technical Reports Server (NTRS)

    Gilliland, M. C.; Smith, B. J.; Calvert, W.

    1976-01-01

    The paper describes the design concept of the Heterogeneous Element Processor (HEP), a system tailored to the special needs of scientific simulation. In order to achieve high-speed computation required by simulation, HEP features a hierarchy of processes executing in parallel on a number of processors, with synchronization being largely accomplished by hardware. A full-empty-reserve scheme of synchronization is realized by zero-one-valued hardware semaphores. A typical system has, besides the control computer and the scheduler, an algebraic module, a memory module, a first-in first-out (FIFO) module, an integrator module, and an I/O module. The architecture of the scheduler and the algebraic module is examined in detail.

  4. A parallel row-based algorithm with error control for standard-cell replacement on a hypercube multiprocessor

    NASA Technical Reports Server (NTRS)

    Sargent, Jeff Scott

    1988-01-01

    A new row-based parallel algorithm for standard-cell placement targeted for execution on a hypercube multiprocessor is presented. Key features of this implementation include a dynamic simulated-annealing schedule, row-partitioning of the VLSI chip image, and two novel new approaches to controlling error in parallel cell-placement algorithms; Heuristic Cell-Coloring and Adaptive (Parallel Move) Sequence Control. Heuristic Cell-Coloring identifies sets of noninteracting cells that can be moved repeatedly, and in parallel, with no buildup of error in the placement cost. Adaptive Sequence Control allows multiple parallel cell moves to take place between global cell-position updates. This feedback mechanism is based on an error bound derived analytically from the traditional annealing move-acceptance profile. Placement results are presented for real industry circuits and the performance is summarized of an implementation on the Intel iPSC/2 Hypercube. The runtime of this algorithm is 5 to 16 times faster than a previous program developed for the Hypercube, while producing equivalent quality placement. An integrated place and route program for the Intel iPSC/2 Hypercube is currently being developed.

  5. Nonpreemptive run-time scheduling issues on a multitasked, multiprogrammed multiprocessor with dependencies, bidimensional tasks, folding and dynamic graphs

    SciTech Connect

    Miller, Allan Ray

    1987-05-01

    Increases in high speed hardware have mandated studies in software techniques to exploit the parallel capabilities. This thesis examines the effects a run-time scheduler has on a multiprocessor. The model consists of directed, acyclic graphs, generated from serial FORTRAN benchmark programs by the parallel compiler Parafrase. A multitasked, multiprogrammed environment is created. Dependencies are generated by the compiler. Tasks are bidimensional, i.e., they may specify both time and processor requests. Processor requests may be folded into execution time by the scheduler. The graphs may arrive at arbitrary time intervals. The general case is NP-hard, thus, a variety of heuristics are examined by a simulator. Multiprogramming demonstrates a greater need for a run-time scheduler than does monoprogramming for a variety of reasons, e.g., greater stress on the processors, a larger number of independent control paths, more variety in the task parameters, etc. The dynamic critical path series of algorithms perform well. Dynamic critical volume did not add much. Unfortunately, dynamic critical path maximizes turnaround time as well as throughput. Two schedulers are presented which balance throughput and turnaround time. The first requires classification of jobs by type; the second requires selection of a ratio value which is dependent upon system parameters. 45 refs., 19 figs., 20 tabs.

  6. Proactive quantum secret sharing

    NASA Astrophysics Data System (ADS)

    Qin, Huawang; Dai, Yuewei

    2015-11-01

    A proactive quantum secret sharing scheme is proposed, in which the participants can update their key shares periodically. In an updating period, one participant randomly generates the EPR pairs, and the other participants update their key shares and perform the corresponding unitary operations on the particles of the EPR pairs. Then, the participant who generated the EPR pairs performs the Bell-state measurement and updates his key share according to the result of the Bell-state measurement. After an updating period, each participant can change his key share, but the secret is changeless, and the old key shares will be useless even if they have been stolen by the attacker. The proactive property of our scheme is very useful to resist the mobile attacker.

  7. Robert Hooke's model of memory.

    PubMed

    Hintzman, Douglas L

    2003-03-01

    In 1682 the scientist and inventor Robert Hooke read a lecture to the Royal Society of London, in which he described a mechanistic model of human memory. Yet few psychologists today seem to have heard of Hooke's memory model. The lecture addressed questions of encoding, memory capacity, repetition, retrieval, and forgetting--some of these in a surprisingly modern way. Hooke's model shares several characteristics with the theory of Richard Semon, which came more than 200 years later, but it is more complete. Among the model's interesting properties are that (1) it allows for attention and other top-down influences on encoding; (2) it uses resonance to implement parallel, cue-dependent retrieval; (3) it explains memory for recency; (4) it offers a single-system account of repetition priming; and (5) the power law of forgetting can be derived from the model's assumptions in a straightforward way.

  8. Episodic Memories

    ERIC Educational Resources Information Center

    Conway, Martin A.

    2009-01-01

    An account of episodic memories is developed that focuses on the types of knowledge they represent, their properties, and the functions they might serve. It is proposed that episodic memories consist of "episodic elements," summary records of experience often in the form of visual images, associated to a "conceptual frame" that provides a…

  9. Collaging Memories

    ERIC Educational Resources Information Center

    Wallach, Michele

    2011-01-01

    Even middle school students can have memories of their childhoods, of an earlier time. The art of Romare Bearden and the writings of Paul Auster can be used to introduce ideas about time and memory to students and inspire works of their own. Bearden is an exceptional role model for young artists, not only because of his astounding art, but also…

  10. Memory Magic.

    ERIC Educational Resources Information Center

    Hartman, Thomas G.; Nowak, Norman

    This paper outlines several "tricks" that aid students in improving their memories. The distinctions between operational and figural thought processes are noted. Operational memory is described as something that allows adults to make generalizations about numbers and the rules by which they may be combined, thus leading to easier memorization.…

  11. Collaging Memories

    ERIC Educational Resources Information Center

    Wallach, Michele

    2011-01-01

    Even middle school students can have memories of their childhoods, of an earlier time. The art of Romare Bearden and the writings of Paul Auster can be used to introduce ideas about time and memory to students and inspire works of their own. Bearden is an exceptional role model for young artists, not only because of his astounding art, but also…

  12. Episodic Memories

    ERIC Educational Resources Information Center

    Conway, Martin A.

    2009-01-01

    An account of episodic memories is developed that focuses on the types of knowledge they represent, their properties, and the functions they might serve. It is proposed that episodic memories consist of "episodic elements," summary records of experience often in the form of visual images, associated to a "conceptual frame" that provides a…

  13. Commissioning of SharePlan: The Liverpool Experience

    NASA Astrophysics Data System (ADS)

    Xing, Aitang; Deshpande, Shrikant; Arumugam, Sankar; George, Armia; Holloway, Lois; Goozee, Gary

    2014-03-01

    SharePlan is a treatment planning system developed by Raysearch Laboratories AB to enable creation of a linear accelerator intensity modulated radiotherapy (IMRT) plan as a backup for a Tomotherapy plan. A 6MV Elekta Synergy Linear accelerator photon beam was modelled in SharePlan. The beam model was validated using Matrix Evolution, a 2D ion chamber array, for two head-neck and three prostate plans using 3%/3mm Gamma criteria. For 39 IMRT beams, the minimum and maximum Gamma pass rates are 95.4% and 98.7%. SharePlan is able to generate backup IMRT plans which are deliverable on a traditional linear accelerator and accurate in terms of clinical criteria. During use of SharePlan, however, an out-of-memory error frequently occurred and SharePlan was forced to be closed. This error occurred occasionally at any of these steps: loading the Tomotherapy plan into SharePlan, generating the IMRT plan, selecting the optimal plan, approving the plan and setting up a QA plan. The out-of-memory error was caused by memory leakage in one or more of the C/C++ functions implemented in SharePlan fluence engine, dose engine or optimizer, as acknowledged by the manufacturer. Because of the interruption caused by out-of-memory errors, SharePlan has not been implemented in our clinic although accuracy has been verified. A new software program is now being provided to our centre to replace SharePlan.

  14. Memory conformity affects inaccurate memories more than accurate memories.

    PubMed

    Wright, Daniel B; Villalba, Daniella K

    2012-01-01

    After controlling for initial confidence, inaccurate memories were shown to be more easily distorted than accurate memories. In two experiments groups of participants viewed 50 stimuli and were then presented with these stimuli plus 50 fillers. During this test phase participants reported their confidence that each stimulus was originally shown. This was followed by computer-generated responses from a bogus participant. After being exposed to this response participants again rated the confidence of their memory. The computer-generated responses systematically distorted participants' responses. Memory distortion depended on initial memory confidence, with uncertain memories being more malleable than confident memories. This effect was moderated by whether the participant's memory was initially accurate or inaccurate. Inaccurate memories were more malleable than accurate memories. The data were consistent with a model describing two types of memory (i.e., recollective and non-recollective memories), which differ in how susceptible these memories are to memory distortion.

  15. Support for non-locking parallel reception of packets belonging to a single memory reception FIFO

    DOEpatents

    Chen, Dong [Yorktown Heights, NY; Heidelberger, Philip [Yorktown Heights, NY; Salapura, Valentina [Yorktown Heights, NY; Senger, Robert M [Yorktown Heights, NY; Steinmacher-Burow, Burkhard [Boeblingen, DE; Sugawara, Yutaka [Yorktown Heights, NY

    2011-01-27

    A method and apparatus for distributed parallel messaging in a parallel computing system. A plurality of DMA engine units are configured in a multiprocessor system to operate in parallel, one DMA engine unit for transferring a current packet received at a network reception queue to a memory location in a memory FIFO (rmFIFO) region of a memory. A control unit implements logic to determine whether any prior received packet destined for that rmFIFO is still in a process of being stored in the associated memory by another DMA engine unit of the plurality, and prevent the one DMA engine unit from indicating completion of storing the current received packet in the reception memory FIFO (rmFIFO) until all prior received packets destined for that rmFIFO are completely stored by the other DMA engine units. Thus, there is provided non-locking support so that multiple packets destined for a single rmFIFO are transferred and stored in parallel to predetermined locations in a memory.

  16. A multilevel nonvolatile magnetoelectric memory

    NASA Astrophysics Data System (ADS)

    Shen, Jianxin; Cong, Junzhuang; Shang, Dashan; Chai, Yisheng; Shen, Shipeng; Zhai, Kun; Sun, Young

    2016-09-01

    The coexistence and coupling between magnetization and electric polarization in multiferroic materials provide extra degrees of freedom for creating next-generation memory devices. A variety of concepts of multiferroic or magnetoelectric memories have been proposed and explored in the past decade. Here we propose a new principle to realize a multilevel nonvolatile memory based on the multiple states of the magnetoelectric coefficient (α) of multiferroics. Because the states of α depends on the relative orientation between magnetization and polarization, one can reach different levels of α by controlling the ratio of up and down ferroelectric domains with external electric fields. Our experiments in a device made of the PMN-PT/Terfenol-D multiferroic heterostructure confirm that the states of α can be well controlled between positive and negative by applying selective electric fields. Consequently, two-level, four-level, and eight-level nonvolatile memory devices are demonstrated at room temperature. This kind of multilevel magnetoelectric memory retains all the advantages of ferroelectric random access memory but overcomes the drawback of destructive reading of polarization. In contrast, the reading of α is nondestructive and highly efficient in a parallel way, with an independent reading coil shared by all the memory cells.

  17. A multilevel nonvolatile magnetoelectric memory

    PubMed Central

    Shen, Jianxin; Cong, Junzhuang; Shang, Dashan; Chai, Yisheng; Shen, Shipeng; Zhai, Kun; Sun, Young

    2016-01-01

    The coexistence and coupling between magnetization and electric polarization in multiferroic materials provide extra degrees of freedom for creating next-generation memory devices. A variety of concepts of multiferroic or magnetoelectric memories have been proposed and explored in the past decade. Here we propose a new principle to realize a multilevel nonvolatile memory based on the multiple states of the magnetoelectric coefficient (α) of multiferroics. Because the states of α depends on the relative orientation between magnetization and polarization, one can reach different levels of α by controlling the ratio of up and down ferroelectric domains with external electric fields. Our experiments in a device made of the PMN-PT/Terfenol-D multiferroic heterostructure confirm that the states of α can be well controlled between positive and negative by applying selective electric fields. Consequently, two-level, four-level, and eight-level nonvolatile memory devices are demonstrated at room temperature. This kind of multilevel magnetoelectric memory retains all the advantages of ferroelectric random access memory but overcomes the drawback of destructive reading of polarization. In contrast, the reading of α is nondestructive and highly efficient in a parallel way, with an independent reading coil shared by all the memory cells. PMID:27681812

  18. Armstrong Memorial Service

    NASA Image and Video Library

    2012-09-13

    NASA Deputy Administrator Lori Garver, right, shares a moment with Apollo 17 mission commander Gene Cernan, the last man to walk on the moon, left, as U.S. Sen. Kay Bailey Hutchison, R-Texas, center looks on prior to a memorial service celebrating the life of Neil Armstrong, Thursday, Sept. 13, 2012, at the Washington National Cathedral. Armstrong, the first man to walk on the moon during the 1969 Apollo 11 mission, died Saturday, Aug. 25. He was 82. Photo Credit: (NASA/Bill Ingalls)

  19. Shared Parenting Dysfunction.

    ERIC Educational Resources Information Center

    Turkat, Ira Daniel

    2002-01-01

    Joint custody of children is the most prevalent court ordered arrangement for families of divorce. A growing body of literature indicates that many parents engage in behaviors that are incompatible with shared parenting. This article provides specific criteria for a definition of the Shared Parenting Dysfunction. Clinical aspects of the phenomenon…

  20. Intelligence Sharing in Bosnia

    DTIC Science & Technology

    2007-11-02

    increases with the demands of near real time accurate intelligence for operational decision-making. Given this environment, intelligence-sharing...operating system providing actionable near-real- time intelligence to commanders for coalition synchronization and the requirement to protect national...real time accurate intelligence for operational decision-making. Given this environment, intelligence-sharing requirements across an ad hoc coalition

  1. Models, Norms and Sharing.

    ERIC Educational Resources Information Center

    Harris, Mary B.

    To investigate the effect of modeling on altruism, 156 third and fifth grade children were exposed to a model who either shared with them, gave to a charity, or refused to share. The test apparatus, identified as a game, consisted of a box with signal lights and a chute through which marbles were dispensed. Subjects and the model played the game…

  2. Work Sharing Case Studies.

    ERIC Educational Resources Information Center

    McCarthy, Maureen E.; And Others

    Designed to provide private sector employers with the practical information necessary to select and then to design and implement work sharing arrangements, this book presents case studies of some 36 work sharing programs. Topics covered in the case studies include the circumstances leading to adoption of the program, details of compensation and…

  3. Balancing Loads Among Parallel Data Processors

    NASA Technical Reports Server (NTRS)

    Baffes, Paul Thomas

    1990-01-01

    Heuristic algorithm minimizes amount of memory used by multiprocessor system. Distributes load of many identical, short computations among multiple parallel digital data processors, each of which has its own (local) memory. Each processor operates on distinct and independent set of data in larger shared memory. As integral part of load-balancing scheme, total amount of space used in shared memory minimized. Possible applications include artificial neural networks or image processors for which "pipeline" and vector methods of load balancing inappropriate.

  4. Share with thy neighbors

    NASA Astrophysics Data System (ADS)

    Chandra, Surendar; Yu, Xuwen

    2007-01-01

    Peer to peer (P2P) systems are traditionally designed to scale to a large number of nodes. However, we focus on scenarios where the sharing is effected only among neighbors. Localized sharing is particularly attractive in scenarios where wide area network connectivity is undesirable, expensive or unavailable. On the other hand, local neighbors may not offer the wide variety of objects possible in a much larger system. The goal of this paper is to investigate a P2P system that shares contents with its neighbors. We analyze the sharing behavior of Apple iTunes users in an University setting. iTunes restricts the sharing of audio and video objects to peers within the same LAN sub-network. We show that users are already making a significant amount of content available for local sharing. We show that these systems are not appropriate for applications that require access to a specific object. We argue that mechanisms that allow the user to specify classes of interesting objects are better suited for these systems. Mechanisms such as bloom filters can allow each peer to summarize the contents available in the neighborhood, reducing network search overhead. This research can form the basis for future storage systems that utilize the shared storage available in neighbors and build a probabilistic storage for local consumption.

  5. Collective memory: a perspective from (experimental) clinical psychology.

    PubMed

    Wessel, Ineke; Moulds, Michelle L

    2008-04-01

    This paper considers the concept of collective memory from an experimental clinical psychology perspective. Exploration of the term collective reveals a broad distinction between literatures that view collective memories as a property of groups (collectivistic memory) and those that regard these memories as a property of individuals who are, to a greater or lesser extent, an integral part of their social environment (social memory). First, we argue that the understanding of collectivistic memory phenomena may benefit from drawing parallels with current psychological models such as the self-memory system theory of individualistic autobiographical memory. Second, we suggest that the social memory literature may inform the study of trauma-related disorders. We argue that a factual focus induced by collaborative remembering may be beneficial to natural recovery in the immediate aftermath of trauma, and propose that shared remembering techniques may provide a useful addition to the treatment of post-traumatic stress disorder.

  6. Programming parallel architectures - The BLAZE family of languages

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush

    1989-01-01

    This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.

  7. Memory Systems Do Not Divide on Consciousness: Reinterpreting Memory in Terms of Activation and Binding

    PubMed Central

    Reder, Lynne M.; Park, Heekyeong; Kieffaber, Paul D.

    2009-01-01

    There is a popular hypothesis that performance on implicit and explicit memory tasks reflects 2 distinct memory systems. Explicit memory is said to store those experiences that can be consciously recollected, and implicit memory is said to store experiences and affect subsequent behavior but to be unavailable to conscious awareness. Although this division based on awareness is a useful taxonomy for memory tasks, the authors review the evidence that the unconscious character of implicit memory does not necessitate that it be treated as a separate system of human memory. They also argue that some implicit and explicit memory tasks share the same memory representations and that the important distinction is whether the task (implicit or explicit) requires the formation of a new association. The authors review and critique dissociations from the behavioral, amnesia, and neuroimaging literatures that have been advanced in support of separate explicit and implicit memory systems by highlighting contradictory evidence and by illustrating how the data can be accounted for using a simple computational memory model that assumes the same memory representation for those disparate tasks. PMID:19210052

  8. Rearview Memories

    ERIC Educational Resources Information Center

    Gross, Gwen E.

    2008-01-01

    In this article, the author shares her experience when she was still a student until she became a superintendent. In her 17th year in the superintendency, the author finds the joys of her work all around her, grateful to be bestowed with the gift of leadership. She shares with colleagues a few especially meaningful moments from her professional…

  9. Musical and verbal memory in Alzheimer's disease: a study of long-term and short-term memory.

    PubMed

    Ménard, Marie-Claude; Belleville, Sylvie

    2009-10-01

    Musical memory was tested in Alzheimer patients and in healthy older adults using long-term and short-term memory tasks. Long-term memory (LTM) was tested with a recognition procedure using unfamiliar melodies. Short-term memory (STM) was evaluated with same/different judgment tasks on short series of notes. Musical memory was compared to verbal memory using a task that used pseudowords (LTM) or syllables (STM). Results indicated impaired musical memory in AD patients relative to healthy controls. The deficit was found for both long-term and short-term memory. Furthermore, it was of the same magnitude for both musical and verbal domains whether tested with short-term or long-term memory tasks. No correlation was found between musical and verbal LTM. However, there was a significant correlation between verbal and musical STM in AD participants and healthy older adults, which suggests that the two domains may share common mechanisms.

  10. Cueing others' memories.

    PubMed

    Tullis, Jonathan G; Benjamin, Aaron S

    2015-05-01

    Many situations require us to generate external cues to support later retrieval from memory. For instance, we create file names in order to cue our memory to a file's contents, and instructors create lecture slides to remember what points to make during classes. We even generate cues for others when we remind friends of shared experiences or send colleagues a computer file that is named in such a way so as to remind them of its contents. Here we explore how and how well learners tailor retrieval cues for different intended recipients. Across three experiments, subjects generated verbal cues for a list of target words for themselves or for others. Learners generated cues for others by increasing the normative cue-to-target associative strength but also by increasing the number of other words their cues point to, relative to cues that they generated for themselves. This strategy was effective: such cues supported higher levels of recall for others than cues generated for oneself. Generating cues for others also required more time than generating cues for oneself. Learners responded to the differential demands of cue generation for others by effortfully excluding personal, episodic knowledge and including knowledge that they estimate to be broadly shared.

  11. A Sharing Proposition.

    ERIC Educational Resources Information Center

    Sturgeon, Julie

    2002-01-01

    Describes how the University of Vermont and St. Michael's College in Burlington, Vermont cooperated to share a single card access system. Discusses the planning, financial, and marketplace advantages of the cooperation. (EV)

  12. A Sharing Proposition.

    ERIC Educational Resources Information Center

    Sturgeon, Julie

    2002-01-01

    Describes how the University of Vermont and St. Michael's College in Burlington, Vermont cooperated to share a single card access system. Discusses the planning, financial, and marketplace advantages of the cooperation. (EV)

  13. Accelerating Spectrum Sharing Technologies

    SciTech Connect

    Juan D. Deaton; Lynda L. Brighton; Rangam Subramanian; Hussein Moradi; Jose Loera

    2013-09-01

    Spectrum sharing potentially holds the promise of solving the emerging spectrum crisis. However, technology innovators face the conundrum of developing spectrum sharing technologies without the ability to experiment and test with real incumbent systems. Interference with operational incumbents can prevent critical services, and the cost of deploying and operating an incumbent system can be prohibitive. Thus, the lack of incumbent systems and frequency authorization for technology incubation and demonstration has stymied spectrum sharing research. To this end, industry, academia, and regulators all require a test facility for validating hypotheses and demonstrating functionality without affecting operational incumbent systems. This article proposes a four-phase program supported by our spectrum accountability architecture. We propose that our comprehensive experimentation and testing approach for technology incubation and demonstration will accelerate the development of spectrum sharing technologies.

  14. System and method for programmable bank selection for banked memory subsystems

    DOEpatents

    Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Hoenicke, Dirk; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan

    2010-09-07

    A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each of the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.

  15. Memory Network For Distributed Data Processors

    NASA Technical Reports Server (NTRS)

    Bolen, David; Jensen, Dean; Millard, ED; Robinson, Dave; Scanlon, George

    1992-01-01

    Universal Memory Network (UMN) is modular, digital data-communication system enabling computers with differing bus architectures to share 32-bit-wide data between locations up to 3 km apart with less than one millisecond of latency. Makes it possible to design sophisticated real-time and near-real-time data-processing systems without data-transfer "bottlenecks". This enterprise network permits transmission of volume of data equivalent to an encyclopedia each second. Facilities benefiting from Universal Memory Network include telemetry stations, simulation facilities, power-plants, and large laboratories or any facility sharing very large volumes of data. Main hub of UMN is reflection center including smaller hubs called Shared Memory Interfaces.

  16. Information partnerships--shared data, shared scale.

    PubMed

    Konsynski, B R; McFarlan, F W

    1990-01-01

    How can one company gain access to another's resources or customers without merging ownership, management, or plotting a takeover? The answer is found in new information partnerships, enabling diverse companies to develop strategic coalitions through the sharing of data. The key to cooperation is a quantum improvement in the hardware and software supporting relational databases: new computer speeds, cheaper mass-storage devices, the proliferation of fiber-optic networks, and networking architectures. Information partnerships mean that companies can distribute the technological and financial exposure that comes with huge investments. For the customer's part, partnerships inevitably lead to greater simplification on the desktop and more common standards around which vendors have to compete. The most common types of partnership are: joint marketing partnerships, such as American Airline's award of frequent flyer miles to customers who use Citibank's credit card; intraindustry partnerships, such as the insurance value-added network service (which links insurance and casualty companies to independent agents); customer-supplier partnerships, such as Baxter Healthcare's electronic channel to hospitals for medical and other equipment; and IT vendor-driven partnerships, exemplified by ESAB (a European welding supplies and equipment company), whose expansion strategy was premised on a technology platform offered by an IT vendor. Partnerships that succeed have shared vision at the top, reciprocal skills in information technology, concrete plans for an early success, persistence in the development of usable information for all partners, coordination on business policy, and a new and imaginative business architecture.

  17. An Emulation Tool for Simulating Matrix Operations on an SIMD (Single Instruction Stream Multiple Data Stream) Multiprocessor.

    DTIC Science & Technology

    1987-10-01

    4.3 DATA TYPES dataval =integer; (Data Type of array matrixdim - 31; {Max rows /cols mat rix -a rray [mat r ixdim~ma t r iidim] of dataval ; ( data...local A,B,R memories.I of dataval ; shiftmem -array[o. .SHREGMAX] ( Shift register memory.) of dataval ;Li cpmem - array[O. .CPMEMMAX] The Central Memory...of dataval ; instruction- record opcode:integer; (Operation code. opl:integer; (First operand. (p:ntgr Second operand. op:ite{r Third operand

  18. Memory consolidation.

    PubMed

    Squire, Larry R; Genzel, Lisa; Wixted, John T; Morris, Richard G

    2015-08-03

    Conscious memory for a new experience is initially dependent on information stored in both the hippocampus and neocortex. Systems consolidation is the process by which the hippocampus guides the reorganization of the information stored in the neocortex such that it eventually becomes independent of the hippocampus. Early evidence for systems consolidation was provided by studies of retrograde amnesia, which found that damage to the hippocampus-impaired memories formed in the recent past, but typically spared memories formed in the more remote past. Systems consolidation has been found to occur for both episodic and semantic memories and for both spatial and nonspatial memories, although empirical inconsistencies and theoretical disagreements remain about these issues. Recent work has begun to characterize the neural mechanisms that underlie the dialogue between the hippocampus and neocortex (e.g., "neural replay," which occurs during sharp wave ripple activity). New work has also identified variables, such as the amount of preexisting knowledge, that affect the rate of consolidation. The increasing use of molecular genetic tools (e.g., optogenetics) can be expected to further improve understanding of the neural mechanisms underlying consolidation. Copyright © 2015 Cold Spring Harbor Laboratory Press; all rights reserved.

  19. Memory Consolidation

    PubMed Central

    Squire, Larry R.; Genzel, Lisa; Wixted, John T.; Morris, Richard G.

    2015-01-01

    Conscious memory for a new experience is initially dependent on information stored in both the hippocampus and neocortex. Systems consolidation is the process by which the hippocampus guides the reorganization of the information stored in the neocortex such that it eventually becomes independent of the hippocampus. Early evidence for systems consolidation was provided by studies of retrograde amnesia, which found that damage to the hippocampus-impaired memories formed in the recent past, but typically spared memories formed in the more remote past. Systems consolidation has been found to occur for both episodic and semantic memories and for both spatial and nonspatial memories, although empirical inconsistencies and theoretical disagreements remain about these issues. Recent work has begun to characterize the neural mechanisms that underlie the dialogue between the hippocampus and neocortex (e.g., “neural replay,” which occurs during sharp wave ripple activity). New work has also identified variables, such as the amount of preexisting knowledge, that affect the rate of consolidation. The increasing use of molecular genetic tools (e.g., optogenetics) can be expected to further improve understanding of the neural mechanisms underlying consolidation. PMID:26238360

  20. Fear Memory.

    PubMed

    Izquierdo, Ivan; Furini, Cristiane R G; Myskiw, Jociane C

    2016-04-01

    Fear memory is the best-studied form of memory. It was thoroughly investigated in the past 60 years mostly using two classical conditioning procedures (contextual fear conditioning and fear conditioning to a tone) and one instrumental procedure (one-trial inhibitory avoidance). Fear memory is formed in the hippocampus (contextual conditioning and inhibitory avoidance), in the basolateral amygdala (inhibitory avoidance), and in the lateral amygdala (conditioning to a tone). The circuitry involves, in addition, the pre- and infralimbic ventromedial prefrontal cortex, the central amygdala subnuclei, and the dentate gyrus. Fear learning models, notably inhibitory avoidance, have also been very useful for the analysis of the biochemical mechanisms of memory consolidation as a whole. These studies have capitalized on in vitro observations on long-term potentiation and other kinds of plasticity. The effect of a very large number of drugs on fear learning has been intensively studied, often as a prelude to the investigation of effects on anxiety. The extinction of fear learning involves to an extent a reversal of the flow of information in the mentioned structures and is used in the therapy of posttraumatic stress disorder and fear memories in general.