Sample records for shared memory programming

  1. Supporting shared data structures on distributed memory architectures

    NASA Technical Reports Server (NTRS)

    Koelbel, Charles; Mehrotra, Piyush; Vanrosendale, John

    1990-01-01

    Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece owned by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. A new programming environment is presented for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. The analysis and program transformations required to implement this environment are described, and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes are described.

  2. Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

    2000-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.

  3. Effects of cacheing on multitasking efficiency and programming strategy on an ELXSI 6400

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Montry, G.R.; Benner, R.E.

    1985-12-01

    The impact of a cache/shared memory architecture, and, in particular, the cache coherency problem, upon concurrent algorithm and program development is discussed. In this context, a simple set of programming strategies are proposed which streamline code development and improve code performance when multitasking in a cache/shared memory or distributed memory environment.

  4. Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)

    2002-01-01

    The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.

  5. High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

    NASA Technical Reports Server (NTRS)

    Simon, Horst D.; Saini, Subhash; Grassi, Charles

    1994-01-01

    The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.

  6. Comparison of two paradigms for distributed shared memory

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Levelt, W.G.; Kaashoek, M.F.; Bal, H.E.

    1990-08-01

    The paper compares two paradigms for Distributed Shared Memory on loosely coupled computing systems: the shared data-object model as used in Orca, a programming language specially designed for loosely coupled computing systems and the Shared Virtual Memory model. For both paradigms the authors have implemented two systems, one using only point-to-point messages, the other using broadcasting as well. They briefly describe these two paradigms and their implementations. Then they compare their performance on four applications: the traveling salesman problem, alpha-beta search, matrix multiplication and the all pairs shortest paths problem. The measurements show that both paradigms can be used efficientlymore » for programming large-grain parallel applications. Significant speedups were obtained on all applications. The unstructured Shared Virtual Memory paradigm achieves the best absolute performance, although this is largely due to the preliminary nature of the Orca compiler used. The structured shared data-object model achieves the highest speedups and is much easier to program and to debug.« less

  7. Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry

    1998-01-01

    This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.

  8. Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

    NASA Technical Reports Server (NTRS)

    Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)

    2000-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.

  9. A simple modern correctness condition for a space-based high-performance multiprocessor

    NASA Technical Reports Server (NTRS)

    Probst, David K.; Li, Hon F.

    1992-01-01

    A number of U.S. national programs, including space-based detection of ballistic missile launches, envisage putting significant computing power into space. Given sufficient progress in low-power VLSI, multichip-module packaging and liquid-cooling technologies, we will see design of high-performance multiprocessors for individual satellites. In very high speed implementations, performance depends critically on tolerating large latencies in interprocessor communication; without latency tolerance, performance is limited by the vastly differing time scales in processor and data-memory modules, including interconnect times. The modern approach to tolerating remote-communication cost in scalable, shared-memory multiprocessors is to use a multithreaded architecture, and alter the semantics of shared memory slightly, at the price of forcing the programmer either to reason about program correctness in a relaxed consistency model or to agree to program in a constrained style. The literature on multiprocessor correctness conditions has become increasingly complex, and sometimes confusing, which may hinder its practical application. We propose a simple modern correctness condition for a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and the parallel programming system.

  10. Combining Distributed and Shared Memory Models: Approach and Evolution of the Global Arrays Toolkit

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nieplocha, Jarek; Harrison, Robert J.; Kumar, Mukul

    2002-07-29

    Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in the modern computers this characteristic might have a negative impact on performance and scalability. Various techniques, such as code restructuring to increase data reuse and introducing blocking in data accesses, can address the problem and yield performance competitive with message passing[Singh], however at the cost of compromising the ease of use feature. Distributed memory models such as message passing or one-sided communication offer performance and scalability butmore » they compromise the ease-of-use. In this context, the message-passing model is sometimes referred to as?assembly programming for the scientific computing?. The Global Arrays toolkit[GA1, GA2] attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed explicitly by the programmer. This management is achieved by explicit calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be explicitly specified and hence managed. The GA model exposes to the programmer the hierarchical memory of modern high-performance computer systems, and by recognizing the communication overhead for remote data transfer, it promotes data reuse and locality of reference. This paper describes the characteristics of the Global Arrays programming model, capabilities of the toolkit, and discusses its evolution.« less

  11. What Multilevel Parallel Programs do when you are not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Labarta, Jesus; Gimenez, Judit

    2004-01-01

    With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.

  12. The FORCE - A highly portable parallel programming language

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.

  13. The FORCE: A highly portable parallel programming language

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.

  14. Shared versus distributed memory multiprocessors

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.

    1991-01-01

    The question of whether multiprocessors should have shared or distributed memory has attracted a great deal of attention. Some researchers argue strongly for building distributed memory machines, while others argue just as strongly for programming shared memory multiprocessors. A great deal of research is underway on both types of parallel systems. Special emphasis is placed on systems with a very large number of processors for computation intensive tasks and considers research and implementation trends. It appears that the two types of systems will likely converge to a common form for large scale multiprocessors.

  15. Memory access in shared virtual memory

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berrendorf, R.

    1992-01-01

    Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.

  16. Memory access in shared virtual memory

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berrendorf, R.

    1992-09-01

    Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.

  17. Scheduling for Locality in Shared-Memory Multiprocessors

    DTIC Science & Technology

    1993-05-01

    Submitted in Partial Fulfillment of the Requirements for the Degree ’)iIC Q(JALfryT INSPECTED 5 DOCTOR OF PHILOSOPHY I Accesion For Supervised by NTIS CRAM... architecture on parallel program performance, explain the implications of this trend on popular parallel programming models, and propose system software to 0...decomoosition and scheduling algorithms. I. SUIUECT TERMS IS. NUMBER OF PAGES shared-memory multiprocessors; architecture trends; loop 110 scheduling

  18. Address tracing for parallel machines

    NASA Technical Reports Server (NTRS)

    Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent

    1991-01-01

    Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.

  19. Hybrid MPI+OpenMP Programming of an Overset CFD Solver and Performance Investigations

    NASA Technical Reports Server (NTRS)

    Djomehri, M. Jahed; Jin, Haoqiang H.; Biegel, Bryan (Technical Monitor)

    2002-01-01

    This report describes a two level parallelization of a Computational Fluid Dynamic (CFD) solver with multi-zone overset structured grids. The approach is based on a hybrid MPI+OpenMP programming model suitable for shared memory and clusters of shared memory machines. The performance investigations of the hybrid application on an SGI Origin2000 (O2K) machine is reported using medium and large scale test problems.

  20. Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

    NASA Technical Reports Server (NTRS)

    Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.

    2003-01-01

    Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.

  1. Programming distributed memory architectures using Kali

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush; Vanrosendale, John

    1990-01-01

    Programming nonshared memory systems is more difficult than programming shared memory systems, in part because of the relatively low level of current programming environments for such machines. A new programming environment is presented, Kali, which provides a global name space and allows direct access to remote data values. In order to retain efficiency, Kali provides a system on annotations, allowing the user to control those aspects of the program critical to performance, such as data distribution and load balancing. The primitives and constructs provided by the language is described, and some of the issues raised in translating a Kali program for execution on distributed memory systems are also discussed.

  2. Memory T and memory B cells share a transcriptional program of self-renewal with long-term hematopoietic stem cells

    PubMed Central

    Luckey, Chance John; Bhattacharya, Deepta; Goldrath, Ananda W.; Weissman, Irving L.; Benoist, Christophe; Mathis, Diane

    2006-01-01

    The only cells of the hematopoietic system that undergo self-renewal for the lifetime of the organism are long-term hematopoietic stem cells and memory T and B cells. To determine whether there is a shared transcriptional program among these self-renewing populations, we first compared the gene-expression profiles of naïve, effector and memory CD8+ T cells with those of long-term hematopoietic stem cells, short-term hematopoietic stem cells, and lineage-committed progenitors. Transcripts augmented in memory CD8+ T cells relative to naïve and effector T cells were selectively enriched in long-term hematopoietic stem cells and were progressively lost in their short-term and lineage-committed counterparts. Furthermore, transcripts selectively decreased in memory CD8+ T cells were selectively down-regulated in long-term hematopoietic stem cells and progressively increased with differentiation. To confirm that this pattern was a general property of immunologic memory, we turned to independently generated gene expression profiles of memory, naïve, germinal center, and plasma B cells. Once again, memory-enriched and -depleted transcripts were also appropriately augmented and diminished in long-term hematopoietic stem cells, and their expression correlated with progressive loss of self-renewal function. Thus, there appears to be a common signature of both up- and down-regulated transcripts shared between memory T cells, memory B cells, and long-term hematopoietic stem cells. This signature was not consistently enriched in neural or embryonic stem cell populations and, therefore, appears to be restricted to the hematopoeitic system. These observations provide evidence that the shared phenotype of self-renewal in the hematopoietic system is linked at the molecular level. PMID:16492737

  3. Parallel computing for probabilistic fatigue analysis

    NASA Technical Reports Server (NTRS)

    Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.

    1993-01-01

    This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.

  4. The Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

    NASA Technical Reports Server (NTRS)

    Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)

    2001-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.

  5. Shared Memory Parallelization of an Implicit ADI-type CFD Code

    NASA Technical Reports Server (NTRS)

    Hauser, Th.; Huang, P. G.

    1999-01-01

    A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.

  6. Avoiding and tolerating latency in large-scale next-generation shared-memory multiprocessors

    NASA Technical Reports Server (NTRS)

    Probst, David K.

    1993-01-01

    A scalable solution to the memory-latency problem is necessary to prevent the large latencies of synchronization and memory operations inherent in large-scale shared-memory multiprocessors from reducing high performance. We distinguish latency avoidance and latency tolerance. Latency is avoided when data is brought to nearby locales for future reference. Latency is tolerated when references are overlapped with other computation. Latency-avoiding locales include: processor registers, data caches used temporally, and nearby memory modules. Tolerating communication latency requires parallelism, allowing the overlap of communication and computation. Latency-tolerating techniques include: vector pipelining, data caches used spatially, prefetching in various forms, and multithreading in various forms. Relaxing the consistency model permits increased use of avoidance and tolerance techniques. Each model is a mapping from the program text to sets of partial orders on program operations; it is a convention about which temporal precedences among program operations are necessary. Information about temporal locality and parallelism constrains the use of avoidance and tolerance techniques. Suitable architectural primitives and compiler technology are required to exploit the increased freedom to reorder and overlap operations in relaxed models.

  7. Explicit time integration of finite element models on a vectorized, concurrent computer with shared memory

    NASA Technical Reports Server (NTRS)

    Gilbertsen, Noreen D.; Belytschko, Ted

    1990-01-01

    The implementation of a nonlinear explicit program on a vectorized, concurrent computer with shared memory is described and studied. The conflict between vectorization and concurrency is described and some guidelines are given for optimal block sizes. Several example problems are summarized to illustrate the types of speed-ups which can be achieved by reprogramming as compared to compiler optimization.

  8. Multiprocessor architecture: Synthesis and evaluation

    NASA Technical Reports Server (NTRS)

    Standley, Hilda M.

    1990-01-01

    Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.

  9. Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

    1998-01-01

    This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.

  10. Conditional load and store in a shared memory

    DOEpatents

    Blumrich, Matthias A; Ohmacht, Martin

    2015-02-03

    A method, system and computer program product for implementing load-reserve and store-conditional instructions in a multi-processor computing system. The computing system includes a multitude of processor units and a shared memory cache, and each of the processor units has access to the memory cache. In one embodiment, the method comprises providing the memory cache with a series of reservation registers, and storing in these registers addresses reserved in the memory cache for the processor units as a result of issuing load-reserve requests. In this embodiment, when one of the processor units makes a request to store data in the memory cache using a store-conditional request, the reservation registers are checked to determine if an address in the memory cache is reserved for that processor unit. If an address in the memory cache is reserved for that processor, the data are stored at this address.

  11. Support for Debugging Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)

    2001-01-01

    This viewgraph presentation provides information on the technical aspects of debugging computer code that has been automatically converted for use in a parallel computing system. Shared memory parallelization and distributed memory parallelization entail separate and distinct challenges for a debugging program. A prototype system has been developed which integrates various tools for the debugging of automatically parallelized programs including the CAPTools Database which provides variable definition information across subroutines as well as array distribution information.

  12. Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

    NASA Technical Reports Server (NTRS)

    Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

    2016-01-01

    In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.

  13. Implementation and performance of parallel Prolog interpreter

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wei, S.; Kale, L.V.; Balkrishna, R.

    1988-01-01

    In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.

  14. Global Arrays

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krishnamoorthy, Sriram; Daily, Jeffrey A.; Vishnu, Abhinav

    2015-11-01

    Global Arrays (GA) is a distributed-memory programming model that allows for shared-memory-style programming combined with one-sided communication, to create a set of tools that combine high performance with ease-of-use. GA exposes a relatively straightforward programming abstraction, while supporting fully-distributed data structures, locality of reference, and high-performance communication. GA was originally formulated in the early 1990’s to provide a communication layer for the Northwest Chemistry (NWChem) suite of chemistry modeling codes that was being developed concurrently.

  15. A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction

    DOE PAGES

    Kumar, B.; Huang, C. -H.; Sadayappan, P.; ...

    1995-01-01

    In this article, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storagemore » of size O(7 n ) for multiplying 2 n × 2 n matrices. We present a modified formulation in which the working storage requirement is reduced to O(4 n ). The modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor. Performance results on a Cray Y-MP8/64 are presented.« less

  16. Automated quantitative muscle biopsy analysis system

    NASA Technical Reports Server (NTRS)

    Castleman, Kenneth R. (Inventor)

    1980-01-01

    An automated system to aid the diagnosis of neuromuscular diseases by producing fiber size histograms utilizing histochemically stained muscle biopsy tissue. Televised images of the microscopic fibers are processed electronically by a multi-microprocessor computer, which isolates, measures, and classifies the fibers and displays the fiber size distribution. The architecture of the multi-microprocessor computer, which is iterated to any required degree of complexity, features a series of individual microprocessors P.sub.n each receiving data from a shared memory M.sub.n-1 and outputing processed data to a separate shared memory M.sub.n+1 under control of a program stored in dedicated memory M.sub.n.

  17. ORCA Project: Research on high-performance parallel computer programming environments. Final report, 1 Apr-31 Mar 90

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Snyder, L.; Notkin, D.; Adams, L.

    1990-03-31

    This task relates to research on programming massively parallel computers. Previous work on the Ensamble concept of programming was extended and investigation into nonshared memory models of parallel computation was undertaken. Previous work on the Ensamble concept defined a set of programming abstractions and was used to organize the programming task into three distinct levels; Composition of machine instruction, composition of processes, and composition of phases. It was applied to shared memory models of computations. During the present research period, these concepts were extended to nonshared memory models. During the present research period, one Ph D. thesis was completed, onemore » book chapter, and six conference proceedings were published.« less

  18. Vienna FORTRAN: A FORTRAN language extension for distributed memory multiprocessors

    NASA Technical Reports Server (NTRS)

    Chapman, Barbara; Mehrotra, Piyush; Zima, Hans

    1991-01-01

    Exploiting the performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna FORTRAN is a language extension of FORTRAN which provides the user with a wide range of facilities for such mapping of data structures. However, programs in Vienna FORTRAN are written using global data references. Thus, the user has the advantage of a shared memory programming paradigm while explicitly controlling the placement of data. The basic features of Vienna FORTRAN are presented along with a set of examples illustrating the use of these features.

  19. Parallel Programming Paradigms

    DTIC Science & Technology

    1987-07-01

    Unclassified IS.. DECLASSIFICATIONIOOWNGRADIN G 16. DISTRIBUTION STATEMENT (of this Report) Distribution of this report is unlimited. 17...8416878 and by the Office of Naval Research Contracts No. N00014-86-K-0264 and No. N00014-85- K-0328. 8 ?~~ O . G 1 49 II Parallel Programming Paradigms...processors -. "to fetch from the same memory cell (list head) and thus seems to favor a shared memory - g implementation [37). In this dissertation, we

  20. Cooperative Data Sharing: Simple Support for Clusters of SMP Nodes

    NASA Technical Reports Server (NTRS)

    DiNucci, David C.; Balley, David H. (Technical Monitor)

    1997-01-01

    Libraries like PVM and MPI send typed messages to allow for heterogeneous cluster computing. Lower-level libraries, such as GAM, provide more efficient access to communication by removing the need to copy messages between the interface and user space in some cases. still lower-level interfaces, such as UNET, get right down to the hardware level to provide maximum performance. However, these are all still interfaces for passing messages from one process to another, and have limited utility in a shared-memory environment, due primarily to the fact that message passing is just another term for copying. This drawback is made more pertinent by today's hybrid architectures (e.g. clusters of SMPs), where it is difficult to know beforehand whether two communicating processes will share memory. As a result, even portable language tools (like HPF compilers) must either map all interprocess communication, into message passing with the accompanying performance degradation in shared memory environments, or they must check each communication at run-time and implement the shared-memory case separately for efficiency. Cooperative Data Sharing (CDS) is a single user-level API which abstracts all communication between processes into the sharing and access coordination of memory regions, in a model which might be described as "distributed shared messages" or "large-grain distributed shared memory". As a result, the user programs to a simple latency-tolerant abstract communication specification which can be mapped efficiently to either a shared-memory or message-passing based run-time system, depending upon the available architecture. Unlike some distributed shared memory interfaces, the user still has complete control over the assignment of data to processors, the forwarding of data to its next likely destination, and the queuing of data until it is needed, so even the relatively high latency present in clusters can be accomodated. CDS does not require special use of an MMU, which can add overhead to some DSM systems, and does not require an SPMD programming model. unlike some message-passing interfaces, CDS allows the user to implement efficient demand-driven applications where processes must "fight" over data, and does not perform copying if processes share memory and do not attempt concurrent writes. CDS also supports heterogeneous computing, dynamic process creation, handlers, and a very simple thread-arbitration mechanism. Additional support for array subsections is currently being considered. The CDS1 API, which forms the kernel of CDS, is built primarily upon only 2 communication primitives, one process initiation primitive, and some data translation (and marshalling) routines, memory allocation routines, and priority control routines. The entire current collection of 28 routines provides enough functionality to implement most (or all) of MPI 1 and 2, which has a much larger interface consisting of hundreds of routines. still, the API is small enough to consider integrating into standard os interfaces for handling inter-process communication in a network-independent way. This approach would also help to solve many of the problems plaguing other higher-level standards such as MPI and PVM which must, in some cases, "play OS" to adequately address progress and process control issues. The CDS2 API, a higher level of interface roughly equivalent in functionality to MPI and to be built entirely upon CDS1, is still being designed. It is intended to add support for the equivalent of communicators, reduction and other collective operations, process topologies, additional support for process creation, and some automatic memory management. CDS2 will not exactly match MPI, because the copy-free semantics of communication from CDS1 will be supported. CDS2 application programs will be free to carefully also use CDS1. CDS1 has been implemented on networks of workstations running unmodified Unix-based operating systems, using UDP/IP and vendor-supplied high- performance locks. Although its inter-node performance is currently unimpressive due to rudimentary implementation technique, it even now outperforms highly-optimized MPI implementation on intra-node communication due to its support for non-copy communication. The similarity of the CDS1 architecture to that of other projects such as UNET and TRAP suggests that the inter-node performance can be increased significantly to surpass MPI or PVM, and it may be possible to migrate some of its functionality to communication controllers.

  1. Programming in Vienna Fortran

    NASA Technical Reports Server (NTRS)

    Chapman, Barbara; Mehrotra, Piyush; Zima, Hans

    1992-01-01

    Exploiting the full performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna Fortran is a language extension of Fortran which provides the user with a wide range of facilities for such mapping of data structures. In contrast to current programming practice, programs in Vienna Fortran are written using global data references. Thus, the user has the advantages of a shared memory programming paradigm while explicitly controlling the data distribution. In this paper, we present the language features of Vienna Fortran for FORTRAN 77, together with examples illustrating the use of these features.

  2. Real-time implementations of image segmentation algorithms on shared memory multicore architecture: a survey (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Akil, Mohamed

    2017-05-01

    The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jones, J.P.; Bangs, A.L.; Butler, P.L.

    Hetero Helix is a programming environment which simulates shared memory on a heterogeneous network of distributed-memory computers. The machines in the network may vary with respect to their native operating systems and internal representation of numbers. Hetero Helix presents a simple programming model to developers, and also considers the needs of designers, system integrators, and maintainers. The key software technology underlying Hetero Helix is the use of a compiler'' which analyzes the data structures in shared memory and automatically generates code which translates data representations from the format native to each machine into a common format, and vice versa. Themore » design of Hetero Helix was motivated in particular by the requirements of robotics applications. Hetero Helix has been used successfully in an integration effort involving 27 CPUs in a heterogeneous network and a body of software totaling roughly 100,00 lines of code. 25 refs., 6 figs.« less

  4. Efficient iteration in data-parallel programs with irregular and dynamically distributed data structures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Littlefield, R.J.

    1990-02-01

    To implement an efficient data-parallel program on a non-shared memory MIMD multicomputer, data and computations must be properly partitioned to achieve good load balance and locality of reference. Programs with irregular data reference patterns often require irregular partitions. Although good partitions may be easy to determine, they can be difficult or impossible to implement in programming languages that provide only regular data distributions, such as blocked or cyclic arrays. We are developing Onyx, a programming system that provides a shared memory model of distributed data structures and extends the concept of data distribution to include irregular and dynamic distributions. Thismore » provides a powerful means to specify irregular partitions. Perhaps surprisingly, programs using it can also execute efficiently. In this paper, we describe and evaluate the Onyx implementation of a model problem that repeatedly executes an irregular but fixed data reference pattern. On an NCUBE hypercube, the speed of the Onyx implementation is comparable to that of carefully handwritten message-passing code.« less

  5. SMT-Aware Instantaneous Footprint Optimization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Roy, Probir; Liu, Xu; Song, Shuaiwen

    Modern architectures employ simultaneous multithreading (SMT) to increase thread-level parallelism. SMT threads share many functional units and the whole memory hierarchy of a physical core. Without a careful code design, SMT threads can easily contend with each other for these shared resources, causing severe performance degradation. Minimizing SMT thread contention for HPC applications running on dedicated platforms is very challenging, because they usually spawn threads within Single Program Multiple Data (SPMD) models. To address this important issue, we introduce a simple scheme for SMT-aware code optimization, which aims to reduce the memory contention across SMT threads.

  6. Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Caubet, Jordi; Biegel, Bryan A. (Technical Monitor)

    2002-01-01

    In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. This system consists of two major components: The OMPItrace dynamic instrumentation mechanism, which allows the tracing of processes and threads and the Paraver graphical user interface for inspection and analyses of the generated traces. We describe how to use the system to conduct a detailed comparative study of a benchmark code implemented in five different programming paradigms applicable for shared memory

  7. UPC++ Programmer’s Guide (v1.0 2017.9)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bachan, J.; Baden, S.; Bonachea, D.

    UPC++ is a C++11 library that provides Asynchronous Partitioned Global Address Space (APGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The APGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, APGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, allmore » operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.« less

  8. UPC++ Programmer’s Guide, v1.0-2018.3.0

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bachan, J.; Baden, S.; Bonachea, Dan

    UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operationsmore » that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.« less

  9. Dynamic programming on a shared-memory multiprocessor

    NASA Technical Reports Server (NTRS)

    Edmonds, Phil; Chu, Eleanor; George, Alan

    1993-01-01

    Three new algorithms for solving dynamic programming problems on a shared-memory parallel computer are described. All three algorithms attempt to balance work load, while keeping synchronization cost low. In particular, for a multiprocessor having p processors, an analysis of the best algorithm shows that the arithmetic cost is O(n-cubed/6p) and that the synchronization cost is O(absolute value of log sub C n) if p much less than n, where C = (2p-1)/(2p + 1) and n is the size of the problem. The low synchronization cost is important for machines where synchronization is expensive. Analysis and experiments show that the best algorithm is effective in balancing the work load and producing high efficiency.

  10. Implementations of BLAST for parallel computers.

    PubMed

    Jülich, A

    1995-02-01

    The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.

  11. Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

    DOE PAGES

    Radhakrishnan, Hari; Rouson, Damian W. I.; Morris, Karla; ...

    2015-01-01

    This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were donemore » using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.« less

  12. A portable approach for PIC on emerging architectures

    NASA Astrophysics Data System (ADS)

    Decyk, Viktor

    2016-03-01

    A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.

  13. Merlin - Massively parallel heterogeneous computing

    NASA Technical Reports Server (NTRS)

    Wittie, Larry; Maples, Creve

    1989-01-01

    Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.

  14. Crystallographic and general use programs for the XDS Sigma 5 computer

    NASA Technical Reports Server (NTRS)

    Snyder, R. L.

    1973-01-01

    Programs in basic FORTRAN 4 are described, which fall into three catagories: (1) interactive programs to be executed under time sharing (BTM); (2) non interactive programs which are executed in batch processing mode (BPM); and (3) large non interactive programs which require more memory than is available in the normal BPM/BTM operating system and must be run overnight on a special system called XRAY which releases about 45,000 words of memory to the user. Programs in catagories (1) and (2) are stored as FORTRAN source files in the account FSNYDER. Programs in catagory (3) are stored in the XRAY system as load modules. The type of file in account FSNYDER is identified by the first two letters in the name.

  15. Tuning collective communication for Partitioned Global Address Space programming models

    DOE PAGES

    Nishtala, Rajesh; Zheng, Yili; Hargrove, Paul H.; ...

    2011-06-12

    Partitioned Global Address Space (PGAS) languages offer programmers the convenience of a shared memory programming style combined with locality control necessary to run on large-scale distributed memory systems. Even within a PGAS language programmers often need to perform global communication operations such as broadcasts or reductions, which are best performed as collective operations in which a group of threads work together to perform the operation. In this study we consider the problem of implementing collective communication within PGAS languages and explore some of the design trade-offs in both the interface and implementation. In particular, PGAS collectives have semantic issues thatmore » are different than in send–receive style message passing programs, and different implementation approaches that take advantage of the one-sided communication style in these languages. We present an implementation framework for PGAS collectives as part of the GASNet communication layer, which supports shared memory, distributed memory and hybrids. The framework supports a broad set of algorithms for each collective, over which the implementation may be automatically tuned. In conclusion, we demonstrate the benefit of optimized GASNet collectives using application benchmarks written in UPC, and demonstrate that the GASNet collectives can deliver scalable performance on a variety of state-of-the-art parallel machines including a Cray XT4, an IBM BlueGene/P, and a Sun Constellation system with InfiniBand interconnect.« less

  16. A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoemmen, Mark

    2010-11-01

    Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches formore » orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.« less

  17. Multiple-User, Multitasking, Virtual-Memory Computer System

    NASA Technical Reports Server (NTRS)

    Generazio, Edward R.; Roth, Don J.; Stang, David B.

    1993-01-01

    Computer system designed and programmed to serve multiple users in research laboratory. Provides for computer control and monitoring of laboratory instruments, acquisition and anlaysis of data from those instruments, and interaction with users via remote terminals. System provides fast access to shared central processing units and associated large (from megabytes to gigabytes) memories. Underlying concept of system also applicable to monitoring and control of industrial processes.

  18. Performing an allreduce operation using shared memory

    DOEpatents

    Archer, Charles J [Rochester, MN; Dozsa, Gabor [Ardsley, NY; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

    2012-04-17

    Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.

  19. Performing an allreduce operation using shared memory

    DOEpatents

    Archer, Charles J; Dozsa, Gabor; Ratterman, Joseph D; Smith, Brian E

    2014-06-10

    Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.

  20. Simulation Analysis of Data Sharing in Shared Memory Multiprocessors

    DTIC Science & Technology

    1989-02-24

    LIMITATION OF ABSTRACT Same as Report (SAR) 18. NUMBER OF PAGES 178 19a. NAME OF RESPONSIBLE PERSON a. REPORT unclassified b . ABSTRACT unclassified...work. Andrea Casotto (CELL), Steve McGrogan (SPICE), Srinivas Devadas (TOPOP1) and Hi-Keung Tony Ma (VERIFY) donated the parallel programs and a con...Effect of Block Size on B us Utilization 120 5-14 Ratio of Sharing Bus Cyc les to Total Bus Cycles 120 5-15 Oassification of Bus Cyc les for

  1. Scalable Triadic Analysis of Large-Scale Graphs: Multi-Core vs. Multi-Processor vs. Multi-Threaded Shared Memory Architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chin, George; Marquez, Andres; Choudhury, Sutanay

    2012-09-01

    Triadic analysis encompasses a useful set of graph mining methods that is centered on the concept of a triad, which is a subgraph of three nodes and the configuration of directed edges across the nodes. Such methods are often applied in the social sciences as well as many other diverse fields. Triadic methods commonly operate on a triad census that counts the number of triads of every possible edge configuration in a graph. Like other graph algorithms, triadic census algorithms do not scale well when graphs reach tens of millions to billions of nodes. To enable the triadic analysis ofmore » large-scale graphs, we developed and optimized a triad census algorithm to efficiently execute on shared memory architectures. We will retrace the development and evolution of a parallel triad census algorithm. Over the course of several versions, we continually adapted the code’s data structures and program logic to expose more opportunities to exploit parallelism on shared memory that would translate into improved computational performance. We will recall the critical steps and modifications that occurred during code development and optimization. Furthermore, we will compare the performances of triad census algorithm versions on three specific systems: Cray XMT, HP Superdome, and AMD multi-core NUMA machine. These three systems have shared memory architectures but with markedly different hardware capabilities to manage parallelism.« less

  2. Scaling Irregular Applications through Data Aggregation and Software Multithreading

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Morari, Alessandro; Tumeo, Antonino; Chavarría-Miranda, Daniel

    Bioinformatics, data analytics, semantic databases, knowledge discovery are emerging high performance application areas that exploit dynamic, linked data structures such as graphs, unbalanced trees or unstructured grids. These data structures usually are very large, requiring significantly more memory than available on single shared memory systems. Additionally, these data structures are difficult to partition on distributed memory systems. They also present poor spatial and temporal locality, thus generating unpredictable memory and network accesses. The Partitioned Global Address Space (PGAS) programming model seems suitable for these applications, because it allows using a shared memory abstraction across distributed-memory clusters. However, current PGAS languagesmore » and libraries are built to target regular remote data accesses and block transfers. Furthermore, they usually rely on the Single Program Multiple Data (SPMD) parallel control model, which is not well suited to the fine grained, dynamic and unbalanced parallelism of irregular applications. In this paper we present {\\bf GMT} (Global Memory and Threading library), a custom runtime library that enables efficient execution of irregular applications on commodity clusters. GMT integrates a PGAS data substrate with simple fork/join parallelism and provides automatic load balancing on a per node basis. It implements multi-level aggregation and lightweight multithreading to maximize memory and network bandwidth with fine-grained data accesses and tolerate long data access latencies. A key innovation in the GMT runtime is its thread specialization (workers, helpers and communication threads) that realize the overall functionality. We compare our approach with other PGAS models, such as UPC running using GASNet, and hand-optimized MPI code on a set of typical large-scale irregular applications, demonstrating speedups of an order of magnitude.« less

  3. The force on the flex: Global parallelism and portability

    NASA Technical Reports Server (NTRS)

    Jordan, H. F.

    1986-01-01

    A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment.

  4. Testing New Programming Paradigms with NAS Parallel Benchmarks

    NASA Technical Reports Server (NTRS)

    Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

    2000-01-01

    Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.

  5. Programming model for distributed intelligent systems

    NASA Technical Reports Server (NTRS)

    Sztipanovits, J.; Biegl, C.; Karsai, G.; Bogunovic, N.; Purves, B.; Williams, R.; Christiansen, T.

    1988-01-01

    A programming model and architecture which was developed for the design and implementation of complex, heterogeneous measurement and control systems is described. The Multigraph Architecture integrates artificial intelligence techniques with conventional software technologies, offers a unified framework for distributed and shared memory based parallel computational models and supports multiple programming paradigms. The system can be implemented on different hardware architectures and can be adapted to strongly different applications.

  6. Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

    NASA Astrophysics Data System (ADS)

    Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

    2015-09-01

    The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.

  7. Virtual memory support for distributed computing environments using a shared data object model

    NASA Astrophysics Data System (ADS)

    Huang, F.; Bacon, J.; Mapp, G.

    1995-12-01

    Conventional storage management systems provide one interface for accessing memory segments and another for accessing secondary storage objects. This hinders application programming and affects overall system performance due to mandatory data copying and user/kernel boundary crossings, which in the microkernel case may involve context switches. Memory-mapping techniques may be used to provide programmers with a unified view of the storage system. This paper extends such techniques to support a shared data object model for distributed computing environments in which good support for coherence and synchronization is essential. The approach is based on a microkernel, typed memory objects, and integrated coherence control. A microkernel architecture is used to support multiple coherence protocols and the addition of new protocols. Memory objects are typed and applications can choose the most suitable protocols for different types of object to avoid protocol mismatch. Low-level coherence control is integrated with high-level concurrency control so that the number of messages required to maintain memory coherence is reduced and system-wide synchronization is realized without severely impacting the system performance. These features together contribute a novel approach to the support for flexible coherence under application control.

  8. MPF: A portable message passing facility for shared memory multiprocessors

    NASA Technical Reports Server (NTRS)

    Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.

    1987-01-01

    The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.

  9. Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures

    DTIC Science & Technology

    2017-10-04

    Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These

  10. The effect of the order in which episodic autobiographical memories versus autobiographical knowledge are shared on feelings of closeness.

    PubMed

    Brandon, Nicole R; Beike, Denise R; Cole, Holly E

    2017-07-01

    Autobiographical memories (AMs) can be used to create and maintain closeness with others [Alea, N., & Bluck, S. (2003). Why are you telling me that? A conceptual model of the social function of autobiographical memory. Memory, 11(2), 165-178]. However, the differential effects of memory specificity are not well established. Two studies with 148 participants tested whether the order in which autobiographical knowledge (AK) and specific episodic AM (EAM) are shared affects feelings of closeness. Participants read two memories hypothetically shared by each of four strangers. The strangers first shared either AK or an EAM, and then shared either AK or an EAM. Participants were randomly assigned to read either positive or negative AMs from the strangers. Findings suggest that people feel closer to those who share positive AMs in the same way they construct memories: starting with general and moving to specific.

  11. Self-defining memories, scripts, and the life story: narrative identity in personality and psychotherapy.

    PubMed

    Singer, Jefferson A; Blagov, Pavel; Berry, Meredith; Oost, Kathryn M

    2013-12-01

    An integrative model of narrative identity builds on a dual memory system that draws on episodic memory and a long-term self to generate autobiographical memories. Autobiographical memories related to critical goals in a lifetime period lead to life-story memories, which in turn become self-defining memories when linked to an individual's enduring concerns. Self-defining memories that share repetitive emotion-outcome sequences yield narrative scripts, abstracted templates that filter cognitive-affective processing. The life story is the individual's overarching narrative that provides unity and purpose over the life course. Healthy narrative identity combines memory specificity with adaptive meaning-making to achieve insight and well-being, as demonstrated through a literature review of personality and clinical research, as well as new findings from our own research program. A clinical case study drawing on this narrative identity model is also presented with implications for treatment and research. © 2012 Wiley Periodicals, Inc.

  12. Hybrid Memory Management for Parallel Execution of Prolog on Shared Memory Multiprocessors

    DTIC Science & Technology

    1990-06-01

    organizing data to increase locality. The stack structure exhibits greater locality than the heap structure. Tradeoff decisions can also be made on...PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES...University of California at Berkeley,Department of Electrical Engineering and Computer Sciences,Berkeley,CA,94720 8. PERFORMING ORGANIZATION REPORT

  13. Toward Enhancing OpenMP's Work-Sharing Directives

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chapman, B M; Huang, L; Jin, H

    2006-05-17

    OpenMP provides a portable programming interface for shared memory parallel computers (SMPs). Although this interface has proven successful for small SMPs, it requires greater flexibility in light of the steadily growing size of individual SMPs and the recent advent of multithreaded chips. In this paper, we describe two application development experiences that exposed these expressivity problems in the current OpenMP specification. We then propose mechanisms to overcome these limitations, including thread subteams and thread topologies. Thus, we identify language features that improve OpenMP application performance on emerging and large-scale platforms while preserving ease of programming.

  14. Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

    2002-01-01

    The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.

  15. Shared Semantics and the Use of Organizational Memories for E-Mail Communications.

    ERIC Educational Resources Information Center

    Schwartz, David G.

    1998-01-01

    Examines the use of shared semantics information to link concepts in an organizational memory to e-mail communications. Presents a framework for determining shared semantics based on organizational and personal user profiles. Illustrates how shared semantics are used by the HyperMail system to help link organizational memories (OM) content to…

  16. A class Hierarchical, object-oriented approach to virtual memory management

    NASA Technical Reports Server (NTRS)

    Russo, Vincent F.; Campbell, Roy H.; Johnston, Gary M.

    1989-01-01

    The Choices family of operating systems exploits class hierarchies and object-oriented programming to facilitate the construction of customized operating systems for shared memory and networked multiprocessors. The software is being used in the Tapestry laboratory to study the performance of algorithms, mechanisms, and policies for parallel systems. Described here are the architectural design and class hierarchy of the Choices virtual memory management system. The software and hardware mechanisms and policies of a virtual memory system implement a memory hierarchy that exploits the trade-off between response times and storage capacities. In Choices, the notion of a memory hierarchy is captured by abstract classes. Concrete subclasses of those abstractions implement a virtual address space, segmentation, paging, physical memory management, secondary storage, and remote (that is, networked) storage. Captured in the notion of a memory hierarchy are classes that represent memory objects. These classes provide a storage mechanism that contains encapsulated data and have methods to read or write the memory object. Each of these classes provides specializations to represent the memory hierarchy.

  17. C-MOS array design techniques: SUMC multiprocessor system study

    NASA Technical Reports Server (NTRS)

    Clapp, W. A.; Helbig, W. A.; Merriam, A. S.

    1972-01-01

    The current capabilities of LSI techniques for speed and reliability, plus the possibilities of assembling large configurations of LSI logic and storage elements, have demanded the study of multiprocessors and multiprocessing techniques, problems, and potentialities. Evaluated are three previous systems studies for a space ultrareliable modular computer multiprocessing system, and a new multiprocessing system is proposed that is flexibly configured with up to four central processors, four 1/0 processors, and 16 main memory units, plus auxiliary memory and peripheral devices. This multiprocessor system features a multilevel interrupt, qualified S/360 compatibility for ground-based generation of programs, virtual memory management of a storage hierarchy through 1/0 processors, and multiport access to multiple and shared memory units.

  18. Strategies for Energy Efficient Resource Management of Hybrid Programming Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Dong; Supinski, Bronis de; Schulz, Martin

    2013-01-01

    Many scientific applications are programmed using hybrid programming models that use both message-passing and shared-memory, due to the increasing prevalence of large-scale systems with multicore, multisocket nodes. Previous work has shown that energy efficiency can be improved using software-controlled execution schemes that consider both the programming model and the power-aware execution capabilities of the system. However, such approaches have focused on identifying optimal resource utilization for one programming model, either shared-memory or message-passing, in isolation. The potential solution space, thus the challenge, increases substantially when optimizing hybrid models since the possible resource configurations increase exponentially. Nonetheless, with the accelerating adoptionmore » of hybrid programming models, we increasingly need improved energy efficiency in hybrid parallel applications on large-scale systems. In this work, we present new software-controlled execution schemes that consider the effects of dynamic concurrency throttling (DCT) and dynamic voltage and frequency scaling (DVFS) in the context of hybrid programming models. Specifically, we present predictive models and novel algorithms based on statistical analysis that anticipate application power and time requirements under different concurrency and frequency configurations. We apply our models and methods to the NPB MZ benchmarks and selected applications from the ASC Sequoia codes. Overall, we achieve substantial energy savings (8.74% on average and up to 13.8%) with some performance gain (up to 7.5%) or negligible performance loss.« less

  19. Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

    NASA Technical Reports Server (NTRS)

    Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

    2017-01-01

    In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.

  20. Partitioning problems in parallel, pipelined and distributed computing

    NASA Technical Reports Server (NTRS)

    Bokhari, S.

    1985-01-01

    The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest.

  1. An OpenACC-Based Unified Programming Model for Multi-accelerator Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S

    2015-01-01

    This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.

  2. What we remember affects how we see: spatial working memory steers saccade programming.

    PubMed

    Wong, Jason H; Peterson, Matthew S

    2013-02-01

    Relationships between visual attention, saccade programming, and visual working memory have been hypothesized for over a decade. Awh, Jonides, and Reuter-Lorenz (Journal of Experimental Psychology: Human Perception and Performance 24(3):780-90, 1998) and Awh et al. (Psychological Science 10(5):433-437, 1999) proposed that rehearsing a location in memory also leads to enhanced attentional processing at that location. In regard to eye movements, Belopolsky and Theeuwes (Attention, Perception & Psychophysics 71(3):620-631, 2009) found that holding a location in working memory affects saccade programming, albeit negatively. In three experiments, we attempted to replicate the findings of Belopolsky and Theeuwes (Attention, Perception & Psychophysics 71(3):620-631, 2009) and determine whether the spatial memory effect can occur in other saccade-cuing paradigms, including endogenous central arrow cues and exogenous irrelevant singletons. In the first experiment, our results were the opposite of those in Belopolsky and Theeuwes (Attention, Perception & Psychophysics 71(3):620-631, 2009), in that we found facilitation (shorter saccade latencies) instead of inhibition when the saccade target matched the region in spatial working memory. In Experiment 2, we sought to determine whether the spatial working memory effect would generalize to other endogenous cuing tasks, such as a central arrow that pointed to one of six possible peripheral locations. As in Experiment 1, we found that saccade programming was facilitated when the cued location coincided with the saccade target. In Experiment 3, we explored how spatial memory interacts with other types of cues, such as a peripheral color singleton target or irrelevant onset. In both cases, the eyes were more likely to go to either singleton when it coincided with the location held in spatial working memory. On the basis of these results, we conclude that spatial working memory and saccade programming are likely to share common overlapping circuitry.

  3. The OpenMP Implementation of NAS Parallel Benchmarks and its Performance

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry

    1999-01-01

    As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.

  4. Hypercluster Parallel Processor

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela

    1992-01-01

    Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.

  5. MaMR: High-performance MapReduce programming model for material cloud applications

    NASA Astrophysics Data System (ADS)

    Jing, Weipeng; Tong, Danyu; Wang, Yangang; Wang, Jingyuan; Liu, Yaqiu; Zhao, Peng

    2017-02-01

    With the increasing data size in materials science, existing programming models no longer satisfy the application requirements. MapReduce is a programming model that enables the easy development of scalable parallel applications to process big data on cloud computing systems. However, this model does not directly support the processing of multiple related data, and the processing performance does not reflect the advantages of cloud computing. To enhance the capability of workflow applications in material data processing, we defined a programming model for material cloud applications that supports multiple different Map and Reduce functions running concurrently based on hybrid share-memory BSP called MaMR. An optimized data sharing strategy to supply the shared data to the different Map and Reduce stages was also designed. We added a new merge phase to MapReduce that can efficiently merge data from the map and reduce modules. Experiments showed that the model and framework present effective performance improvements compared to previous work.

  6. Tolerant (parallel) Programming

    NASA Technical Reports Server (NTRS)

    DiNucci, David C.; Bailey, David H. (Technical Monitor)

    1997-01-01

    In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.

  7. Technical support for digital systems technology development. Task order 1: ISP contention analysis and control

    NASA Technical Reports Server (NTRS)

    Stehle, Roy H.; Ogier, Richard G.

    1993-01-01

    Alternatives for realizing a packet-based network switch for use on a frequency division multiple access/time division multiplexed (FDMA/TDM) geostationary communication satellite were investigated. Each of the eight downlink beams supports eight directed dwells. The design needed to accommodate multicast packets with very low probability of loss due to contention. Three switch architectures were designed and analyzed. An output-queued, shared bus system yielded a functionally simple system, utilizing a first-in, first-out (FIFO) memory per downlink dwell, but at the expense of a large total memory requirement. A shared memory architecture offered the most efficiency in memory requirements, requiring about half the memory of the shared bus design. The processing requirement for the shared-memory system adds system complexity that may offset the benefits of the smaller memory. An alternative design using a shared memory buffer per downlink beam decreases circuit complexity through a distributed design, and requires at most 1000 packets of memory more than the completely shared memory design. Modifications to the basic packet switch designs were proposed to accommodate circuit-switched traffic, which must be served on a periodic basis with minimal delay. Methods for dynamically controlling the downlink dwell lengths were developed and analyzed. These methods adapt quickly to changing traffic demands, and do not add significant complexity or cost to the satellite and ground station designs. Methods for reducing the memory requirement by not requiring the satellite to store full packets were also proposed and analyzed. In addition, optimal packet and dwell lengths were computed as functions of memory size for the three switch architectures.

  8. A message passing kernel for the hypercluster parallel processing test bed

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.; Quealy, Angela; Cole, Gary L.

    1989-01-01

    A Message-Passing Kernel (MPK) for the Hypercluster parallel-processing test bed is described. The Hypercluster is being developed at the NASA Lewis Research Center to support investigations of parallel algorithms and architectures for computational fluid and structural mechanics applications. The Hypercluster resembles the hypercube architecture except that each node consists of multiple processors communicating through shared memory. The MPK efficiently routes information through the Hypercluster, using a message-passing protocol when necessary and faster shared-memory communication whenever possible. The MPK also interfaces all of the processors with the Hypercluster operating system (HYCLOPS), which runs on a Front-End Processor (FEP). This approach distributes many of the I/O tasks to the Hypercluster processors and eliminates the need for a separate I/O support program on the FEP.

  9. Proceedings of the second SISAL users` conference

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feo, J T; Frerking, C; Miller, P J

    1992-12-01

    This report contains papers on the following topics: A sisal code for computing the fourier transform on S{sub N}; five ways to fill your knapsack; simulating material dislocation motion in sisal; candis as an interface for sisal; parallelisation and performance of the burg algorithm on a shared-memory multiprocessor; use of genetic algorithm in sisal to solve the file design problem; implementing FFT`s in sisal; programming and evaluating the performance of signal processing applications in the sisal programming environment; sisal and Von Neumann-based languages: translation and intercommunication; an IF2 code generator for ADAM architecture; program partitioning for NUMA multiprocessor computer systems;more » mapping functional parallelism on distributed memory machines; implicit array copying: prevention is better than cure ; mathematical syntax for sisal; an approach for optimizing recursive functions; implementing arrays in sisal 2.0; Fol: an object oriented extension to the sisal language; twine: a portable, extensible sisal execution kernel; and investigating the memory performance of the optimizing sisal compiler.« less

  10. Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

    NASA Technical Reports Server (NTRS)

    Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

    1994-01-01

    Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.

  11. Holding on

    ERIC Educational Resources Information Center

    Thaxton, Terry Ann

    2011-01-01

    In this article, the author takes a multidimensional and personal look at creative writing work in an assisted living facility. The people she works with at the facility have memory loss. She shares her experience working with these people and describes a storytelling workshop that was modeled after Timeslips, a program started by Anne Basting at…

  12. Message Passing and Shared Address Space Parallelism on an SMP Cluster

    NASA Technical Reports Server (NTRS)

    Shan, Hongzhang; Singh, Jaswinder P.; Oliker, Leonid; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI+SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.

  13. High-performance computing — an overview

    NASA Astrophysics Data System (ADS)

    Marksteiner, Peter

    1996-08-01

    An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.

  14. Force user's manual, revised

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

    1987-01-01

    A methodology for writing parallel programs for shared memory multiprocessors has been formalized as an extension to the Fortran language and implemented as a macro preprocessor. The extended language is known as the Force, and this manual describes how to write Force programs and execute them on the Flexible Computer Corporation Flex/32, the Encore Multimax and the Sequent Balance computers. The parallel extension macros are described in detail, but knowledge of Fortran is assumed.

  15. Accelerate quasi Monte Carlo method for solving systems of linear algebraic equations through shared memory

    NASA Astrophysics Data System (ADS)

    Lai, Siyan; Xu, Ying; Shao, Bo; Guo, Menghan; Lin, Xiaola

    2017-04-01

    In this paper we study on Monte Carlo method for solving systems of linear algebraic equations (SLAE) based on shared memory. Former research demostrated that GPU can effectively speed up the computations of this issue. Our purpose is to optimize Monte Carlo method simulation on GPUmemoryachritecture specifically. Random numbers are organized to storein shared memory, which aims to accelerate the parallel algorithm. Bank conflicts can be avoided by our Collaborative Thread Arrays(CTA)scheme. The results of experiments show that the shared memory based strategy can speed up the computaions over than 3X at most.

  16. Practical Formal Verification of MPI and Thread Programs

    NASA Astrophysics Data System (ADS)

    Gopalakrishnan, Ganesh; Kirby, Robert M.

    Large-scale simulation codes in science and engineering are written using the Message Passing Interface (MPI). Shared memory threads are widely used directly, or to implement higher level programming abstractions. Traditional debugging methods for MPI or thread programs are incapable of providing useful formal guarantees about coverage. They get bogged down in the sheer number of interleavings (schedules), often missing shallow bugs. In this tutorial we will introduce two practical formal verification tools: ISP (for MPI C programs) and Inspect (for Pthread C programs). Unlike other formal verification tools, ISP and Inspect run directly on user source codes (much like a debugger). They pursue only the relevant set of process interleavings, using our own customized Dynamic Partial Order Reduction algorithms. For a given test harness, DPOR allows these tools to guarantee the absence of deadlocks, instrumented MPI object leaks and communication races (using ISP), and shared memory races (using Inspect). ISP and Inspect have been used to verify large pieces of code: in excess of 10,000 lines of MPI/C for ISP in under 5 seconds, and about 5,000 lines of Pthread/C code in a few hours (and much faster with the use of a cluster or by exploiting special cases such as symmetry) for Inspect. We will also demonstrate the Microsoft Visual Studio and Eclipse Parallel Tools Platform integrations of ISP (these will be available on the LiveCD).

  17. CaLRS: A Critical-Aware Shared LLC Request Scheduling Algorithm on GPGPU

    PubMed Central

    Ma, Jianliang; Meng, Jinglei; Chen, Tianzhou; Wu, Minghui

    2015-01-01

    Ultra high thread-level parallelism in modern GPUs usually introduces numerous memory requests simultaneously. So there are always plenty of memory requests waiting at each bank of the shared LLC (L2 in this paper) and global memory. For global memory, various schedulers have already been developed to adjust the request sequence. But we find few work has ever focused on the service sequence on the shared LLC. We measured that a big number of GPU applications always queue at LLC bank for services, which provide opportunity to optimize the service order on LLC. Through adjusting the GPU memory request service order, we can improve the schedulability of SM. So we proposed a critical-aware shared LLC request scheduling algorithm (CaLRS) in this paper. The priority representative of memory request is critical for CaLRS. We use the number of memory requests that originate from the same warp but have not been serviced when they arrive at the shared LLC bank to represent the criticality of each warp. Experiments show that the proposed scheme can boost the SM schedulability effectively by promoting the scheduling priority of the memory requests with high criticality and improves the performance of GPU indirectly. PMID:25729772

  18. Time Constraints and Resource Sharing in Adults' Working Memory Spans

    ERIC Educational Resources Information Center

    Barrouillet, Pierre; Bernardin, Sophie; Camos, Valerie

    2004-01-01

    This article presents a new model that accounts for working memory spans in adults, the time-based resource-sharing model. The model assumes that both components (i.e., processing and maintenance) of the main working memory tasks require attention and that memory traces decay as soon as attention is switched away. Because memory retrievals are…

  19. Flexible language constructs for large parallel programs

    NASA Technical Reports Server (NTRS)

    Rosing, Matthew; Schnabel, Robert

    1993-01-01

    The goal of the research described is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (MIMD) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include SIMD (Single Instruction Multiple Data), SPMD (Single Program Multiple Data), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression of the variety of algorithms that occur in large scientific computations. An overview of a new language that combines many of these programming models in a clean manner is given. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. An overview of the language and discussion of some of the critical implementation details is given.

  20. Externalising the autobiographical self: sharing personal memories online facilitated memory retention.

    PubMed

    Wang, Qi; Lee, Dasom; Hou, Yubo

    2017-07-01

    Internet technology provides a new means of recalling and sharing personal memories in the digital age. What is the mnemonic consequence of posting personal memories online? Theories of transactive memory and autobiographical memory would make contrasting predictions. In the present study, college students completed a daily diary for a week, listing at the end of each day all the events that happened to them on that day. They also reported whether they posted any of the events online. Participants received a surprise memory test after the completion of the diary recording and then another test a week later. At both tests, events posted online were significantly more likely than those not posted online to be recalled. It appears that sharing memories online may provide unique opportunities for rehearsal and meaning-making that facilitate memory retention.

  1. Parallel Computing for Probabilistic Response Analysis of High Temperature Composites

    NASA Technical Reports Server (NTRS)

    Sues, R. H.; Lua, Y. J.; Smith, M. D.

    1994-01-01

    The objective of this Phase I research was to establish the required software and hardware strategies to achieve large scale parallelism in solving PCM problems. To meet this objective, several investigations were conducted. First, we identified the multiple levels of parallelism in PCM and the computational strategies to exploit these parallelisms. Next, several software and hardware efficiency investigations were conducted. These involved the use of three different parallel programming paradigms and solution of two example problems on both a shared-memory multiprocessor and a distributed-memory network of workstations.

  2. A simple GPU-accelerated two-dimensional MUSCL-Hancock solver for ideal magnetohydrodynamics

    NASA Astrophysics Data System (ADS)

    Bard, Christopher M.; Dorelli, John C.

    2014-02-01

    We describe our experience using NVIDIA's CUDA (Compute Unified Device Architecture) C programming environment to implement a two-dimensional second-order MUSCL-Hancock ideal magnetohydrodynamics (MHD) solver on a GTX 480 Graphics Processing Unit (GPU). Taking a simple approach in which the MHD variables are stored exclusively in the global memory of the GTX 480 and accessed in a cache-friendly manner (without further optimizing memory access by, for example, staging data in the GPU's faster shared memory), we achieved a maximum speed-up of ≈126 for a 10242 grid relative to the sequential C code running on a single Intel Nehalem (2.8 GHz) core. This speedup is consistent with simple estimates based on the known floating point performance, memory throughput and parallel processing capacity of the GTX 480.

  3. Early Experiences Writing Performance Portable OpenMP 4 Codes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Joubert, Wayne; Hernandez, Oscar R

    In this paper, we evaluate the recently available directives in OpenMP 4 to parallelize a computational kernel using both the traditional shared memory approach and the newer accelerator targeting capabilities. In addition, we explore various transformations that attempt to increase application performance portability, and examine the expressiveness and performance implications of using these approaches. For example, we want to understand if the target map directives in OpenMP 4 improve data locality when mapped to a shared memory system, as opposed to the traditional first touch policy approach in traditional OpenMP. To that end, we use recent Cray and Intel compilersmore » to measure the performance variations of a simple application kernel when executed on the OLCF s Titan supercomputer with NVIDIA GPUs and the Beacon system with Intel Xeon Phi accelerators attached. To better understand these trade-offs, we compare our results from traditional OpenMP shared memory implementations to the newer accelerator programming model when it is used to target both the CPU and an attached heterogeneous device. We believe the results and lessons learned as presented in this paper will be useful to the larger user community by providing guidelines that can assist programmers in the development of performance portable code.« less

  4. Message Passing vs. Shared Address Space on a Cluster of SMPs

    NASA Technical Reports Server (NTRS)

    Shan, Hongzhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswas, Rupak

    2000-01-01

    The convergence of scalable computer architectures using clusters of PCs (or PC-SMPs) with commodity networking has become an attractive platform for high end scientific computing. Currently, message-passing and shared address space (SAS) are the two leading programming paradigms for these systems. Message-passing has been standardized with MPI, and is the most common and mature programming approach. However message-passing code development can be extremely difficult, especially for irregular structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality, and high protocol overhead. In this paper, we compare the performance of and programming effort, required for six applications under both programming models on a 32 CPU PC-SMP cluster. Our application suite consists of codes that typically do not exhibit high efficiency under shared memory programming. due to their high communication to computation ratios and complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications: however, on certain classes of problems SAS performance is competitive with MPI. We also present new algorithms for improving the PC cluster performance of MPI collective operations.

  5. In Remembrance: September 11, 2001

    ERIC Educational Resources Information Center

    Haeseler, Martha P.

    2002-01-01

    In this article, the author shares her experience of being part of the creation of a memorial. mosaic dedicated to those who had died on September 11, 2001. Working with veterans at a long-term outpatient program within a Veterans Administration (VA) Mental Hygiene Clinic, she found that the physical process of constructing something from…

  6. Execution time support for scientific programs on distributed memory machines

    NASA Technical Reports Server (NTRS)

    Berryman, Harry; Saltz, Joel; Scroggs, Jeffrey

    1990-01-01

    Optimizations are considered that are required for efficient execution of code segments that consists of loops over distributed data structures. The PARTI (Parallel Automated Runtime Toolkit at ICASE) execution time primitives are designed to carry out these optimizations and can be used to implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to carry out gather and scatter operations on distributed arrays. Communications patterns are derived at runtime, and the appropriate send and receive messages are automatically generated.

  7. MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Taft, James R.

    1999-01-01

    Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.

  8. Epigenetic Networks Regulate the Transcriptional Program in Memory and Terminally Differentiated CD8+ T Cells.

    PubMed

    Rodriguez, Ramon M; Suarez-Alvarez, Beatriz; Lavín, José L; Mosén-Ansorena, David; Baragaño Raneros, Aroa; Márquez-Kisinousky, Leonardo; Aransay, Ana M; Lopez-Larrea, Carlos

    2017-01-15

    Epigenetic mechanisms play a critical role during differentiation of T cells by contributing to the formation of stable and heritable transcriptional patterns. To better understand the mechanisms of memory maintenance in CD8 + T cells, we performed genome-wide analysis of DNA methylation, histone marking (acetylated lysine 9 in histone H3 and trimethylated lysine 9 in histone), and gene-expression profiles in naive, effector memory (EM), and terminally differentiated EM (TEMRA) cells. Our results indicate that DNA demethylation and histone acetylation are coordinated to generate the transcriptional program associated with memory cells. Conversely, EM and TEMRA cells share a very similar epigenetic landscape. Nonetheless, the TEMRA transcriptional program predicts an innate immunity phenotype associated with genes never reported in these cells, including several mediators of NK cell activation (VAV3 and LYN) and a large array of NK receptors (e.g., KIR2DL3, KIR2DL4, KIR2DL1, KIR3DL1, KIR2DS5). In addition, we identified up to 161 genes that encode transcriptional regulators, some of unknown function in CD8 + T cells, and that were differentially expressed in the course of differentiation. Overall, these results provide new insights into the regulatory networks involved in memory CD8 + T cell maintenance and T cell terminal differentiation. Copyright © 2017 by The American Association of Immunologists, Inc.

  9. A mechanism for efficient debugging of parallel programs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miller, B.P.; Choi, J.D.

    1988-01-01

    This paper addresses the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors (SMMP). The authors describe the use of flowback analysis to provide information on causal relationships between events in a program's execution without re-executing the program for debugging. The authors introduce a mechanism called incremental tracing that, by using semantic analyses of the debugged program, makes the flowback analysis practical with only a small amount of trace generated during execution. The extend flowback analysis to apply to parallel programs and describe a method to detect race conditions in the interactions ofmore » the co-operating processes.« less

  10. Software-Controlled Caches in the VMP Multiprocessor

    DTIC Science & Technology

    1986-03-01

    programming system level that Processors is tuned for the VMP design. In this vein, we are interested in exploring how far the software support can go to ...handled in software, analogously to the handling agement of the shared program state is familiar and of virtual memory page faults. Hardware support for...ensure good behavior, as opposed to how Each cache miss results in bus traffic. Table 2 pro- vides the bus cost for the "average" cache miss. Fig

  11. High Performance Programming Using Explicit Shared Memory Model on the Cray T3D

    NASA Technical Reports Server (NTRS)

    Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)

    1994-01-01

    The Cray T3D is the first-phase system in Cray Research Inc.'s (CRI) three-phase massively parallel processing program. In this report we describe the architecture of the T3D, as well as the CRAFT (Cray Research Adaptive Fortran) programming model, and contrast it with PVM, which is also supported on the T3D We present some performance data based on the NAS Parallel Benchmarks to illustrate both architectural and software features of the T3D.

  12. Visual and Spatial Working Memory Are Not that Dissociated after All: A Time-Based Resource-Sharing Account

    ERIC Educational Resources Information Center

    Vergauwe, Evie; Barrouillet, Pierre; Camos, Valerie

    2009-01-01

    Examinations of interference between visual and spatial materials in working memory have suggested domain- and process-based fractionations of visuo-spatial working memory. The present study examined the role of central time-based resource sharing in visuo-spatial working memory and assessed its role in obtained interference patterns. Visual and…

  13. An implementation of the SNR high speed network communication protocol (Receiver part)

    NASA Astrophysics Data System (ADS)

    Wan, Wen-Jyh

    1995-03-01

    This thesis work is to implement the receiver pan of the SNR high speed network transport protocol. The approach was to use the Systems of Communicating Machines (SCM) as the formal definition of the protocol. Programs were developed on top of the Unix system using C programming language. The Unix system features that were adopted for this implementation were multitasking, signals, shared memory, semaphores, sockets, timers and process control. The problems encountered, and solved, were signal loss, shared memory conflicts, process synchronization, scheduling, data alignment and errors in the SCM specification itself. The result was a correctly functioning program which implemented the SNR protocol. The system was tested using different connection modes, lost packets, duplicate packets and large data transfers. The contributions of this thesis are: (1) implementation of the receiver part of the SNR high speed transport protocol; (2) testing and integration with the transmitter part of the SNR transport protocol on an FDDI data link layered network; (3) demonstration of the functions of the SNR transport protocol such as connection management, sequenced delivery, flow control and error recovery using selective repeat methods of retransmission; and (4) modifications to the SNR transport protocol specification such as corrections for incorrect predicate conditions, defining of additional packet types formats, solutions for signal lost and processes contention problems etc.

  14. Programming parallel architectures: The BLAZE family of languages

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush

    1988-01-01

    Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.

  15. Debugging Fortran on a shared memory machine

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Allen, T.R.; Padua, D.A.

    1987-01-01

    Debugging on a parallel processor is more difficult than debugging on a serial machine because errors in a parallel program may introduce nondeterminism. The approach to parallel debugging presented here attempts to reduce the problem of debugging on a parallel machine to that of debugging on a serial machine by automatically detecting nondeterminism. 20 refs., 6 figs.

  16. A Simple GPU-Accelerated Two-Dimensional MUSCL-Hancock Solver for Ideal Magnetohydrodynamics

    NASA Technical Reports Server (NTRS)

    Bard, Christopher; Dorelli, John C.

    2013-01-01

    We describe our experience using NVIDIA's CUDA (Compute Unified Device Architecture) C programming environment to implement a two-dimensional second-order MUSCL-Hancock ideal magnetohydrodynamics (MHD) solver on a GTX 480 Graphics Processing Unit (GPU). Taking a simple approach in which the MHD variables are stored exclusively in the global memory of the GTX 480 and accessed in a cache-friendly manner (without further optimizing memory access by, for example, staging data in the GPU's faster shared memory), we achieved a maximum speed-up of approx. = 126 for a sq 1024 grid relative to the sequential C code running on a single Intel Nehalem (2.8 GHz) core. This speedup is consistent with simple estimates based on the known floating point performance, memory throughput and parallel processing capacity of the GTX 480.

  17. A pervasive parallel framework for visualization: final report for FWP 10-014707

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moreland, Kenneth D.

    2014-01-01

    We are on the threshold of a transformative change in the basic architecture of highperformance computing. The use of accelerator processors, characterized by large core counts, shared but asymmetrical memory, and heavy thread loading, is quickly becoming the norm in high performance computing. These accelerators represent significant challenges in updating our existing base of software. An intrinsic problem with this transition is a fundamental programming shift from message passing processes to much more fine thread scheduling with memory sharing. Another problem is the lack of stability in accelerator implementation; processor and compiler technology is currently changing rapidly. This report documentsmore » the results of our three-year ASCR project to address these challenges. Our project includes the development of the Dax toolkit, which contains the beginnings of new algorithms for a new generation of computers and the underlying infrastructure to rapidly prototype and build further algorithms as necessary.« less

  18. Rapid recovery from transient faults in the fault-tolerant processor with fault-tolerant shared memory

    NASA Technical Reports Server (NTRS)

    Harper, Richard E.; Butler, Bryan P.

    1990-01-01

    The Draper fault-tolerant processor with fault-tolerant shared memory (FTP/FTSM), which is designed to allow application tasks to continue execution during the memory alignment process, is described. Processor performance is not affected by memory alignment. In addition, the FTP/FTSM incorporates a hardware scrubber device to perform the memory alignment quickly during unused memory access cycles. The FTP/FTSM architecture is described, followed by an estimate of the time required for channel reintegration.

  19. Relations of maternal style and child self-concept to autobiographical memories in chinese, chinese immigrant, and European american 3-year-olds.

    PubMed

    Wang, Qi

    2006-01-01

    The relations of maternal reminiscing style and child self-concept to children's shared and independent autobiographical memories were examined in a sample of 189 three-year-olds and their mothers from Chinese families in China, first-generation Chinese immigrant families in the United States, and European American families. Mothers shared memories with their children and completed questionnaires; children recounted autobiographical events and described themselves with a researcher. Independent of culture, gender, child age, and language skills, maternal elaborations and evaluations were associated with children's shared memory reports, and maternal evaluations and child agentic self-focus were associated with children's independent memory reports. Maternal style and child self-concept further mediated cultural influences on children's memory. The findings provide insight into the social-cultural construction of autobiographical memory.

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sayan Ghosh, Jeff Hammond

    OpenSHMEM is a community effort to unifyt and standardize the SHMEM programming model. MPI (Message Passing Interface) is a well-known community standard for parallel programming using distributed memory. The most recen t release of MPI, version 3.0, was designed in part to support programming models like SHMEM.OSHMPI is an implementation of the OpenSHMEM standard using MPI-3 for the Linux operating system. It is the first implementation of SHMEM over MPI one-sided communication and has the potential to be widely adopted due to the portability and widely availability of Linux and MPI-3. OSHMPI has been tested on a variety of systemsmore » and implementations of MPI-3, includingInfiniBand clusters using MVAPICH2 and SGI shared-memory supercomputers using MPICH. Current support is limited to Linux but may be extended to Apple OSX if there is sufficient interest. The code is opensource via https://github.com/jeffhammond/oshmpi« less

  1. An enhanced Ada run-time system for real-time embedded processors

    NASA Technical Reports Server (NTRS)

    Sims, J. T.

    1991-01-01

    An enhanced Ada run-time system has been developed to support real-time embedded processor applications. The primary focus of this development effort has been on the tasking system and the memory management facilities of the run-time system. The tasking system has been extended to support efficient and precise periodic task execution as required for control applications. Event-driven task execution providing a means of task-asynchronous control and communication among Ada tasks is supported in this system. Inter-task control is even provided among tasks distributed on separate physical processors. The memory management system has been enhanced to provide object allocation and protected access support for memory shared between disjoint processors, each of which is executing a distinct Ada program.

  2. SAHAYOG: A Testbed for Load Sharing under Failure,

    DTIC Science & Technology

    1987-07-01

    messages, shared memory and semaphores . To communicate using messages, processes create message queues using system-provided prim- itives. The message...The size of the memory that is to be shared is decided by the process when it makes a request for memory allocation. The semaphore option of IPC can be...used to prevent two or more concurrent processes from executing their critical sections at the same time. Semaphores must be used when the processes

  3. Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Buntinas, D.; Mercier, G.; Gropp, W.

    2007-09-01

    This paper presents the implementation of MPICH2 over the Nemesis communication subsystem and the evaluation of its shared-memory performance. We describe design issues as well as some of the optimization techniques we employed. We conducted a performance evaluation over shared memory using microbenchmarks. The evaluation shows that MPICH2 Nemesis has very low communication overhead, making it suitable for smaller-grained applications.

  4. Flexible Language Constructs for Large Parallel Programs

    DOE PAGES

    Rosing, Matt; Schnabel, Robert

    1994-01-01

    The goal of the research described in this article is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (multiple instruction multiple data [MIMD]) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include single instruction multiple data (SIMD), single program multiple data (SPMD), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression ofmore » the variety of algorithms that occur in large scientific computations. In this article, we give an overview of a new language that combines many of these programming models in a clean manner. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. In this article, we give an overview of the language and discuss some of the critical implementation details.« less

  5. Working memory resources are shared across sensory modalities.

    PubMed

    Salmela, V R; Moisala, M; Alho, K

    2014-10-01

    A common assumption in the working memory literature is that the visual and auditory modalities have separate and independent memory stores. Recent evidence on visual working memory has suggested that resources are shared between representations, and that the precision of representations sets the limit for memory performance. We tested whether memory resources are also shared across sensory modalities. Memory precision for two visual (spatial frequency and orientation) and two auditory (pitch and tone duration) features was measured separately for each feature and for all possible feature combinations. Thus, only the memory load was varied, from one to four features, while keeping the stimuli similar. In Experiment 1, two gratings and two tones-both containing two varying features-were presented simultaneously. In Experiment 2, two gratings and two tones-each containing only one varying feature-were presented sequentially. The memory precision (delayed discrimination threshold) for a single feature was close to the perceptual threshold. However, as the number of features to be remembered was increased, the discrimination thresholds increased more than twofold. Importantly, the decrease in memory precision did not depend on the modality of the other feature(s), or on whether the features were in the same or in separate objects. Hence, simultaneously storing one visual and one auditory feature had an effect on memory precision equal to those of simultaneously storing two visual or two auditory features. The results show that working memory is limited by the precision of the stored representations, and that working memory can be described as a resource pool that is shared across modalities.

  6. Distributed simulation using a real-time shared memory network

    NASA Technical Reports Server (NTRS)

    Simon, Donald L.; Mattern, Duane L.; Wong, Edmond; Musgrave, Jeffrey L.

    1993-01-01

    The Advanced Control Technology Branch of the NASA Lewis Research Center performs research in the area of advanced digital controls for aeronautic and space propulsion systems. This work requires the real-time implementation of both control software and complex dynamical models of the propulsion system. We are implementing these systems in a distributed, multi-vendor computer environment. Therefore, a need exists for real-time communication and synchronization between the distributed multi-vendor computers. A shared memory network is a potential solution which offers several advantages over other real-time communication approaches. A candidate shared memory network was tested for basic performance. The shared memory network was then used to implement a distributed simulation of a ramjet engine. The accuracy and execution time of the distributed simulation was measured and compared to the performance of the non-partitioned simulation. The ease of partitioning the simulation, the minimal time required to develop for communication between the processors and the resulting execution time all indicate that the shared memory network is a real-time communication technique worthy of serious consideration.

  7. Parallel processing on the Livermore VAX 11/780-4 parallel processor system with compatibility to Cray Research, Inc. (CRI) multitasking. Version 1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Werner, N.E.; Van Matre, S.W.

    1985-05-01

    This manual describes the CRI Subroutine Library and Utility Package. The CRI library provides Cray multitasking functionality on the four-processor shared memory VAX 11/780-4. Additional functionality has been added for more flexibility. A discussion of the library, utilities, error messages, and example programs is provided.

  8. "Everybody Had a Piece ...": Collaborative Practice and Shared Decision Making at the Open Book

    ERIC Educational Resources Information Center

    Gordon, John; Ramdeholl, Dianne

    2010-01-01

    The Open Book, an adult literacy program in Brooklyn, from 1985-2002, remains, for many of the students and staff involved, a defining experience in their lives, a time that allowed them to see different possibilities, for themselves and society. In an attempt to preserve the field's collective historical memory, the authors in this chapter…

  9. Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver

    NASA Astrophysics Data System (ADS)

    Moustafa, Salli; Dutka-Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre

    2014-06-01

    This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46 × 106 spatial cells and 1 × 1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40:74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool.

  10. Multiprocessor shared-memory information exchange

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Santoline, L.L.; Bowers, M.D.; Crew, A.W.

    1989-02-01

    In distributed microprocessor-based instrumentation and control systems, the inter-and intra-subsystem communication requirements ultimately form the basis for the overall system architecture. This paper describes a software protocol which addresses the intra-subsystem communications problem. Specifically the protocol allows for multiple processors to exchange information via a shared-memory interface. The authors primary goal is to provide a reliable means for information to be exchanged between central application processor boards (masters) and dedicated function processor boards (slaves) in a single computer chassis. The resultant Multiprocessor Shared-Memory Information Exchange (MSMIE) protocol, a standard master-slave shared-memory interface suitable for use in nuclear safety systems, ismore » designed to pass unidirectional buffers of information between the processors while providing a minimum, deterministic cycle time for this data exchange.« less

  11. Performance analysis and kernel size study of the Lynx real-time operating system

    NASA Technical Reports Server (NTRS)

    Liu, Yuan-Kwei; Gibson, James S.; Fernquist, Alan R.

    1993-01-01

    This paper analyzes the Lynx real-time operating system (LynxOS), which has been selected as the operating system for the Space Station Freedom Data Management System (DMS). The features of LynxOS are compared to other Unix-based operating system (OS). The tools for measuring the performance of LynxOS, which include a high-speed digital timer/counter board, a device driver program, and an application program, are analyzed. The timings for interrupt response, process creation and deletion, threads, semaphores, shared memory, and signals are measured. The memory size of the DMS Embedded Data Processor (EDP) is limited. Besides, virtual memory is not suitable for real-time applications because page swap timing may not be deterministic. Therefore, the DMS software, including LynxOS, has to fit in the main memory of an EDP. To reduce the LynxOS kernel size, the following steps are taken: analyzing the factors that influence the kernel size; identifying the modules of LynxOS that may not be needed in an EDP; adjusting the system parameters of LynxOS; reconfiguring the device drivers used in the LynxOS; and analyzing the symbol table. The reductions in kernel disk size, kernel memory size and total kernel size reduction from each step mentioned above are listed and analyzed.

  12. Working Memory Span Development: A Time-Based Resource-Sharing Model Account

    ERIC Educational Resources Information Center

    Barrouillet, Pierre; Gavens, Nathalie; Vergauwe, Evie; Gaillard, Vinciane; Camos, Valerie

    2009-01-01

    The time-based resource-sharing model (P. Barrouillet, S. Bernardin, & V. Camos, 2004) assumes that during complex working memory span tasks, attention is frequently and surreptitiously switched from processing to reactivate decaying memory traces before their complete loss. Three experiments involving children from 5 to 14 years of age…

  13. Direct access inter-process shared memory

    DOEpatents

    Brightwell, Ronald B; Pedretti, Kevin; Hudson, Trammell B

    2013-10-22

    A technique for directly sharing physical memory between processes executing on processor cores is described. The technique includes loading a plurality of processes into the physical memory for execution on a corresponding plurality of processor cores sharing the physical memory. An address space is mapped to each of the processes by populating a first entry in a top level virtual address table for each of the processes. The address space of each of the processes is cross-mapped into each of the processes by populating one or more subsequent entries of the top level virtual address table with the first entry in the top level virtual address table from other processes.

  14. Memory Network For Distributed Data Processors

    NASA Technical Reports Server (NTRS)

    Bolen, David; Jensen, Dean; Millard, ED; Robinson, Dave; Scanlon, George

    1992-01-01

    Universal Memory Network (UMN) is modular, digital data-communication system enabling computers with differing bus architectures to share 32-bit-wide data between locations up to 3 km apart with less than one millisecond of latency. Makes it possible to design sophisticated real-time and near-real-time data-processing systems without data-transfer "bottlenecks". This enterprise network permits transmission of volume of data equivalent to an encyclopedia each second. Facilities benefiting from Universal Memory Network include telemetry stations, simulation facilities, power-plants, and large laboratories or any facility sharing very large volumes of data. Main hub of UMN is reflection center including smaller hubs called Shared Memory Interfaces.

  15. Grouping and binding in visual short-term memory.

    PubMed

    Quinlan, Philip T; Cohen, Dale J

    2012-09-01

    Findings of 2 experiments are reported that challenge the current understanding of visual short-term memory (VSTM). In both experiments, a single study display, containing 6 colored shapes, was presented briefly and then probed with a single colored shape. At stake is how VSTM retains a record of different objects that share common features: In the 1st experiment, 2 study items sometimes shared a common feature (either a shape or a color). The data revealed a color sharing effect, in which memory was much better for items that shared a common color than for items that did not. The 2nd experiment showed that the size of the color sharing effect depended on whether a single pair of items shared a common color or whether 2 pairs of items were so defined-memory for all items improved when 2 color groups were presented. In explaining performance, an account is advanced in which items compete for a fixed number of slots, but then memory recall for any given stored item is prone to error. A critical assumption is that items that share a common color are stored together in a slot as a chunk. The evidence provides further support for the idea that principles of perceptual organization may determine the manner in which items are stored in VSTM. PsycINFO Database Record (c) 2012 APA, all rights reserved.

  16. Static analysis of the hull plate using the finite element method

    NASA Astrophysics Data System (ADS)

    Ion, A.

    2015-11-01

    This paper aims at presenting the static analysis for two levels of a container ship's construction as follows: the first level is at the girder / hull plate and the second level is conducted at the entire strength hull of the vessel. This article will describe the work for the static analysis of a hull plate. We shall use the software package ANSYS Mechanical 14.5. The program is run on a computer with four Intel Xeon X5260 CPU processors at 3.33 GHz, 32 GB memory installed. In terms of software, the shared memory parallel version of ANSYS refers to running ANSYS across multiple cores on a SMP system. The distributed memory parallel version of ANSYS (Distributed ANSYS) refers to running ANSYS across multiple processors on SMP systems or DMP systems.

  17. Sptrace

    NASA Technical Reports Server (NTRS)

    Burleigh, Scott C.

    2011-01-01

    Sptrace is a general-purpose space utilization tracing system that is conceptually similar to the commercial Purify product used to detect leaks and other memory usage errors. It is designed to monitor space utilization in any sort of heap, i.e., a region of data storage on some device (nominally memory; possibly shared and possibly persistent) with a flat address space. This software can trace usage of shared and/or non-volatile storage in addition to private RAM (random access memory). Sptrace is implemented as a set of C function calls that are invoked from within the software that is being examined. The function calls fall into two broad classes: (1) functions that are embedded within the heap management software [e.g., JPL's SDR (Simple Data Recorder) and PSM (Personal Space Management) systems] to enable heap usage analysis by populating a virtual time-sequenced log of usage activity, and (2) reporting functions that are embedded within the application program whose behavior is suspect. For ease of use, these functions may be wrapped privately inside public functions offered by the heap management software. Sptrace can be used for VxWorks or RTEMS realtime systems as easily as for Linux or OS/X systems.

  18. Proceedings: Sisal `93

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feo, J.T.

    1993-10-01

    This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor;more » A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.« less

  19. Shared memories reveal shared structure in neural activity across individuals

    PubMed Central

    Chen, J.; Leong, Y.C.; Honey, C.J.; Yong, C.H.; Norman, K.A.; Hasson, U.

    2016-01-01

    Our lives revolve around sharing experiences and memories with others. When different people recount the same events, how similar are their underlying neural representations? Participants viewed a fifty-minute movie, then verbally described the events during functional MRI, producing unguided detailed descriptions lasting up to forty minutes. As each person spoke, event-specific spatial patterns were reinstated in default-network, medial-temporal, and high-level visual areas. Individual event patterns were both highly discriminable from one another and similar between people, suggesting consistent spatial organization. In many high-order areas, patterns were more similar between people recalling the same event than between recall and perception, indicating systematic reshaping of percept into memory. These results reveal the existence of a common spatial organization for memories in high-level cortical areas, where encoded information is largely abstracted beyond sensory constraints; and that neural patterns during perception are altered systematically across people into shared memory representations for real-life events. PMID:27918531

  20. Mnemonic transmission, social contagion, and emergence of collective memory: Influence of emotional valence, group structure, and information distribution.

    PubMed

    Choi, Hae-Yoon; Kensinger, Elizabeth A; Rajaram, Suparna

    2017-09-01

    Social transmission of memory and its consequence on collective memory have generated enduring interdisciplinary interest because of their widespread significance in interpersonal, sociocultural, and political arenas. We tested the influence of 3 key factors-emotional salience of information, group structure, and information distribution-on mnemonic transmission, social contagion, and collective memory. Participants individually studied emotionally salient (negative or positive) and nonemotional (neutral) picture-word pairs that were completely shared, partially shared, or unshared within participant triads, and then completed 3 consecutive recalls in 1 of 3 conditions: individual-individual-individual (control), collaborative-collaborative (identical group; insular structure)-individual, and collaborative-collaborative (reconfigured group; diverse structure)-individual. Collaboration enhanced negative memories especially in insular group structure and especially for shared information, and promoted collective forgetting of positive memories. Diverse group structure reduced this negativity effect. Unequally distributed information led to social contagion that creates false memories; diverse structure propagated a greater variety of false memories whereas insular structure promoted confidence in false recognition and false collective memory. A simultaneous assessment of network structure, information distribution, and emotional valence breaks new ground to specify how network structure shapes the spread of negative memories and false memories, and the emergence of collective memory. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  1. Implementation of a parallel unstructured Euler solver on shared and distributed memory architectures

    NASA Technical Reports Server (NTRS)

    Mavriplis, D. J.; Das, Raja; Saltz, Joel; Vermeland, R. E.

    1992-01-01

    An efficient three dimensional unstructured Euler solver is parallelized on a Cray Y-MP C90 shared memory computer and on an Intel Touchstone Delta distributed memory computer. This paper relates the experiences gained and describes the software tools and hardware used in this study. Performance comparisons between two differing architectures are made.

  2. Development of a Dynamic Time Sharing Scheduled Environment Final Report CRADA No. TC-824-94E

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jette, M.; Caliga, D.

    Massively parallel computers, such as the Cray T3D, have historically supported resource sharing solely with space sharing. In that method, multiple problems are solved by executing them on distinct processors. This project developed a dynamic time- and space-sharing scheduler to achieve greater interactivity and throughput than could be achieved with space-sharing alone. CRI and LLNL worked together on the design, testing, and review aspects of this project. There were separate software deliverables. CFU implemented a general purpose scheduling system as per the design specifications. LLNL ported the local gang scheduler software to the LLNL Cray T3D. In this approach, processorsmore » are allocated simultaneously to aU components of a parallel program (in a “gang”). Program execution is preempted as needed to provide for interactivity. Programs are also reIocated to different processors as needed to efficiently pack the computer’s torus of processors. In phase one, CRI developed an interface specification after discussions with LLNL for systemlevel software supporting a time- and space-sharing environment on the LLNL T3D. The two parties also discussed interface specifications for external control tools (such as scheduling policy tools, system administration tools) and applications programs. CRI assumed responsibility for the writing and implementation of all the necessary system software in this phase. In phase two, CRI implemented job-rolling on the Cray T3D, a mechanism for preempting a program, saving its state to disk, and later restoring its state to memory for continued execution. LLNL ported its gang scheduler to the LLNL T3D utilizing the CRI interface implemented in phases one and two. During phase three, the functionality and effectiveness of the LLNL gang scheduler was assessed to provide input to CRI time- and space-sharing, efforts. CRI will utilize this information in the development of general schedulers suitable for other sites and future architectures.« less

  3. Sleep Benefits Memory for Semantic Category Structure While Preserving Exemplar-Specific Information.

    PubMed

    Schapiro, Anna C; McDevitt, Elizabeth A; Chen, Lang; Norman, Kenneth A; Mednick, Sara C; Rogers, Timothy T

    2017-11-01

    Semantic memory encompasses knowledge about both the properties that typify concepts (e.g. robins, like all birds, have wings) as well as the properties that individuate conceptually related items (e.g. robins, in particular, have red breasts). We investigate the impact of sleep on new semantic learning using a property inference task in which both kinds of information are initially acquired equally well. Participants learned about three categories of novel objects possessing some properties that were shared among category exemplars and others that were unique to an exemplar, with exposure frequency varying across categories. In Experiment 1, memory for shared properties improved and memory for unique properties was preserved across a night of sleep, while memory for both feature types declined over a day awake. In Experiment 2, memory for shared properties improved across a nap, but only for the lower-frequency category, suggesting a prioritization of weakly learned information early in a sleep period. The increase was significantly correlated with amount of REM, but was also observed in participants who did not enter REM, suggesting involvement of both REM and NREM sleep. The results provide the first evidence that sleep improves memory for the shared structure of object categories, while simultaneously preserving object-unique information.

  4. A Tutorial on Parallel and Concurrent Programming in Haskell

    NASA Astrophysics Data System (ADS)

    Peyton Jones, Simon; Singh, Satnam

    This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.

  5. Parallel Computation of the Regional Ocean Modeling System (ROMS)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, P; Song, Y T; Chao, Y

    2005-04-05

    The Regional Ocean Modeling System (ROMS) is a regional ocean general circulation modeling system solving the free surface, hydrostatic, primitive equations over varying topography. It is free software distributed world-wide for studying both complex coastal ocean problems and the basin-to-global scale ocean circulation. The original ROMS code could only be run on shared-memory systems. With the increasing need to simulate larger model domains with finer resolutions and on a variety of computer platforms, there is a need in the ocean-modeling community to have a ROMS code that can be run on any parallel computer ranging from 10 to hundreds ofmore » processors. Recently, we have explored parallelization for ROMS using the MPI programming model. In this paper, an efficient parallelization strategy for such a large-scale scientific software package, based on an existing shared-memory computing model, is presented. In addition, scientific applications and data-performance issues on a couple of SGI systems, including Columbia, the world's third-fastest supercomputer, are discussed.« less

  6. Structurally Integrated Versus Structurally Segregated Memory Representations: Implications for the Design of Instructional Materials.

    ERIC Educational Resources Information Center

    Hayes-Roth, Barbara

    Two kinds of memory organization are distinguished: segregrated versus integrated. In segregated memory organizations, related learned propositions have separate memory representations. In integrated memory organizations, memory representations of related propositions share common subrepresentations. Segregated memory organizations facilitate…

  7. Getting connected: Both associative and semantic links structure semantic memory for newly learned persons.

    PubMed

    Wiese, Holger; Schweinberger, Stefan R

    2015-01-01

    The present study examined whether semantic memory for newly learned people is structured by visual co-occurrence, shared semantics, or both. Participants were trained with pairs of simultaneously presented (i.e., co-occurring) preexperimentally unfamiliar faces, which either did or did not share additionally provided semantic information (occupation, place of living, etc.). Semantic information could also be shared between faces that did not co-occur. A subsequent priming experiment revealed faster responses for both co-occurrence/no shared semantics and no co-occurrence/shared semantics conditions, than for an unrelated condition. Strikingly, priming was strongest in the co-occurrence/shared semantics condition, suggesting additive effects of these factors. Additional analysis of event-related brain potentials yielded priming in the N400 component only for combined effects of visual co-occurrence and shared semantics, with more positive amplitudes in this than in the unrelated condition. Overall, these findings suggest that both semantic relatedness and visual co-occurrence are important when novel information is integrated into person-related semantic memory.

  8. Categorical and associative relations increase false memory relative to purely associative relations.

    PubMed

    Coane, Jennifer H; McBride, Dawn M; Termonen, Miia-Liisa; Cutting, J Cooper

    2016-01-01

    The goal of the present study was to examine the contributions of associative strength and similarity in terms of shared features to the production of false memories in the Deese/Roediger-McDermott list-learning paradigm. Whereas the activation/monitoring account suggests that false memories are driven by automatic associative activation from list items to nonpresented lures, combined with errors in source monitoring, other accounts (e.g., fuzzy trace theory, global-matching models) emphasize the importance of semantic-level similarity, and thus predict that shared features between list and lure items will increase false memory. Participants studied lists of nine items related to a nonpresented lure. Half of the lists consisted of items that were associated but did not share features with the lure, and the other half included items that were equally associated but also shared features with the lure (in many cases, these were taxonomically related items). The two types of lists were carefully matched in terms of a variety of lexical and semantic factors, and the same lures were used across list types. In two experiments, false recognition of the critical lures was greater following the study of lists that shared features with the critical lure, suggesting that similarity at a categorical or taxonomic level contributes to false memory above and beyond associative strength. We refer to this phenomenon as a "feature boost" that reflects additive effects of shared meaning and association strength and is generally consistent with accounts of false memory that have emphasized thematic or feature-level similarity among studied and nonstudied representations.

  9. System and method for programmable bank selection for banked memory subsystems

    DOEpatents

    Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Hoenicke, Dirk; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan

    2010-09-07

    A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each of the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.

  10. Work stealing for GPU-accelerated parallel programs in a global address space framework: WORK STEALING ON GPU-ACCELERATED SYSTEMS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

    Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.« less

  11. Work stealing for GPU-accelerated parallel programs in a global address space framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

    Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain« less

  12. Early programming and late-acting checkpoints governing the development of CD4 T cell memory.

    PubMed

    Dhume, Kunal; McKinstry, K Kai

    2018-04-27

    CD4 T cells contribute to protection against pathogens through numerous mechanisms. Incorporating the goal of memory CD4 T cell generation into vaccine strategies thus offers a powerful approach to improve their efficacy, especially in situations where humoral responses alone cannot confer long-term immunity. These threats include viruses such as influenza that mutate coat proteins to avoid neutralizing antibodies, but that are targeted by T cells that recognize more conserved protein epitopes shared by different strains. A major barrier in the design of such vaccines is that the mechanisms controlling the efficiency with which memory cells form remain incompletely understood. Here, we discuss recent insights into fate decisions controlling memory generation. We focus on the importance of three general cues: interleukin-2, antigen, and costimulatory interactions. It is increasingly clear that these signals have a powerful influence on the capacity of CD4 T cells to form memory during two distinct phases of the immune response. First, through 'programming' that occurs during initial priming, and second, through 'checkpoints' that operate later during the effector stage. These findings indicate that novel vaccine strategies must seek to optimize cognate interactions, during which interleukin-2-, antigen, and costimulation-dependent signals are tightly linked, well beyond initial antigen encounter to induce robust memory CD4 T cells. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  13. SKIRT: Hybrid parallelization of radiative transfer simulations

    NASA Astrophysics Data System (ADS)

    Verstocken, S.; Van De Putte, D.; Camps, P.; Baes, M.

    2017-07-01

    We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modelling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behaviour of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.

  14. Shared processing in multiple object tracking and visual working memory in the absence of response order and task order confounds

    PubMed Central

    Howe, Piers D. L.

    2017-01-01

    To understand how the visual system represents multiple moving objects and how those representations contribute to tracking, it is essential that we understand how the processes of attention and working memory interact. In the work described here we present an investigation of that interaction via a series of tracking and working memory dual-task experiments. Previously, it has been argued that tracking is resistant to disruption by a concurrent working memory task and that any apparent disruption is in fact due to observers making a response to the working memory task, rather than due to competition for shared resources. Contrary to this, in our experiments we find that when task order and response order confounds are avoided, all participants show a similar decrease in both tracking and working memory performance. However, if task and response order confounds are not adequately controlled for we find substantial individual differences, which could explain the previous conflicting reports on this topic. Our results provide clear evidence that tracking and working memory tasks share processing resources. PMID:28410383

  15. Shared processing in multiple object tracking and visual working memory in the absence of response order and task order confounds.

    PubMed

    Lapierre, Mark D; Cropper, Simon J; Howe, Piers D L

    2017-01-01

    To understand how the visual system represents multiple moving objects and how those representations contribute to tracking, it is essential that we understand how the processes of attention and working memory interact. In the work described here we present an investigation of that interaction via a series of tracking and working memory dual-task experiments. Previously, it has been argued that tracking is resistant to disruption by a concurrent working memory task and that any apparent disruption is in fact due to observers making a response to the working memory task, rather than due to competition for shared resources. Contrary to this, in our experiments we find that when task order and response order confounds are avoided, all participants show a similar decrease in both tracking and working memory performance. However, if task and response order confounds are not adequately controlled for we find substantial individual differences, which could explain the previous conflicting reports on this topic. Our results provide clear evidence that tracking and working memory tasks share processing resources.

  16. Visual and spatial working memory are not that dissociated after all: a time-based resource-sharing account.

    PubMed

    Vergauwe, Evie; Barrouillet, Pierre; Camos, Valérie

    2009-07-01

    Examinations of interference between visual and spatial materials in working memory have suggested domain- and process-based fractionations of visuo-spatial working memory. The present study examined the role of central time-based resource sharing in visuo-spatial working memory and assessed its role in obtained interference patterns. Visual and spatial storage were combined with both visual and spatial on-line processing components in computer-paced working memory span tasks (Experiment 1) and in a selective interference paradigm (Experiment 2). The cognitive load of the processing components was manipulated to investigate its impact on concurrent maintenance for both within-domain and between-domain combinations of processing and storage components. In contrast to both domain- and process-based fractionations of visuo-spatial working memory, the results revealed that recall performance was determined by the cognitive load induced by the processing of items, rather than by the domain to which those items pertained. These findings are interpreted as evidence for a time-based resource-sharing mechanism in visuo-spatial working memory.

  17. Why are you telling me that? A conceptual model of the social function of autobiographical memory.

    PubMed

    Alea, Nicole; Bluck, Susan

    2003-03-01

    In an effort to stimulate and guide empirical work within a functional framework, this paper provides a conceptual model of the social functions of autobiographical memory (AM) across the lifespan. The model delineates the processes and variables involved when AMs are shared to serve social functions. Components of the model include: lifespan contextual influences, the qualitative characteristics of memory (emotionality and level of detail recalled), the speaker's characteristics (age, gender, and personality), the familiarity and similarity of the listener to the speaker, the level of responsiveness during the memory-sharing process, and the nature of the social relationship in which the memory sharing occurs (valence and length of the relationship). These components are shown to influence the type of social function served and/or, the extent to which social functions are served. Directions for future empirical work to substantiate the model and hypotheses derived from the model are provided.

  18. Support of Multidimensional Parallelism in the OpenMP Programming Model

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele

    2003-01-01

    OpenMP is the current standard for shared-memory programming. While providing ease of parallel programming, the OpenMP programming model also has limitations which often effect the scalability of applications. Examples for these limitations are work distribution and point-to-point synchronization among threads. We propose extensions to the OpenMP programming model which allow the user to easily distribute the work in multiple dimensions and synchronize the workflow among the threads. The proposed extensions include four new constructs and the associated runtime library. They do not require changes to the source code and can be implemented based on the existing OpenMP standard. We illustrate the concept in a prototype translator and test with benchmark codes and a cloud modeling code.

  19. Destination memory impairment in older people.

    PubMed

    Gopie, Nigel; Craik, Fergus I M; Hasher, Lynn

    2010-12-01

    Older adults are assumed to have poor destination memory-knowing to whom they tell particular information-and anecdotes about them repeating stories to the same people are cited as informal evidence for this claim. Experiment 1 assessed young and older adults' destination memory by having participants tell facts (e.g., "A dime has 118 ridges around its edge") to pictures of famous people (e.g., Oprah Winfrey). Surprise recognition memory tests, which also assessed confidence, revealed that older adults, compared to young adults, were disproportionately impaired on destination memory relative to spared memory for the individual components (i.e., facts, faces) of the episode. Older adults also were more confident that they had not told a fact to a particular person when they actually had (i.e., a miss); this presumably causes them to repeat information more often than young adults. When the direction of information transfer was reversed in Experiment 2, such that the famous people shared information with the participants (i.e., a source memory experiment), age-related memory differences disappeared. In contrast to the destination memory experiment, older adults in the source memory experiment were more confident than young adults that someone had shared a fact with them when a different person actually had shared the fact (i.e., a false alarm). Overall, accuracy and confidence jointly influence age-related changes to destination memory, a fundamental component of successful communication. (c) 2010 APA, all rights reserved).

  20. Architecture, Design and Implementation of RC64, a Many-Core High-Performance DSP for Space Applications

    NASA Astrophysics Data System (ADS)

    Ginosar, Ran; Aviely, Peleg; Liran, Tuvia; Alon, Dov; Dobkin, Reuven; Goldberg, Michael

    2013-08-01

    RC64, a novel 64-core many-core signal processing chip targets DSP performance of 12.8 GIPS, 100 GOPS and 12.8 single precision GFLOS while dissipating only 3 Watts. RC64 employs advanced DSP cores, a multi-bank shared memory and a hardware scheduler, supports DDR2 memory and communicates over five proprietary 6.4 Gbps channels. The programming model employs sequential fine-grain tasks and a separate task map to define task dependencies. RC64 is implemented as a 200 MHz ASIC on Tower 130nm CMOS technology, assembled in hermetically sealed ceramic QFP package and qualified to the highest space standards.

  1. NAS Parallel Benchmark. Results 11-96: Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks. 1.0

    NASA Technical Reports Server (NTRS)

    Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)

    1997-01-01

    High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.

  2. CD4 memory T cells develop and acquire functional competence by sequential cognate interactions and stepwise gene regulation

    PubMed Central

    Kaji, Tomohiro; Hijikata, Atsushi; Ishige, Akiko; Kitami, Toshimori; Watanabe, Takashi; Ohara, Osamu; Yanaka, Noriyuki; Okada, Mariko; Shimoda, Michiko; Taniguchi, Masaru

    2016-01-01

    Memory CD4+ T cells promote protective humoral immunity; however, how memory T cells acquire this activity remains unclear. This study demonstrates that CD4+ T cells develop into antigen-specific memory T cells that can promote the terminal differentiation of memory B cells far more effectively than their naive T-cell counterparts. Memory T cell development requires the transcription factor B-cell lymphoma 6 (Bcl6), which is known to direct T-follicular helper (Tfh) cell differentiation. However, unlike Tfh cells, memory T cell development did not require germinal center B cells. Curiously, memory T cells that develop in the absence of cognate B cells cannot promote memory B-cell recall responses and this defect was accompanied by down-regulation of genes associated with homeostasis and activation and up-regulation of genes inhibitory for T-cell responses. Although memory T cells display phenotypic and genetic signatures distinct from Tfh cells, both had in common the expression of a group of genes associated with metabolic pathways. This gene expression profile was not shared to any great extent with naive T cells and was not influenced by the absence of cognate B cells during memory T cell development. These results suggest that memory T cell development is programmed by stepwise expression of gatekeeper genes through serial interactions with different types of antigen-presenting cells, first licensing the memory lineage pathway and subsequently facilitating the functional development of memory T cells. Finally, we identified Gdpd3 as a candidate genetic marker for memory T cells. PMID:26714588

  3. Interference due to shared features between action plans is influenced by working memory span.

    PubMed

    Fournier, Lisa R; Behmer, Lawrence P; Stubblefield, Alexandra M

    2014-12-01

    In this study, we examined the interactions between the action plans that we hold in memory and the actions that we carry out, asking whether the interference due to shared features between action plans is due to selection demands imposed on working memory. Individuals with low and high working memory spans learned arbitrary motor actions in response to two different visual events (A and B), presented in a serial order. They planned a response to the first event (A) and while maintaining this action plan in memory they then executed a speeded response to the second event (B). Afterward, they executed the action plan for the first event (A) maintained in memory. Speeded responses to the second event (B) were delayed when it shared an action feature (feature overlap) with the first event (A), relative to when it did not (no feature overlap). The size of the feature-overlap delay was greater for low-span than for high-span participants. This indicates that interference due to overlapping action plans is greater when fewer working memory resources are available, suggesting that this interference is due to selection demands imposed on working memory. Thus, working memory plays an important role in managing current and upcoming action plans, at least for newly learned tasks. Also, managing multiple action plans is compromised in individuals who have low versus high working memory spans.

  4. Destination Memory Impairment in Older People

    PubMed Central

    Gopie, Nigel; Craik, Fergus I. M.; Hasher, Lynn

    2012-01-01

    Older adults are assumed to have poor destination memory— knowing to whom they tell particular information—and anecdotes about them repeating stories to the same people are cited as informal evidence for this claim. Experiment 1 assessed young and older adults’ destination memory by having participants tell facts (e.g., “A dime has 118 ridges around its edge”) to pictures of famous people (e.g., Oprah Winfrey). Surprise recognition memory tests, which also assessed confidence, revealed that older adults, compared to young adults, were disproportionately impaired on destination memory relative to spared memory for the individual components (i.e., facts, faces) of the episode. Older adults also were more confident that they had not told a fact to a particular person when they actually had (i.e., a miss); this presumably causes them to repeat information more often than young adults. When the direction of information transfer was reversed in Experiment 2, such that the famous people shared information with the participants (i.e., a source memory experiment), age-related memory differences disappeared. In contrast to the destination memory experiment, older adults in the source memory experiment were more confident than young adults that someone had shared a fact with them when a different person actually had shared the fact (i.e., a false alarm). Overall, accuracy and confidence jointly influence age-related changes to destination memory, a fundamental component of successful communication. PMID:20718537

  5. PANDA: A distributed multiprocessor operating system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chubb, P.

    1989-01-01

    PANDA is a design for a distributed multiprocessor and an operating system. PANDA is designed to allow easy expansion of both hardware and software. As such, the PANDA kernel provides only message passing and memory and process management. The other features needed for the system (device drivers, secondary storage management, etc.) are provided as replaceable user tasks. The thesis presents PANDA's design and implementation, both hardware and software. PANDA uses multiple 68010 processors sharing memory on a VME bus, each such node potentially connected to others via a high speed network. The machine is completely homogeneous: there are no differencesmore » between processors that are detectable by programs running on the machine. A single two-processor node has been constructed. Each processor contains memory management circuits designed to allow processors to share page tables safely. PANDA presents a programmers' model similar to the hardware model: a job is divided into multiple tasks, each having its own address space. Within each task, multiple processes share code and data. Tasks can send messages to each other, and set up virtual circuits between themselves. Peripheral devices such as disc drives are represented within PANDA by tasks. PANDA divides secondary storage into volumes, each volume being accessed by a volume access task, or VAT. All knowledge about the way that data is stored on a disc is kept in its volume's VAT. The design is such that PANDA should provide a useful testbed for file systems and device drivers, as these can be installed without recompiling PANDA itself, and without rebooting the machine.« less

  6. NPS Collaborative Technology Testbed for ONR CKM Program

    DTIC Science & Technology

    2005-01-11

    or have access to the MIT E-Wall hosted by the TOC. The combination of E-Wall and agents lend themselves to the dynamic gathering and display of...display, intuitive icons or menus that is easy to activate and customize , and automatically seeks and connects to other like services/networks/agents...integration creates network- centric memory mechanism for developing shared understanding of SA events Data Base Integration of Sensor-DM Agents and

  7. Verified Separate Compilation for C

    DTIC Science & Technology

    2015-06-01

    simulations, says that the visible set is closed under reachability. These two conditions, plus (6.2) and monotonicity of the REACH relation, imply...erase to a CompCert memory m. By erasure, we mean the removal of the “ juice ” that is unnecessary for execution (as in Curry-style type erasure of...simply typed lambda calculus). The “ juice ” has several components: permission shares controlling access to objects in the program logic; predicates in the

  8. Parallel programming with Easy Java Simulations

    NASA Astrophysics Data System (ADS)

    Esquembre, F.; Christian, W.; Belloni, M.

    2018-01-01

    Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.

  9. Low latency memory access and synchronization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.

    A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processormore » only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.« less

  10. Low latency memory access and synchronization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.

    A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processormore » only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.« less

  11. OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers

    NASA Astrophysics Data System (ADS)

    Kimura, Keiji; Mase, Masayoshi; Mikami, Hiroki; Miyamoto, Takamichi; Shirako, Jun; Kasahara, Hironori

    OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an METI/NEDO project entitled "Multicore Technology for Realtime Consumer Electronics." By using the OSCAR API as an interface between the OSCAR compiler and backend compilers, the OSCAR compiler enables hierarchical multigrain parallel processing with memory optimization under capacity restriction for cache memory, local memory, distributed shared memory, and on-chip/off-chip shared memory; data transfer using a DMA controller; and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating for various embedded multicores. In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OSCAR API is designed on a subset of the OpenMP. This paper describes the OSCAR API and its compatibility with the OSCAR compiler by showing code examples. Performance evaluations of the OSCAR compiler and the OSCAR API are carried out using an IBM Power5+ workstation, an IBM Power6 high-end SMP server, and a newly developed consumer electronics multicore chip RP2 by Renesas, Hitachi and Waseda. From the results of scalability evaluation, it is found that on an average, the OSCAR compiler with the OSCAR API can exploit 5.8 times speedup over the sequential execution on the Power5+ workstation with eight cores and 2.9 times speedup on RP2 with four cores, respectively. In addition, the OSCAR compiler can accelerate an IBM XL Fortran compiler up to 3.3 times on the Power6 SMP server. Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.

  12. Location-Unbound Color-Shape Binding Representations in Visual Working Memory.

    PubMed

    Saiki, Jun

    2016-02-01

    The mechanism by which nonspatial features, such as color and shape, are bound in visual working memory, and the role of those features' location in their binding, remains unknown. In the current study, I modified a redundancy-gain paradigm to investigate these issues. A set of features was presented in a two-object memory display, followed by a single object probe. Participants judged whether the probe contained any features of the memory display, regardless of its location. Response time distributions revealed feature coactivation only when both features of a single object in the memory display appeared together in the probe, regardless of the response time benefit from the probe and memory objects sharing the same location. This finding suggests that a shared location is necessary in the formation of bound representations but unnecessary in their maintenance. Electroencephalography data showed that amplitude modulations reflecting location-unbound feature coactivation were different from those reflecting the location-sharing benefit, consistent with the behavioral finding that feature-location binding is unnecessary in the maintenance of color-shape binding. © The Author(s) 2015.

  13. Shared reality in intergroup communication: Increasing the epistemic authority of an out-group audience.

    PubMed

    Echterhoff, Gerald; Kopietz, René; Higgins, E Tory

    2017-06-01

    Communicators typically tune messages to their audience's attitude. Such audience tuning biases communicators' memory for the topic toward the audience's attitude to the extent that they create a shared reality with the audience. To investigate shared reality in intergroup communication, we first established that a reduced memory bias after tuning messages to an out-group (vs. in-group) audience is a subtle index of communicators' denial of shared reality to that out-group audience (Experiments 1a and 1b). We then examined whether the audience-tuning memory bias might emerge when the out-group audience's epistemic authority is enhanced, either by increasing epistemic expertise concerning the communication topic or by creating epistemic consensus among members of a multiperson out-group audience. In Experiment 2, when Germans communicated to a Turkish audience with an attitude about a Turkish (vs. German) target, the audience-tuning memory bias appeared. In Experiment 3, when the audience of German communicators consisted of 3 Turks who all held the same attitude toward the target, the memory bias again appeared. The association between message valence and memory valence was consistently higher when the audience's epistemic authority was high (vs. low). An integrative analysis across all studies also suggested that the memory bias increases with increasing strength of epistemic inputs (epistemic expertise, epistemic consensus, and audience-tuned message production). The findings suggest novel ways of overcoming intergroup biases in intergroup relations. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  14. The Developmental Influence of Primary Memory Capacity on Working Memory and Academic Achievement

    PubMed Central

    2015-01-01

    In this study, we investigate the development of primary memory capacity among children. Children between the ages of 5 and 8 completed 3 novel tasks (split span, interleaved lists, and a modified free-recall task) that measured primary memory by estimating the number of items in the focus of attention that could be spontaneously recalled in serial order. These tasks were calibrated against traditional measures of simple and complex span. Clear age-related changes in these primary memory estimates were observed. There were marked individual differences in primary memory capacity, but each novel measure was predictive of simple span performance. Among older children, each measure shared variance with reading and mathematics performance, whereas for younger children, the interleaved lists task was the strongest single predictor of academic ability. We argue that these novel tasks have considerable potential for the measurement of primary memory capacity and provide new, complementary ways of measuring the transient memory processes that predict academic performance. The interleaved lists task also shared features with interference control tasks, and our findings suggest that young children have a particular difficulty in resisting distraction and that variance in the ability to resist distraction is also shared with measures of educational attainment. PMID:26075630

  15. The developmental influence of primary memory capacity on working memory and academic achievement.

    PubMed

    Hall, Debbora; Jarrold, Christopher; Towse, John N; Zarandi, Amy L

    2015-08-01

    In this study, we investigate the development of primary memory capacity among children. Children between the ages of 5 and 8 completed 3 novel tasks (split span, interleaved lists, and a modified free-recall task) that measured primary memory by estimating the number of items in the focus of attention that could be spontaneously recalled in serial order. These tasks were calibrated against traditional measures of simple and complex span. Clear age-related changes in these primary memory estimates were observed. There were marked individual differences in primary memory capacity, but each novel measure was predictive of simple span performance. Among older children, each measure shared variance with reading and mathematics performance, whereas for younger children, the interleaved lists task was the strongest single predictor of academic ability. We argue that these novel tasks have considerable potential for the measurement of primary memory capacity and provide new, complementary ways of measuring the transient memory processes that predict academic performance. The interleaved lists task also shared features with interference control tasks, and our findings suggest that young children have a particular difficulty in resisting distraction and that variance in the ability to resist distraction is also shared with measures of educational attainment. (c) 2015 APA, all rights reserved).

  16. Measuring Transactiving Memory Systems Using Network Analysis

    ERIC Educational Resources Information Center

    King, Kylie Goodell

    2017-01-01

    Transactive memory systems (TMSs) describe the structures and processes that teams use to share information, work together, and accomplish shared goals. First introduced over three decades ago, TMSs have been measured in a variety of ways. This dissertation proposes the use of network analysis in measuring TMS. This is accomplished by describing…

  17. Operator Influence of Unexploded Ordnance Sensor Technologies

    DTIC Science & Technology

    2007-03-01

    chart display ActiveX control Mscomct2.dll – date/time display ActiveX control Pnpscr.dll – Systran SCRAMNet replicated shared memory device...response value database rgm_p2.dll – Phase 2 shared memory API and implementation Commercial components StripM.ocx – strip chart display ActiveX

  18. Runtime support for parallelizing data mining algorithms

    NASA Astrophysics Data System (ADS)

    Jin, Ruoming; Agrawal, Gagan

    2002-03-01

    With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.

  19. Concurrent working memory load can facilitate selective attention: evidence for specialized load.

    PubMed

    Park, Soojin; Kim, Min-Shik; Chun, Marvin M

    2007-10-01

    Load theory predicts that concurrent working memory load impairs selective attention and increases distractor interference (N. Lavie, A. Hirst, J. W. de Fockert, & E. Viding). Here, the authors present new evidence that the type of concurrent working memory load determines whether load impairs selective attention or not. Working memory load was paired with a same/different matching task that required focusing on targets while ignoring distractors. When working memory items shared the same limited-capacity processing mechanisms with targets in the matching task, distractor interference increased. However, when working memory items shared processing with distractors in the matching task, distractor interference decreased, facilitating target selection. A specialized load account is proposed to describe the dissociable effects of working memory load on selective processing depending on whether the load overlaps with targets or with distractors. (c) 2007 APA

  20. Internet Technology in Magnetic Resonance: A Common Gateway Interface Program for the World-Wide Web NMR Spectrometer

    NASA Astrophysics Data System (ADS)

    Buszko, Marian L.; Buszko, Dominik; Wang, Daniel C.

    1998-04-01

    A custom-written Common Gateway Interface (CGI) program for remote control of an NMR spectrometer using a World Wide Web browser has been described. The program, running on a UNIX workstation, uses multiple processes to handle concurrent tasks of interacting with the user and with the spectrometer. The program's parent process communicates with the browser and sends out commands to the spectrometer; the child process is mainly responsible for data acquisition. Communication between the processes is via the shared memory mechanism. The WWW pages that have been developed for the system make use of the frames feature of web browsers. The CGI program provides an intuitive user interface to the NMR spectrometer, making, in effect, a complex system an easy-to-use Web appliance.

  1. Sharing programming resources between Bio* projects through remote procedure call and native call stack strategies.

    PubMed

    Prins, Pjotr; Goto, Naohisa; Yates, Andrew; Gautier, Laurent; Willis, Scooter; Fields, Christopher; Katayama, Toshiaki

    2012-01-01

    Open-source software (OSS) encourages computer programmers to reuse software components written by others. In evolutionary bioinformatics, OSS comes in a broad range of programming languages, including C/C++, Perl, Python, Ruby, Java, and R. To avoid writing the same functionality multiple times for different languages, it is possible to share components by bridging computer languages and Bio* projects, such as BioPerl, Biopython, BioRuby, BioJava, and R/Bioconductor. In this chapter, we compare the two principal approaches for sharing software between different programming languages: either by remote procedure call (RPC) or by sharing a local call stack. RPC provides a language-independent protocol over a network interface; examples are RSOAP and Rserve. The local call stack provides a between-language mapping not over the network interface, but directly in computer memory; examples are R bindings, RPy, and languages sharing the Java Virtual Machine stack. This functionality provides strategies for sharing of software between Bio* projects, which can be exploited more often. Here, we present cross-language examples for sequence translation, and measure throughput of the different options. We compare calling into R through native R, RSOAP, Rserve, and RPy interfaces, with the performance of native BioPerl, Biopython, BioJava, and BioRuby implementations, and with call stack bindings to BioJava and the European Molecular Biology Open Software Suite. In general, call stack approaches outperform native Bio* implementations and these, in turn, outperform RPC-based approaches. To test and compare strategies, we provide a downloadable BioNode image with all examples, tools, and libraries included. The BioNode image can be run on VirtualBox-supported operating systems, including Windows, OSX, and Linux.

  2. Transactive memory systems scale for couples: development and validation

    PubMed Central

    Hewitt, Lauren Y.; Roberts, Lynne D.

    2015-01-01

    People in romantic relationships can develop shared memory systems by pooling their cognitive resources, allowing each person access to more information but with less cognitive effort. Research examining such memory systems in romantic couples largely focuses on remembering word lists or performing lab-based tasks, but these types of activities do not capture the processes underlying couples’ transactive memory systems, and may not be representative of the ways in which romantic couples use their shared memory systems in everyday life. We adapted an existing measure of transactive memory systems for use with romantic couples (TMSS-C), and conducted an initial validation study. In total, 397 participants who each identified as being a member of a romantic relationship of at least 3 months duration completed the study. The data provided a good fit to the anticipated three-factor structure of the components of couples’ transactive memory systems (specialization, credibility and coordination), and there was reasonable evidence of both convergent and divergent validity, as well as strong evidence of test–retest reliability across a 2-week period. The TMSS-C provides a valuable tool that can quickly and easily capture the underlying components of romantic couples’ transactive memory systems. It has potential to help us better understand this intriguing feature of romantic relationships, and how shared memory systems might be associated with other important features of romantic relationships. PMID:25999873

  3. DMA shared byte counters in a parallel computer

    DOEpatents

    Chen, Dong; Gara, Alan G.; Heidelberger, Philip; Vranas, Pavlos

    2010-04-06

    A parallel computer system is constructed as a network of interconnected compute nodes. Each of the compute nodes includes at least one processor, a memory and a DMA engine. The DMA engine includes a processor interface for interfacing with the at least one processor, DMA logic, a memory interface for interfacing with the memory, a DMA network interface for interfacing with the network, injection and reception byte counters, injection and reception FIFO metadata, and status registers and control registers. The injection FIFOs maintain memory locations of the injection FIFO metadata memory locations including its current head and tail, and the reception FIFOs maintain the reception FIFO metadata memory locations including its current head and tail. The injection byte counters and reception byte counters may be shared between messages.

  4. Division of attention as a function of the number of steps, visual shifts, and memory load

    NASA Technical Reports Server (NTRS)

    Chechile, R. A.; Butler, K.; Gutowski, W.; Palmer, E. A.

    1986-01-01

    The effects on divided attention of visual shifts and long-term memory retrieval during a monitoring task are considered. A concurrent vigilance task was standardized under all experimental conditions. The results show that subjects can perform nearly perfectly on all of the time-shared tasks if long-term memory retrieval is not required for monitoring. With the requirement of memory retrieval, however, there was a large decrease in accuracy for all of the time-shared activities. It was concluded that the attentional demand of longterm memory retrieval is appreciable (even for a well-learned motor sequence), and thus memory retrieval results in a sizable reduction in the capability of subjects to divide their attention. A selected bibliography on the divided attention literature is provided.

  5. [Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].

    PubMed

    Furuta, Takuya; Sato, Tatsuhiko

    2015-01-01

    Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.

  6. RC64, a Rad-Hard Many-Core High- Performance DSP for Space Applications

    NASA Astrophysics Data System (ADS)

    Ginosar, Ran; Aviely, Peleg; Gellis, Hagay; Liran, Tuvia; Israeli, Tsvika; Nesher, Roy; Lange, Fredy; Dobkin, Reuven; Meirov, Henri; Reznik, Dror

    2015-09-01

    RC64, a novel rad-hard 64-core signal processing chip targets DSP performance of 75 GMACs (16bit), 150 GOPS and 38 single precision GFLOPS while dissipating less than 10 Watts. RC64 integrates advanced DSP cores with a multi-bank shared memory and a hardware scheduler, also supporting DDR2/3 memory and twelve 3.125 Gbps full duplex high speed serial links using SpaceFibre and other protocols. The programming model employs sequential fine-grain tasks and a separate task map to define task dependencies. RC64 is implemented as a 300 MHz integrated circuit on a 65nm CMOS technology, assembled in hermetically sealed ceramic CCGA624 package and qualified to the highest space standards.

  7. RC64, a Rad-Hard Many-Core High-Performance DSP for Space Applications

    NASA Astrophysics Data System (ADS)

    Ginosar, Ran; Aviely, Peleg; Liran, Tuvia; Alon, Dov; Mandler, Alberto; Lange, Fredy; Dobkin, Reuven; Goldberg, Miki

    2014-08-01

    RC64, a novel rad-hard 64-core signal processing chip targets DSP performance of 75 GMACs (16bit), 150 GOPS and 20 single precision GFLOPS while dissipating less than 10 Watts. RC64 integrates advanced DSP cores with a multi-bank shared memory and a hardware scheduler, also supporting DDR2/3 memory and twelve 2.5 Gbps full duplex high speed serial links using SpaceFibre and other protocols. The programming model employs sequential fine-grain tasks and a separate task map to define task dependencies. RC64 is implemented as a 300 MHz integrated circuit on a 65nm CMOS technology, assembled in hermetically sealed ceramic CCGA624 package and qualified to the highest space standards.

  8. Welcoming nora: a family event.

    PubMed

    Walsh, Allison J; Walsh, Paul R; Walsh, Jane M; Walsh, Gavin T

    2011-01-01

    In this column, Allison and Paul Walsh share the story of the birth of Nora, their third baby and their second child to be born at home. Allison and Paul share their individual memories of labor and birth. But their story is only part of the story of Nora's birth. Nora's birth was a family event, with Allison and Paul's other children very much part of the experience. Jane and Gavin share their own memories of their baby sister's birth.

  9. Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster

    NASA Technical Reports Server (NTRS)

    Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland

    2003-01-01

    In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.

  10. Colouring in the Blanks: Memory Drawings of the 1990 Kuwait Invasion

    ERIC Educational Resources Information Center

    Pepin-Wakefield, Yvonne

    2009-01-01

    This study used drawing tasks to examine the similarities and differences between females and males who shared a collective traumatic event in early childhood. Could these childhood memories be recorded, measured, and compared for gender differences in drawings by young adults who had shared a similar experience as children? Exploration of this…

  11. Functions of Memory Sharing and Mother-Child Reminiscing Behaviors: Individual and Cultural Variations

    ERIC Educational Resources Information Center

    Kulkofsky, Sarah; Wang, Qi; Koh, Jessie Bee Kim

    2009-01-01

    This study examined maternal beliefs about the functions of memory sharing and the relations between these beliefs and mother-child reminiscing behaviors in a cross-cultural context. Sixty-three European American and 47 Chinese mothers completed an open-ended questionnaire concerning their beliefs about the functions of parent-child memory…

  12. Stillbirth and stigma: the spoiling and repair of multiple social identities.

    PubMed

    Brierley-Jones, Lyn; Crawley, Rosalind; Lomax, Samantha; Ayers, Susan

    This study investigated mothers' experiences surrounding stillbirth in the United Kingdom, their memory making and sharing opportunities, and the effect these opportunities had on them. Qualitative data were generated from free text responses to open-ended questions. Thematic content analysis revealed that "stigma" was experienced by most women and Goffman's (1963) work on stigma was subsequently used as an analytical framework. Results suggest that stillbirth can spoil the identities of "patient," "mother," and "full citizen." Stigma was reported as arising from interactions with professionals, family, friends, work colleagues, and even casual acquaintances. Stillbirth produces common learning experiences often requiring "identity work" (Murphy, 2012). Memory making and sharing may be important in this work and further research is needed. Stigma can reduce the memory sharing opportunities for women after stillbirth and this may explain some of the differential mental health effects of memory making after stillbirth that is documented in the literature.

  13. Parallelization of KENO-Va Monte Carlo code

    NASA Astrophysics Data System (ADS)

    Ramón, Javier; Peña, Jorge

    1995-07-01

    KENO-Va is a code integrated within the SCALE system developed by Oak Ridge that solves the transport equation through the Monte Carlo Method. It is being used at the Consejo de Seguridad Nuclear (CSN) to perform criticality calculations for fuel storage pools and shipping casks. Two parallel versions of the code: one for shared memory machines and other for distributed memory systems using the message-passing interface PVM have been generated. In both versions the neutrons of each generation are tracked in parallel. In order to preserve the reproducibility of the results in both versions, advanced seeds for random numbers were used. The CONVEX C3440 with four processors and shared memory at CSN was used to implement the shared memory version. A FDDI network of 6 HP9000/735 was employed to implement the message-passing version using proprietary PVM. The speedup obtained was 3.6 in both cases.

  14. Brain Information Sharing During Visual Short-Term Memory Binding Yields a Memory Biomarker for Familial Alzheimer's Disease.

    PubMed

    Parra, Mario A; Mikulan, Ezequiel; Trujillo, Natalia; Sala, Sergio Della; Lopera, Francisco; Manes, Facundo; Starr, John; Ibanez, Agustin

    2017-01-01

    Alzheimer's disease (AD) as a disconnection syndrome which disrupts both brain information sharing and memory binding functions. The extent to which these two phenotypic expressions share pathophysiological mechanisms remains unknown. To unveil the electrophysiological correlates of integrative memory impairments in AD towards new memory biomarkers for its prodromal stages. Patients with 100% risk of familial AD (FAD) and healthy controls underwent assessment with the Visual Short-Term Memory binding test (VSTMBT) while we recorded their EEG. We applied a novel brain connectivity method (Weighted Symbolic Mutual Information) to EEG data. Patients showed significant deficits during the VSTMBT. A reduction of brain connectivity was observed during resting as well as during correct VSTM binding, particularly over frontal and posterior regions. An increase of connectivity was found during VSTM binding performance over central regions. While decreased connectivity was found in cases in more advanced stages of FAD, increased brain connectivity appeared in cases in earlier stages. Such altered patterns of task-related connectivity were found in 89% of the assessed patients. VSTM binding in the prodromal stages of FAD are associated to altered patterns of brain connectivity thus confirming the link between integrative memory deficits and impaired brain information sharing in prodromal FAD. While significant loss of brain connectivity seems to be a feature of the advanced stages of FAD increased brain connectivity characterizes its earlier stages. These findings are discussed in the light of recent proposals about the earliest pathophysiological mechanisms of AD and their clinical expression. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  15. Decomposing the relationship between cognitive functioning and self-referent memory beliefs in older adulthood: What’s memory got to do with it?

    PubMed Central

    Payne, Brennan R.; Gross, Alden L.; Hill, Patrick L.; Parisi, Jeanine M.; Rebok, George W.; Stine-Morrow, Elizabeth A. L.

    2018-01-01

    With advancing age, episodic memory performance shows marked declines along with concurrent reports of lower subjective memory beliefs. Given that normative age-related declines in episodic memory co-occur with declines in other cognitive domains, we examined the relationship between memory beliefs and multiple domains of cognitive functioning. Confirmatory bi-factor structural equation models were used to parse the shared and independent variance among factors representing episodic memory, psychomotor speed, and executive reasoning in one large cohort study (Senior Odyssey, N = 462), and replicated using another large cohort of healthy older adults (ACTIVE, N = 2,802). Accounting for a general fluid cognitive functioning factor (comprised of the shared variance among measures of episodic memory, speed, and reasoning) attenuated the relationship between objective memory performance and subjective memory beliefs in both samples. Moreover, the general cognitive functioning factor was the strongest predictor of memory beliefs in both samples. These findings are consistent with the notion that dispositional memory beliefs may reflect perceptions of cognition more broadly. This may be one reason why memory beliefs have broad predictive validity for interventions that target fluid cognitive ability. PMID:27685541

  16. Decomposing the relationship between cognitive functioning and self-referent memory beliefs in older adulthood: what's memory got to do with it?

    PubMed

    Payne, Brennan R; Gross, Alden L; Hill, Patrick L; Parisi, Jeanine M; Rebok, George W; Stine-Morrow, Elizabeth A L

    2017-07-01

    With advancing age, episodic memory performance shows marked declines along with concurrent reports of lower subjective memory beliefs. Given that normative age-related declines in episodic memory co-occur with declines in other cognitive domains, we examined the relationship between memory beliefs and multiple domains of cognitive functioning. Confirmatory bi-factor structural equation models were used to parse the shared and independent variance among factors representing episodic memory, psychomotor speed, and executive reasoning in one large cohort study (Senior Odyssey, N = 462), and replicated using another large cohort of healthy older adults (ACTIVE, N = 2802). Accounting for a general fluid cognitive functioning factor (comprised of the shared variance among measures of episodic memory, speed, and reasoning) attenuated the relationship between objective memory performance and subjective memory beliefs in both samples. Moreover, the general cognitive functioning factor was the strongest predictor of memory beliefs in both samples. These findings are consistent with the notion that dispositional memory beliefs may reflect perceptions of cognition more broadly. This may be one reason why memory beliefs have broad predictive validity for interventions that target fluid cognitive ability.

  17. Method for prefetching non-contiguous data structures

    DOEpatents

    Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Brewster, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

    2009-05-05

    A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.

  18. Spaceborne Processor Array

    NASA Technical Reports Server (NTRS)

    Chow, Edward T.; Schatzel, Donald V.; Whitaker, William D.; Sterling, Thomas

    2008-01-01

    A Spaceborne Processor Array in Multifunctional Structure (SPAMS) can lower the total mass of the electronic and structural overhead of spacecraft, resulting in reduced launch costs, while increasing the science return through dynamic onboard computing. SPAMS integrates the multifunctional structure (MFS) and the Gilgamesh Memory, Intelligence, and Network Device (MIND) multi-core in-memory computer architecture into a single-system super-architecture. This transforms every inch of a spacecraft into a sharable, interconnected, smart computing element to increase computing performance while simultaneously reducing mass. The MIND in-memory architecture provides a foundation for high-performance, low-power, and fault-tolerant computing. The MIND chip has an internal structure that includes memory, processing, and communication functionality. The Gilgamesh is a scalable system comprising multiple MIND chips interconnected to operate as a single, tightly coupled, parallel computer. The array of MIND components shares a global, virtual name space for program variables and tasks that are allocated at run time to the distributed physical memory and processing resources. Individual processor- memory nodes can be activated or powered down at run time to provide active power management and to configure around faults. A SPAMS system is comprised of a distributed Gilgamesh array built into MFS, interfaces into instrument and communication subsystems, a mass storage interface, and a radiation-hardened flight computer.

  19. Improving the performance of heterogeneous multi-core processors by modifying the cache coherence protocol

    NASA Astrophysics Data System (ADS)

    Fang, Juan; Hao, Xiaoting; Fan, Qingwen; Chang, Zeqing; Song, Shuying

    2017-05-01

    In the Heterogeneous multi-core architecture, CPU and GPU processor are integrated on the same chip, which poses a new challenge to the last-level cache management. In this architecture, the CPU application and the GPU application execute concurrently, accessing the last-level cache. CPU and GPU have different memory access characteristics, so that they have differences in the sensitivity of last-level cache (LLC) capacity. For many CPU applications, a reduced share of the LLC could lead to significant performance degradation. On the contrary, GPU applications can tolerate increase in memory access latency when there is sufficient thread-level parallelism. Taking into account the GPU program memory latency tolerance characteristics, this paper presents a method that let GPU applications can access to memory directly, leaving lots of LLC space for CPU applications, in improving the performance of CPU applications and does not affect the performance of GPU applications. When the CPU application is cache sensitive, and the GPU application is insensitive to the cache, the overall performance of the system is improved significantly.

  20. Integrating Cache Performance Modeling and Tuning Support in Parallelization Tools

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

    1998-01-01

    With the resurgence of distributed shared memory (DSM) systems based on cache-coherent Non Uniform Memory Access (ccNUMA) architectures and increasing disparity between memory and processors speeds, data locality overheads are becoming the greatest bottlenecks in the way of realizing potential high performance of these systems. While parallelization tools and compilers facilitate the users in porting their sequential applications to a DSM system, a lot of time and effort is needed to tune the memory performance of these applications to achieve reasonable speedup. In this paper, we show that integrating cache performance modeling and tuning support within a parallelization environment can alleviate this problem. The Cache Performance Modeling and Prediction Tool (CPMP), employs trace-driven simulation techniques without the overhead of generating and managing detailed address traces. CPMP predicts the cache performance impact of source code level "what-if" modifications in a program to assist a user in the tuning process. CPMP is built on top of a customized version of the Computer Aided Parallelization Tools (CAPTools) environment. Finally, we demonstrate how CPMP can be applied to tune a real Computational Fluid Dynamics (CFD) application.

  1. A shared resource between declarative memory and motor memory.

    PubMed

    Keisler, Aysha; Shadmehr, Reza

    2010-11-03

    The neural systems that support motor adaptation in humans are thought to be distinct from those that support the declarative system. Yet, during motor adaptation changes in motor commands are supported by a fast adaptive process that has important properties (rapid learning, fast decay) that are usually associated with the declarative system. The fast process can be contrasted to a slow adaptive process that also supports motor memory, but learns gradually and shows resistance to forgetting. Here we show that after people stop performing a motor task, the fast motor memory can be disrupted by a task that engages declarative memory, but the slow motor memory is immune from this interference. Furthermore, we find that the fast/declarative component plays a major role in the consolidation of the slow motor memory. Because of the competitive nature of declarative and nondeclarative memory during consolidation, impairment of the fast/declarative component leads to improvements in the slow/nondeclarative component. Therefore, the fast process that supports formation of motor memory is not only neurally distinct from the slow process, but it shares critical resources with the declarative memory system.

  2. A shared resource between declarative memory and motor memory

    PubMed Central

    Keisler, Aysha; Shadmehr, Reza

    2010-01-01

    The neural systems that support motor adaptation in humans are thought to be distinct from those that support the declarative system. Yet, during motor adaptation changes in motor commands are supported by a fast adaptive process that has important properties (rapid learning, fast decay) that are usually associated with the declarative system. The fast process can be contrasted to a slow adaptive process that also supports motor memory, but learns gradually and shows resistance to forgetting. Here we show that after people stop performing a motor task, the fast motor memory can be disrupted by a task that engages declarative memory, but the slow motor memory is immune from this interference. Furthermore, we find that the fast/declarative component plays a major role in the consolidation of the slow motor memory. Because of the competitive nature of declarative and non-declarative memory during consolidation, impairment of the fast/declarative component leads to improvements in the slow/non-declarative component. Therefore, the fast process that supports formation of motor memory is not only neurally distinct from the slow process, but it shares critical resources with the declarative memory system. PMID:21048140

  3. Optimizing ROOT’s Performance Using C++ Modules

    NASA Astrophysics Data System (ADS)

    Vassilev, Vassil

    2017-10-01

    ROOT comes with a C++ compliant interpreter cling. Cling needs to understand the content of the libraries in order to interact with them. Exposing the full shared library descriptors to the interpreter at runtime translates into increased memory footprint. ROOT’s exploratory programming concepts allow implicit and explicit runtime shared library loading. It requires the interpreter to load the library descriptor. Re-parsing of descriptors’ content has a noticeable effect on the runtime performance. Present state-of-art lazy parsing technique brings the runtime performance to reasonable levels but proves to be fragile and can introduce correctness issues. An elegant solution is to load information from the descriptor lazily and in a non-recursive way. The LLVM community advances its C++ Modules technology providing an io-efficient, on-disk representation capable to reduce build times and peak memory usage. The feature is standardized as a C++ technical specification. C++ Modules are a flexible concept, which can be employed to match CMS and other experiments’ requirement for ROOT: to optimize both runtime memory usage and performance. Cling technically “inherits” the feature, however tweaking it to ROOT scale and beyond is a complex endeavor. The paper discusses the status of the C++ Modules in the context of ROOT, supported by few preliminary performance results. It shows a step-by-step migration plan and describes potential challenges which could appear.

  4. VIRTUAL FRAME BUFFER INTERFACE

    NASA Technical Reports Server (NTRS)

    Wolfe, T. L.

    1994-01-01

    Large image processing systems use multiple frame buffers with differing architectures and vendor supplied user interfaces. This variety of architectures and interfaces creates software development, maintenance, and portability problems for application programs. The Virtual Frame Buffer Interface program makes all frame buffers appear as a generic frame buffer with a specified set of characteristics, allowing programmers to write code which will run unmodified on all supported hardware. The Virtual Frame Buffer Interface converts generic commands to actual device commands. The virtual frame buffer consists of a definition of capabilities and FORTRAN subroutines that are called by application programs. The virtual frame buffer routines may be treated as subroutines, logical functions, or integer functions by the application program. Routines are included that allocate and manage hardware resources such as frame buffers, monitors, video switches, trackballs, tablets and joysticks; access image memory planes; and perform alphanumeric font or text generation. The subroutines for the various "real" frame buffers are in separate VAX/VMS shared libraries allowing modification, correction or enhancement of the virtual interface without affecting application programs. The Virtual Frame Buffer Interface program was developed in FORTRAN 77 for a DEC VAX 11/780 or a DEC VAX 11/750 under VMS 4.X. It supports ADAGE IK3000, DEANZA IP8500, Low Resolution RAMTEK 9460, and High Resolution RAMTEK 9460 Frame Buffers. It has a central memory requirement of approximately 150K. This program was developed in 1985.

  5. Discrete-Slots Models of Visual Working-Memory Response Times

    PubMed Central

    Donkin, Christopher; Nosofsky, Robert M.; Gold, Jason M.; Shiffrin, Richard M.

    2014-01-01

    Much recent research has aimed to establish whether visual working memory (WM) is better characterized by a limited number of discrete all-or-none slots or by a continuous sharing of memory resources. To date, however, researchers have not considered the response-time (RT) predictions of discrete-slots versus shared-resources models. To complement the past research in this field, we formalize a family of mixed-state, discrete-slots models for explaining choice and RTs in tasks of visual WM change detection. In the tasks under investigation, a small set of visual items is presented, followed by a test item in 1 of the studied positions for which a change judgment must be made. According to the models, if the studied item in that position is retained in 1 of the discrete slots, then a memory-based evidence-accumulation process determines the choice and the RT; if the studied item in that position is missing, then a guessing-based accumulation process operates. Observed RT distributions are therefore theorized to arise as probabilistic mixtures of the memory-based and guessing distributions. We formalize an analogous set of continuous shared-resources models. The model classes are tested on individual subjects with both qualitative contrasts and quantitative fits to RT-distribution data. The discrete-slots models provide much better qualitative and quantitative accounts of the RT and choice data than do the shared-resources models, although there is some evidence for “slots plus resources” when memory set size is very small. PMID:24015956

  6. Multiprogramming performance degradation - Case study on a shared memory multiprocessor

    NASA Technical Reports Server (NTRS)

    Dimpsey, R. T.; Iyer, R. K.

    1989-01-01

    The performance degradation due to multiprogramming overhead is quantified for a parallel-processing machine. Measurements of real workloads were taken, and it was found that there is a moderate correlation between the completion time of a program and the amount of system overhead measured during program execution. Experiments in controlled environments were then conducted to calculate a lower bound on the performance degradation of parallel jobs caused by multiprogramming overhead. The results show that the multiprogramming overhead of parallel jobs consumes at least 4 percent of the processor time. When two or more serial jobs are introduced into the system, this amount increases to 5.3 percent

  7. Parallel computation with the force

    NASA Technical Reports Server (NTRS)

    Jordan, H. F.

    1985-01-01

    A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.

  8. Shared Representations in Language Processing and Verbal Short-Term Memory: The Case of Grammatical Gender

    ERIC Educational Resources Information Center

    Schweppe, Judith; Rummer, Ralf

    2007-01-01

    The general idea of language-based accounts of short-term memory is that retention of linguistic materials is based on representations within the language processing system. In the present sentence recall study, we address the question whether the assumption of shared representations holds for morphosyntactic information (here: grammatical gender…

  9. The Precategorical Nature of Visual Short-Term Memory

    ERIC Educational Resources Information Center

    Quinlan, Philip T.; Cohen, Dale J.

    2016-01-01

    We conducted a series of recognition experiments that assessed whether visual short-term memory (VSTM) is sensitive to shared category membership of to-be-remembered (tbr) images of common objects. In Experiment 1 some of the tbr items shared the same basic level category (e.g., hand axe): Such items were no better retained than others. In the…

  10. Fault tolerant onboard packet switch architecture for communication satellites: Shared memory per beam approach

    NASA Technical Reports Server (NTRS)

    Shalkhauser, Mary JO; Quintana, Jorge A.; Soni, Nitin J.

    1994-01-01

    The NASA Lewis Research Center is developing a multichannel communication signal processing satellite (MCSPS) system which will provide low data rate, direct to user, commercial communications services. The focus of current space segment developments is a flexible, high-throughput, fault tolerant onboard information switching processor. This information switching processor (ISP) is a destination-directed packet switch which performs both space and time switching to route user information among numerous user ground terminals. Through both industry study contracts and in-house investigations, several packet switching architectures were examined. A contention-free approach, the shared memory per beam architecture, was selected for implementation. The shared memory per beam architecture, fault tolerance insertion, implementation, and demonstration plans are described.

  11. The performance of disk arrays in shared-memory database machines

    NASA Technical Reports Server (NTRS)

    Katz, Randy H.; Hong, Wei

    1993-01-01

    In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metric data temperature as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley.

  12. Internet Technology in Magnetic Resonance: A Common Gateway Interface Program for the World-Wide Web NMR Spectrometer

    PubMed

    Buszko; Buszko; Wang

    1998-04-01

    A custom-written Common Gateway Interface (CGI) program for remote control of an NMR spectrometer using a World Wide Web browser has been described. The program, running on a UNIX workstation, uses multiple processes to handle concurrent tasks of interacting with the user and with the spectrometer. The program's parent process communicates with the browser and sends out commands to the spectrometer; the child process is mainly responsible for data acquisition. Communication between the processes is via the shared memory mechanism. The WWW pages that have been developed for the system make use of the frames feature of web browsers. The CGI program provides an intuitive user interface to the NMR spectrometer, making, in effect, a complex system an easy-to-use Web appliance. Copyright 1998 Academic Press.

  13. Optical memories in digital computing

    NASA Technical Reports Server (NTRS)

    Alford, C. O.; Gaylord, T. K.

    1979-01-01

    High capacity optical memories with relatively-high data-transfer rate and multiport simultaneous access capability may serve as basis for new computer architectures. Several computer structures that might profitably use memories are: a) simultaneous record-access system, b) simultaneously-shared memory computer system, and c) parallel digital processing structure.

  14. Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster

    NASA Technical Reports Server (NTRS)

    Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.

  15. Reader set encoding for directory of shared cache memory in multiprocessor system

    DOEpatents

    Ahn, Dnaiel; Ceze, Luis H.; Gara, Alan; Ohmacht, Martin; Xiaotong, Zhuang

    2014-06-10

    In a parallel processing system with speculative execution, conflict checking occurs in a directory lookup of a cache memory that is shared by all processors. In each case, the same physical memory address will map to the same set of that cache, no matter which processor originated that access. The directory includes a dynamic reader set encoding, indicating what speculative threads have read a particular line. This reader set encoding is used in conflict checking. A bitset encoding is used to specify particular threads that have read the line.

  16. Insights on consciousness from taste memory research.

    PubMed

    Gallo, Milagros

    2016-01-01

    Taste research in rodents supports the relevance of memory in order to determine the content of consciousness by modifying both taste perception and later action. Associated with this issue is the fact that taste and visual modalities share anatomical circuits traditionally related to conscious memory. This challenges the view of taste memory as a type of non-declarative unconscious memory.

  17. Investigating Ground Swarm Robotics Using Agent Based Simulation

    DTIC Science & Technology

    2006-12-01

    Incorporation of virtual pheromones as a shared memory map is modeled as an additional capability that is found to enhance the robustness and reliability of the...virtual pheromones as a shared memory map is modeled as an additional capability that is found to enhance the robustness and reliability of the swarm... PHEROMONES .......................................... 42 1. Repel Friends under Inorganic SA.................................................. 45 2. Max

  18. Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Biswas, Rupak

    1999-01-01

    The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.

  19. Centrally managed unified shared virtual address space

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wilkes, John

    Systems, apparatuses, and methods for managing a unified shared virtual address space. A host may execute system software and manage a plurality of nodes coupled to the host. The host may send work tasks to the nodes, and for each node, the host may externally manage the node's view of the system's virtual address space. Each node may have a central processing unit (CPU) style memory management unit (MMU) with an internal translation lookaside buffer (TLB). In one embodiment, the host may be coupled to a given node via an input/output memory management unit (IOMMU) interface, where the IOMMU frontendmore » interface shares the TLB with the given node's MMU. In another embodiment, the host may control the given node's view of virtual address space via memory-mapped control registers.« less

  20. Attention and Visuospatial Working Memory Share the Same Processing Resources

    PubMed Central

    Feng, Jing; Pratt, Jay; Spence, Ian

    2012-01-01

    Attention and visuospatial working memory (VWM) share very similar characteristics; both have the same upper bound of about four items in capacity and they recruit overlapping brain regions. We examined whether both attention and VWM share the same processing resources using a novel dual-task costs approach based on a load-varying dual-task technique. With sufficiently large loads on attention and VWM, considerable interference between the two processes was observed. A further load increase on either process produced reciprocal increases in interference on both processes, indicating that attention and VWM share common resources. More critically, comparison among four experiments on the reciprocal interference effects, as measured by the dual-task costs, demonstrates no significant contribution from additional processing other than the shared processes. These results support the notion that attention and VWM share the same processing resources. PMID:22529826

  1. Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Secchi, Simone; Tumeo, Antonino; Villa, Oreste

    Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less

  2. System and method for memory allocation in a multiclass memory system

    DOEpatents

    Loh, Gabriel; Meswani, Mitesh; Ignatowski, Michael; Nutter, Mark

    2016-06-28

    A system for memory allocation in a multiclass memory system includes a processor coupleable to a plurality of memories sharing a unified memory address space, and a library store to store a library of software functions. The processor identifies a type of a data structure in response to a memory allocation function call to the library for allocating memory to the data structure. Using the library, the processor allocates portions of the data structure among multiple memories of the multiclass memory system based on the type of the data structure.

  3. A Formal Model of Capacity Limits in Working Memory

    ERIC Educational Resources Information Center

    Oberauer, Klaus; Kliegl, Reinhold

    2006-01-01

    A mathematical model of working-memory capacity limits is proposed on the key assumption of mutual interference between items in working memory. Interference is assumed to arise from overwriting of features shared by these items. The model was fit to time-accuracy data of memory-updating tasks from four experiments using nonlinear mixed effect…

  4. Ordering of guarded and unguarded stores for no-sync I/O

    DOEpatents

    Gara, Alan; Ohmacht, Martin

    2013-06-25

    A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.

  5. LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kurzak, Jakub; Luszczek, Pitior; Faverge, Mathieu

    2012-03-01

    LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance LINPACK benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.

  6. Neural Mechanisms of Interference Control Underlie the Relationship between Fluid Intelligence and Working Memory Span

    ERIC Educational Resources Information Center

    Burgess, Gregory C.; Gray, Jeremy R.; Conway, Andrew R. A.; Braver, Todd S.

    2011-01-01

    Fluid intelligence (gF) and working memory (WM) span predict success in demanding cognitive situations. Recent studies show that much of the variance in gF and WM span is shared, suggesting common neural mechanisms. This study provides a direct investigation of the degree to which shared variance in gF and WM span can be explained by neural…

  7. Sawja: Static Analysis Workshop for Java

    NASA Astrophysics Data System (ADS)

    Hubert, Laurent; Barré, Nicolas; Besson, Frédéric; Demange, Delphine; Jensen, Thomas; Monfort, Vincent; Pichardie, David; Turpin, Tiphaine

    Static analysis is a powerful technique for automatic verification of programs but raises major engineering challenges when developing a full-fledged analyzer for a realistic language such as Java. Efficiency and precision of such a tool rely partly on low level components which only depend on the syntactic structure of the language and therefore should not be redesigned for each implementation of a new static analysis. This paper describes the Sawja library: a static analysis workshop fully compliant with Java 6 which provides OCaml modules for efficiently manipulating Java bytecode programs. We present the main features of the library, including i) efficient functional data-structures for representing a program with implicit sharing and lazy parsing, ii) an intermediate stack-less representation, and iii) fast computation and manipulation of complete programs. We provide experimental evaluations of the different features with respect to time, memory and precision.

  8. Satellite Image Mosaic Engine

    NASA Technical Reports Server (NTRS)

    Plesea, Lucian

    2006-01-01

    A computer program automatically builds large, full-resolution mosaics of multispectral images of Earth landmasses from images acquired by Landsat 7, complete with matching of colors and blending between adjacent scenes. While the code has been used extensively for Landsat, it could also be used for other data sources. A single mosaic of as many as 8,000 scenes, represented by more than 5 terabytes of data and the largest set produced in this work, demonstrated what the code could do to provide global coverage. The program first statistically analyzes input images to determine areas of coverage and data-value distributions. It then transforms the input images from their original universal transverse Mercator coordinates to other geographical coordinates, with scaling. It applies a first-order polynomial brightness correction to each band in each scene. It uses a data-mask image for selecting data and blending of input scenes. Under control by a user, the program can be made to operate on small parts of the output image space, with check-point and restart capabilities. The program runs on SGI IRIX computers. It is capable of parallel processing using shared-memory code, large memories, and tens of central processing units. It can retrieve input data and store output data at locations remote from the processors on which it is executed.

  9. A Direct Experience in a New Accountable Care Organization: Results, Challenges, and the Role of the Neurosurgeon.

    PubMed

    Kim, Dong H; Lloyd, Christopher; Fernandez, Douglas K; Spielman, Amanda; Bradshaw, David

    2017-04-01

    The passage of the Affordable Care Act saw the creation of Accountable Care Organizations (ACOs), a new approach to healthcare delivery moving from fee-for-service toward population health. This paper presents a case study of the Memorial Hermann ACO (MHACO), launched in response to the Medicare Shared Savings Program, with goals to align physician and hospital incentives, practice evidence-based medicine, develop care coordination, and increase efficiency. Building blocks included an affiliated primary care network, a clinical integration program (involving shared electronic medical record platforms and quality data reporting), and significant investments in information technology. Presented is the approach taken to form MHACO; the management structure, technology developed, and a 2-year experience. Incorporated in July 2012, the MHACO involved 22 000 Medicare patients. In 2015, Centers for Medicare and Medicaid Services released data showing a composite quality score between 80 and 85 (from a maximum 100) and nearly $53 million in total savings (or 11% of expected expenditure), making MHACO one of the most successful nationally.1 In fewer than 5 years, almost 500 ACOs have developed, and by some estimates, a quarter of Medicare patients are currently enrolled in an ACO. Although ACOs to date have focused on primary care, the future will increasingly involve specialists. At Memorial Hermann, neurosurgeons took an early role in forming collaborative partnerships with the hospital, and started programs that served as precursors to the ACO model. This paper ends with an overview of ACO development, likely changes going forward, and a discussion of the role of specialists in general, and of neurosurgeons in particular. Copyright © 2016 by the Congress of Neurological Surgeons.

  10. Checkpointing Shared Memory Programs at the Application-level

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bronevetsky, G; Schulz, M; Szwed, P

    2004-09-08

    Trends in high-performance computing are making it necessary for long-running applications to tolerate hardware faults. The most commonly used approach is checkpoint and restart(CPR)-the state of the computation is saved periodically on disk, and when a failure occurs, the computation is restarted from the last saved state. At present, it is the responsibility of the programmer to instrument applications for CPR. Our group is investigating the use of compiler technology to instrument codes to make them self-checkpointing and self-restarting, thereby providing an automatic solution to the problem of making long-running scientific applications resilient to hardware faults. Our previous work focusedmore » on message-passing programs. In this paper, we describe such a system for shared-memory programs running on symmetric multiprocessors. The system has two components: (i)a pre-compiler for source-to-source modification of applications, and (ii) a runtime system that implements a protocol for coordinating CPR among the threads of the parallel application. For the sake of concreteness, we focus on a non-trivial subset of OpenMP that includes barriers and locks. One of the advantages of this approach is that the ability to tolerate faults becomes embedded within the application itself, so applications become self-checkpointing and self-restarting on any platform. We demonstrate this by showing that our transformed benchmarks can checkpoint and restart on three different platforms (Windows/x86, Linux/x86, and Tru64/Alpha). Our experiments show that the overhead introduced by this approach is usually quite small; they also suggest ways in which the current implementation can be tuned to reduced overheads further.« less

  11. Audience-tuning effects on memory: the role of shared reality.

    PubMed

    Echterhoff, Gerald; Higgins, E Tory; Groll, Stephan

    2005-09-01

    After tuning to an audience, communicators' own memories for the topic often reflect the biased view expressed in their messages. Three studies examined explanations for this bias. Memories for a target person were biased when feedback signaled the audience's successful identification of the target but not after failed identification (Experiment 1). Whereas communicators tuning to an in-group audience exhibited the bias, communicators tuning to an out-group audience did not (Experiment 2). These differences did not depend on communicators' mood but were mediated by communicators' trust in their audience's judgment about other people (Experiments 2 and 3). Message and memory were more closely associated for high than for low trusters. Apparently, audience-tuning effects depend on the communicators' experience of a shared reality.

  12. Rapid solution of large-scale systems of equations

    NASA Technical Reports Server (NTRS)

    Storaasli, Olaf O.

    1994-01-01

    The analysis and design of complex aerospace structures requires the rapid solution of large systems of linear and nonlinear equations, eigenvalue extraction for buckling, vibration and flutter modes, structural optimization and design sensitivity calculation. Computers with multiple processors and vector capabilities can offer substantial computational advantages over traditional scalar computer for these analyses. These computers fall into two categories: shared memory computers and distributed memory computers. This presentation covers general-purpose, highly efficient algorithms for generation/assembly or element matrices, solution of systems of linear and nonlinear equations, eigenvalue and design sensitivity analysis and optimization. All algorithms are coded in FORTRAN for shared memory computers and many are adapted to distributed memory computers. The capability and numerical performance of these algorithms will be addressed.

  13. Performance evaluation of throughput computing workloads using multi-core processors and graphics processors

    NASA Astrophysics Data System (ADS)

    Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.

    2017-11-01

    Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.

  14. Examining age-related shared variance between face cognition, vision, and self-reported physical health: a test of the common cause hypothesis for social cognition

    PubMed Central

    Olderbak, Sally; Hildebrandt, Andrea; Wilhelm, Oliver

    2015-01-01

    The shared decline in cognitive abilities, sensory functions (e.g., vision and hearing), and physical health with increasing age is well documented with some research attributing this shared age-related decline to a single common cause (e.g., aging brain). We evaluate the extent to which the common cause hypothesis predicts associations between vision and physical health with social cognition abilities specifically face perception and face memory. Based on a sample of 443 adults (17–88 years old), we test a series of structural equation models, including Multiple Indicator Multiple Cause (MIMIC) models, and estimate the extent to which vision and self-reported physical health are related to face perception and face memory through a common factor, before and after controlling for their fluid cognitive component and the linear effects of age. Results suggest significant shared variance amongst these constructs, with a common factor explaining some, but not all, of the shared age-related variance. Also, we found that the relations of face perception, but not face memory, with vision and physical health could be completely explained by fluid cognition. Overall, results suggest that a single common cause explains most, but not all age-related shared variance with domain specific aging mechanisms evident. PMID:26321998

  15. Solutions and debugging for data consistency in multiprocessors with noncoherent caches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bernstein, D.; Mendelson, B.; Breternitz, M. Jr.

    1995-02-01

    We analyze two important problems that arise in shared-memory multiprocessor systems. The stale data problem involves ensuring that data items in local memory of individual processors are current, independent of writes done by other processors. False sharing occurs when two processors have copies of the same shared data block but update different portions of the block. The false sharing problem involves guaranteeing that subsequent writes are properly combined. In modern architectures these problems are usually solved in hardware, by exploiting mechanisms for hardware controlled cache consistency. This leads to more expensive and nonscalable designs. Therefore, we are concentrating on softwaremore » methods for ensuring cache consistency that would allow for affordable and scalable multiprocessing systems. Unfortunately, providing software control is nontrivial, both for the compiler writer and for the application programmer. For this reason we are developing a debugging environment that will facilitate the development of compiler-based techniques and will help the programmer to tune his or her application using explicit cache management mechanisms. We extend the notion of a race condition for IBM Shared Memory System POWER/4, taking into consideration its noncoherent caches, and propose techniques for detection of false sharing problems. Identification of the stale data problem is discussed as well, and solutions are suggested.« less

  16. Static Memory Deduplication for Performance Optimization in Cloud Computing.

    PubMed

    Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan

    2017-04-27

    In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible.

  17. Static Memory Deduplication for Performance Optimization in Cloud Computing

    PubMed Central

    Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan

    2017-01-01

    In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible. PMID:28448434

  18. Advanced Development of Certified OS Kernels

    DTIC Science & Technology

    2015-06-01

    It provides an infrastructure to map a physical page into multiple processes’ page maps in different address spaces. Their ownership mechanism ensures...of their shared memory infrastructure . Trap module The trap module specifies the behaviors of exception handlers and mCertiKOS system calls. In...layers), 1 pm for the shared memory infrastructure (3 layers), 3.5 pm for the thread management (10 layers), 1 pm for the process management (4 layers

  19. 6 DOF Nonlinear AUV Simulation Toolbox

    DTIC Science & Technology

    1997-01-01

    is to supply a flexible 3D -simulation platform for motion visualization, in-lab debugging and testing of mission-specific strategies as well as those...Explorer are modular designed [Smith] in order to cut time and cost for vehicle recontlguration. A flexible 3D -simulation platform is desired to... 3D models. Current implemented modules include a nonlinear dynamic model for the OEX, shared memory and semaphore manager tools, shared memory monitor

  20. A cache-aided multiprocessor rollback recovery scheme

    NASA Technical Reports Server (NTRS)

    Wu, Kun-Lung; Fuchs, W. Kent

    1989-01-01

    This paper demonstrates how previous uniprocessor cache-aided recovery schemes can be applied to multiprocessor architectures, for recovering from transient processor failures, utilizing private caches and a global shared memory. As with cache-aided uniprocessor recovery, the multiprocessor cache-aided recovery scheme of this paper can be easily integrated into standard bus-based snoopy cache coherence protocols. A consistent shared memory state is maintained without the necessity of global check-pointing.

  1. Targeted Memory Reactivation during Sleep Adaptively Promotes the Strengthening or Weakening of Overlapping Memories.

    PubMed

    Oyarzún, Javiera P; Morís, Joaquín; Luque, David; de Diego-Balaguer, Ruth; Fuentemilla, Lluís

    2017-08-09

    System memory consolidation is conceptualized as an active process whereby newly encoded memory representations are strengthened through selective memory reactivation during sleep. However, our learning experience is highly overlapping in content (i.e., shares common elements), and memories of these events are organized in an intricate network of overlapping associated events. It remains to be explored whether and how selective memory reactivation during sleep has an impact on these overlapping memories acquired during awake time. Here, we test in a group of adult women and men the prediction that selective memory reactivation during sleep entails the reactivation of associated events and that this may lead the brain to adaptively regulate whether these associated memories are strengthened or pruned from memory networks on the basis of their relative associative strength with the shared element. Our findings demonstrate the existence of efficient regulatory neural mechanisms governing how complex memory networks are shaped during sleep as a function of their associative memory strength. SIGNIFICANCE STATEMENT Numerous studies have demonstrated that system memory consolidation is an active, selective, and sleep-dependent process in which only subsets of new memories become stabilized through their reactivation. However, the learning experience is highly overlapping in content and thus events are encoded in an intricate network of related memories. It remains to be explored whether and how memory reactivation has an impact on overlapping memories acquired during awake time. Here, we show that sleep memory reactivation promotes strengthening and weakening of overlapping memories based on their associative memory strength. These results suggest the existence of an efficient regulatory neural mechanism that avoids the formation of cluttered memory representation of multiple events and promotes stabilization of complex memory networks. Copyright © 2017 the authors 0270-6474/17/377748-11$15.00/0.

  2. [Artificial intelligence meeting neuropsychology. Semantic memory in normal and pathological aging].

    PubMed

    Aimé, Xavier; Charlet, Jean; Maillet, Didier; Belin, Catherine

    2015-03-01

    Artificial intelligence (IA) is the subject of much research, but also many fantasies. It aims to reproduce human intelligence in its learning capacity, knowledge storage and computation. In 2014, the Defense Advanced Research Projects Agency (DARPA) started the restoring active memory (RAM) program that attempt to develop implantable technology to bridge gaps in the injured brain and restore normal memory function to people with memory loss caused by injury or disease. In another IA's field, computational ontologies (a formal and shared conceptualization) try to model knowledge in order to represent a structured and unambiguous meaning of the concepts of a target domain. The aim of these structures is to ensure a consensual understanding of their meaning and a univariant use (the same concept is used by all to categorize the same individuals). The first representations of knowledge in the AI's domain are largely based on model tests of semantic memory. This one, as a component of long-term memory is the memory of words, ideas, concepts. It is the only declarative memory system that resists so remarkably to the effects of age. In contrast, non-specific cognitive changes may decrease the performance of elderly in various events and instead report difficulties of access to semantic representations that affect the semantics stock itself. Some dementias, like semantic dementia and Alzheimer's disease, are linked to alteration of semantic memory. We propose in this paper, using the computational ontologies model, a formal and relatively thin modeling, in the service of neuropsychology: 1) for the practitioner with decision support systems, 2) for the patient as cognitive prosthesis outsourced, and 3) for the researcher to study semantic memory.

  3. A High Performance VLSI Computer Architecture For Computer Graphics

    NASA Astrophysics Data System (ADS)

    Chin, Chi-Yuan; Lin, Wen-Tai

    1988-10-01

    A VLSI computer architecture, consisting of multiple processors, is presented in this paper to satisfy the modern computer graphics demands, e.g. high resolution, realistic animation, real-time display etc.. All processors share a global memory which are partitioned into multiple banks. Through a crossbar network, data from one memory bank can be broadcasted to many processors. Processors are physically interconnected through a hyper-crossbar network (a crossbar-like network). By programming the network, the topology of communication links among processors can be reconfigurated to satisfy specific dataflows of different applications. Each processor consists of a controller, arithmetic operators, local memory, a local crossbar network, and I/O ports to communicate with other processors, memory banks, and a system controller. Operations in each processor are characterized into two modes, i.e. object domain and space domain, to fully utilize the data-independency characteristics of graphics processing. Special graphics features such as 3D-to-2D conversion, shadow generation, texturing, and reflection, can be easily handled. With the current high density interconnection (MI) technology, it is feasible to implement a 64-processor system to achieve 2.5 billion operations per second, a performance needed in most advanced graphics applications.

  4. Autobiographical memory functions of nostalgia in comparison to rumination and counterfactual thinking: similarity and uniqueness.

    PubMed

    Cheung, Wing-Yee; Wildschut, Tim; Sedikides, Constantine

    2018-02-01

    We compared and contrasted nostalgia with rumination and counterfactual thinking in terms of their autobiographical memory functions. Specifically, we assessed individual differences in nostalgia, rumination, and counterfactual thinking, which we then linked to self-reported functions or uses of autobiographical memory (Self-Regard, Boredom Reduction, Death Preparation, Intimacy Maintenance, Conversation, Teach/Inform, and Bitterness Revival). We tested which memory functions are shared and which are uniquely linked to nostalgia. The commonality among nostalgia, rumination, and counterfactual thinking resides in their shared positive associations with all memory functions: individuals who evinced a stronger propensity towards past-oriented thought (as manifested in nostalgia, rumination, and counterfactual thinking) reported greater overall recruitment of memories in the service of present functioning. The uniqueness of nostalgia resides in its comparatively strong positive associations with Intimacy Maintenance, Teach/Inform, and Self-Regard and weak association with Bitterness Revival. In all, nostalgia possesses a more positive functional signature than do rumination and counterfactual thinking.

  5. Mnemonic convergence in social networks: The emergent properties of cognition at a collective level.

    PubMed

    Coman, Alin; Momennejad, Ida; Drach, Rae D; Geana, Andra

    2016-07-19

    The development of shared memories, beliefs, and norms is a fundamental characteristic of human communities. These emergent outcomes are thought to occur owing to a dynamic system of information sharing and memory updating, which fundamentally depends on communication. Here we report results on the formation of collective memories in laboratory-created communities. We manipulated conversational network structure in a series of real-time, computer-mediated interactions in fourteen 10-member communities. The results show that mnemonic convergence, measured as the degree of overlap among community members' memories, is influenced by both individual-level information-processing phenomena and by the conversational social network structure created during conversational recall. By studying laboratory-created social networks, we show how large-scale social phenomena (i.e., collective memory) can emerge out of microlevel local dynamics (i.e., mnemonic reinforcement and suppression effects). The social-interactionist approach proposed herein points to optimal strategies for spreading information in social networks and provides a framework for measuring and forging collective memories in communities of individuals.

  6. Using memories to understand others: the role of episodic memory in theory of mind impairment in Alzheimer disease.

    PubMed

    Moreau, Noémie; Viallet, François; Champagne-Lavau, Maud

    2013-09-01

    Theory of mind (TOM) refers to the ability to infer one's own and other's mental states. Growing evidence highlighted the presence of impairment on the most complex TOM tasks in Alzheimer disease (AD). However, how TOM deficit is related to other cognitive dysfunctions and more specifically to episodic memory impairment - the prominent feature of this disease - is still under debate. Recent neuroanatomical findings have shown that remembering past events and inferring others' states of mind share the same cerebral network suggesting the two abilities share a common process .This paper proposes to review emergent evidence of TOM impairment in AD patients and to discuss the evidence of a relationship between TOM and episodic memory. We will discuss about AD patients' deficit in TOM being possibly related to their difficulties in recollecting memories of past social interactions. Copyright © 2013 Elsevier B.V. All rights reserved.

  7. Mental time travel and the shaping of the human mind

    PubMed Central

    Suddendorf, Thomas; Addis, Donna Rose; Corballis, Michael C.

    2009-01-01

    Episodic memory, enabling conscious recollection of past episodes, can be distinguished from semantic memory, which stores enduring facts about the world. Episodic memory shares a core neural network with the simulation of future episodes, enabling mental time travel into both the past and the future. The notion that there might be something distinctly human about mental time travel has provoked ingenious attempts to demonstrate episodic memory or future simulation in non-human animals, but we argue that they have not yet established a capacity comparable to the human faculty. The evolution of the capacity to simulate possible future events, based on episodic memory, enhanced fitness by enabling action in preparation of different possible scenarios that increased present or future survival and reproduction chances. Human language may have evolved in the first instance for the sharing of past and planned future events, and, indeed, fictional ones, further enhancing fitness in social settings. PMID:19528013

  8. Shared prefetching to reduce execution skew in multi-threaded systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eichenberger, Alexandre E; Gunnels, John A

    Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated basedmore » on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.« less

  9. Cricket: A Mapped, Persistent Object Store

    NASA Technical Reports Server (NTRS)

    Shekita, Eugene; Zwilling, Michael

    1996-01-01

    This paper describes Cricket, a new database storage system that is intended to be used as a platform for design environments and persistent programming languages. Cricket uses the memory management primitives of the Mach operating system to provide the abstraction of a shared, transactional single-level store that can be directly accessed by user applications. In this paper, we present the design and motivation for Cricket. We also present some initial performance results which show that, for its intended applications, Cricket can provide better performance than a general-purpose database storage system.

  10. A shared neural ensemble links distinct contextual memories encoded close in time

    NASA Astrophysics Data System (ADS)

    Cai, Denise J.; Aharoni, Daniel; Shuman, Tristan; Shobe, Justin; Biane, Jeremy; Song, Weilin; Wei, Brandon; Veshkini, Michael; La-Vu, Mimi; Lou, Jerry; Flores, Sergio E.; Kim, Isaac; Sano, Yoshitake; Zhou, Miou; Baumgaertel, Karsten; Lavi, Ayal; Kamata, Masakazu; Tuszynski, Mark; Mayford, Mark; Golshani, Peyman; Silva, Alcino J.

    2016-06-01

    Recent studies suggest that a shared neural ensemble may link distinct memories encoded close in time. According to the memory allocation hypothesis, learning triggers a temporary increase in neuronal excitability that biases the representation of a subsequent memory to the neuronal ensemble encoding the first memory, such that recall of one memory increases the likelihood of recalling the other memory. Here we show in mice that the overlap between the hippocampal CA1 ensembles activated by two distinct contexts acquired within a day is higher than when they are separated by a week. Several findings indicate that this overlap of neuronal ensembles links two contextual memories. First, fear paired with one context is transferred to a neutral context when the two contexts are acquired within a day but not across a week. Second, the first memory strengthens the second memory within a day but not across a week. Older mice, known to have lower CA1 excitability, do not show the overlap between ensembles, the transfer of fear between contexts, or the strengthening of the second memory. Finally, in aged mice, increasing cellular excitability and activating a common ensemble of CA1 neurons during two distinct context exposures rescued the deficit in linking memories. Taken together, these findings demonstrate that contextual memories encoded close in time are linked by directing storage into overlapping ensembles. Alteration of these processes by ageing could affect the temporal structure of memories, thus impairing efficient recall of related information.

  11. Factor structure of overall autobiographical memory usage: the directive, self and social functions revisited.

    PubMed

    Rasmussen, Anne S; Habermas, Tilmann

    2011-08-01

    According to theory, autobiographical memory serves three broad functions of overall usage: directive, self, and social. However, there is evidence to suggest that the tripartite model may be better conceptualised in terms of a four-factor model with two social functions. In the present study we examined the two models in Danish and German samples, using the Thinking About Life Experiences Questionnaire (TALE; Bluck, Alea, Habermas, & Rubin, 2005), which measures the overall usage of the three functions generalised across concrete memories. Confirmatory factor analysis supported the four-factor model and rejected the theoretical three-factor model in both samples. The results are discussed in relation to cultural differences in overall autobiographical memory usage as well as sharing versus non-sharing aspects of social remembering.

  12. Parallel performance investigations of an unstructured mesh Navier-Stokes solver

    NASA Technical Reports Server (NTRS)

    Mavriplis, Dimitri J.

    2000-01-01

    A Reynolds-averaged Navier-Stokes solver based on unstructured mesh techniques for analysis of high-lift configurations is described. The method makes use of an agglomeration multigrid solver for convergence acceleration. Implicit line-smoothing is employed to relieve the stiffness associated with highly stretched meshes. A GMRES technique is also implemented to speed convergence at the expense of additional memory usage. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Convergence and scalability results are illustrated for various high-lift cases.

  13. Experimental evaluation of multiprocessor cache-based error recovery

    NASA Technical Reports Server (NTRS)

    Janssens, Bob; Fuchs, W. K.

    1991-01-01

    Several variations of cache-based checkpointing for rollback error recovery in shared-memory multiprocessors have been recently developed. By modifying the cache replacement policy, these techniques use the inherent redundancy in the memory hierarchy to periodically checkpoint the computation state. Three schemes, different in the manner in which they avoid rollback propagation, are evaluated. By simulation with address traces from parallel applications running on an Encore Multimax shared-memory multiprocessor, the performance effect of integrating the recovery schemes in the cache coherence protocol are evaluated. The results indicate that the cache-based schemes can provide checkpointing capability with low performance overhead but uncontrollable high variability in the checkpoint interval.

  14. Performing a local reduction operation on a parallel computer

    DOEpatents

    Blocksome, Michael A; Faraj, Daniel A

    2013-06-04

    A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.

  15. Performing a local reduction operation on a parallel computer

    DOEpatents

    Blocksome, Michael A.; Faraj, Daniel A.

    2012-12-11

    A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.

  16. Memory loss

    MedlinePlus

    ... this page: //medlineplus.gov/ency/article/003257.htm Memory loss To use the sharing features on this ... Bethesda, MD 20894 U.S. Department of Health and Human Services National Institutes of Health Page last updated: ...

  17. RACER: Effective Race Detection Using AspectJ

    NASA Technical Reports Server (NTRS)

    Bodden, Eric; Havelund, Klaus

    2008-01-01

    The limits of coding with joint constraints on detected and undetected error rates Programming errors occur frequently in large software systems, and even more so if these systems are concurrent. In the past, researchers have developed specialized programs to aid programmers detecting concurrent programming errors such as deadlocks, livelocks, starvation and data races. In this work we propose a language extension to the aspect-oriented programming language AspectJ, in the form of three new built-in pointcuts, lock(), unlock() and may be Shared(), which allow programmers to monitor program events where locks are granted or handed back, and where values are accessed that may be shared amongst multiple Java threads. We decide thread-locality using a static thread-local objects analysis developed by others. Using the three new primitive pointcuts, researchers can directly implement efficient monitoring algorithms to detect concurrent programming errors online. As an example, we expose a new algorithm which we call RACER, an adoption of the well-known ERASER algorithm to the memory model of Java. We implemented the new pointcuts as an extension to the Aspect Bench Compiler, implemented the RACER algorithm using this language extension and then applied the algorithm to the NASA K9 Rover Executive. Our experiments proved our implementation very effective. In the Rover Executive RACER finds 70 data races. Only one of these races was previously known.We further applied the algorithm to two other multi-threaded programs written by Computer Science researchers, in which we found races as well.

  18. We Remember, We Forget: Collaborative Remembering in Older Couples

    ERIC Educational Resources Information Center

    Harris, Celia B.; Keil, Paul G.; Sutton, John; Barnier, Amanda J.; McIlwain, Doris J. F.

    2011-01-01

    Transactive memory theory describes the processes by which benefits for memory can occur when remembering is shared in dyads or groups. In contrast, cognitive psychology experiments demonstrate that social influences on memory disrupt and inhibit individual recall. However, most research in cognitive psychology has focused on groups of strangers…

  19. 76 FR 12821 - 150th Anniversary of the Inauguration of Abraham Lincoln

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-03-09

    ... together by shared memories and common hopes. As we observe the 150th anniversary of his Inauguration, we... his memory enabled America to move beyond a young collection of States to become a free and unified... memory and uphold the principles he so nobly advanced. [[Page 12822

  20. Expert Systems on Multiprocessor Architectures. Volume 2. Technical Reports

    DTIC Science & Technology

    1991-06-01

    Report RC 12936 (#58037). IBM T. J. Wartson Reiearch Center. July 1987. � Alan Jay Smith. Cache memories. Coniputing Sitrry., 1.1(3): I.3-5:30...basic-shared is an instrument for ashared memory design. The components panels are processor- qload-scrolling-bar-panel, memory-qload-scrolling-bar-panel

  1. Blanket Gate Would Address Blocks Of Memory

    NASA Technical Reports Server (NTRS)

    Lambe, John; Moopenn, Alexander; Thakoor, Anilkumar P.

    1988-01-01

    Circuit-chip area used more efficiently. Proposed gate structure selectively allows and restricts access to blocks of memory in electronic neural-type network. By breaking memory into independent blocks, gate greatly simplifies problem of reading from and writing to memory. Since blocks not used simultaneously, share operational amplifiers that prompt and read information stored in memory cells. Fewer operational amplifiers needed, and chip area occupied reduced correspondingly. Cost per bit drops as result.

  2. The potential of multi-port optical memories in digital computing

    NASA Technical Reports Server (NTRS)

    Alford, C. O.; Gaylord, T. K.

    1975-01-01

    A high-capacity memory with a relatively high data transfer rate and multi-port simultaneous access capability may serve as the basis for new computer architectures. The implementation of a multi-port optical memory is discussed. Several computer structures are presented that might profitably use such a memory. These structures include (1) a simultaneous record access system, (2) a simultaneously shared memory computer system, and (3) a parallel digital processing structure.

  3. Facilitating change in health-related behaviors and intentions: a randomized controlled trial of a multidimensional memory program for older adults.

    PubMed

    Wiegand, Melanie A; Troyer, Angela K; Gojmerac, Christina; Murphy, Kelly J

    2013-01-01

    Many older adults are concerned about memory changes with age and consequently seek ways to optimize their memory function. Memory programs are known to be variably effective in improving memory knowledge, other aspects of metamemory, and/or objective memory, but little is known about their impact on implementing and sustaining lifestyle and healthcare-seeking intentions and behaviors. We evaluated a multidimensional, evidence-based intervention, the Memory and Aging Program, that provides education about memory and memory change, training in the use of practical memory strategies, and support for implementation of healthy lifestyle behavior changes. In a randomized controlled trial, 42 healthy older adults participated in a program (n = 21) or a waitlist control (n = 21) group. Relative to the control group, participants in the program implemented more healthy lifestyle behaviors by the end of the program and maintained these changes 1 month later. Similarly, program participants reported a decreased intention to seek unnecessary medical attention for their memory immediately after the program and 1 month later. Findings support the use of multidimensional memory programs to promote healthy lifestyles and influence healthcare-seeking behaviors. Discussion focuses on implications of these changes for maximizing cognitive health and minimizing impact on healthcare resources.

  4. Methodology for fast detection of false sharing in threaded scientific codes

    DOEpatents

    Chung, I-Hsin; Cong, Guojing; Murata, Hiroki; Negishi, Yasushi; Wen, Hui-Fang

    2014-11-25

    A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.

  5. Performance and scalability of Fourier domain optical coherence tomography acceleration using graphics processing units.

    PubMed

    Li, Jian; Bloch, Pavel; Xu, Jing; Sarunic, Marinko V; Shannon, Lesley

    2011-05-01

    Fourier domain optical coherence tomography (FD-OCT) provides faster line rates, better resolution, and higher sensitivity for noninvasive, in vivo biomedical imaging compared to traditional time domain OCT (TD-OCT). However, because the signal processing for FD-OCT is computationally intensive, real-time FD-OCT applications demand powerful computing platforms to deliver acceptable performance. Graphics processing units (GPUs) have been used as coprocessors to accelerate FD-OCT by leveraging their relatively simple programming model to exploit thread-level parallelism. Unfortunately, GPUs do not "share" memory with their host processors, requiring additional data transfers between the GPU and CPU. In this paper, we implement a complete FD-OCT accelerator on a consumer grade GPU/CPU platform. Our data acquisition system uses spectrometer-based detection and a dual-arm interferometer topology with numerical dispersion compensation for retinal imaging. We demonstrate that the maximum line rate is dictated by the memory transfer time and not the processing time due to the GPU platform's memory model. Finally, we discuss how the performance trends of GPU-based accelerators compare to the expected future requirements of FD-OCT data rates.

  6. Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel

    Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less

  7. Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends

    DOE PAGES

    Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel; ...

    2017-03-08

    Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less

  8. An Adaptive Insertion and Promotion Policy for Partitioned Shared Caches

    NASA Astrophysics Data System (ADS)

    Mahrom, Norfadila; Liebelt, Michael; Raof, Rafikha Aliana A.; Daud, Shuhaizar; Hafizah Ghazali, Nur

    2018-03-01

    Cache replacement policies in chip multiprocessors (CMP) have been investigated extensively and proven able to enhance shared cache management. However, competition among multiple processors executing different threads that require simultaneous access to a shared memory may cause cache contention and memory coherence problems on the chip. These issues also exist due to some drawbacks of the commonly used Least Recently Used (LRU) policy employed in multiprocessor systems, which are because of the cache lines residing in the cache longer than required. In image processing analysis of for example extra pulmonary tuberculosis (TB), an accurate diagnosis for tissue specimen is required. Therefore, a fast and reliable shared memory management system to execute algorithms for processing vast amount of specimen image is needed. In this paper, the effects of the cache replacement policy in a partitioned shared cache are investigated. The goal is to quantify whether better performance can be achieved by using less complex replacement strategies. This paper proposes a Middle Insertion 2 Positions Promotion (MI2PP) policy to eliminate cache misses that could adversely affect the access patterns and the throughput of the processors in the system. The policy employs a static predefined insertion point, near distance promotion, and the concept of ownership in the eviction policy to effectively improve cache thrashing and to avoid resource stealing among the processors.

  9. Hierarchical Traces for Reduced NSM Memory Requirements

    NASA Astrophysics Data System (ADS)

    Dahl, Torbjørn S.

    This paper presents work on using hierarchical long term memory to reduce the memory requirements of nearest sequence memory (NSM) learning, a previously published, instance-based reinforcement learning algorithm. A hierarchical memory representation reduces the memory requirements by allowing traces to share common sub-sequences. We present moderated mechanisms for estimating discounted future rewards and for dealing with hidden state using hierarchical memory. We also present an experimental analysis of how the sub-sequence length affects the memory compression achieved and show that the reduced memory requirements do not effect the speed of learning. Finally, we analyse and discuss the persistence of the sub-sequences independent of specific trace instances.

  10. The Contribution of Working Memory to Fluid Reasoning: Capacity, Control, or Both?

    ERIC Educational Resources Information Center

    Chuderski, Adam; Necka, Edward

    2012-01-01

    Fluid reasoning shares a large part of its variance with working memory capacity (WMC). The literature on working memory (WM) suggests that the capacity of the focus of attention responsible for simultaneous maintenance and integration of information within WM, as well as the effectiveness of executive control exerted over WM, determines…

  11. Feature-Based Memory-Driven Attentional Capture: Visual Working Memory Content Affects Visual Attention

    ERIC Educational Resources Information Center

    Olivers, Christian N. L.; Meijer, Frank; Theeuwes, Jan

    2006-01-01

    In 7 experiments, the authors explored whether visual attention (the ability to select relevant visual information) and visual working memory (the ability to retain relevant visual information) share the same content representations. The presence of singleton distractors interfered more strongly with a visual search task when it was accompanied by…

  12. Time-Related Decay or Interference-Based Forgetting in Working Memory?

    ERIC Educational Resources Information Center

    Portrat, Sophie; Barrouillet, Pierre; Camos, Valerie

    2008-01-01

    The time-based resource-sharing model of working memory assumes that memory traces suffer from a time-related decay when attention is occupied by concurrent activities. Using complex continuous span tasks in which temporal parameters are carefully controlled, P. Barrouillet, S. Bernardin, S. Portrat, E. Vergauwe, & V. Camos (2007) recently…

  13. Developmental Change in Working Memory Strategies: From Passive Maintenance to Active Refreshing

    ERIC Educational Resources Information Center

    Camos, Valerie; Barrouillet, Pierre

    2011-01-01

    Change in strategies is often mentioned as a source of memory development. However, though performance in working memory tasks steadily improves during childhood, theories differ in linking this development to strategy changes. Whereas some theories, such as the time-based resource-sharing model, invoke the age-related increase in use and…

  14. Cache write generate for parallel image processing on shared memory architectures.

    PubMed

    Wittenbrink, C M; Somani, A K; Chen, C H

    1996-01-01

    We investigate cache write generate, our cache mode invention. We demonstrate that for parallel image processing applications, the new mode improves main memory bandwidth, CPU efficiency, cache hits, and cache latency. We use register level simulations validated by the UW-Proteus system. Many memory, cache, and processor configurations are evaluated.

  15. Mnemonic convergence in social networks: The emergent properties of cognition at a collective level

    PubMed Central

    Coman, Alin; Momennejad, Ida; Drach, Rae D.; Geana, Andra

    2016-01-01

    The development of shared memories, beliefs, and norms is a fundamental characteristic of human communities. These emergent outcomes are thought to occur owing to a dynamic system of information sharing and memory updating, which fundamentally depends on communication. Here we report results on the formation of collective memories in laboratory-created communities. We manipulated conversational network structure in a series of real-time, computer-mediated interactions in fourteen 10-member communities. The results show that mnemonic convergence, measured as the degree of overlap among community members’ memories, is influenced by both individual-level information-processing phenomena and by the conversational social network structure created during conversational recall. By studying laboratory-created social networks, we show how large-scale social phenomena (i.e., collective memory) can emerge out of microlevel local dynamics (i.e., mnemonic reinforcement and suppression effects). The social-interactionist approach proposed herein points to optimal strategies for spreading information in social networks and provides a framework for measuring and forging collective memories in communities of individuals. PMID:27357678

  16. A sample implementation for parallelizing Divide-and-Conquer algorithms on the GPU.

    PubMed

    Mei, Gang; Zhang, Jiayin; Xu, Nengxiong; Zhao, Kunyang

    2018-01-01

    The strategy of Divide-and-Conquer (D&C) is one of the frequently used programming patterns to design efficient algorithms in computer science, which has been parallelized on shared memory systems and distributed memory systems. Tzeng and Owens specifically developed a generic paradigm for parallelizing D&C algorithms on modern Graphics Processing Units (GPUs). In this paper, by following the generic paradigm proposed by Tzeng and Owens, we provide a new and publicly available GPU implementation of the famous D&C algorithm, QuickHull, to give a sample and guide for parallelizing D&C algorithms on the GPU. The experimental results demonstrate the practicality of our sample GPU implementation. Our research objective in this paper is to present a sample GPU implementation of a classical D&C algorithm to help interested readers to develop their own efficient GPU implementations with fewer efforts.

  17. Vascular system modeling in parallel environment - distributed and shared memory approaches

    PubMed Central

    Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne

    2011-01-01

    The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891

  18. Importance of balanced architectures in the design of high-performance imaging systems

    NASA Astrophysics Data System (ADS)

    Sgro, Joseph A.; Stanton, Paul C.

    1999-03-01

    Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.

  19. OPSO - The OpenGL based Field Acquisition and Telescope Guiding System

    NASA Astrophysics Data System (ADS)

    Škoda, P.; Fuchs, J.; Honsa, J.

    2006-07-01

    We present OPSO, a modular pointing and auto-guiding system for the coudé spectrograph of the Ondřejov observatory 2m telescope. The current field and slit viewing CCD cameras with image intensifiers are giving only standard TV video output. To allow the acquisition and guiding of very faint targets, we have designed an image enhancing system working in real time on TV frames grabbed by BT878-based video capture card. Its basic capabilities include the sliding averaging of hundreds of frames with bad pixel masking and removal of outliers, display of median of set of frames, quick zooming, contrast and brightness adjustment, plotting of horizontal and vertical cross cuts of seeing disk within given intensity range and many more. From the programmer's point of view, the system consists of three tasks running in parallel on a Linux PC. One C task controls the video capturing over Video for Linux (v4l2) interface and feeds the frames into the large block of shared memory, where the core image processing is done by another C program calling the OpenGL library. The GUI is, however, dynamically built in Python from XML description of widgets prepared in Glade. All tasks are exchanging information by IPC calls using the shared memory segments.

  20. Parallelization and automatic data distribution for nuclear reactor simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liebrock, L.M.

    1997-07-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less

  1. Associative-memory representations emerge as shared spatial patterns of theta activity spanning the primate temporal cortex

    PubMed Central

    Nakahara, Kiyoshi; Adachi, Ken; Kawasaki, Keisuke; Matsuo, Takeshi; Sawahata, Hirohito; Majima, Kei; Takeda, Masaki; Sugiyama, Sayaka; Nakata, Ryota; Iijima, Atsuhiko; Tanigawa, Hisashi; Suzuki, Takafumi; Kamitani, Yukiyasu; Hasegawa, Isao

    2016-01-01

    Highly localized neuronal spikes in primate temporal cortex can encode associative memory; however, whether memory formation involves area-wide reorganization of ensemble activity, which often accompanies rhythmicity, or just local microcircuit-level plasticity, remains elusive. Using high-density electrocorticography, we capture local-field potentials spanning the monkey temporal lobes, and show that the visual pair-association (PA) memory is encoded in spatial patterns of theta activity in areas TE, 36, and, partially, in the parahippocampal cortex, but not in the entorhinal cortex. The theta patterns elicited by learned paired associates are distinct between pairs, but similar within pairs. This pattern similarity, emerging through novel PA learning, allows a machine-learning decoder trained on theta patterns elicited by a particular visual item to correctly predict the identity of those elicited by its paired associate. Our results suggest that the formation and sharing of widespread cortical theta patterns via learning-induced reorganization are involved in the mechanisms of associative memory representation. PMID:27282247

  2. The costs of changing an intended action: movement planning, but not execution, interferes with verbal working memory.

    PubMed

    Spiegel, M A; Koester, D; Weigelt, M; Schack, T

    2012-02-16

    How much cognitive effort does it take to change a movement plan? In previous studies, it has been shown that humans plan and represent actions in advance, but it remains unclear whether or not action planning and verbal working memory share cognitive resources. Using a novel experimental paradigm, we combined in two experiments a grasp-to-place task with a verbal working memory task. Participants planned a placing movement toward one of two target positions and subsequently encoded and maintained visually presented letters. Both experiments revealed that re-planning the intended action reduced letter recall performance; execution time, however, was not influenced by action modifications. The results of Experiment 2 suggest that the action's interference with verbal working memory arose during the planning rather than the execution phase of the movement. Together, our results strongly suggest that movement planning and verbal working memory share common cognitive resources. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  3. Synapsin Determines Memory Strength after Punishment- and Relief-Learning

    PubMed Central

    Niewalda, Thomas; Michels, Birgit; Jungnickel, Roswitha; Diegelmann, Sören; Kleber, Jörg; Kähne, Thilo

    2015-01-01

    Adverse life events can induce two kinds of memory with opposite valence, dependent on timing: “negative” memories for stimuli preceding them and “positive” memories for stimuli experienced at the moment of “relief.” Such punishment memory and relief memory are found in insects, rats, and man. For example, fruit flies (Drosophila melanogaster) avoid an odor after odor-shock training (“forward conditioning” of the odor), whereas after shock-odor training (“backward conditioning” of the odor) they approach it. Do these timing-dependent associative processes share molecular determinants? We focus on the role of Synapsin, a conserved presynaptic phosphoprotein regulating the balance between the reserve pool and the readily releasable pool of synaptic vesicles. We find that a lack of Synapsin leaves task-relevant sensory and motor faculties unaffected. In contrast, both punishment memory and relief memory scores are reduced. These defects reflect a true lessening of associative memory strength, as distortions in nonassociative processing (e.g., susceptibility to handling, adaptation, habituation, sensitization), discrimination ability, and changes in the time course of coincidence detection can be ruled out as alternative explanations. Reductions in punishment- and relief-memory strength are also observed upon an RNAi-mediated knock-down of Synapsin, and are rescued both by acutely restoring Synapsin and by locally restoring it in the mushroom bodies of mutant flies. Thus, both punishment memory and relief memory require the Synapsin protein and in this sense share genetic and molecular determinants. We note that corresponding molecular commonalities between punishment memory and relief memory in humans would constrain pharmacological attempts to selectively interfere with excessive associative punishment memories, e.g., after traumatic experiences. PMID:25972175

  4. Synapsin determines memory strength after punishment- and relief-learning.

    PubMed

    Niewalda, Thomas; Michels, Birgit; Jungnickel, Roswitha; Diegelmann, Sören; Kleber, Jörg; Kähne, Thilo; Gerber, Bertram

    2015-05-13

    Adverse life events can induce two kinds of memory with opposite valence, dependent on timing: "negative" memories for stimuli preceding them and "positive" memories for stimuli experienced at the moment of "relief." Such punishment memory and relief memory are found in insects, rats, and man. For example, fruit flies (Drosophila melanogaster) avoid an odor after odor-shock training ("forward conditioning" of the odor), whereas after shock-odor training ("backward conditioning" of the odor) they approach it. Do these timing-dependent associative processes share molecular determinants? We focus on the role of Synapsin, a conserved presynaptic phosphoprotein regulating the balance between the reserve pool and the readily releasable pool of synaptic vesicles. We find that a lack of Synapsin leaves task-relevant sensory and motor faculties unaffected. In contrast, both punishment memory and relief memory scores are reduced. These defects reflect a true lessening of associative memory strength, as distortions in nonassociative processing (e.g., susceptibility to handling, adaptation, habituation, sensitization), discrimination ability, and changes in the time course of coincidence detection can be ruled out as alternative explanations. Reductions in punishment- and relief-memory strength are also observed upon an RNAi-mediated knock-down of Synapsin, and are rescued both by acutely restoring Synapsin and by locally restoring it in the mushroom bodies of mutant flies. Thus, both punishment memory and relief memory require the Synapsin protein and in this sense share genetic and molecular determinants. We note that corresponding molecular commonalities between punishment memory and relief memory in humans would constrain pharmacological attempts to selectively interfere with excessive associative punishment memories, e.g., after traumatic experiences. Copyright © 2015 Niewalda et al.

  5. Teuchos C++ memory management classes, idioms, and related topics, the complete reference : a comprehensive strategy for safe and efficient memory management in C++ for high performance computing.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bartlett, Roscoe Ainsworth

    2010-05-01

    The ubiquitous use of raw pointers in higher-level code is the primary cause of all memory usage problems and memory leaks in C++ programs. This paper describes what might be considered a radical approach to the problem which is to encapsulate the use of all raw pointers and all raw calls to new and delete in higher-level C++ code. Instead, a set of cooperating template classes developed in the Trilinos package Teuchos are used to encapsulate every use of raw C++ pointers in every use case where it appears in high-level code. Included in the set of memory management classesmore » is the typical reference-counted smart pointer class similar to boost::shared ptr (and therefore C++0x std::shared ptr). However, what is missing in boost and the new standard library are non-reference counted classes for remaining use cases where raw C++ pointers would need to be used. These classes have a debug build mode where nearly all programmer errors are caught and gracefully reported at runtime. The default optimized build mode strips all runtime checks and allows the code to perform as efficiently as raw C++ pointers with reasonable usage. Also included is a novel approach for dealing with the circular references problem that imparts little extra overhead and is almost completely invisible to most of the code (unlike the boost and therefore C++0x approach). Rather than being a radical approach, encapsulating all raw C++ pointers is simply the logical progression of a trend in the C++ development and standards community that started with std::auto ptr and is continued (but not finished) with std::shared ptr in C++0x. Using the Teuchos reference-counted memory management classes allows one to remove unnecessary constraints in the use of objects by removing arbitrary lifetime ordering constraints which are a type of unnecessary coupling [23]. The code one writes with these classes will be more likely to be correct on first writing, will be less likely to contain silent (but deadly) memory usage errors, and will be much more robust to later refactoring and maintenance. The level of debug-mode runtime checking provided by the Teuchos memory management classes is stronger in many respects than what is provided by memory checking tools like Valgrind and Purify while being much less expensive. However, tools like Valgrind and Purify perform a number of types of checks (like usage of uninitialized memory) that makes these tools very valuable and therefore complement the Teuchos memory management debug-mode runtime checking. The Teuchos memory management classes and idioms largely address the technical issues in resolving the fragile built-in C++ memory management model (with the exception of circular references which has no easy solution but can be managed as discussed). All that remains is to teach these classes and idioms and expand their usage in C++ codes. The long-term viability of C++ as a usable and productive language depends on it. Otherwise, if C++ is no safer than C, then is the greater complexity of C++ worth what one gets as extra features? Given that C is smaller and easier to learn than C++ and since most programmers don't know object-orientation (or templates or X, Y, and Z features of C++) all that well anyway, then what really are most programmers getting extra out of C++ that would outweigh the extra complexity of C++ over C? C++ zealots will argue this point but the reality is that C++ popularity has peaked and is becoming less popular while the popularity of C has remained fairly stable over the last decade22. Idioms like are advocated in this paper can help to avert this trend but it will require wide community buy-in and a change in the way C++ is taught in order to have the greatest impact. To make these programs more secure, compiler vendors or static analysis tools (e.g. klocwork23) could implement a preprocessor-like language similar to OpenMP24 that would allow the programmer to declare (in comments) that certain blocks of code should be ''pointer-free'' or allow smaller blocks to be 'pointers allowed'. This would significantly improve the robustness of code that uses the memory management classes described here.« less

  6. Audience tuning effects in the context of situated and embodied processes.

    PubMed

    Semin, Gün R

    2018-03-05

    This review provides an overview of the research on communication and the 'Saying is Believing' paradigm in the context of different perspectives on communication. The process of 'audience tuning' is shaped by a variety of situated factors in contexts that affect the communicators' confidence in their message. The overwhelming common denominator is that the combination of features that create ambiguity yields the optimal condition for the formation of shared realities. I conclude with an argument that the implied invariance of memory processes in shared reality work needs to be more attentive to the regulatory function of memories driving the expression of shared realities. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. A 300MHz Embedded Flash Memory with Pipeline Architecture and Offset-Free Sense Amplifiers for Dual-Core Automotive Microcontrollers

    NASA Astrophysics Data System (ADS)

    Kajiyama, Shinya; Fujito, Masamichi; Kasai, Hideo; Mizuno, Makoto; Yamaguchi, Takanori; Shinagawa, Yutaka

    A novel 300MHz embedded flash memory for dual-core microcontrollers with a shared ROM architecture is proposed. One of its features is a three-stage pipeline read operation, which enables reduced access pitch and therefore reduces performance penalty due to conflict of shared ROM accesses. Another feature is a highly sensitive sense amplifier that achieves efficient pipeline operation with two-cycle latency one-cycle pitch as a result of a shortened sense time of 0.63ns. The combination of the pipeline architecture and proposed sense amplifiers significantly reduces access-conflict penalties with shared ROM and enhances performance of 32-bit RISC dual-core microcontrollers by 30%.

  8. A general model for memory interference in a multiprocessor system with memory hierarchy

    NASA Technical Reports Server (NTRS)

    Taha, Badie A.; Standley, Hilda M.

    1989-01-01

    The problem of memory interference in a multiprocessor system with a hierarchy of shared buses and memories is addressed. The behavior of the processors is represented by a sequence of memory requests with each followed by a determined amount of processing time. A statistical queuing network model for determining the extent of memory interference in multiprocessor systems with clusters of memory hierarchies is presented. The performance of the system is measured by the expected number of busy memory clusters. The results of the analytic model are compared with simulation results, and the correlation between them is found to be very high.

  9. Domain-general involvement of the posterior frontolateral cortex in time-based resource-sharing in working memory: An fMRI study.

    PubMed

    Vergauwe, Evie; Hartstra, Egbert; Barrouillet, Pierre; Brass, Marcel

    2015-07-15

    Working memory is often defined in cognitive psychology as a system devoted to the simultaneous processing and maintenance of information. In line with the time-based resource-sharing model of working memory (TBRS; Barrouillet and Camos, 2015; Barrouillet et al., 2004), there is accumulating evidence that, when memory items have to be maintained while performing a concurrent activity, memory performance depends on the cognitive load of this activity, independently of the domain involved. The present study used fMRI to identify regions in the brain that are sensitive to variations in cognitive load in a domain-general way. More precisely, we aimed at identifying brain areas that activate during maintenance of memory items as a direct function of the cognitive load induced by both verbal and spatial concurrent tasks. Results show that the right IFJ and bilateral SPL/IPS are the only areas showing an increased involvement as cognitive load increases and do so in a domain general manner. When correlating the fMRI signal with the approximated cognitive load as defined by the TBRS model, it was shown that the main focus of the cognitive load-related activation is located in the right IFJ. The present findings indicate that the IFJ makes domain-general contributions to time-based resource-sharing in working memory and allowed us to generate the novel hypothesis by which the IFJ might be the neural basis for the process of rapid switching. We argue that the IFJ might be a crucial part of a central attentional bottleneck in the brain because of its inability to upload more than one task rule at once. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. Information and processes underlying semantic and episodic memory across tasks, items, and individuals.

    PubMed

    Cox, Gregory E; Hemmer, Pernille; Aue, William R; Criss, Amy H

    2018-04-01

    The development of memory theory has been constrained by a focus on isolated tasks rather than the processes and information that are common to situations in which memory is engaged. We present results from a study in which 453 participants took part in five different memory tasks: single-item recognition, associative recognition, cued recall, free recall, and lexical decision. Using hierarchical Bayesian techniques, we jointly analyzed the correlations between tasks within individuals-reflecting the degree to which tasks rely on shared cognitive processes-and within items-reflecting the degree to which tasks rely on the same information conveyed by the item. Among other things, we find that (a) the processes involved in lexical access and episodic memory are largely separate and rely on different kinds of information, (b) access to lexical memory is driven primarily by perceptual aspects of a word, (c) all episodic memory tasks rely to an extent on a set of shared processes which make use of semantic features to encode both single words and associations between words, and (d) recall involves additional processes likely related to contextual cuing and response production. These results provide a large-scale picture of memory across different tasks which can serve to drive the development of comprehensive theories of memory. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  11. Review: Leon N. Cooper's Science and Human Experience: Values, Culture, and the Mind.

    PubMed

    Lynch, Gary S

    2015-01-01

    Why are we reviewing a book written by someone who shared in the 1972 Nobel Prize in Physics for work on superconductivity? Because shortly after winning the prize, Leon N. Cooper transitioned into brain research-specifically, the biological basis of memory. He became director of the Brown University Institute for Brain and Neural Systems, whose interdisciplinary program allowed him to integrate research on the brain, physics, and even philosophy. His new book tackles a diverse spectrum of topics and questions, including these: Does science have limits? Where does order come from? Can we understand consciousness?

  12. Review: Leon N. Cooper’s Science and Human Experience: Values, Culture, and the Mind

    PubMed Central

    Lynch, Gary S.

    2015-01-01

    Why are we reviewing a book written by someone who shared in the 1972 Nobel Prize in Physics for work on superconductivity? Because shortly after winning the prize, Leon N. Cooper transitioned into brain research—specifically, the biological basis of memory. He became director of the Brown University Institute for Brain and Neural Systems, whose interdisciplinary program allowed him to integrate research on the brain, physics, and even philosophy. His new book tackles a diverse spectrum of topics and questions, including these: Does science have limits? Where does order come from? Can we understand consciousness? PMID:27358665

  13. Parallel Gaussian elimination of a block tridiagonal matrix using multiple microcomputers

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.

    1989-01-01

    The solution of a block tridiagonal matrix using parallel processing is demonstrated. The multiprocessor system on which results were obtained and the software environment used to program that system are described. Theoretical partitioning and resource allocation for the Gaussian elimination method used to solve the matrix are discussed. The results obtained from running 1, 2 and 3 processor versions of the block tridiagonal solver are presented. The PASCAL source code for these solvers is given in the appendix, and may be transportable to other shared memory parallel processors provided that the synchronization outlines are reproduced on the target system.

  14. Test program for 4-K memory card, JOLT microprocessor

    NASA Technical Reports Server (NTRS)

    Lilley, R. W.

    1976-01-01

    A memory test program is described for use with the JOLT microcomputer 4,096-word memory board used in development of an Omega navigation receiver. The program allows a quick test of the memory board by cycling the memory through all possible bit combinations in all words.

  15. What Drives Memory-Driven Attentional Capture? The Effects of Memory Type, Display Type, and Search Type

    ERIC Educational Resources Information Center

    Olivers, Christian N. L.

    2009-01-01

    An important question is whether visual attention (the ability to select relevant visual information) and visual working memory (the ability to retain relevant visual information) share the same content representations. Some past research has indicated that they do: Singleton distractors interfered more strongly with a visual search task when they…

  16. Discrete Resource Allocation in Visual Working Memory

    ERIC Educational Resources Information Center

    Barton, Brian; Ester, Edward F.; Awh, Edward

    2009-01-01

    Are resources in visual working memory allocated in a continuous or a discrete fashion? On one hand, flexible resource models suggest that capacity is determined by a central resource pool that can be flexibly divided such that items of greater complexity receive a larger share of resources. On the other hand, if capacity in working memory is…

  17. Natural Conversations as a Source of False Memories in Children: Implications for the Testimony of Young Witnesses

    PubMed Central

    Principe, Gabrielle F.; Schindewolf, Erica

    2012-01-01

    Research on factors that can affect the accuracy of children’s autobiographical remembering has important implications for understanding the abilities of young witnesses to provide legal testimony. In this article, we review our own recent research on one factor that has much potential to induce errors in children’s event recall, namely natural memory sharing conversations with peers and parents. Our studies provide compelling evidence that not only can the content of conversations about the past intrude into later memory but that such exchanges can prompt the generation of entirely false narratives that are more detailed than true accounts of experienced events. Further, our work show that deeper and more creative participation in memory sharing dialogues can boost the damaging effects of conversationally conveyed misinformation. Implications of this collection of findings for children’s testimony are discussed. PMID:23129880

  18. On nonlinear finite element analysis in single-, multi- and parallel-processors

    NASA Technical Reports Server (NTRS)

    Utku, S.; Melosh, R.; Islam, M.; Salama, M.

    1982-01-01

    Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.

  19. Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

    DOEpatents

    Blocksome, Michael A.; Mamidala, Amith R.

    2013-09-03

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.

  20. Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

    DOEpatents

    Blocksome, Michael A; Mamidala, Amith R

    2014-02-11

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.

  1. Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jin, Shuangshuang; Chen, Yousu; Wu, Di

    2015-12-09

    Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less

  2. Parallel program debugging with flowback analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Choi, Jongdeok.

    1989-01-01

    This thesis describes the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors. The goal of the debugging system is to present to the programmer a graphical view of the dynamic program dependences while keeping the execution-time overhead low. The author first describes the use of flowback analysis to provide information on causal relationship between events in a programs' execution without re-executing the program for debugging. Execution time overhead is kept low by recording only a small amount of trace during a program's execution. He uses semantic analysis and a technique called incrementalmore » tracing to keep the time and space overhead low. As part of the semantic analysis, he uses a static program dependence graph structure that reduces the amount of work done at compile time and takes advantage of the dynamic information produced during execution time. The cornerstone of the incremental tracing concept is to generate a coarse trace during execution and fill incrementally, during the interactive portion of the debugging session, the gap between the information gathered in the coarse trace and the information needed to do the flowback analysis using the coarse trace. Then, he describes how to extend the flowback analysis to parallel programs. The flowback analysis can span process boundaries; i.e., the most recent modification to a shared variable might be traced to a different process than the one that contains the current reference. The static and dynamic program dependence graphs of the individual processes are tied together with synchronization and data dependence information to form complete graphs that represent the entire program.« less

  3. Protection of Mission-Critical Applications from Untrusted Execution Environment: Resource Efficient Replication and Migration of Virtual Machines

    DTIC Science & Technology

    2015-09-28

    the performance of log-and- replay can degrade significantly for VMs configured with multiple virtual CPUs, since the shared memory communication...whether based on checkpoint replication or log-and- replay , existing HA ap- proaches use in- memory backups. The backup VM sits in the memory of a...efficiently. 15. SUBJECT TERMS High-availability virtual machines, live migration, memory and traffic overheads, application suspension, Java

  4. Parallel processing for scientific computations

    NASA Technical Reports Server (NTRS)

    Alkhatib, Hasan S.

    1995-01-01

    The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.

  5. Wnt signaling inhibits CTL memory programming

    PubMed Central

    Xiao, Zhengguo; Sun, Zhifeng; Smyth, Kendra; Li, Lei

    2013-01-01

    Induction of functional CTLs is one of the major goals for vaccine development and cancer therapy. Inflammatory cytokines are critical for memory CTL generation. Wnt signaling is important for CTL priming and memory formation, but its role in cytokine-driven memory CTL programming is unclear. We found that wnt signaling inhibited IL-12-driven CTL activation and memory programming. This impaired memory CTL programming was attributed to up-regulation of eomes and down-regulation of T-bet. Wnt signaling suppressed the mTOR pathway during CTL activation, which was different to its effects on other cell types. Interestingly, the impaired memory CTL programming by wnt was partially rescued by mTOR inhibitor rapamycin. In conclusion, we found that crosstalk between wnt and the IL-12 signaling inhibits T-bet and mTOR pathways and impairs memory programming which can be recovered in part by rapamycin. In addition, direct inhibition of wnt signaling during CTL activation does not affect CTL memory programming. Therefore, wnt signaling may serve as a new tool for CTL manipulation in autoimmune diseases and immune therapy for certain cancers. PMID:23911398

  6. Implementing Shared Memory Parallelism in MCBEND

    NASA Astrophysics Data System (ADS)

    Bird, Adam; Long, David; Dobson, Geoff

    2017-09-01

    MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.

  7. Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

    DOE PAGES

    Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...

    2013-01-01

    Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less

  8. Automation of Data Traffic Control on DSM Architecture

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

    2001-01-01

    The design of distributed shared memory (DSM) computers liberates users from the duty to distribute data across processors and allows for the incremental development of parallel programs using, for example, OpenMP or Java threads. DSM architecture greatly simplifies the development of parallel programs having good performance on a few processors. However, to achieve a good program scalability on DSM computers requires that the user understand data flow in the application and use various techniques to avoid data traffic congestions. In this paper we discuss a number of such techniques, including data blocking, data placement, data transposition and page size control and evaluate their efficiency on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks. We also present a tool which automates the detection of constructs causing data congestions in Fortran array oriented codes and advises the user on code transformations for improving data traffic in the application.

  9. Performance and Application of Parallel OVERFLOW Codes on Distributed and Shared Memory Platforms

    NASA Technical Reports Server (NTRS)

    Djomehri, M. Jahed; Rizk, Yehia M.

    1999-01-01

    The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each group are approximately balanced. A proper number of threads are initially allocated to each group, and in subsequent iterations during the run-time, the number of threads are adjusted to achieve load balancing across the processes. Each process exploits the multitasking directives already established in Overflow.

  10. Implementing High-Performance Geometric Multigrid Solver with Naturally Grained Messages

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shan, Hongzhang; Williams, Samuel; Zheng, Yili

    2015-10-26

    Structured-grid linear solvers often require manually packing and unpacking of communication data to achieve high performance.Orchestrating this process efficiently is challenging, labor-intensive, and potentially error-prone.In this paper, we explore an alternative approach that communicates the data with naturally grained messagesizes without manual packing and unpacking. This approach is the distributed analogue of shared-memory programming, taking advantage of the global addressspace in PGAS languages to provide substantial programming ease. However, its performance may suffer from the large number of small messages. We investigate theruntime support required in the UPC ++ library for this naturally grained version to close the performance gapmore » between the two approaches and attain comparable performance at scale using the High-Performance Geometric Multgrid (HPGMG-FV) benchmark as a driver.« less

  11. Effects of Aging on True and False Memory Formation: An fMRI Study

    ERIC Educational Resources Information Center

    Dennis, Nancy A.; Kim, Hongkeun; Cabeza, Roberto

    2007-01-01

    Compared to young, older adults are more likely to forget events that occurred in the past as well as remember events that never happened. Previous studies examining false memories and aging have shown that these memories are more likely to occur when new items share perceptual or semantic similarities with those presented during encoding. It is…

  12. Ad Hoc Categories and False Memories: Memory Illusions for Categories Created On-The-Spot

    ERIC Educational Resources Information Center

    Soro, Jerônimo C.; Ferreira, Mário B.; Semin, Gün R.; Mata, André; Carneiro, Paula

    2017-01-01

    Three experiments were designed to test whether experimentally created ad hoc associative networks evoke false memories. We used the DRM (Deese, Roediger, McDermott) paradigm with lists of ad hoc categories composed of exemplars aggregated toward specific goals (e.g., going for a picnic) that do not share any consistent set of features. Experiment…

  13. Can your software engineer program your PLC?

    NASA Astrophysics Data System (ADS)

    Borrowman, Alastair J.; Taylor, Philip

    2016-07-01

    The use of Programmable Logic Controllers (PLCs) in the control of large physics experiments is ubiquitous1, 2, 3. The programming of these controllers is normally the domain of engineers with a background in electronics, this paper introduces PLC program development from the software engineer's perspective. PLC programs provide the link between control software running on PC architecture systems and physical hardware controlled and monitored by digital and analog signals. The higher-level software running on the PC is typically responsible for accepting operator input and from this deciding when and how hardware connected to the PLC is controlled. The PLC accepts demands from the PC, considers the current state of its connected hardware and if correct to do so (based upon interlocks or other constraints) adjusts its hardware output signals appropriately for the PC's demands. A published ICD (Interface Control Document) defines the PLC memory locations available to be written and read by the PC to control and monitor the hardware. Historically the method of programming PLCs has been ladder diagrams that closely resemble circuit diagrams, however, PLC manufacturers nowadays also provide, and promote, the use of higher-level programming languages4. Based on techniques used in the development of high-level PC software to control PLCs for multiple telescopes, this paper examines the development of PLC programs to operate the hardware of a medical cyclotron beamline controlled from a PC using the Experimental Physics and Industrial Control System (EPICS), which is also widely used in telescope control5, 6, 7. The PLC used is the new generation Siemens S7-1200 programmed using Siemens Pascal based Structured Control Language (SCL), which is their implementation of Structured Text (ST). The approach described is that from a software engineer's perspective, utilising Siemens Totally Integrated Automation (TIA) Portal integrated development environment (IDE) to create modular PLC programs based upon reusable functions capable of being unit tested without the PLC connected to hardware. Emphasis has been placed on designing an interface between EPICS and SCL that enforces correct operation of hardware through stringent separation of PC accessible PLC memory and hardware I/O addresses used only by the PLC. The paper also introduces the method used to automate the creation, from the same source document, the PLC memory structure (tag) definitions (defining memory used to access hardware I/O and that accessed by the PC) and creation of the PC program data structures (EPICS database records) used to access the permitted PLC addresses. From direct experience this paper demonstrates the advantages of PLC program development being shared between electronic and software engineers, to enable use of the most appropriate processes from both the perspective of the hardware and the higher-level software used to control it.

  14. Transactive memory in organizational groups: the effects of content, consensus, specialization, and accuracy on group performance.

    PubMed

    Austin, John R

    2003-10-01

    Previous research on transactive memory has found a positive relationship between transactive memory system development and group performance in single project laboratory and ad hoc groups. Closely related research on shared mental models and expertise recognition supports these findings. In this study, the author examined the relationship between transactive memory systems and performance in mature, continuing groups. A group's transactive memory system, measured as a combination of knowledge stock, knowledge specialization, transactive memory consensus, and transactive memory accuracy, is positively related to group goal performance, external group evaluations, and internal group evaluations. The positive relationship with group performance was found to hold for both task and external relationship transactive memory systems.

  15. Social Transmission of False Memory in Small Groups and Large Networks.

    PubMed

    Maswood, Raeya; Rajaram, Suparna

    2018-05-21

    Sharing information and memories is a key feature of social interactions, making social contexts important for developing and transmitting accurate memories and also false memories. False memory transmission can have wide-ranging effects, including shaping personal memories of individuals as well as collective memories of a network of people. This paper reviews a collection of key findings and explanations in cognitive research on the transmission of false memories in small groups. It also reviews the emerging experimental work on larger networks and collective false memories. Given the reconstructive nature of memory, the abundance of misinformation in everyday life, and the variety of social structures in which people interact, an understanding of transmission of false memories has both scientific and societal implications. © 2018 Cognitive Science Society, Inc.

  16. MULTI: a shared memory approach to cooperative molecular modeling.

    PubMed

    Darden, T; Johnson, P; Smith, H

    1991-03-01

    A general purpose molecular modeling system, MULTI, based on the UNIX shared memory and semaphore facilities for interprocess communication is described. In addition to the normal querying or monitoring of geometric data, MULTI also provides processes for manipulating conformations, and for displaying peptide or nucleic acid ribbons, Connolly surfaces, close nonbonded contacts, crystal-symmetry related images, least-squares superpositions, and so forth. This paper outlines the basic techniques used in MULTI to ensure cooperation among these specialized processes, and then describes how they can work together to provide a flexible modeling environment.

  17. A Massively Parallel Code for Polarization Calculations

    NASA Astrophysics Data System (ADS)

    Akiyama, Shizuka; Höflich, Peter

    2001-03-01

    We present an implementation of our Monte-Carlo radiation transport method for rapidly expanding, NLTE atmospheres for massively parallel computers which utilizes both the distributed and shared memory models. This allows us to take full advantage of the fast communication and low latency inherent to nodes with multiple CPUs, and to stretch the limits of scalability with the number of nodes compared to a version which is based on the shared memory model. Test calculations on a local 20-node Beowulf cluster with dual CPUs showed an improved scalability by about 40%.

  18. Cache-based error recovery for shared memory multiprocessor systems

    NASA Technical Reports Server (NTRS)

    Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.

    1989-01-01

    A multiprocessor cache-based checkpointing and recovery scheme for of recovering from transient processor errors in a shared-memory multiprocessor with private caches is presented. New implementation techniques that use checkpoint identifiers and recovery stacks to reduce performance degradation in processor utilization during normal execution are examined. This cache-based checkpointing technique prevents rollback propagation, provides for rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions that take error latency into account are presented.

  19. Power/Performance Trade-offs of Small Batched LU Based Solvers on GPUs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Villa, Oreste; Fatica, Massimiliano; Gawande, Nitin A.

    In this paper we propose and analyze a set of batched linear solvers for small matrices on Graphic Processing Units (GPUs), evaluating the various alternatives depending on the size of the systems to solve. We discuss three different solutions that operate with different level of parallelization and GPU features. The first, exploiting the CUBLAS library, manages matrices of size up to 32x32 and employs Warp level (one matrix, one Warp) parallelism and shared memory. The second works at Thread-block level parallelism (one matrix, one Thread-block), still exploiting shared memory but managing matrices up to 76x76. The third is Thread levelmore » parallel (one matrix, one thread) and can reach sizes up to 128x128, but it does not exploit shared memory and only relies on the high memory bandwidth of the GPU. The first and second solution only support partial pivoting, the third one easily supports partial and full pivoting, making it attractive to problems that require greater numerical stability. We analyze the trade-offs in terms of performance and power consumption as function of the size of the linear systems that are simultaneously solved. We execute the three implementations on a Tesla M2090 (Fermi) and on a Tesla K20 (Kepler).« less

  20. A neuropsychological comparison of obsessive-compulsive disorder and trichotillomania.

    PubMed

    Chamberlain, Samuel R; Fineberg, Naomi A; Blackwell, Andrew D; Clark, Luke; Robbins, Trevor W; Sahakian, Barbara J

    2007-03-02

    Obsessive-compulsive disorder (OCD) and trichotillomania (compulsive hair-pulling) share overlapping co-morbidity, familial transmission, and phenomenology. However, the extent to which these disorders share a common cognitive phenotype has yet to be elucidated using patients without confounding co-morbidities. To compare neurocognitive functioning in co-morbidity-free patients with OCD and trichotillomania, focusing on domains of learning and memory, executive function, affective processing, reflection-impulsivity and decision-making. Twenty patients with OCD, 20 patients with trichotillomania, and 20 matched controls undertook neuropsychological assessment after meeting stringent inclusion criteria. Groups were matched for age, education, verbal IQ, and gender. The OCD and trichotillomania groups were impaired on spatial working memory. Only OCD patients showed additional impairments on executive planning and visual pattern recognition memory, and missed more responses to sad target words than other groups on an affective go/no-go task. Furthermore, OCD patients failed to modulate their behaviour between conditions on the reflection-impulsivity test, suggestive of cognitive inflexibility. Both clinical groups showed intact decision-making and probabilistic reversal learning. OCD and trichotillomania shared overlapping spatial working memory problems, but neuropsychological dysfunction in OCD spanned additional domains that were intact in trichotillomania. Findings are discussed in relation to likely fronto-striatal neural substrates and future research directions.

  1. NavP: Structured and Multithreaded Distributed Parallel Programming

    NASA Technical Reports Server (NTRS)

    Pan, Lei

    2007-01-01

    We present Navigational Programming (NavP) -- a distributed parallel programming methodology based on the principles of migrating computations and multithreading. The four major steps of NavP are: (1) Distribute the data using the data communication pattern in a given algorithm; (2) Insert navigational commands for the computation to migrate and follow large-sized distributed data; (3) Cut the sequential migrating thread and construct a mobile pipeline; and (4) Loop back for refinement. NavP is significantly different from the current prevailing Message Passing (MP) approach. The advantages of NavP include: (1) NavP is structured distributed programming and it does not change the code structure of an original algorithm. This is in sharp contrast to MP as MP implementations in general do not resemble the original sequential code; (2) NavP implementations are always competitive with the best MPI implementations in terms of performance. Approaches such as DSM or HPF have failed to deliver satisfying performance as of today in contrast, even if they are relatively easy to use compared to MP; (3) NavP provides incremental parallelization, which is beyond the reach of MP; and (4) NavP is a unifying approach that allows us to exploit both fine- (multithreading on shared memory) and coarse- (pipelined tasks on distributed memory) grained parallelism. This is in contrast to the currently popular hybrid use of MP+OpenMP, which is known to be complex to use. We present experimental results that demonstrate the effectiveness of NavP.

  2. Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.

    2003-01-01

    With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.

  3. Using a source-to-source transformation to introduce multi-threading into the AliRoot framework for a parallel event reconstruction

    NASA Astrophysics Data System (ADS)

    Lohn, Stefan B.; Dong, Xin; Carminati, Federico

    2012-12-01

    Chip-Multiprocessors are going to support massive parallelism by many additional physical and logical cores. Improving performance can no longer be obtained by increasing clock-frequency because the technical limits are almost reached. Instead, parallel execution must be used to gain performance. Resources like main memory, the cache hierarchy, bandwidth of the memory bus or links between cores and sockets are not going to be improved as fast. Hence, parallelism can only result into performance gains if the memory usage is optimized and the communication between threads is minimized. Besides concurrent programming has become a domain for experts. Implementing multi-threading is error prone and labor-intensive. A full reimplementation of the whole AliRoot source-code is unaffordable. This paper describes the effort to evaluate the adaption of AliRoot to the needs of multi-threading and to provide the capability of parallel processing by using a semi-automatic source-to-source transformation to address the problems as described before and to provide a straight-forward way of parallelization with almost no interference between threads. This makes the approach simple and reduces the required manual changes in the code. In a first step, unconditional thread-safety will be introduced to bring the original sequential and thread unaware source-code into the position of utilizing multi-threading. Afterwards further investigations have to be performed to point out candidates of classes that are useful to share amongst threads. Then in a second step, the transformation has to change the code to share these classes and finally to verify if there are anymore invalid interferences between threads.

  4. A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations

    PubMed Central

    Ho, ThienLuan; Oh, Seung-Rohk

    2017-01-01

    Approximate string matching with k-differences has a number of practical applications, ranging from pattern recognition to computational biology. This paper proposes an efficient memory-access algorithm for parallel approximate string matching with k-differences on Graphics Processing Units (GPUs). In the proposed algorithm, all threads in the same GPUs warp share data using warp-shuffle operation instead of accessing the shared memory. Moreover, we implement the proposed algorithm by exploiting the memory structure of GPUs to optimize its performance. Experiment results for real DNA packages revealed that the performance of the proposed algorithm and its implementation archived up to 122.64 and 1.53 times compared to that of sequential algorithm on CPU and previous parallel approximate string matching algorithm on GPUs, respectively. PMID:29016700

  5. Relative time sharing: new findings and an extension of the resource allocation model of temporal processing.

    PubMed

    Buhusi, Catalin V; Meck, Warren H

    2009-07-12

    Individuals time as if using a stopwatch that can be stopped or reset on command. Here, we review behavioural and neurobiological data supporting the time-sharing hypothesis that perceived time depends on the attentional and memory resources allocated to the timing process. Neuroimaging studies in humans suggest that timekeeping tasks engage brain circuits typically involved in attention and working memory. Behavioural, pharmacological, lesion and electrophysiological studies in lower animals support this time-sharing hypothesis. When subjects attend to a second task, or when intruder events are presented, estimated durations are shorter, presumably due to resources being taken away from timing. Here, we extend the time-sharing hypothesis by proposing that resource reallocation is proportional to the perceived contrast, both in temporal and non-temporal features, between intruders and the timed events. New findings support this extension by showing that the effect of an intruder event is dependent on the relative duration of the intruder to the intertrial interval. The conclusion is that the brain circuits engaged by timekeeping comprise not only those primarily involved in time accumulation, but also those involved in the maintenance of attentional and memory resources for timing, and in the monitoring and reallocation of those resources among tasks.

  6. Multibit Polycristalline Silicon-Oxide-Silicon Nitride-Oxide-Silicon Memory Cells with High Density Designed Utilizing a Separated Control Gate

    NASA Astrophysics Data System (ADS)

    Rok Kim, Kyeong; You, Joo Hyung; Dal Kwack, Kae; Kim, Tae Whan

    2010-10-01

    Unique multibit NAND polycrystalline silicon-oxide-silicon nitride-oxide-silicon (SONOS) memory cells utilizing a separated control gate (SCG) were designed to increase memory density. The proposed NAND SONOS memory device based on a SCG structure was operated as two bits, resulting in an increase in the storage density of the NVM devices in comparison with conventional single-bit memories. The electrical properties of the SONOS memory cells with a SCG were investigated to clarify the charging effects in the SONOS memory cells. When the program voltage was supplied to each gate of the NAND SONOS flash memory cells, the electrons were trapped in the nitride region of the oxide-nitride-oxide layer under the gate to supply the program voltage. The electrons were accumulated without affecting the other gate during the programming operation, indicating the absence of cross-talk between two trap charge regions. It is expected that the inference effect will be suppressed by the lower program voltage than the program voltage of the conventional NAND flash memory. The simulation results indicate that the proposed unique NAND SONOS memory cells with a SCG can be used to increase memory density.

  7. DISP: Optimizations towards Scalable MPI Startup

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fu, Huansong; Pophale, Swaroop S; Gorentla Venkata, Manjunath

    2016-01-01

    Despite the popularity of MPI for high performance computing, the startup of MPI programs faces a scalability challenge as both the execution time and memory consumption increase drastically at scale. We have examined this problem using the collective modules of Cheetah and Tuned in Open MPI as representative implementations. Previous improvements for collectives have focused on algorithmic advances and hardware off-load. In this paper, we examine the startup cost of the collective module within a communicator and explore various techniques to improve its efficiency and scalability. Accordingly, we have developed a new scalable startup scheme with three internal techniques, namelymore » Delayed Initialization, Module Sharing and Prediction-based Topology Setup (DISP). Our DISP scheme greatly benefits the collective initialization of the Cheetah module. At the same time, it helps boost the performance of non-collective initialization in the Tuned module. We evaluate the performance of our implementation on Titan supercomputer at ORNL with up to 4096 processes. The results show that our delayed initialization can speed up the startup of Tuned and Cheetah by an average of 32.0% and 29.2%, respectively, our module sharing can reduce the memory consumption of Tuned and Cheetah by up to 24.1% and 83.5%, respectively, and our prediction-based topology setup can speed up the startup of Cheetah by up to 80%.« less

  8. An experimental distributed microprocessor implementation with a shared memory communications and control medium

    NASA Technical Reports Server (NTRS)

    Mejzak, R. S.

    1980-01-01

    The distributed processing concept is defined in terms of control primitives, variables, and structures and their use in performing a decomposed discrete Fourier transform (DET) application function. The design assumes interprocessor communications to be anonymous. In this scheme, all processors can access an entire common database by employing control primitives. Access to selected areas within the common database is random, enforced by a hardware lock, and determined by task and subtask pointers. This enables the number of processors to be varied in the configuration without any modifications to the control structure. Decompositional elements of the DFT application function in terms of tasks and subtasks are also described. The experimental hardware configuration consists of IMSAI 8080 chassis which are independent, 8 bit microcomputer units. These chassis are linked together to form a multiple processing system by means of a shared memory facility. This facility consists of hardware which provides a bus structure to enable up to six microcomputers to be interconnected. It provides polling and arbitration logic so that only one processor has access to shared memory at any one time.

  9. Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

    NASA Astrophysics Data System (ADS)

    Calafiura, Paolo; Leggett, Charles; Seuster, Rolf; Tsulaia, Vakhtang; Van Gemmeren, Peter

    2015-12-01

    AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write mechanisms, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows the running of AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.

  10. Human Memory Organization for Computer Programs.

    ERIC Educational Resources Information Center

    Norcio, A. F.; Kerst, Stephen M.

    1983-01-01

    Results of study investigating human memory organization in processing of computer programming languages indicate that algorithmic logic segments form a cognitive organizational structure in memory for programs. Statement indentation and internal program documentation did not enhance organizational process of recall of statements in five Fortran…

  11. Optimized Infrastructure for the Earth System Prediction Capability

    DTIC Science & Technology

    2013-09-30

    for referencing memory between its native coupling datatype (MCT Attribute Vectors) and ESMF Arrays. This will reduce the copies required and will...introduced ability within CESM to share memory between ESMF and MCT datatypes makes using both tools together much easier. Using both is appealing

  12. Infectious Cognition: Risk Perception Affects Socially Shared Retrieval-Induced Forgetting of Medical Information.

    PubMed

    Coman, Alin; Berry, Jessica N

    2015-12-01

    When speakers selectively retrieve previously learned information, listeners often concurrently, and covertly, retrieve their memories of that information. This concurrent retrieval typically enhances memory for mentioned information (the rehearsal effect) and impairs memory for unmentioned but related information (socially shared retrieval-induced forgetting, SSRIF), relative to memory for unmentioned and unrelated information. Building on research showing that anxiety leads to increased attention to threat-relevant information, we explored whether concurrent retrieval is facilitated in high-anxiety real-world contexts. Participants first learned category-exemplar facts about meningococcal disease. Following a manipulation of perceived risk of infection (low vs. high risk), they listened to a mock radio show in which some of the facts were selectively practiced. Final recall tests showed that the rehearsal effect was equivalent between the two risk conditions, but SSRIF was significantly larger in the high-risk than in the low-risk condition. Thus, the tendency to exaggerate consequences of news events was found to have deleterious consequences. © The Author(s) 2015.

  13. Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blocksome, Michael A.; Mamidala, Amith R.

    2013-09-03

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segmentmore » of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.« less

  14. The Work's Not Over- Roll Up Your Sleeves and Make a Difference!

    NASA Astrophysics Data System (ADS)

    Sarquis, Mickey

    1997-01-01

    As my 17-year tenure as the first editor of the Secondary School Chemistry Section draws to a close, John Moore has invited me to share some reflections on my experiences. It's hard for me to believe that this many years have passed; in some ways, it seems like only yesterday that I took on this position. Looking back over my term as Section editor recalls wonderful memories, but it also stimulates me to seek out and take on new challenges as I move into a new phase of involvement in chemical education. In response to John's kind invitation, I'd like to share some of these memories and ideas with you who share my vision of quality chemical education, particularly at the secondary level.

  15. Fast quantum Monte Carlo on a GPU

    NASA Astrophysics Data System (ADS)

    Lutsyshyn, Y.

    2015-02-01

    We present a scheme for the parallelization of quantum Monte Carlo method on graphical processing units, focusing on variational Monte Carlo simulation of bosonic systems. We use asynchronous execution schemes with shared memory persistence, and obtain an excellent utilization of the accelerator. The CUDA code is provided along with a package that simulates liquid helium-4. The program was benchmarked on several models of Nvidia GPU, including Fermi GTX560 and M2090, and the Kepler architecture K20 GPU. Special optimization was developed for the Kepler cards, including placement of data structures in the register space of the Kepler GPUs. Kepler-specific optimization is discussed.

  16. An efficient 3-dim FFT for plane wave electronic structure calculations on massively parallel machines composed of multiprocessor nodes

    NASA Astrophysics Data System (ADS)

    Goedecker, Stefan; Boulet, Mireille; Deutsch, Thierry

    2003-08-01

    Three-dimensional Fast Fourier Transforms (FFTs) are the main computational task in plane wave electronic structure calculations. Obtaining a high performance on a large numbers of processors is non-trivial on the latest generation of parallel computers that consist of nodes made up of a shared memory multiprocessors. A non-dogmatic method for obtaining high performance for such 3-dim FFTs in a combined MPI/OpenMP programming paradigm will be presented. Exploiting the peculiarities of plane wave electronic structure calculations, speedups of up to 160 and speeds of up to 130 Gflops were obtained on 256 processors.

  17. Effects of a Memory Training Program in Older People with Severe Memory Loss

    ERIC Educational Resources Information Center

    Mateos, Pedro M.; Valentin, Alberto; González-Tablas, Maria del Mar; Espadas, Verónica; Vera, Juan L.; Jorge, Inmaculada García

    2016-01-01

    Strategies based memory training programs are widely used to enhance the cognitive abilities of the elderly. Participants in these training programs are usually people whose mental abilities remain intact. Occasionally, people with cognitive impairment also participate. The aim of this study was to test if memory training designed specifically for…

  18. Arousal-biased competition in perception and memory

    PubMed Central

    Mather, Mara; Sutherland, Matthew R.

    2010-01-01

    Our everyday surroundings besiege us with information. The battle is for a share of our limited attention and memory, with the brain selecting the winners and discarding the losers. Previous research shows that both bottom-up and top-down factors bias competition in favor of high priority stimuli. We propose that arousal during an event increases this bias both in perception and in long-term memory of the event. Arousal-biased competition theory provides specific predictions about when arousal will enhance and when it will impair memory for events, accounting for some puzzling contradictions in the emotional memory literature. PMID:21660127

  19. Glucocorticoids in the prefrontal cortex enhance memory consolidation and impair working memory by a common neural mechanism

    PubMed Central

    Barsegyan, Areg; Mackenzie, Scott M.; Kurose, Brian D.; McGaugh, James L.; Roozendaal, Benno

    2010-01-01

    It is well established that acute administration of adrenocortical hormones enhances the consolidation of memories of emotional experiences and, concurrently, impairs working memory. These different glucocorticoid effects on these two memory functions have generally been considered to be independently regulated processes. Here we report that a glucocorticoid receptor agonist administered into the medial prefrontal cortex (mPFC) of male Sprague-Dawley rats both enhances memory consolidation and impairs working memory. Both memory effects are mediated by activation of a membrane-bound steroid receptor and depend on noradrenergic activity within the mPFC to increase levels of cAMP-dependent protein kinase. These findings provide direct evidence that glucocorticoid effects on both memory consolidation and working memory share a common neural influence within the mPFC. PMID:20810923

  20. Benefits and Costs of Context Reinstatement in Episodic Memory: An ERP Study.

    PubMed

    Bramão, Inês; Johansson, Mikael

    2017-01-01

    This study investigated context-dependent episodic memory retrieval. An influential idea in the memory literature is that performance benefits when the retrieval context overlaps with the original encoding context. However, such memory facilitation may not be driven by the encoding-retrieval overlap per se but by the presence of diagnostic features in the reinstated context that discriminate the target episode from competing episodes. To test this prediction, the encoding-retrieval overlap and the diagnostic value of the context were manipulated in a novel associative recognition memory task. Participants were asked to memorize word pairs presented together with diagnostic (unique) and nondiagnostic (shared) background scenes. At test, participants recognized the word pairs in the presence and absence of the previously encoded contexts. Behavioral data show facilitated memory performance in the presence of the original context but, importantly, only when the context was diagnostic of the target episode. The electrophysiological data reveal an early anterior ERP encoding-retrieval overlap effect that tracks the cost associated with having nondiagnostic contexts present at retrieval, that is, shared by multiple previous episodes, and a later posterior encoding-retrieval overlap effect that reflects facilitated access to the target episode during retrieval in diagnostic contexts. Taken together, our results underscore the importance of the diagnostic value of the context and suggest that context-dependent episodic memory effects are multiple determined.

  1. Hypercluster - Parallel processing for computational mechanics

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.

    1988-01-01

    An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.

  2. Design and implementation of a medium speed communications interface and protocol for a low cost, refreshed display computer

    NASA Technical Reports Server (NTRS)

    Phyne, J. R.; Nelson, M. D.

    1975-01-01

    The design and implementation of hardware and software systems involved in using a 40,000 bit/second communication line as the connecting link between an IMLAC PDS 1-D display computer and a Univac 1108 computer system were described. The IMLAC consists of two independent processors sharing a common memory. The display processor generates the deflection and beam control currents as it interprets a program contained in the memory; the minicomputer has a general instruction set and is responsible for starting and stopping the display processor and for communicating with the outside world through the keyboard, teletype, light pen, and communication line. The processing time associated with each data byte was minimized by designing the input and output processes as finite state machines which automatically sequence from each state to the next. Several tests of the communication link and the IMLAC software were made using a special low capacity computer grade cable between the IMLAC and the Univac.

  3. Contributions of Medial Temporal Lobe and Striatal Memory Systems to Learning and Retrieving Overlapping Spatial Memories

    PubMed Central

    Brown, Thackery I.; Stern, Chantal E.

    2014-01-01

    Many life experiences share information with other memories. In order to make decisions based on overlapping memories, we need to distinguish between experiences to determine the appropriate behavior for the current situation. Previous work suggests that the medial temporal lobe (MTL) and medial caudate interact to support the retrieval of overlapping navigational memories in different contexts. The present study used functional magnetic resonance imaging (fMRI) in humans to test the prediction that the MTL and medial caudate play complementary roles in learning novel mazes that cross paths with, and must be distinguished from, previously learned routes. During fMRI scanning, participants navigated virtual routes that were well learned from prior training while also learning new mazes. Critically, some routes learned during scanning shared hallways with those learned during pre-scan training. Overlap between mazes required participants to use contextual cues to select between alternative behaviors. Results demonstrated parahippocampal cortex activity specific for novel spatial cues that distinguish between overlapping routes. The hippocampus and medial caudate were active for learning overlapping spatial memories, and increased their activity for previously learned routes when they became context dependent. Our findings provide novel evidence that the MTL and medial caudate play complementary roles in the learning, updating, and execution of context-dependent navigational behaviors. PMID:23448868

  4. Symbiosis of executive and selective attention in working memory

    PubMed Central

    Vandierendonck, André

    2014-01-01

    The notion of working memory (WM) was introduced to account for the usage of short-term memory resources by other cognitive tasks such as reasoning, mental arithmetic, language comprehension, and many others. This collaboration between memory and other cognitive tasks can only be achieved by a dedicated WM system that controls task coordination. To that end, WM models include executive control. Nevertheless, other attention control systems may be involved in coordination of memory and cognitive tasks calling on memory resources. The present paper briefly reviews the evidence concerning the role of selective attention in WM activities. A model is proposed in which selective attention control is directly linked to the executive control part of the WM system. The model assumes that apart from storage of declarative information, the system also includes an executive WM module that represents the current task set. Control processes are automatically triggered when particular conditions in these modules are met. As each task set represents the parameter settings and the actions needed to achieve the task goal, it will depend on the specific settings and actions whether selective attention control will have to be shared among the active tasks. Only when such sharing is required, task performance will be affected by the capacity limits of the control system involved. PMID:25152723

  5. Symbiosis of executive and selective attention in working memory.

    PubMed

    Vandierendonck, André

    2014-01-01

    The notion of working memory (WM) was introduced to account for the usage of short-term memory resources by other cognitive tasks such as reasoning, mental arithmetic, language comprehension, and many others. This collaboration between memory and other cognitive tasks can only be achieved by a dedicated WM system that controls task coordination. To that end, WM models include executive control. Nevertheless, other attention control systems may be involved in coordination of memory and cognitive tasks calling on memory resources. The present paper briefly reviews the evidence concerning the role of selective attention in WM activities. A model is proposed in which selective attention control is directly linked to the executive control part of the WM system. The model assumes that apart from storage of declarative information, the system also includes an executive WM module that represents the current task set. Control processes are automatically triggered when particular conditions in these modules are met. As each task set represents the parameter settings and the actions needed to achieve the task goal, it will depend on the specific settings and actions whether selective attention control will have to be shared among the active tasks. Only when such sharing is required, task performance will be affected by the capacity limits of the control system involved.

  6. Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Heber, Gerd; Biswas, Rupak

    2000-01-01

    The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.

  7. The prevalence and quality of silent, socially silent, and disclosed autobiographical memories across adulthood.

    PubMed

    Alea, Nicole

    2010-02-01

    Two separate studies examined the prevalence and quality of silent (infrequently recalled), socially silent (i.e., recalled but not shared), and disclosed autobiographical memories. In Study 1 young and older men and women remembered positive events. Positive memories were more likely to be disclosed than to be kept socially silent or completely silent. However, socially silent and disclosed memories did not differ in memory quality: the memories were equally vivid, significant, and emotional. Silent memories were less qualitatively rich. This pattern of results was generally replicated in Study 2 with a lifespan sample for both positive and negative memories, and with additional qualitative variables. The exception was that negative memories were kept silent more often. Age differences were minimal. Women disclosed their autobiographical memories more, but men told a greater variety of people. Results are discussed in terms of the functions that memory telling and silences might serve for individuals.

  8. GPU COMPUTING FOR PARTICLE TRACKING

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nishimura, Hiroshi; Song, Kai; Muriki, Krishna

    2011-03-25

    This is a feasibility study of using a modern Graphics Processing Unit (GPU) to parallelize the accelerator particle tracking code. To demonstrate the massive parallelization features provided by GPU computing, a simplified TracyGPU program is developed for dynamic aperture calculation. Performances, issues, and challenges from introducing GPU are also discussed. General purpose Computation on Graphics Processing Units (GPGPU) bring massive parallel computing capabilities to numerical calculation. However, the unique architecture of GPU requires a comprehensive understanding of the hardware and programming model to be able to well optimize existing applications. In the field of accelerator physics, the dynamic aperture calculationmore » of a storage ring, which is often the most time consuming part of the accelerator modeling and simulation, can benefit from GPU due to its embarrassingly parallel feature, which fits well with the GPU programming model. In this paper, we use the Tesla C2050 GPU which consists of 14 multi-processois (MP) with 32 cores on each MP, therefore a total of 448 cores, to host thousands ot threads dynamically. Thread is a logical execution unit of the program on GPU. In the GPU programming model, threads are grouped into a collection of blocks Within each block, multiple threads share the same code, and up to 48 KB of shared memory. Multiple thread blocks form a grid, which is executed as a GPU kernel. A simplified code that is a subset of Tracy++ [2] is developed to demonstrate the possibility of using GPU to speed up the dynamic aperture calculation by having each thread track a particle.« less

  9. Personal semantics: at the crossroads of semantic and episodic memory.

    PubMed

    Renoult, Louis; Davidson, Patrick S R; Palombo, Daniela J; Moscovitch, Morris; Levine, Brian

    2012-11-01

    Declarative memory is usually described as consisting of two systems: semantic and episodic memory. Between these two poles, however, may lie a third entity: personal semantics (PS). PS concerns knowledge of one's past. Although typically assumed to be an aspect of semantic memory, it is essentially absent from existing models of knowledge. Furthermore, like episodic memory (EM), PS is idiosyncratically personal (i.e., not culturally-shared). We show that, depending on how it is operationalized, the neural correlates of PS can look more similar to semantic memory, more similar to EM, or dissimilar to both. We consider three different perspectives to better integrate PS into existing models of declarative memory and suggest experimental strategies for disentangling PS from semantic and episodic memory. Copyright © 2012 Elsevier Ltd. All rights reserved.

  10. Etiological Distinction of Working Memory Components in Relation to Mathematics

    PubMed Central

    Lukowski, Sarah L.; Soden, Brooke; Hart, Sara A.; Thompson, Lee A.; Kovas, Yulia; Petrill, Stephen A.

    2014-01-01

    Working memory has been consistently associated with mathematics achievement, although the etiology of these relations remains poorly understood. The present study examined the genetic and environmental underpinnings of math story problem solving, timed calculation, and untimed calculation alongside working memory components in 12-year-old monozygotic (n = 105) and same-sex dizygotic (n = 143) twin pairs. Results indicated significant phenotypic correlation between each working memory component and all mathematics outcomes (r = 0.18 – 0.33). Additive genetic influences shared between the visuo-spatial sketchpad and mathematics achievement was significant, accounting for roughly 89% of the observed correlation. In addition, genetic covariance was found between the phonological loop and math story problem solving. In contrast, despite there being a significant observed relationship between phonological loop and timed and untimed calculation, there was no significant genetic or environmental covariance between the phonological loop and timed or untimed calculation skills. Further analyses indicated that genetic overlap between the visuo-spatial sketchpad and math story problem solving and math fluency was distinct from general genetic factors, whereas g, phonological loop, and mathematics shared generalist genes. Thus, although each working memory component was related to mathematics, the etiology of their relationships may be distinct. PMID:25477699

  11. Initial Feasibility and Validity of a Prospective Memory Training Program in a Substance Use Treatment Population

    PubMed Central

    Sweeney, Mary M.; Rass, Olga; Johnson, Patrick S.; Strain, Eric C.; Berry, Meredith S.; Vo, Hoa T.; Fishman, Marc J.; Munro, Cynthia A.; Rebok, George W.; Mintzer, Miriam Z.; Johnson, Matthew W.

    2016-01-01

    Individuals with substance use disorders have shown deficits in the ability to implement future intentions, called prospective memory. Deficits in prospective memory and working memory, a critical underlying component of prospective memory, likely contribute to substance use treatment failures. Thus, improvement of prospective memory and working memory in substance use patients is an innovative target for intervention. We sought to develop a feasible and valid prospective memory training program that incorporates working memory training and may serve as a useful adjunct to substance use disorder treatment. We administered a single session of the novel prospective memory and working memory training program to participants (n = 22; 13 male; 9 female) enrolled in outpatient substance use disorder treatment and correlated performance to existing measures of prospective memory and working memory. Generally accurate prospective memory performance in a single session suggests feasibility in a substance use treatment population. However, training difficulty should be increased to avoid ceiling effects across repeated sessions. Consistent with existing literature, we observed superior performance on event-based relative to time-based prospective memory tasks. Performance on the prospective memory and working memory training components correlated with validated assessments of prospective memory and working memory, respectively. Correlations between novel memory training program performance and established measures suggest that our training engages appropriate cognitive processes. Further, differential event- and time-based prospective memory task performance suggests internal validity of our training. These data support development of this intervention as an adjunctive therapy for substance use disorders. PMID:27690506

  12. Initial feasibility and validity of a prospective memory training program in a substance use treatment population.

    PubMed

    Sweeney, Mary M; Rass, Olga; Johnson, Patrick S; Strain, Eric C; Berry, Meredith S; Vo, Hoa T; Fishman, Marc J; Munro, Cynthia A; Rebok, George W; Mintzer, Miriam Z; Johnson, Matthew W

    2016-10-01

    Individuals with substance use disorders have shown deficits in the ability to implement future intentions, called prospective memory. Deficits in prospective memory and working memory, a critical underlying component of prospective memory, likely contribute to substance use treatment failures. Thus, improvement of prospective memory and working memory in substance use patients is an innovative target for intervention. We sought to develop a feasible and valid prospective memory training program that incorporates working memory training and may serve as a useful adjunct to substance use disorder treatment. We administered a single session of the novel prospective memory and working memory training program to participants (n = 22; 13 men, 9 women) enrolled in outpatient substance use disorder treatment and correlated performance to existing measures of prospective memory and working memory. Generally accurate prospective memory performance in a single session suggests feasibility in a substance use treatment population. However, training difficulty should be increased to avoid ceiling effects across repeated sessions. Consistent with existing literature, we observed superior performance on event-based relative to time-based prospective memory tasks. Performance on the prospective memory and working memory training components correlated with validated assessments of prospective memory and working memory, respectively. Correlations between novel memory training program performance and established measures suggest that our training engages appropriate cognitive processes. Further, differential event- and time-based prospective memory task performance suggests internal validity of our training. These data support the development of this intervention as an adjunctive therapy for substance use disorders. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  13. Using school grounds for nature studies: An exploratory study of elementary teachers' experiences

    NASA Astrophysics Data System (ADS)

    Willis, Tamra Lee

    2001-06-01

    The purpose of this study was to gain understanding of the experiences of elementary teachers who used school grounds to do nature studies. Following an inductive, naturalistic approach, the goal was to explore the phenomenon using words of teachers as guides to understanding. Interviews were conducted with a purposeful sampling of ten quality public school teachers in grades K--5 who were well-known for their schoolyard nature programs. Interview questions were focused by a theoretical framework of environmental cognition. Data were gathered about how teachers came to use the outdoors to teach and how they experienced teaching nature studies on the school grounds. A conceptual model of Quality Teachers of Schoolyard Nature Studies was delineated. The model consisted of three components: teacher past and present experiences with nature, teacher beliefs relevant to using the school grounds for nature studies, and teacher action efficacy pertaining to schoolyard nature programs. The model suggested a relationship between teachers' personal experiences' with nature and their beliefs about sharing nature with children. In addition, the model connected teachers' beliefs about schoolyard nature to their action efficacy, i.e. action behavior reflected through motivation and commitment. The participants shared many common experiences and beliefs. Most had extensive childhood experiences in nature and memories of adults who shared nature with them. They did not consider themselves nature experts, but felt they knew the basics of natural science from their own experiences outdoors and from working with children. The teachers' beliefs about schoolyard nature studies developed from several dimensions of their lives: experiences with nature, experiences teaching, and experiences with students. They were motivated to share nature with students on the school grounds by their beliefs that students would come to appreciate and understand nature, just as they had during their own experiences. In addition, they believed that schoolyard nature programs benefitted student learning and enjoyment of learning. The action efficacy of the teachers was influenced by their beliefs about schoolyard nature programs and beliefs in their own competence to overcome challenges and achieve goals. Implications for educational practice and further research were cited.

  14. Test Sequence Priming in Recognition Memory

    ERIC Educational Resources Information Center

    Johns, Elizabeth E.; Mewhort, D. J. K.

    2009-01-01

    The authors examined priming within the test sequence in 3 recognition memory experiments. A probe primed its successor whenever both probes shared a feature with the same studied item ("interjacent priming"), indicating that the study item like the probe is central to the decision. Interjacent priming occurred even when the 2 probes did…

  15. Two Maintenance Mechanisms of Verbal Information in Working Memory

    ERIC Educational Resources Information Center

    Camos, V.; Lagner, P.; Barrouillet, P.

    2009-01-01

    The present study evaluated the interplay between two mechanisms of maintenance of verbal information in working memory, namely articulatory rehearsal as described in Baddeley's model, and attentional refreshing as postulated in Barrouillet and Camos's Time-Based Resource-Sharing (TBRS) model. In four experiments using complex span paradigm, we…

  16. Down Memory Lane: Recollections of Lamaze International's First 50 Years

    PubMed Central

    Zwelling, Elaine

    2010-01-01

    The 42-year involvement of one member of Lamaze International is chronicled through a decade-by-decade review of personal memories. The history of Lamaze International is shared through the recollections of her roles as a childbirth educator, faculty member, and member of the board of directors. PMID:21629385

  17. Paul Ricoeur, Memory, and the Historical Gaze: Implications for Education Histories

    ERIC Educational Resources Information Center

    Colby, Sherri Rae

    2012-01-01

    In this article, the author shares the potential applications of Paul Ricoeur's philosophies of history, memory, and narrative to the interpretation of educational histories, and those histories' life spans: moving cyclically from early conception, to evidentiary construction, to published dissemination; and ultimately to death or immortality. Her…

  18. The DASH Project: An Overview

    DTIC Science & Technology

    1988-02-29

    by memory copyin g will degrade system performance on shared-memory multiprocessors. Virtual memor y (VM) remapping, as opposed to memory copying...Bershad, G.D. Giuseppe Facchetti, Kevin Fall, G . Scott Graham, Ellen Nelson , P. Venkat Rangan, Bruno Sartirana, Shin-Yuan Tzou, Raj Vaswani, and Robert...Remote Execution in NEST", IEEE Trans. on Software Eng. 13, 8 (August 1987), 905-912. 3. G . T. Almes, A. P. Black, E. Lazowska and J. Noe, "The Eden

  19. Research about Memory Detection Based on the Embedded Platform

    NASA Astrophysics Data System (ADS)

    Sun, Hao; Chu, Jian

    As is known to us all, the resources of memory detection of the embedded systems are very limited. Taking the Linux-based embedded arm as platform, this article puts forward two efficient memory detection technologies according to the characteristics of the embedded software. Especially for the programs which need specific libraries, the article puts forwards portable memory detection methods to help program designers to reduce human errors,improve programming quality and therefore make better use of the valuable embedded memory resource.

  20. Concurrent Image Processing Executive (CIPE). Volume 1: Design overview

    NASA Technical Reports Server (NTRS)

    Lee, Meemong; Groom, Steven L.; Mazer, Alan S.; Williams, Winifred I.

    1990-01-01

    The design and implementation of a Concurrent Image Processing Executive (CIPE), which is intended to become the support system software for a prototype high performance science analysis workstation are described. The target machine for this software is a JPL/Caltech Mark 3fp Hypercube hosted by either a MASSCOMP 5600 or a Sun-3, Sun-4 workstation; however, the design will accommodate other concurrent machines of similar architecture, i.e., local memory, multiple-instruction-multiple-data (MIMD) machines. The CIPE system provides both a multimode user interface and an applications programmer interface, and has been designed around four loosely coupled modules: user interface, host-resident executive, hypercube-resident executive, and application functions. The loose coupling between modules allows modification of a particular module without significantly affecting the other modules in the system. In order to enhance hypercube memory utilization and to allow expansion of image processing capabilities, a specialized program management method, incremental loading, was devised. To minimize data transfer between host and hypercube, a data management method which distributes, redistributes, and tracks data set information was implemented. The data management also allows data sharing among application programs. The CIPE software architecture provides a flexible environment for scientific analysis of complex remote sensing image data, such as planetary data and imaging spectrometry, utilizing state-of-the-art concurrent computation capabilities.

  1. Whitmore, Henschke, and Hilaris: The reorientation of prostate brachytherapy (1970-1987).

    PubMed

    Aronowitz, Jesse N

    2012-01-01

    Urologists had performed prostate brachytherapy for decades before New York's Memorial Hospital retropubic program. This paper explores the contribution of Willet Whitmore, Ulrich Henschke, Basil Hilaris, and Memorial's physicists to the evolution of the procedure. Literature review and interviews with program participants. More than 1000 retropubic implants were performed at Memorial between 1970 and 1987. Unlike previous efforts, Memorial's program benefited from the participation of three disciplines in its conception and execution. Memorial's retropubic program was a collaboration of urologists, radiation therapists, and physicists. Their approach focused greater attention on dosimetry and radiation safety, and served as a template for subsequent prostate brachytherapy programs. Copyright © 2012 American Brachytherapy Society. Published by Elsevier Inc. All rights reserved.

  2. Quasi-Optimal Elimination Trees for 2D Grids with Singularities

    DOE PAGES

    Paszyńska, A.; Paszyński, M.; Jopek, K.; ...

    2015-01-01

    We consmore » truct quasi-optimal elimination trees for 2D finite element meshes with singularities. These trees minimize the complexity of the solution of the discrete system. The computational cost estimates of the elimination process model the execution of the multifrontal algorithms in serial and in parallel shared-memory executions. Since the meshes considered are a subspace of all possible mesh partitions, we call these minimizers quasi-optimal. We minimize the cost functionals using dynamic programming. Finding these minimizers is more computationally expensive than solving the original algebraic system. Nevertheless, from the insights provided by the analysis of the dynamic programming minima, we propose a heuristic construction of the elimination trees that has cost O N e log ⁡ N e , where N e is the number of elements in the mesh. We show that this heuristic ordering has similar computational cost to the quasi-optimal elimination trees found with dynamic programming and outperforms state-of-the-art alternatives in our numerical experiments.« less

  3. Quasi-Optimal Elimination Trees for 2D Grids with Singularities

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Paszyńska, A.; Paszyński, M.; Jopek, K.

    We consmore » truct quasi-optimal elimination trees for 2D finite element meshes with singularities. These trees minimize the complexity of the solution of the discrete system. The computational cost estimates of the elimination process model the execution of the multifrontal algorithms in serial and in parallel shared-memory executions. Since the meshes considered are a subspace of all possible mesh partitions, we call these minimizers quasi-optimal. We minimize the cost functionals using dynamic programming. Finding these minimizers is more computationally expensive than solving the original algebraic system. Nevertheless, from the insights provided by the analysis of the dynamic programming minima, we propose a heuristic construction of the elimination trees that has cost O N e log ⁡ N e , where N e is the number of elements in the mesh. We show that this heuristic ordering has similar computational cost to the quasi-optimal elimination trees found with dynamic programming and outperforms state-of-the-art alternatives in our numerical experiments.« less

  4. Flash Memory Reliability: Read, Program, and Erase Latency Versus Endurance Cycling

    NASA Technical Reports Server (NTRS)

    Heidecker, Jason

    2010-01-01

    This report documents the efforts and results of the fiscal year (FY) 2010 NASA Electronic Parts and Packaging Program (NEPP) task for nonvolatile memory (NVM) reliability. This year's focus was to measure latency (read, program, and erase) of NAND Flash memories and determine how these parameters drift with erase/program/read endurance cycling.

  5. Initial Performance Results on IBM POWER6

    NASA Technical Reports Server (NTRS)

    Saini, Subbash; Talcott, Dale; Jespersen, Dennis; Djomehri, Jahed; Jin, Haoqiang; Mehrotra, Piysuh

    2008-01-01

    The POWER5+ processor has a faster memory bus than that of the previous generation POWER5 processor (533 MHz vs. 400 MHz), but the measured per-core memory bandwidth of the latter is better than that of the former (5.7 GB/s vs. 4.3 GB/s). The reason for this is that in the POWER5+, the two cores on the chip share the L2 cache, L3 cache and memory bus. The memory controller is also on the chip and is shared by the two cores. This serializes the path to memory. For consistently good performance on a wide range of applications, the performance of the processor, the memory subsystem, and the interconnects (both latency and bandwidth) should be balanced. Recognizing this, IBM has designed the Power6 processor so as to avoid the bottlenecks due to the L2 cache, memory controller and buffer chips of the POWER5+. Unlike the POWER5+, each core in the POWER6 has its own L2 cache (4 MB - double that of the Power5+), memory controller and buffer chips. Each core in the POWER6 runs at 4.7 GHz instead of 1.9 GHz in POWER5+. In this paper, we evaluate the performance of a dual-core Power6 based IBM p6-570 system, and we compare its performance with that of a dual-core Power5+ based IBM p575+ system. In this evaluation, we have used the High- Performance Computing Challenge (HPCC) benchmarks, NAS Parallel Benchmarks (NPB), and four real-world applications--three from computational fluid dynamics and one from climate modeling.

  6. Relation of Physical Activity to Memory Functioning in Older Adults: The Memory Workout Program.

    ERIC Educational Resources Information Center

    Rebok, George W.; Plude, Dana J.

    2001-01-01

    The Memory Workout, a CD-ROM program designed to help older adults increase changes in physical and cognitive activity influencing memory, was tested with 24 subjects. Results revealed a significant relationship between exercise time, exercise efficacy, and cognitive function, as well as interest in improving memory and physical activity.…

  7. Parallel discrete event simulation using shared memory

    NASA Technical Reports Server (NTRS)

    Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

    1988-01-01

    With traditional event-list techniques, evaluating a detailed discrete-event simulation-model can often require hours or even days of computation time. By eliminating the event list and maintaining only sufficient synchronization to ensure causality, parallel simulation can potentially provide speedups that are linear in the numbers of processors. A set of shared-memory experiments, using the Chandy-Misra distributed-simulation algorithm, to simulate networks of queues is presented. Parameters of the study include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential-simulation of most queueing network models.

  8. 78 FR 5781 - Cost-Sharing Rates for Pharmacy Benefits Program of the TRICARE Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-01-28

    ... DEPARTMENT OF DEFENSE Office of the Secretary Cost-Sharing Rates for Pharmacy Benefits Program of... to cost-sharing rates to the TRICARE Pharmacy Benefits Program. SUMMARY: This notice is to advise interested parties of cost-sharing rate change for the Pharmacy Benefits Program. DATES: The cost-sharing...

  9. Feature-based memory-driven attentional capture: visual working memory content affects visual attention.

    PubMed

    Olivers, Christian N L; Meijer, Frank; Theeuwes, Jan

    2006-10-01

    In 7 experiments, the authors explored whether visual attention (the ability to select relevant visual information) and visual working memory (the ability to retain relevant visual information) share the same content representations. The presence of singleton distractors interfered more strongly with a visual search task when it was accompanied by an additional memory task. Singleton distractors interfered even more when they were identical or related to the object held in memory, but only when it was difficult to verbalize the memory content. Furthermore, this content-specific interaction occurred for features that were relevant to the memory task but not for irrelevant features of the same object or for once-remembered objects that could be forgotten. Finally, memory-related distractors attracted more eye movements but did not result in longer fixations. The results demonstrate memory-driven attentional capture on the basis of content-specific representations. Copyright 2006 APA.

  10. Working Memory in Children: A Time-Constrained Functioning Similar to Adults

    ERIC Educational Resources Information Center

    Portrat, Sophie; Camos, Valerie; Barrouillet, Pierre

    2009-01-01

    Within the time-based resource-sharing (TBRS) model, we tested a new conception of the relationships between processing and storage in which the core mechanisms of working memory (WM) are time constrained. However, our previous studies were restricted to adults. The current study aimed at demonstrating that these mechanisms are present and…

  11. Close Associations and Memory in Brainwriting Groups

    ERIC Educational Resources Information Center

    Coskun, Hamit

    2011-01-01

    The present experiment examined whether or not the type of associations (close (e.g. apple-pear) and distant (e.g. apple-fish) word associations) and memory instruction (paying attention to the ideas of others) had effects on the idea generation performances in the brainwriting paradigm in which all participants shared their ideas by using paper…

  12. Common data buffer

    NASA Technical Reports Server (NTRS)

    Byrne, F.

    1981-01-01

    Time-shared interface speeds data processing in distributed computer network. Two-level high-speed scanning approach routes information to buffer, portion of which is reserved for series of "first-in, first-out" memory stacks. Buffer address structure and memory are protected from noise or failed components by error correcting code. System is applicable to any computer or processing language.

  13. Accumulating Evidence about What Prospective Memory Costs Actually Reveal

    ERIC Educational Resources Information Center

    Strickland, Luke; Heathcote, Andrew; Remington, Roger W.; Loft, Shayne

    2017-01-01

    Event-based prospective memory (PM) tasks require participants to substitute an atypical PM response for an ongoing task response when presented with PM targets. Responses to ongoing tasks are often slower with the addition of PM demands ("PM costs"). Prominent PM theories attribute costs to capacity-sharing between the ongoing and PM…

  14. How communication goals determine when audience tuning biases memory.

    PubMed

    Echterhoff, Gerald; Higgins, E Tory; Kopietz, René; Groll, Stephan

    2008-02-01

    After tuning their message to suit their audience's attitude, communicators' own memories for the original information (e.g., a target person's behaviors) often reflect the biased view expressed in their message--producing an audience-congruent memory bias. Exploring the motivational circumstances of message production, the authors investigated whether this bias depends on the goals driving audience tuning. In 4 experiments, the memory bias was found to a greater extent when audience tuning served the creation of a shared reality than when it served alternative, nonshared reality goals (being polite toward a stigmatized-group audience; obtaining incentives; being entertaining; complying with a blatant demand). In addition, the authors found that these effects were mediated by the epistemic trust in the audience-congruent view but not by the rehearsal or accurate retrieval of the original input information, the ability to discriminate between the original and the message information, or a contrast away from extremely tuned messages. The central role of epistemic trust, a measure of the communicators' experience of shared reality, was supported in meta-analyses across the experiments. PsycINFO Database Record (c) 2008 APA, all rights reserved.

  15. System for simultaneously loading program to master computer memory devices and corresponding slave computer memory devices

    NASA Technical Reports Server (NTRS)

    Hall, William A. (Inventor)

    1993-01-01

    A bus programmable slave module card for use in a computer control system is disclosed which comprises a master computer and one or more slave computer modules interfacing by means of a bus. Each slave module includes its own microprocessor, memory, and control program for acting as a single loop controller. The slave card includes a plurality of memory means (S1, S2...) corresponding to a like plurality of memory devices (C1, C2...) in the master computer, for each slave memory means its own communication lines connectable through the bus with memory communication lines of an associated memory device in the master computer, and a one-way electronic door which is switchable to either a closed condition or a one-way open condition. With the door closed, communication lines between master computer memory (C1, C2...) and slave memory (S1, S2...) are blocked. In the one-way open condition invention, the memory communication lines or each slave memory means (S1, S2...) connect with the memory communication lines of its associated memory device (C1, C2...) in the master computer, and the memory devices (C1, C2...) of the master computer and slave card are electrically parallel such that information seen by the master's memory is also seen by the slave's memory. The slave card is also connectable to a switch for electronically removing the slave microprocessor from the system. With the master computer and the slave card in programming mode relationship, and the slave microprocessor electronically removed from the system, loading a program in the memory devices (C1, C2...) of the master accomplishes a parallel loading into the memory devices (S1, S2...) of the slave.

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel W.

    Coupled-cluster methods provide highly accurate models of molecular structure by explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix-matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy efficient manner. We achieve up to 240 speedup compared with the best optimized shared memory implementation. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures, (Cray XC30&XC40, BlueGene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance. Nevertheless, we preserve a uni ed interface to both programming models to maintain the productivity of computational quantum chemists.« less

  17. Approaches to the mechanisms of song memorization and singing provide evidence for a procedural memory.

    PubMed

    Hultsch, Henrike; Todt, Dietmar

    2004-06-01

    There is growing evidence that, during song learning, birds do not only acquire 'what to sing' (the inventory of behavior), but also 'how to sing' (the singing program), including order-features of song sequencing. Common Nightingales Luscinia megarhynchos acquire such serial information by segmenting long strings of heard songs into smaller subsets or packages, by a process reminiscent of the chunking of information as a coding mechanism in short term memory. Here we report three tutoring experiments on nightingales that examined whether such 'chunking' was susceptible to experimental cueing. The experiments tested whether (1) 'temporal phrasing' (silent intersong intervals spaced out at particular positions of a tutored string), or (2) 'stimulus novelty' (groups of novel song-types added to a basic string), or (3) 'pattern similarity' in the phonetic structure of songs (here: sharing of song initials) would induce package boundaries (or chunking) at the manipulated sequential positions. The results revealed cueing effects in experiments (1) and (2) but not in experiment (3). The finding that birds used temporal variables as cues for chunking does not require the assumption that package formation is a cognitive strategy. Rather, it points towards a mechanism of procedural memory operating in the song acquisition of birds.

  18. Towards robust algorithms for current deposition and dynamic load-balancing in a GPU particle in cell code

    NASA Astrophysics Data System (ADS)

    Rossi, Francesco; Londrillo, Pasquale; Sgattoni, Andrea; Sinigardi, Stefano; Turchetti, Giorgio

    2012-12-01

    We present `jasmine', an implementation of a fully relativistic, 3D, electromagnetic Particle-In-Cell (PIC) code, capable of running simulations in various laser plasma acceleration regimes on Graphics-Processing-Units (GPUs) HPC clusters. Standard energy/charge preserving FDTD-based algorithms have been implemented using double precision and quadratic (or arbitrary sized) shape functions for the particle weighting. When porting a PIC scheme to the GPU architecture (or, in general, a shared memory environment), the particle-to-grid operations (e.g. the evaluation of the current density) require special care to avoid memory inconsistencies and conflicts. Here we present a robust implementation of this operation that is efficient for any number of particles per cell and particle shape function order. Our algorithm exploits the exposed GPU memory hierarchy and avoids the use of atomic operations, which can hurt performance especially when many particles lay on the same cell. We show the code multi-GPU scalability results and present a dynamic load-balancing algorithm. The code is written using a python-based C++ meta-programming technique which translates in a high level of modularity and allows for easy performance tuning and simple extension of the core algorithms to various simulation schemes.

  19. Shared filtering processes link attentional and visual short-term memory capacity limits.

    PubMed

    Bettencourt, Katherine C; Michalka, Samantha W; Somers, David C

    2011-09-30

    Both visual attention and visual short-term memory (VSTM) have been shown to have capacity limits of 4 ± 1 objects, driving the hypothesis that they share a visual processing buffer. However, these capacity limitations also show strong individual differences, making the degree to which these capacities are related unclear. Moreover, other research has suggested a distinction between attention and VSTM buffers. To explore the degree to which capacity limitations reflect the use of a shared visual processing buffer, we compared individual subject's capacities on attentional and VSTM tasks completed in the same testing session. We used a multiple object tracking (MOT) and a VSTM change detection task, with varying levels of distractors, to measure capacity. Significant correlations in capacity were not observed between the MOT and VSTM tasks when distractor filtering demands differed between the tasks. Instead, significant correlations were seen when the tasks shared spatial filtering demands. Moreover, these filtering demands impacted capacity similarly in both attention and VSTM tasks. These observations fail to support the view that visual attention and VSTM capacity limits result from a shared buffer but instead highlight the role of the resource demands of underlying processes in limiting capacity.

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boman, Erik G.

    This LDRD project was a campus exec fellowship to fund (in part) Donald Nguyen’s PhD research at UT-Austin. His work has focused on parallel programming models, and scheduling irregular algorithms on shared-memory systems using the Galois framework. Galois provides a simple but powerful way for users and applications to automatically obtain good parallel performance using certain supported data containers. The naïve user can write serial code, while advanced users can optimize performance by advanced features, such as specifying the scheduling policy. Galois was used to parallelize two sparse matrix reordering schemes: RCM and Sloan. Such reordering is important in high-performancemore » computing to obtain better data locality and thus reduce run times.« less

  1. Hierarchical resilience with lightweight threads.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wheeler, Kyle Bruce

    2011-10-01

    This paper proposes methodology for providing robustness and resilience for a highly threaded distributed- and shared-memory environment based on well-defined inputs and outputs to lightweight tasks. These inputs and outputs form a failure 'barrier', allowing tasks to be restarted or duplicated as necessary. These barriers must be expanded based on task behavior, such as communication between tasks, but do not prohibit any given behavior. One of the trends in high-performance computing codes seems to be a trend toward self-contained functions that mimic functional programming. Software designers are trending toward a model of software design where their core functions are specifiedmore » in side-effect free or low-side-effect ways, wherein the inputs and outputs of the functions are well-defined. This provides the ability to copy the inputs to wherever they need to be - whether that's the other side of the PCI bus or the other side of the network - do work on that input using local memory, and then copy the outputs back (as needed). This design pattern is popular among new distributed threading environment designs. Such designs include the Barcelona STARS system, distributed OpenMP systems, the Habanero-C and Habanero-Java systems from Vivek Sarkar at Rice University, the HPX/ParalleX model from LSU, as well as our own Scalable Parallel Runtime effort (SPR) and the Trilinos stateless kernels. This design pattern is also shared by CUDA and several OpenMP extensions for GPU-type accelerators (e.g. the PGI OpenMP extensions).« less

  2. New York State Educational Programs That Work. Sharing Successful Programs, 1990 Edition.

    ERIC Educational Resources Information Center

    New York State Education Dept., Albany.

    Sharing Successful Programs (SSP) is a national dissemination process for validating, sharing, and implementing successful educational programs. It offers effective strategies for educational improvement by sharing validated programs and provides a cost-effective way for school districts to duplicate validated programs in accordance with their…

  3. Genetic and environmental contributions to the associations between intraindividual variability in reaction time and cognitive function.

    PubMed

    Finkel, Deborah; Pedersen, Nancy L

    2014-01-01

    Intraindividual variability (IIV) in reaction time has been related to cognitive decline, but questions remain about the nature of this relationship. Mean and range in movement and decision time for simple reaction time were available from 241 individuals aged 51-86 years at the fifth testing wave of the Swedish Adoption/Twin Study of Aging. Cognitive performance on four factors was also available: verbal, spatial, memory, and speed. Analyses indicated that range in reaction time could be used as an indicator of IIV. Heritability estimates were 35% for mean reaction and 20% for range in reaction. Multivariate analysis indicated that the genetic variance on the memory, speed, and spatial factors is shared with genetic variance for mean or range in reaction time. IIV shares significant genetic variance with fluid ability in late adulthood, over and above and genetic variance shared with mean reaction time.

  4. A Study of Shared-Memory Mutual Exclusion Protocols Using CADP

    NASA Astrophysics Data System (ADS)

    Mateescu, Radu; Serwe, Wendelin

    Mutual exclusion protocols are an essential building block of concurrent systems: indeed, such a protocol is required whenever a shared resource has to be protected against concurrent non-atomic accesses. Hence, many variants of mutual exclusion protocols exist in the shared-memory setting, such as Peterson's or Dekker's well-known protocols. Although the functional correctness of these protocols has been studied extensively, relatively little attention has been paid to their non-functional aspects, such as their performance in the long run. In this paper, we report on experiments with the performance evaluation of mutual exclusion protocols using Interactive Markov Chains. Steady-state analysis provides an additional criterion for comparing protocols, which complements the verification of their functional properties. We also carefully re-examined the functional properties, whose accurate formulation as temporal logic formulas in the action-based setting turns out to be quite involved.

  5. Handling debugger breakpoints in a shared instruction system

    DOEpatents

    Gooding, Thomas Michael; Shok, Richard Michael

    2014-01-21

    A debugger debugs processes that execute shared instructions so that a breakpoint set for one process will not cause a breakpoint to occur in the other processes. A breakpoint is set by recording the original instruction at the desired location and writing a trap instruction to the shared instructions at that location. When a process encounters the breakpoint, the process passes control to the debugger for breakpoint processing if the breakpoint was set at that location for that process. If the trap was not set at that location for that process, the cacheline containing the trap is copied to a small scratchpad memory, and the virtual memory mappings are changed to translate the virtual address of the cacheline to the scratchpad. The original instruction is then written to replace the trap instruction in the scratchpad, so that process can execute the instructions in the scatchpad thereby avoiding the trap instruction.

  6. An Army dentist in the combat zone during WWII.

    PubMed

    Orden, C Q

    2001-11-01

    It is 60 years since the bombing of Pearl Harbor and the outbreak of World War II for the United States. Some of the men and women who served in the armed forces at the time are willing to share some of their reminiscences with those of us who could not serve for one reason or another or who may not even have been born at the time. One of the dentists who is willing to share some of his memories is Dr. Charles Q. Orden of New York. Unless these people share their memories much history will be lost forever in the next years, and we will all be poorer for it. We sincerely thank Dr. Orden for his offer of information and for allowing us to reproduce Fig. 1 in which he is seen as the dentist using a field dental chair, a foot-powered drill, and with a black dental corpsman as his assistant.

  7. Memory Systems Do Not Divide on Consciousness: Reinterpreting Memory in Terms of Activation and Binding

    PubMed Central

    Reder, Lynne M.; Park, Heekyeong; Kieffaber, Paul D.

    2009-01-01

    There is a popular hypothesis that performance on implicit and explicit memory tasks reflects 2 distinct memory systems. Explicit memory is said to store those experiences that can be consciously recollected, and implicit memory is said to store experiences and affect subsequent behavior but to be unavailable to conscious awareness. Although this division based on awareness is a useful taxonomy for memory tasks, the authors review the evidence that the unconscious character of implicit memory does not necessitate that it be treated as a separate system of human memory. They also argue that some implicit and explicit memory tasks share the same memory representations and that the important distinction is whether the task (implicit or explicit) requires the formation of a new association. The authors review and critique dissociations from the behavioral, amnesia, and neuroimaging literatures that have been advanced in support of separate explicit and implicit memory systems by highlighting contradictory evidence and by illustrating how the data can be accounted for using a simple computational memory model that assumes the same memory representation for those disparate tasks. PMID:19210052

  8. Memories as Useful Outcomes of Residential Outdoor Environmental Education

    ERIC Educational Resources Information Center

    Liddicoat, Kendra R.; Krasny, Marianne E.

    2014-01-01

    Residential outdoor environmental education (ROEE) programs for youth have been shown to yield lasting autobiographical episodic memories. This article explores how past program participants have used such memories, and draws on the memory psychology literature to offer a new perspective on the long-term impacts of environmental education.…

  9. Distinct and shared cognitive functions mediate event- and time-based prospective memory impairment in normal ageing

    PubMed Central

    Gonneaud, Julie; Kalpouzos, Grégoria; Bon, Laetitia; Viader, Fausto; Eustache, Francis; Desgranges, Béatrice

    2011-01-01

    Prospective memory (PM) is the ability to remember to perform an action at a specific point in the future. Regarded as multidimensional, PM involves several cognitive functions that are known to be impaired in normal aging. In the present study, we set out to investigate the cognitive correlates of PM impairment in normal aging. Manipulating cognitive load, we assessed event- and time-based PM, as well as several cognitive functions, including executive functions, working memory and retrospective episodic memory, in healthy subjects covering the entire adulthood. We found that normal aging was characterized by PM decline in all conditions and that event-based PM was more sensitive to the effects of aging than time-based PM. Whatever the conditions, PM was linked to inhibition and processing speed. However, while event-based PM was mainly mediated by binding and retrospective memory processes, time-based PM was mainly related to inhibition. The only distinction between high- and low-load PM cognitive correlates lays in an additional, but marginal, correlation between updating and the high-load PM condition. The association of distinct cognitive functions, as well as shared mechanisms with event- and time-based PM confirms that each type of PM relies on a different set of processes. PMID:21678154

  10. Neuropsychological characteristics of child and adolescent offspring of patients with schizophrenia or bipolar disorder.

    PubMed

    de la Serna, Elena; Sugranyes, Gisela; Sanchez-Gistau, Vanessa; Rodriguez-Toscano, Elisa; Baeza, Immaculada; Vila, Montserrat; Romero, Soledad; Sanchez-Gutierrez, Teresa; Penzol, Mª José; Moreno, Dolores; Castro-Fornieles, Josefina

    2017-05-01

    Schizophrenia (SZ) and bipolar disorder (BD) are considered neurobiological disorders which share some clinical, cognitive and neuroimaging characteristics. Studying child and adolescent offspring of patients diagnosed with bipolar disorder (BDoff) or schizophrenia (SZoff) is regarded as a reliable method for investigating early alterations and vulnerability factors for these disorders. This study compares the neuropsychological characteristics of SZoff, BDoff and a community control offspring group (CC) with the aim of examining shared and differential cognitive characteristics among groups. 41 SZoff, 90 BDoff and 107 CC were recruited. They were all assessed with a complete neuropsychological battery which included intelligence quotient, working memory (WM), processing speed, verbal memory and learning, visual memory, executive functions and sustained attention. SZoff and BDoff showed worse performance in some cognitive areas compared with CC. Some of these difficulties (visual memory) were common to both offspring groups, whereas others, such as verbal learning and WM in SZoff or PSI in BDoff, were group-specific. The cognitive difficulties in visual memory shown by both the SZoff and BDoff groups might point to a common endophenotype in the two disorders. Difficulties in other cognitive functions would be specific depending on the family diagnosis. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. Working memory costs of task switching.

    PubMed

    Liefooghe, Baptist; Barrouillet, Pierre; Vandierendonck, André; Camos, Valérie

    2008-05-01

    Although many accounts of task switching emphasize the importance of working memory as a substantial source of the switch cost, there is a lack of evidence demonstrating that task switching actually places additional demands on working memory. The present study addressed this issue by implementing task switching in continuous complex span tasks with strictly controlled time parameters. A series of 4 experiments demonstrate that recall performance decreased as a function of the number of task switches and that the concurrent load of item maintenance had no influence on task switching. These results indicate that task switching induces a cost on working memory functioning. Implications for theories of task switching, working memory, and resource sharing are addressed.

  12. Method of up-front load balancing for local memory parallel processors

    NASA Technical Reports Server (NTRS)

    Baffes, Paul Thomas (Inventor)

    1990-01-01

    In a parallel processing computer system with multiple processing units and shared memory, a method is disclosed for uniformly balancing the aggregate computational load in, and utilizing minimal memory by, a network having identical computations to be executed at each connection therein. Read-only and read-write memory are subdivided into a plurality of process sets, which function like artificial processing units. Said plurality of process sets is iteratively merged and reduced to the number of processing units without exceeding the balance load. Said merger is based upon the value of a partition threshold, which is a measure of the memory utilization. The turnaround time and memory savings of the instant method are functions of the number of processing units available and the number of partitions into which the memory is subdivided. Typical results of the preferred embodiment yielded memory savings of from sixty to seventy five percent.

  13. Hippocampal and ventral medial prefrontal activation during retrieval-mediated learning supports novel inference.

    PubMed

    Zeithamova, Dagmar; Dominick, April L; Preston, Alison R

    2012-07-12

    Memory enables flexible use of past experience to inform new behaviors. Although leading theories hypothesize that this fundamental flexibility results from the formation of integrated memory networks relating multiple experiences, the neural mechanisms that support memory integration are not well understood. Here, we demonstrate that retrieval-mediated learning, whereby prior event details are reinstated during encoding of related experiences, supports participants' ability to infer relationships between distinct events that share content. Furthermore, we show that activation changes in a functionally coupled hippocampal and ventral medial prefrontal cortical circuit track the formation of integrated memories and successful inferential memory performance. These findings characterize the respective roles of these regions in retrieval-mediated learning processes that support relational memory network formation and inferential memory in the human brain. More broadly, these data reveal fundamental mechanisms through which memory representations are constructed into prospectively useful formats. Copyright © 2012 Elsevier Inc. All rights reserved.

  14. Hippocampal and ventral medial prefrontal activation during retrieval-mediated learning supports novel inference

    PubMed Central

    Zeithamova, Dagmar; Dominick, April L.; Preston, Alison R.

    2012-01-01

    SUMMARY Memory enables flexible use of past experience to inform new behaviors. Though leading theories hypothesize that this fundamental flexibility results from the formation of integrated memory networks relating multiple experiences, the neural mechanisms that support memory integration are not well understood. Here, we demonstrate that retrieval-mediated learning, whereby prior event details are reinstated during encoding of related experiences, supports participants’ ability to infer relationships between distinct events that share content. Furthermore, we show that activation changes in a functionally coupled hippocampal and ventral medial prefrontal cortical circuit track the formation of integrated memories and successful inferential memory performance. These findings characterize the respective roles of these regions in retrieval-mediated learning processes that support relational memory network formation and inferential memory in the human brain. More broadly, these data reveal fundamental mechanisms through which memory representations are constructed into prospectively useful formats. PMID:22794270

  15. Jim Thomas: A Collection of Memories

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wong, Pak C.

    Jim Thomas, a guest editor and a long-time associate editor of Information Visualization (IVS), died in Richland, WA, on August 6, 2010 due to complications from a brain tumor. His friends and colleagues from around the world have since expressed their sadness and paid tribute to a visionary scientist in multiple public forums. For those who didn't get the chance to know Jim, I share a collection of my own memories of Jim Thomas and memories from some of his colleagues.

  16. Effects of motor congruence on visual working memory.

    PubMed

    Quak, Michel; Pecher, Diane; Zeelenberg, Rene

    2014-10-01

    Grounded-cognition theories suggest that memory shares processing resources with perception and action. The motor system could be used to help memorize visual objects. In two experiments, we tested the hypothesis that people use motor affordances to maintain object representations in working memory. Participants performed a working memory task on photographs of manipulable and nonmanipulable objects. The manipulable objects were objects that required either a precision grip (i.e., small items) or a power grip (i.e., large items) to use. A concurrent motor task that could be congruent or incongruent with the manipulable objects caused no difference in working memory performance relative to nonmanipulable objects. Moreover, the precision- or power-grip motor task did not affect memory performance on small and large items differently. These findings suggest that the motor system plays no part in visual working memory.

  17. Many Roads Lead to Recognition: Electrophysiological Correlates of Familiarity Derived from Short-Term Masked Repetition Priming

    ERIC Educational Resources Information Center

    Lucas, Heather D.; Taylor, Jason R.; Henson, Richard N.; Paller, Ken A.

    2012-01-01

    The neural mechanisms that underlie familiarity memory have been extensively investigated, but a consensus understanding remains elusive. Behavioral evidence suggests that familiarity sometimes shares sources with instances of implicit memory known as priming, in that the same increases in processing fluency that give rise to priming can engender…

  18. Forward Association, Backward Association, and the False-Memory Illusion

    ERIC Educational Resources Information Center

    Brainerd, C. J.; Wright, Ron

    2005-01-01

    In the Deese-Roediger-McDermott false-memory illusion, forward associative strength (FAS) is unrelated to the strength of the illusion; this is puzzling, because high-FAS lists ought to share more semantic features with critical unpresented words than should low-FAS lists. The authors show that this null result is probably a truncated range…

  19. The Impact of Storage on Processing: How Is Information Maintained in Working Memory?

    ERIC Educational Resources Information Center

    Vergauwe, Evie; Camos, Valérie; Barrouillet, Pierre

    2014-01-01

    Working memory is typically defined as a system devoted to the simultaneous maintenance and processing of information. However, the interplay between these 2 functions is still a matter of debate in the literature, with views ranging from complete independence to complete dependence. The time-based resource-sharing model assumes that a central…

  20. Selecting Learning Tasks: Effects of Adaptation and Shared Control on Learning Efficiency and Task Involvement

    ERIC Educational Resources Information Center

    Corbalan, Gemma; Kester, Liesbeth; van Merrienboer, Jeroen J. G.

    2008-01-01

    Complex skill acquisition by performing authentic learning tasks is constrained by limited working memory capacity [Baddeley, A. D. (1992). Working memory. "Science, 255", 556-559]. To prevent cognitive overload, task difficulty and support of each newly selected learning task can be adapted to the learner's competence level and perceived task…

  1. Android Protection Mechanism: A Signed Code Security Mechanism for Smartphone Applications

    DTIC Science & Technology

    2011-03-01

    status registers, exceptions, endian support, unaligned access support, synchronization primitives , the Jazelle Extension, and saturated integer...supports comprehensive non-blocking shared-memory synchronization primitives that scale for multiple-processor system designs. This is an improvement... synchronization . Memory semaphores can be loaded and altered without interruption because the load and store operations are atomic. Processor

  2. Senior Citizens' Personal Stories...Literacy through Narrative...Sharing the Richness of the Past.

    ERIC Educational Resources Information Center

    Lineberry, Colleen

    Using simple writing strategies, senior citizens at an elder camp workshop collected memories in journals. In some cases, readings were used to trigger memories. The exercise enabled students to make connections between their own life experiences and the life experiences of others. Workshops encouraging participants to tell their own stories for…

  3. Processes and Content of Narrative Identity Development in Adolescence: Gender and Well-Being

    ERIC Educational Resources Information Center

    McLean, Kate C.; Breen, Andrea V.

    2009-01-01

    The present study examined narrative identity in adolescence (14-18 years) in terms of narrative content and processes of identity development. Age- and gender-related differences in narrative patterns in turning point memories and gender differences in the content and functions for sharing those memories were examined, as was the relationship…

  4. The Performance of a Lifetime

    ERIC Educational Resources Information Center

    Burdette, Kimberly

    2007-01-01

    In this article, the author recalls and shares the first half of her college journey. Her memories do not play back to her in bursts of sounds or colors; friends or lovers; feelings, touches, tastes, or ideas. They play, rather, as silent images of herself that flicker disjointedly across her mind, the lens of her memory having recorded her…

  5. Schools of the Past: A Treasury of Photographs. Fastback 80.

    ERIC Educational Resources Information Center

    Davis, O. L., Jr.

    The experience of schooling in America is recalled through a memory-sharing essay and an album of photographs. The intent of the article is to prompt readers to remember their personal schooling experiences and relate them to the larger framework of national memories. The essay, focusing on schools at the turn of the 20th century, discusses…

  6. Shared Etiology of Phonological Memory and Vocabulary Deficits in School-Age Children

    ERIC Educational Resources Information Center

    Peterson, Robin L.; Pennington, Bruce F.; Samuelsson, Stefan; Byrne, Brian; Olson, Richard K.

    2013-01-01

    Purpose: The goal of this study was to investigate the etiologic basis for the association between deficits in phonological memory (PM) and vocabulary in school-age children. Method: Children with deficits in PM or vocabulary were identified within the International Longitudinal Twin Study (ILTS; Samuelsson et al., 2005). The ILTS includes 1,045…

  7. The CA3 Network as a Memory Store for Spatial Representations

    ERIC Educational Resources Information Center

    Papp, Gergely; Witter, Menno P.; Treves, Alessandro

    2007-01-01

    Comparative neuroanatomy suggests that the CA3 region of the mammalian hippocampus is directly homologous with the medio-dorsal pallium in birds and reptiles, with which it largely shares the basic organization of primitive cortex. Autoassociative memory models, which are generically applicable to cortical networks, then help assess how well CA3…

  8. Memories Are Made of This

    ERIC Educational Resources Information Center

    Chang, Christine

    2010-01-01

    In this article, the author shares her memories of Sally Smith, the founder of The Lab School of Washington, where she works as the director of the Occupational Therapy. When the author first met Smith, Smith asked her what brought her to The Lab School at that point in her career. She told Smith that her background was rather eclectic, since she…

  9. They're Why We're Here

    ERIC Educational Resources Information Center

    Razook, Nim

    2009-01-01

    The author began teaching at the University of Oklahoma in the late 1970s. In this article, the author shares two memories of those times on campus. The first was looking out his office window and seeing Iranian students marching on campus, shouting, "The Shah is a Fascist Pig." The second memory provoked this paper. It made the author…

  10. Rearview Memories

    ERIC Educational Resources Information Center

    Gross, Gwen E.

    2008-01-01

    In this article, the author shares her experience when she was still a student until she became a superintendent. In her 17th year in the superintendency, the author finds the joys of her work all around her, grateful to be bestowed with the gift of leadership. She shares with colleagues a few especially meaningful moments from her professional…

  11. Programming and memory dynamics of innate leukocytes during tissue homeostasis and inflammation.

    PubMed

    Lee, Christina; Geng, Shuo; Zhang, Yao; Rahtes, Allison; Li, Liwu

    2017-09-01

    The field of innate immunity is witnessing a paradigm shift regarding "memory" and "programming" dynamics. Past studies of innate leukocytes characterized them as first responders to danger signals with no memory. However, recent findings suggest that innate leukocytes, such as monocytes and neutrophils, are capable of "memorizing" not only the chemical nature but also the history and dosages of external stimulants. As a consequence, innate leukocytes can be dynamically programmed or reprogrammed into complex inflammatory memory states. Key examples of innate leukocyte memory dynamics include the development of primed and tolerant monocytes when "programmed" with a variety of inflammatory stimulants at varying signal strengths. The development of innate leukocyte memory may have far-reaching translational implications, as programmed innate leukocytes may affect the pathogenesis of both acute and chronic inflammatory diseases. This review intends to critically discuss some of the recent studies that address this emerging concept and its implication in the pathogenesis of inflammatory diseases. © Society for Leukocyte Biology.

  12. Honoring our donors: a survey of memorial ceremonies in United States anatomy programs.

    PubMed

    Jones, Trahern W; Lachman, Nirusha; Pawlina, Wojciech

    2014-01-01

    Many anatomy programs that incorporate dissection of donated human bodies hold memorial ceremonies of gratitude towards body donors. The content of these ceremonies may include learners' reflections on mortality, respect, altruism, and personal growth told through various humanities modalities. The task of planning is usually student- and faculty-led with participation from other health care students. Objective information on current memorial ceremonies for body donors in anatomy programs in the United States appears to be lacking. The number of programs in the United States that currently plan these memorial ceremonies and information on trends in programs undertaking such ceremonies remain unknown. Gross anatomy program directors throughout the United States were contacted and asked to respond to a voluntary questionnaire on memorial ceremonies held at their institution. The results (response rate 68.2%) indicated that a majority of human anatomy programs (95.5%) hold memorial ceremonies. These ceremonies are, for the most part, student-driven and nondenominational or secular in nature. Participants heavily rely upon speech, music, poetry, and written essays, with a small inclusion of other humanities modalities, such as dance or visual art, to explore a variety of themes during these ceremonies. © 2013 American Association of Anatomists.

  13. A numerical differentiation library exploiting parallel architectures

    NASA Astrophysics Data System (ADS)

    Voglis, C.; Hadjidoukas, P. E.; Lagaris, I. E.; Papageorgiou, D. G.

    2009-08-01

    We present a software library for numerically estimating first and second order partial derivatives of a function by finite differencing. Various truncation schemes are offered resulting in corresponding formulas that are accurate to order O(h), O(h), and O(h), h being the differencing step. The derivatives are calculated via forward, backward and central differences. Care has been taken that only feasible points are used in the case where bound constraints are imposed on the variables. The Hessian may be approximated either from function or from gradient values. There are three versions of the software: a sequential version, an OpenMP version for shared memory architectures and an MPI version for distributed systems (clusters). The parallel versions exploit the multiprocessing capability offered by computer clusters, as well as modern multi-core systems and due to the independent character of the derivative computation, the speedup scales almost linearly with the number of available processors/cores. Program summaryProgram title: NDL (Numerical Differentiation Library) Catalogue identifier: AEDG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 73 030 No. of bytes in distributed program, including test data, etc.: 630 876 Distribution format: tar.gz Programming language: ANSI FORTRAN-77, ANSI C, MPI, OPENMP Computer: Distributed systems (clusters), shared memory systems Operating system: Linux, Solaris Has the code been vectorised or parallelized?: Yes RAM: The library uses O(N) internal storage, N being the dimension of the problem Classification: 4.9, 4.14, 6.5 Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, etc. The parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Restrictions: The library uses only double precision arithmetic. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 15 ms for the serial distribution, 0.6 s for the OpenMP and 4.2 s for the MPI parallel distribution on 2 processors.

  14. The iconic memory skills of brain injury survivors and non-brain injured controls after visual scanning training.

    PubMed

    McClure, J T; Browning, R T; Vantrease, C M; Bittle, S T

    1994-01-01

    Previous research suggests that traumatic brain injury (TBI) results in impairment of iconic memory abilities.We would like to acknowledge the contribution of Jeffrey D. Vantrease, who wrote the software program for the Iconic Memory procedure and measurement. This raises serious implications for brain injury rehabilitation. Most cognitive rehabilitation programs do not include iconic memory training. Instead it is common for cognitive rehabilitation programs to focus on attention and concentration skills, memory skills, and visual scanning skills.This study compared the iconic memory skills of brain-injury survivors and control subjects who all reached criterion levels of visual scanning skills. This involved previous training for the brain-injury survivors using popular visual scanning programs that allowed them to visually scan with response time and accuracy within normal limits. Control subjects required only minimal training to reach normal limits criteria. This comparison allows for the dissociation of visual scanning skills and iconic memory skills.The results are discussed in terms of their implications for cognitive rehabilitation and the relationship between visual scanning training and iconic memory skills.

  15. [Cortical potentials evoked to response to a signal to make a memory-guided saccade].

    PubMed

    Slavutskaia, M V; Moiseeva, V V; Shul'govskiĭ, V V

    2010-01-01

    The difference in parameters of visually guided and memory-guided saccades was shown. Increase in the memory-guided saccade latency as compared to that of the visually guided saccades may indicate the deceleration of saccadic programming on the basis of information extraction from the memory. The comparison of parameters and topography of evoked components N1 and P1 of the evoked potential on the signal to make a memory- or visually guided saccade suggests that the early stage of the saccade programming associated with the space information processing is performed predominantly with top-down attention mechanism before the memory-guided saccade and bottom-up mechanism before the visually guided saccade. The findings show that the increase in the latency of the memory-guided saccades is connected with decision making at the central stage of the saccade programming. We proposed that wave N2, which develops in the middle of the latent period of the memory-guided saccades, is correlated with this process. Topography and spatial dynamics of components N1, P1 and N2 testify that the memory-guided saccade programming is controlled by the frontal mediothalamic system of selective attention and left-hemispheric brain mechanisms of motor attention.

  16. Ageing-related stereotypes in memory: When the beliefs come true.

    PubMed

    Bouazzaoui, Badiâa; Follenfant, Alice; Ric, François; Fay, Séverine; Croizet, Jean-Claude; Atzeni, Thierry; Taconnat, Laurence

    2016-01-01

    Age-related stereotype concerns culturally shared beliefs about the inevitable decline of memory with age. In this study, stereotype priming and stereotype threat manipulations were used to explore the impact of age-related stereotype on metamemory beliefs and episodic memory performance. Ninety-two older participants who reported the same perceived memory functioning were divided into two groups: a threatened group and a non-threatened group (control). First, the threatened group was primed with an ageing stereotype questionnaire. Then, both groups were administered memory complaints and memory self-efficacy questionnaires to measure metamemory beliefs. Finally, both groups were administered the Logical Memory task to measure episodic memory, for the threatened group the instructions were manipulated to enhance the stereotype threat. Results indicated that the threatened individuals reported more memory complaints and less memory efficacy, and had lower scores than the control group on the logical memory task. A multiple mediation analysis revealed that the stereotype threat effect on the episodic memory performance was mediated by both memory complaints and memory self-efficacy. This study revealed that stereotype threat impacts belief in one's own memory functioning, which in turn impairs episodic memory performance.

  17. CLOCS (Computer with Low Context-Switching Time) Architecture Reference Documents

    DTIC Science & Technology

    1988-05-06

    Peculiarities The only state inside the central processing unit(CPU) is a program status word. All data operations are memory to memory. One result of this... to the challenge "if I whore to design RISC, this is how I would do it." The architecture was designed by Mark Davis and Bill Gallmeister. 1.2...are memory to memory. Any special devices added should be memory mapped. The program counter is even memory mapped. 1.3.1 Working storage There is no

  18. Chemically programmed ink-jet printed resistive WORM memory array and readout circuit

    NASA Astrophysics Data System (ADS)

    Andersson, H.; Manuilskiy, A.; Sidén, J.; Gao, J.; Hummelgård, M.; Kunninmel, G. V.; Nilsson, H.-E.

    2014-09-01

    In this paper an ink-jet printed write once read many (WORM) resistive memory fabricated on paper substrate is presented. The memory elements are programmed for different resistance states by printing triethylene glycol monoethyl ether on the substrate before the actual memory element is printed using silver nano particle ink. The resistance is thus able to be set to a broad range of values without changing the geometry of the elements. A memory card consisting of 16 elements is manufactured for which the elements are each programmed to one of four defined logic levels, providing a total of 4294 967 296 unique possible combinations. Using a readout circuit, originally developed for resistive sensors to avoid crosstalk between elements, a memory card reader is manufactured that is able to read the values of the memory card and transfer the data to a PC. Such printed memory cards can be used in various applications.

  19. Efficient ICCG on a shared memory multiprocessor

    NASA Technical Reports Server (NTRS)

    Hammond, Steven W.; Schreiber, Robert

    1989-01-01

    Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.

  20. Which neuropsychological functions predict various processing speed components in children with and without attention-deficit/hyperactivity disorder?

    PubMed

    Vadnais, Sarah A; Kibby, Michelle Y; Jagger-Rickels, Audreyana C

    2018-01-01

    We identified statistical predictors of four processing speed (PS) components in a sample of 151 children with and without attention-deficit/hyperactivity disorder (ADHD). Performance on perceptual speed was predicted by visual attention/short-term memory, whereas incidental learning/psychomotor speed was predicted by verbal working memory. Rapid naming was predictive of each PS component assessed, and inhibition predicted all but one task, suggesting a shared need to identify/retrieve stimuli rapidly and inhibit incorrect responding across PS components. Hence, we found both shared and unique predictors of perceptual, cognitive, and output speed, suggesting more specific terminology should be used in future research on PS in ADHD.

  1. Parallel k-means++ for Multiple Shared-Memory Architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mackey, Patrick S.; Lewis, Robert R.

    2016-09-22

    In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithms that are only approximations of k-means++. In this paper we present a parallelization of the exact k-means++ algorithm, with a proof of its correctness. We develop implementations for three distinct shared-memory architectures: multicore CPU, high performance GPU, and the massively multithreaded Cray XMT platform. We demonstrate the scalability of the algorithm on each platform. In addition we present a visual approach for showing which platform performed k-means++ the fastest for varyingmore » data sizes.« less

  2. A multiarchitecture parallel-processing development environment

    NASA Technical Reports Server (NTRS)

    Townsend, Scott; Blech, Richard; Cole, Gary

    1993-01-01

    A description is given of the hardware and software of a multiprocessor test bed - the second generation Hypercluster system. The Hypercluster architecture consists of a standard hypercube distributed-memory topology, with multiprocessor shared-memory nodes. By using standard, off-the-shelf hardware, the system can be upgraded to use rapidly improving computer technology. The Hypercluster's multiarchitecture nature makes it suitable for researching parallel algorithms in computational field simulation applications (e.g., computational fluid dynamics). The dedicated test-bed environment of the Hypercluster and its custom-built software allows experiments with various parallel-processing concepts such as message passing algorithms, debugging tools, and computational 'steering'. Such research would be difficult, if not impossible, to achieve on shared, commercial systems.

  3. Shared virtual memory and generalized speedup

    NASA Technical Reports Server (NTRS)

    Sun, Xian-He; Zhu, Jianping

    1994-01-01

    Generalized speedup is defined as parallel speed over sequential speed. The generalized speedup and its relation with other existing performance metrics, such as traditional speedup, efficiency, scalability, etc., are carefully studied. In terms of the introduced asymptotic speed, it was shown that the difference between the generalized speedup and the traditional speedup lies in the definition of the efficiency of uniprocessor processing, which is a very important issue in shared virtual memory machines. A scientific application was implemented on a KSR-1 parallel computer. Experimental and theoretical results show that the generalized speedup is distinct from the traditional speedup and provides a more reasonable measurement. In the study of different speedups, various causes of superlinear speedup are also presented.

  4. Error recovery in shared memory multiprocessors using private caches

    NASA Technical Reports Server (NTRS)

    Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.

    1990-01-01

    The problem of recovering from processor transient faults in shared memory multiprocesses systems is examined. A user-transparent checkpointing and recovery scheme using private caches is presented. Processes can recover from errors due to faulty processors by restarting from the checkpointed computation state. Implementation techniques using checkpoint identifiers and recovery stacks are examined as a means of reducing performance degradation in processor utilization during normal execution. This cache-based checkpointing technique prevents rollback propagation, provides rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions to take error latency into account are presented.

  5. Reducing Interprocessor Dependence in Recoverable Distributed Shared Memory

    NASA Technical Reports Server (NTRS)

    Janssens, Bob; Fuchs, W. Kent

    1994-01-01

    Checkpointing techniques in parallel systems use dependency tracking and/or message logging to ensure that a system rolls back to a consistent state. Traditional dependency tracking in distributed shared memory (DSM) systems is expensive because of high communication frequency. In this paper we show that, if designed correctly, a DSM system only needs to consider dependencies due to the transfer of blocks of data, resulting in reduced dependency tracking overhead and reduced potential for rollback propagation. We develop an ownership timestamp scheme to tolerate the loss of block state information and develop a passive server model of execution where interactions between processors are considered atomic. With our scheme, dependencies are significantly reduced compared to the traditional message-passing model.

  6. Towards memory-aware services and browsing through lifelogging sensing.

    PubMed

    Arcega, Lorena; Font, Jaime; Cetina, Carlos

    2013-11-05

    Every day we receive lots of information through our senses that is lost forever, because it lacked the strength or the repetition needed to generate a lasting memory. Combining the emerging Internet of Things and lifelogging sensors, we believe it is possible to build up a Digital Memory (Dig-Mem) in order to complement the fallible memory of people. This work shows how to realize the Dig-Mem in terms of interactions, affinities, activities, goals and protocols. We also complement this Dig-Mem with memory-aware services and a Dig-Mem browser. Furthermore, we propose a RFID Tag-Sharing technique to speed up the adoption of Dig-Mem. Experimentation reveals an improvement of the user understanding of Dig-Mem as time passes, compared to natural memories where the level of detail decreases over time.

  7. Optics Program Modified for Multithreaded Parallel Computing

    NASA Technical Reports Server (NTRS)

    Lou, John; Bedding, Dave; Basinger, Scott

    2006-01-01

    A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.

  8. MOIL-opt: Energy-Conserving Molecular Dynamics on a GPU/CPU system

    PubMed Central

    Ruymgaart, A. Peter; Cardenas, Alfredo E.; Elber, Ron

    2011-01-01

    We report an optimized version of the molecular dynamics program MOIL that runs on a shared memory system with OpenMP and exploits the power of a Graphics Processing Unit (GPU). The model is of heterogeneous computing system on a single node with several cores sharing the same memory and a GPU. This is a typical laboratory tool, which provides excellent performance at minimal cost. Besides performance, emphasis is made on accuracy and stability of the algorithm probed by energy conservation for explicit-solvent atomically-detailed-models. Especially for long simulations energy conservation is critical due to the phenomenon known as “energy drift” in which energy errors accumulate linearly as a function of simulation time. To achieve long time dynamics with acceptable accuracy the drift must be particularly small. We identify several means of controlling long-time numerical accuracy while maintaining excellent speedup. To maintain a high level of energy conservation SHAKE and the Ewald reciprocal summation are run in double precision. Double precision summation of real-space non-bonded interactions improves energy conservation. In our best option, the energy drift using 1fs for a time step while constraining the distances of all bonds, is undetectable in 10ns simulation of solvated DHFR (Dihydrofolate reductase). Faster options, shaking only bonds with hydrogen atoms, are also very well behaved and have drifts of less than 1kcal/mol per nanosecond of the same system. CPU/GPU implementations require changes in programming models. We consider the use of a list of neighbors and quadratic versus linear interpolation in lookup tables of different sizes. Quadratic interpolation with a smaller number of grid points is faster than linear lookup tables (with finer representation) without loss of accuracy. Atomic neighbor lists were found most efficient. Typical speedups are about a factor of 10 compared to a single-core single-precision code. PMID:22328867

  9. Cognitive stimulation in healthy older adults: a cognitive stimulation program using leisure activities compared to a conventional cognitive stimulation program.

    PubMed

    Grimaud, Élisabeth; Taconnat, Laurence; Clarys, David

    2017-06-01

    The aim of this study was to compare two methods of cognitive stimulation for the cognitive functions. The first method used an usual approach, the second used leisure activities in order to assess their benefits on cognitive functions (speed of processing; working memory capacity and executive functions) and psychoaffective measures (memory span and self esteem). 67 participants over 60 years old took part in the experiment. They were divided into three groups: 1 group followed a program of conventional cognitive stimulation, 1 group a program of cognitive stimulation using leisure activities and 1 control group. The different measures have been evaluated before and after the training program. Results show that the cognitive stimulation program using leisure activities is as effective on memory span, updating and memory self-perception as the program using conventional cognitive stimulation, and more effective on self-esteem than the conventional program. There is no difference between the two stimulated groups and the control group on speed of processing. Neither of the two cognitive stimulation programs provides a benefit over shifting and inhibition. These results indicate that it seems to be possible to enhance working memory and to observe far transfer benefits over self-perception (self-esteem and memory self-perception) when using leisure activities as a tool for cognitive stimulation.

  10. Multi-processor including data flow accelerator module

    DOEpatents

    Davidson, George S.; Pierce, Paul E.

    1990-01-01

    An accelerator module for a data flow computer includes an intelligent memory. The module is added to a multiprocessor arrangement and uses a shared tagged memory architecture in the data flow computer. The intelligent memory module assigns locations for holding data values in correspondence with arcs leading to a node in a data dependency graph. Each primitive computation is associated with a corresponding memory cell, including a number of slots for operands needed to execute a primitive computation, a primitive identifying pointer, and linking slots for distributing the result of the cell computation to other cells requiring that result as an operand. Circuitry is provided for utilizing tag bits to determine automatically when all operands required by a processor are available and for scheduling the primitive for execution in a queue. Each memory cell of the module may be associated with any of the primitives, and the particular primitive to be executed by the processor associated with the cell is identified by providing an index, such as the cell number for the primitive, to the primitive lookup table of starting addresses. The module thus serves to perform functions previously performed by a number of sections of data flow architectures and coexists with conventional shared memory therein. A multiprocessing system including the module operates in a hybrid mode, wherein the same processing modules are used to perform some processing in a sequential mode, under immediate control of an operating system, while performing other processing in a data flow mode.

  11. Parallelization strategies for continuum-generalized method of moments on the multi-thread systems

    NASA Astrophysics Data System (ADS)

    Bustamam, A.; Handhika, T.; Ernastuti, Kerami, D.

    2017-07-01

    Continuum-Generalized Method of Moments (C-GMM) covers the Generalized Method of Moments (GMM) shortfall which is not as efficient as Maximum Likelihood estimator by using the continuum set of moment conditions in a GMM framework. However, this computation would take a very long time since optimizing regularization parameter. Unfortunately, these calculations are processed sequentially whereas in fact all modern computers are now supported by hierarchical memory systems and hyperthreading technology, which allowing for parallel computing. This paper aims to speed up the calculation process of C-GMM by designing a parallel algorithm for C-GMM on the multi-thread systems. First, parallel regions are detected for the original C-GMM algorithm. There are two parallel regions in the original C-GMM algorithm, that are contributed significantly to the reduction of computational time: the outer-loop and the inner-loop. Furthermore, this parallel algorithm will be implemented with standard shared-memory application programming interface, i.e. Open Multi-Processing (OpenMP). The experiment shows that the outer-loop parallelization is the best strategy for any number of observations.

  12. Expressing Parallelism with ROOT

    NASA Astrophysics Data System (ADS)

    Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

    2017-10-01

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  13. Expressing Parallelism with ROOT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Piparo, D.; Tejedor, E.; Guiraud, E.

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module inmore » Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.« less

  14. Sharing Histories-a transformative learning/teaching method to empower community health workers to support health behavior change of mothers.

    PubMed

    Altobelli, Laura C

    2017-08-23

    One of the keys to improving health globally is promoting mothers' adoption of healthy home practices for improved nutrition and illness prevention in the first 1000 days of life from conception. Customarily, mothers are taught health messages which, even if simplified, are hard to remember. The challenge is how to promote learning and behavior change of mothers more effectively in low-resource settings where access to health information is poor, educational levels are low, and traditional beliefs are strong. In addressing that challenge, a new learning/teaching method called "Sharing Histories" is in development to improve the performance of female community health workers (CHWs) in promoting mothers' behaviors for maternal, neonatal and child health (MNCH). This method builds self-confidence and empowerment of CHWs in learning sessions that are built on guided sharing of their own memories of childbearing and child care. CHWs can later share histories with the mother, building her trust and empowerment to change. For professional primary health care staff who are not educators, Sharing Histories is simple to learn and use so that the method can be easily incorporated into government health systems and ongoing CHW programs. I present here the Sharing Histories method, describe how it differs from other social and behavior change methods, and discuss selected literature from psychology, communications, and neuroscience that helps to explain how and why this method works as a transformative tool to engage, teach, transform, and empower CHWs to be more effective change agents with other mothers in their communities, thereby contributing to the attainment of the Sustainable Development Goals.

  15. [Assessing program sustainability in public health organizations: a tool-kit application in Haiti].

    PubMed

    Ridde, V; Pluye, P; Queuille, L

    2006-10-01

    Public health stakeholders are concerned about program sustainability. However, they usually conceive sustainability in accordance with financial criteria for at least one reason. No simple frameworks are operationally and theoretically sound enough to globally evaluate program sustainability. The present paper aims to describe an application of one framework assessment tool used to evaluate the sustainability level and process of a Nutritional Care Unit managed by a Swiss humanitarian agency to fight against severe child malnutrition in a Haitian area. The managing agency is committed to put this Unit back into the structure of a local public hospital. The evaluation was performed within the sustainability framework proposed in a former article. Data were collected with a combination of tools, semi-structured interviews (n=33, medical and support staff from the agency and the hospital), participatory observation and document review. Data concerned the four characteristics of organizational routines (memory, adaptation, values and rules) enabling assess to the level of sustainability. In addition, data were related to three types of events distinguishing routinization processes from implementation processes: specific events of routinization, routinization-implementation joint events, and specific events of implementation. Data analysis was thematic and results were validated by actors through a feed-back session and written comments. The current level of sustainability of the Nutritional Care Unit within the Hospital is weak: weak memory, high adaptation, weak sharing of values and rules. This may be explained by the sustainability process, and the absence of specific routinization events. The relevance of such processes is reasonable, while it has been strongly challenged in the troublesome Haitian context. Riots have been widespread over the last years, creating difficulties for the Hospital. This experience suggests the proposed framework and sustainability assessment tools are useful when the context permits scrutinization of program sustainability.

  16. Dynamic modulation of innate immunity programming and memory.

    PubMed

    Yuan, Ruoxi; Li, Liwu

    2016-01-01

    Recent progress harkens back to the old theme of immune memory, except this time in the area of innate immunity, to which traditional paradigm only prescribes a rudimentary first-line defense function with no memory. However, both in vitro and in vivo studies reveal that innate leukocytes may adopt distinct activation states such as priming, tolerance, and exhaustion, depending upon the history of prior challenges. The dynamic programming and potential memory of innate leukocytes may have far-reaching consequences in health and disease. This review aims to provide some salient features of innate programing and memory, patho-physiological consequences, underlying mechanisms, and current pressing issues.

  17. Association of KIBRA and memory.

    PubMed

    Bates, Timothy C; Price, Jackie F; Harris, Sarah E; Marioni, Riccardo E; Fowkes, F Gerry R; Stewart, Marlene C; Murray, Gordon D; Whalley, Lawrence J; Starr, John M; Deary, Ian J

    2009-07-24

    We report on the association of KIBRA with memory in two samples of older individuals assessed on either memory for semantically unrelated word stimuli (Rey Auditory Verbal Learning Test, n=2091), or a measure of semantically related material (the WAIS Logical Memory Test of prose-passage recall, n=542). SNP rs17070145 was associated with delayed recall of semantically unrelated items, but not with immediate recall for these stimuli, nor with either immediate or delayed recall for semantically related material. The pattern of results suggests a role for the T-->C substitution in intron 9 of KIBRA in a component of episodic memory involved in long-term storage but independent of processes shared with immediate recall such as rehearsal involved in acquisition and rehearsal or processes.

  18. We Have Met Our Past and Our Future: Thanks for the Walk down Memory Lane

    ERIC Educational Resources Information Center

    Wiseman, Robert C.

    2006-01-01

    In this article, the author takes the readers for a walk down memory lane on the use of teaching aids. He shares his experience of the good old days of Audio Visual--opaque projector, motion pictures/films, recorders, and overhead projector. Computers have arrived, and now people can make graphics, pictures, motion pictures, and many different…

  19. Dealing with Prospective Memory Demands While Performing an Ongoing Task: Shared Processing, Increased On-Task Focus, or Both?

    ERIC Educational Resources Information Center

    Rummel, Jan; Smeekens, Bridget A.; Kane, Michael J.

    2017-01-01

    Prospective memory (PM) is the cognitive ability to remember to fulfill intended action plans at the appropriate future moment. Current theories assume that PM fulfillment draws on attentional processes. Accordingly, pending PM intentions interfere with other ongoing tasks to the extent to which both tasks rely on the same processes. How do people…

  20. An Action Sequence Withheld in Memory Can Delay Execution of Visually Guided Actions: The Generalization of Response Compatibility Interference

    ERIC Educational Resources Information Center

    Wiediger, Matthew D.; Fournier, Lisa R.

    2008-01-01

    Withholding an action plan in memory for later execution can delay execution of another action, if the actions share a similar (compatible) action feature (i.e., response hand). This phenomenon, termed compatibility interference (CI), was found for identity-based actions that do not require visual guidance. The authors examined whether CI can…

  1. Relations of Maternal Style and Child Self-Concept to Autobiographical Memories in Chinese, Chinese Immigrant, and European American 3-Year-Olds

    ERIC Educational Resources Information Center

    Wang, Qi

    2006-01-01

    The relations of maternal reminiscing style and child self-concept to children's shared and independent autobiographical memories were examined in a sample of 189 three-year-olds and their mothers from Chinese families in China, first-generation Chinese immigrant families in the United States, and European American families. Mothers shared…

  2. Genetic and environmental influences on individual differences in emotion regulation and its relation to working memory in toddlerhood.

    PubMed

    Wang, Manjie; Saudino, Kimberly J

    2013-12-01

    This is the first study to explore genetic and environmental contributions to individual differences in emotion regulation in toddlers, and the first to examine the genetic and environmental etiology underlying the association between emotion regulation and working memory. In a sample of 304 same-sex twin pairs (140 MZ, 164 DZ) at age 3, emotion regulation was assessed using the Behavior Rating Scale of the Bayley Scales of Infant Development (BRS; Bayley, 1993), and working memory was measured by the visually cued recall (VCR) task (Zelazo, Jacques, Burack, & Frye, 2002) and several memory tasks from the Mental Scale of the BSID. Based on model-fitting analyses, both emotion regulation and working memory were significantly influenced by genetic and nonshared environmental factors. Shared environmental effects were significant for working memory, but not for emotion regulation. Only genetic factors significantly contributed to the covariation between emotion regulation and working memory.

  3. Genetic and Environmental Influences on Individual Differences in Emotion Regulation and Its Relation to Working Memory in Toddlerhood

    PubMed Central

    Wang, Manjie; Saudino, Kimberly J.

    2014-01-01

    This is the first study to explore genetic and environmental contributions to individual differences in emotion regulation in toddlers, and the first to examine the genetic and environmental etiology underlying the association between emotion regulation and working memory. In a sample of 304 same-sex twin pairs (140 MZ, 164 DZ) at age 3, emotion regulation was assessed using the Behavior Rating Scale of the Bayley Scales of Infant Development (BRS; Bayley, 1993), and working memory was measured by the visually cued recall (VCR) task (Zelazo et al., 2002) and several memory tasks from the Mental Scale of BSID. Based on model-fitting analyses, both emotion regulation and working memory were significantly influenced by genetic and nonshared environmental factors. Shared environmental effects were significant for working memory, but not for emotion regulation. Only genetic factors significantly contributed to the covariation between emotion regulation and working memory. PMID:24098922

  4. Autobiographical Memory Sharing in Everyday Life: Characteristics of a Good Story

    ERIC Educational Resources Information Center

    Baron, Jacqueline M.; Bluck, Susan

    2009-01-01

    Storytelling is a ubiquitous human activity that occurs across the lifespan as part of everyday life. Studies from three disparate literatures suggest that older adults (as compared to younger adults) are (a) less likely to recall story details, (b) more likely to go off-target when sharing stories, and, in contrast, (c) more likely to receive…

  5. Electrophysiological Activity Generated during the Implicit Association Test: A Study Using Event-Related Potentials

    ERIC Educational Resources Information Center

    O'Toole, Catriona; Barnes-Holmes, Dermot

    2009-01-01

    The Implicit Association Test (IAT) examines the differential association of 2 target concepts with 2 attribute concepts. Responding is predicted to be faster on consistent trials, when concepts that are associated in memory share a response key, than on inconsistent trials, when less associated items share a key. In the current study,…

  6. Shared Versus Distributed Memory Multiprocessors

    DTIC Science & Technology

    1991-01-01

    multiprocessors should hawe shared or dis.trimuted meieo-% ha~ trr ~ g ’’~ de~i c4~accio;, S Cm teaicners argue S trongly tor Outiding (li15 tri huted...Applications, MIT Press (1985). 161 D. Gajski et el., "Cedar," Proc. Compcon, pp. 306-309 (Spring 19S9). 171 S. Ahuja, N. Carriero and D. Gelernter, "Linda

  7. The evolution of episodic memory

    PubMed Central

    Allen, Timothy A.; Fortin, Norbert J.

    2013-01-01

    One prominent view holds that episodic memory emerged recently in humans and lacks a “(neo)Darwinian evolution” [Tulving E (2002) Annu Rev Psychol 53:1–25]. Here, we review evidence supporting the alternative perspective that episodic memory has a long evolutionary history. We show that fundamental features of episodic memory capacity are present in mammals and birds and that the major brain regions responsible for episodic memory in humans have anatomical and functional homologs in other species. We propose that episodic memory capacity depends on a fundamental neural circuit that is similar across mammalian and avian species, suggesting that protoepisodic memory systems exist across amniotes and, possibly, all vertebrates. The implication is that episodic memory in diverse species may primarily be due to a shared underlying neural ancestry, rather than the result of evolutionary convergence. We also discuss potential advantages that episodic memory may offer, as well as species-specific divergences that have developed on top of the fundamental episodic memory architecture. We conclude by identifying possible time points for the emergence of episodic memory in evolution, to help guide further research in this area. PMID:23754432

  8. The Aging Well through Interaction and Scientific Education (AgeWISE) Program.

    PubMed

    O'Connor, Maureen K; Kraft, Malissa L; Daley, Ryan; Sugarman, Michael A; Clark, Erika L; Scoglio, Arielle A J; Shirk, Steven D

    2017-12-08

    We conducted a randomized controlled trial of the Aging Well through Interaction and Scientific Education (AgeWISE) program, a 12-week manualized cognitive rehabilitation program designed to provide psychoeducation to older adults about the aging brain, lifestyle factors associated with successful brain aging, and strategies to compensate for age related cognitive decline. Forty-nine cognitively intact participants ≥ 60 years old were randomly assigned to the AgeWISE program (n = 25) or a no-treatment control group (n = 24). Questionnaire data were collected prior to group assignment and post intervention. Two-factor repeated-measures analyses of covariance (ANCOVAs) were used to compare group outcomes. Upon completion, participants in the AgeWISE program reported increases in memory contentment and their sense of control in improving memory; no significant changes were observed in the control group. Surprisingly, participation in the group was not associated with significant changes in knowledge of memory aging, perception of memory ability, or greater use of strategies. The AgeWISE program was successfully implemented and increased participants' memory contentment and their sense of control in improving memory in advancing age. This study supports the use of AgeWISE to improve perspectives on healthy cognitive aging.

  9. Frequent Statement and Dereference Elimination for Imperative and Object-Oriented Distributed Programs

    PubMed Central

    El-Zawawy, Mohamed A.

    2014-01-01

    This paper introduces new approaches for the analysis of frequent statement and dereference elimination for imperative and object-oriented distributed programs running on parallel machines equipped with hierarchical memories. The paper uses languages whose address spaces are globally partitioned. Distributed programs allow defining data layout and threads writing to and reading from other thread memories. Three type systems (for imperative distributed programs) are the tools of the proposed techniques. The first type system defines for every program point a set of calculated (ready) statements and memory accesses. The second type system uses an enriched version of types of the first type system and determines which of the ready statements and memory accesses are used later in the program. The third type system uses the information gather so far to eliminate unnecessary statement computations and memory accesses (the analysis of frequent statement and dereference elimination). Extensions to these type systems are also presented to cover object-oriented distributed programs. Two advantages of our work over related work are the following. The hierarchical style of concurrent parallel computers is similar to the memory model used in this paper. In our approach, each analysis result is assigned a type derivation (serves as a correctness proof). PMID:24892098

  10. Explaining prompts children to privilege inductively rich properties.

    PubMed

    Walker, Caren M; Lombrozo, Tania; Legare, Cristine H; Gopnik, Alison

    2014-11-01

    Four experiments with preschool-aged children test the hypothesis that engaging in explanation promotes inductive reasoning on the basis of shared causal properties as opposed to salient (but superficial) perceptual properties. In Experiments 1a and 1b, 3- to 5-year-old children prompted to explain during a causal learning task were more likely to override a tendency to generalize according to perceptual similarity and instead extend an internal feature to an object that shared a causal property. Experiment 2 replicated this effect of explanation in a case of label extension (i.e., categorization). Experiment 3 demonstrated that explanation improves memory for clusters of causally relevant (non-perceptual) features, but impairs memory for superficial (perceptual) features, providing evidence that effects of explanation are selective in scope and apply to memory as well as inference. In sum, our data support the proposal that engaging in explanation influences children's reasoning by privileging inductively rich, causal properties. Copyright © 2014 Elsevier B.V. All rights reserved.

  11. Power and Performance Trade-offs for Space Time Adaptive Processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gawande, Nitin A.; Manzano Franco, Joseph B.; Tumeo, Antonino

    Computational efficiency – performance relative to power or energy – is one of the most important concerns when designing RADAR processing systems. This paper analyzes power and performance trade-offs for a typical Space Time Adaptive Processing (STAP) application. We study STAP implementations for CUDA and OpenMP on two computationally efficient architectures, Intel Haswell Core I7-4770TE and NVIDIA Kayla with a GK208 GPU. We analyze the power and performance of STAP’s computationally intensive kernels across the two hardware testbeds. We also show the impact and trade-offs of GPU optimization techniques. We show that data parallelism can be exploited for efficient implementationmore » on the Haswell CPU architecture. The GPU architecture is able to process large size data sets without increase in power requirement. The use of shared memory has a significant impact on the power requirement for the GPU. A balance between the use of shared memory and main memory access leads to an improved performance in a typical STAP application.« less

  12. Parallel Navier-Stokes computations on shared and distributed memory architectures

    NASA Technical Reports Server (NTRS)

    Hayder, M. Ehtesham; Jayasimha, D. N.; Pillay, Sasi Kumar

    1995-01-01

    We study a high order finite difference scheme to solve the time accurate flow field of a jet using the compressible Navier-Stokes equations. As part of our ongoing efforts, we have implemented our numerical model on three parallel computing platforms to study the computational, communication, and scalability characteristics. The platforms chosen for this study are a cluster of workstations connected through fast networks (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and a distributed memory multiprocessor (the IBM SPI). Our focus in this study is on the LACE testbed. We present some results for the Cray YMP and the IBM SP1 mainly for comparison purposes. On the LACE testbed, we study: (1) the communication characteristics of Ethernet, FDDI, and the ALLNODE networks and (2) the overheads induced by the PVM message passing library used for parallelizing the application. We demonstrate that clustering of workstations is effective and has the potential to be computationally competitive with supercomputers at a fraction of the cost.

  13. Effects of a Memory and Visual-Motor Integration Program for Older Adults Based on Self-Efficacy Theory.

    PubMed

    Kim, Eun Hwi; Suh, Soon Rim

    2017-06-01

    This study was conducted to verify the effects of a memory and visual-motor integration program for older adults based on self-efficacy theory. A non-equivalent control group pretest-posttest design was implemented in this quasi-experimental study. The participants were 62 older adults from senior centers and older adult welfare facilities in D and G city (Experimental group=30, Control group=32). The experimental group took part in a 12-session memory and visual-motor integration program over 6 weeks. Data regarding memory self-efficacy, memory, visual-motor integration, and depression were collected from July to October of 2014 and analyzed with independent t-test and Mann-Whitney U test using PASW Statistics (SPSS) 18.0 to determine the effects of the interventions. Memory self-efficacy (t=2.20, p=.031), memory (Z=-2.92, p=.004), and visual-motor integration (Z=-2.49, p=.013) increased significantly in the experimental group as compared to the control group. However, depression (Z=-0.90, p=.367) did not decrease significantly. This program is effective for increasing memory, visual-motor integration, and memory self-efficacy in older adults. Therefore, it can be used to improve cognition and prevent dementia in older adults. © 2017 Korean Society of Nursing Science

  14. Virtual reality-based prospective memory training program for people with acquired brain injury.

    PubMed

    Yip, Ben C B; Man, David W K

    2013-01-01

    Acquired brain injuries (ABI) may display cognitive impairments and lead to long-term disabilities including prospective memory (PM) failure. Prospective memory serves to remember to execute an intended action in the future. PM problems would be a challenge to an ABI patient's successful community reintegration. While retrospective memory (RM) has been extensively studied, treatment programs for prospective memory are rarely reported. The development of a treatment program for PM, which is considered timely, can be cost-effective and appropriate to the patient's environment. A 12-session virtual reality (VR)-based cognitive rehabilitation program was developed using everyday PM activities as training content. 37 subjects were recruited to participate in a pretest-posttest control experimental study to evaluate its treatment effectiveness. Results suggest that significantly better changes were seen in both VR-based and real-life PM outcome measures, related cognitive attributes such as frontal lobe functions and semantic fluency. VR-based training may be well accepted by ABI patients as encouraging improvement has been shown. Large-scale studies of a virtual reality-based prospective memory (VRPM) training program are indicated.

  15. The Memory Fitness Program: Cognitive Effects of a Healthy Aging Intervention

    PubMed Central

    Miller, Karen J.; Siddarth, Prabha; Gaines, Jean M.; Parrish, John M.; Ercoli, Linda M.; Marx, Katherine; Ronch, Judah; Pilgram, Barbara; Burke, Kasey; Barczak, Nancy; Babcock, Bridget; Small, Gary W.

    2014-01-01

    Context Age-related memory decline affects a large proportion of older adults. Cognitive training, physical exercise, and other lifestyle habits may help to minimize self-perception of memory loss and a decline in objective memory performance. Objective The purpose of this study was to determine whether a 6-week educational program on memory training, physical activity, stress reduction, and healthy diet led to improved memory performance in older adults. Design A convenience sample of 115 participants (mean age: 80.9 [SD: 6.0 years]) was recruited from two continuing care retirement communities. The intervention consisted of 60-minute classes held twice weekly with 15–20 participants per class. Testing of both objective and subjective cognitive performance occurred at baseline, preintervention, and postintervention. Objective cognitive measures evaluated changes in five domains: immediate verbal memory, delayed verbal memory, retention of verbal information, memory recognition, and verbal fluency. A standardized metamemory instrument assessed four domains of memory self-awareness: frequency and severity of forgetting, retrospective functioning, and mnemonics use. Results The intervention program resulted in significant improvements on objective measures of memory, including recognition of word pairs (t[114] = 3.62, p < 0.001) and retention of verbal information from list learning (t[114] = 2.98, p < 0.01). No improvement was found for verbal fluency. Regarding subjective memory measures, the retrospective functioning score increased significantly following the intervention (t[114] = 4.54, p < 0.0001), indicating perception of a better memory. Conclusions These findings indicate that a 6-week healthy lifestyle program can improve both encoding and recalling of new verbal information, as well as self-perception of memory ability in older adults residing in continuing care retirement communities. PMID:21765343

  16. Working Memory Training for Children with Cochlear Implants: A Pilot Study

    ERIC Educational Resources Information Center

    Kronenberger, William G.; Pisoni, David B.; Henning, Shirley C.; Colson, Bethany G.; Hazzard, Lindsey M.

    2011-01-01

    Purpose: This study investigated the feasibility and efficacy of a working memory training program for improving memory and language skills in a sample of 9 children who are deaf (age 7-15 years) with cochlear implants (CIs). Method: All children completed the Cogmed Working Memory Training program on a home computer over a 5-week period.…

  17. Interference from mere thinking: mental rehearsal temporarily disrupts recall of motor memory.

    PubMed

    Yin, Cong; Wei, Kunlin

    2014-08-01

    Interference between successively learned tasks is widely investigated to study motor memory. However, how simultaneously learned motor memories interact with each other has been rarely studied despite its prevalence in daily life. Assuming that motor memory shares common neural mechanisms with declarative memory system, we made unintuitive predictions that mental rehearsal, as opposed to further practice, of one motor memory will temporarily impair the recall of another simultaneously learned memory. Subjects simultaneously learned two sensorimotor tasks, i.e., visuomotor rotation and gain. They retrieved one memory by either practice or mental rehearsal and then had their memory evaluated. We found that mental rehearsal, instead of execution, impaired the recall of unretrieved memory. This impairment was content-independent, i.e., retrieving either gain or rotation impaired the other memory. Hence, conscious recollection of one motor memory interferes with the recall of another memory. This is analogous to retrieval-induced forgetting in declarative memory, suggesting a common neural process across memory systems. Our findings indicate that motor imagery is sufficient to induce interference between motor memories. Mental rehearsal, currently widely regarded as beneficial for motor performance, negatively affects memory recall when it is exercised for a subset of memorized items. Copyright © 2014 the American Physiological Society.

  18. Distributed-Memory Fast Maximal Independent Set

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kanewala Appuhamilage, Thejaka Amila J.; Zalewski, Marcin J.; Lumsdaine, Andrew

    The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby’s seminal MIS algorithms, “Luby(A)” and “Luby(B),” to distributed-memory execution, and we evaluatemore » their performance. We compare our results with the “Filtered MIS” implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.« less

  19. Towards Memory-Aware Services and Browsing through Lifelogging Sensing

    PubMed Central

    Arcega, Lorena; Font, Jaime; Cetina, Carlos

    2013-01-01

    Every day we receive lots of information through our senses that is lost forever, because it lacked the strength or the repetition needed to generate a lasting memory. Combining the emerging Internet of Things and lifelogging sensors, we believe it is possible to build up a Digital Memory (Dig-Mem) in order to complement the fallible memory of people. This work shows how to realize the Dig-Mem in terms of interactions, affinities, activities, goals and protocols. We also complement this Dig-Mem with memory-aware services and a Dig-Mem browser. Furthermore, we propose a RFID Tag-Sharing technique to speed up the adoption of Dig-Mem. Experimentation reveals an improvement of the user understanding of Dig-Mem as time passes, compared to natural memories where the level of detail decreases over time. PMID:24196436

  20. Bermuda Triangle: a subsystem of the 168/E interfacing scheme used by Group B at SLAC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oxoby, G.J.; Levinson, L.J.; Trang, Q.H.

    1979-12-01

    The Bermuda Triangle system is a method of interfacing several 168/E microprocessors to a central system for control of the processors and overlaying their memories. The system is a three-way interface with I/O ports to a large buffer memory, a PDP11 Unibus and a bus to the 168/E processors. Data may be transferred bidirectionally between any two ports. Two Bermuda Triangles are used, one for the program memory and one for the data memory. The program buffer memory stores the overlay programs for the 168/E, and the data buffer memory, the incoming raw data, the data portion of the overlays,more » and the outgoing processed events. This buffering is necessary since the memories of 168/E microprocessors are small compared to the main program and the amount of data being processed. The link to the computer facility is via a Unibus to IBM channel interface. A PDP11/04 controls the data flow. 7 figures, 4 tables. (RWR)« less

  1. Guidance system operations plan for manned CSM earth orbital and lunar missions using program COLOSSUS 3. Section 7: Erasable memory programs

    NASA Technical Reports Server (NTRS)

    Hamilton, M. H.

    1972-01-01

    Erasable-memory programs designed for guidance computers used in command and lunar modules are presented. The purpose, functional description, assumptions, restrictions, and imitations are given for each program.

  2. Apollo guidance, navigation and control: Guidance system operations plans for manned LM earth orbital and lunar missions using Program COLOSSUS 3. Section 7: Erasable memory programs

    NASA Technical Reports Server (NTRS)

    Hamilton, M. H.

    1972-01-01

    Erasable-memory programs (EMPs) designed for the guidance computers used in the command (CMC) and lunar modules (LGC) are described. CMC programs are designated COLOSSUS 3, and the associated EMPs are identified by a three-digit number beginning with 5. LGC programs are designated LUMINARY 1E, and the associated EMPs are identified, with one exception, by a three-digit number beginning with 1. The exception is EMP 99. The EMPs vary in complexity from a simple flagbit setting to a long and intricate logical structure. They all, however, cause the computer to behave in a way not intended in the original design of the programs; they accomplish this off-nominal behavior by some alteration of erasable memory to interface with existing fixed-memory programs to effect a desired result.

  3. The efficacy of a multifactorial memory training in older adults living in residential care settings.

    PubMed

    Vranić, Andrea; Španić, Ana Marija; Carretti, Barbara; Borella, Erika

    2013-11-01

    Several studies have shown an increase in memory performance after teaching mnemonic techniques to older participants. However, transfer effects to non-trained tasks are generally either very small, or not found. The present study investigates the efficacy of a multifactorial memory training program for older adults living in a residential care center. The program combines teaching of memory strategies with activities based on metacognitive (metamemory) and motivational aspects. Specific training-related gains in the Immediate list recall task (criterion task), as well as transfer effects on measures of short-term memory, long-term memory, working memory, motivational (need for cognition), and metacognitive aspects (subjective measure of one's memory) were examined. Maintenance of training benefits was assessed after seven months. Fifty-one older adults living in a residential care center, with no cognitive impairments, participated in the study. Participants were randomly assigned to two programs: the experimental group attended the training program, while the active control group was involved in a program in which different psychological issues were discussed. A benefit in the criterion task and substantial general transfer effects were found for the trained group, but not for the active control, and they were maintained at the seven months follow-up. Our results suggest that training procedures, which combine teaching of strategies with metacognitive-motivational aspects, can improve cognitive functioning and attitude toward cognitive activities in older adults.

  4. Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

    NASA Technical Reports Server (NTRS)

    Quealy, Angela; Cole, Gary L.; Blech, Richard A.

    1993-01-01

    The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.

  5. Stromal cells in chronic inflammation and tertiary lymphoid organ formation.

    PubMed

    Buckley, Christopher D; Barone, Francesca; Nayar, Saba; Bénézech, Cecile; Caamaño, Jorge

    2015-01-01

    Inflammation is an unstable state. It either resolves or persists. Why inflammation persists and the factors that define tissue tropism remain obscure. Increasing evidence suggests that tissue-resident stromal cells not only provide positional memory but also actively regulate the differential accumulation of inflammatory cells within inflamed tissues. Furthermore, at many sites of chronic inflammation, structures that mimic secondary lymphoid tissues are observed, suggesting that chronic inflammation and lymphoid tissue formation share common activation programs. Similarly, blood and lymphatic endothelial cells contribute to tissue homeostasis and disease persistence in chronic inflammation. This review highlights our increasing understanding of the role of stromal cells in inflammation and summarizes the novel immunological role that stromal cells exert in the persistence of inflammatory diseases.

  6. Preventing Loss of Independence through Exercise (PLIÉ): qualitative analysis of a clinical trial in older adults with dementia.

    PubMed

    Wu, Eveline; Barnes, Deborah E; Ackerman, Sara L; Lee, Jennifer; Chesney, Margaret; Mehling, Wolf E

    2015-01-01

    Preventing Loss of Independence through Exercise (PLIÉ) is a novel, integrative exercise program for individuals with dementia that combines elements of different conventional and complementary exercise modalities (e.g. tai-chi, yoga, Feldenkrais, and dance movement therapy) and focuses on training procedural memory for basic functional movements (e.g., sit-to-stand) while increasing mindful body awareness and facilitating social connection. This study presents analyses of qualitative data collected during a 36-week cross-over pilot clinical trial in 11 individuals. Qualitative data included exercise instructors' written notes, which were prepared after each class and also following biweekly telephone calls with caregivers and monthly home visits; three video-recorded classes; and written summaries prepared by research assistants following pre- and post-intervention quantitative assessments. Data were extracted for each study participant and placed onto a timeline for month of observation. Data were coded and analyzed to identify themes that were confirmed and refined through an iterative, collaborative process by the entire team including a qualitative researcher (SA) and the exercise instructors. Three overarching themes emerged: (1) Functional changes included increasing body awareness, movement memory and functional skill. (2) Emotional changes included greater acceptance of resting, sharing of personal stories and feelings, and positive attitude toward exercise. (3) Social changes included more coherent social interactions and making friends. These qualitative results suggest that the PLIÉ program may be associated with beneficial functional, emotional, and social changes for individuals with mild to moderate dementia. Further study of the PLIÉ program in individuals with dementia is warranted.

  7. An Investigation of Unified Memory Access Performance in CUDA

    PubMed Central

    Landaverde, Raphael; Zhang, Tiansheng; Coskun, Ayse K.; Herbordt, Martin

    2015-01-01

    Managing memory between the CPU and GPU is a major challenge in GPU computing. A programming model, Unified Memory Access (UMA), has been recently introduced by Nvidia to simplify the complexities of memory management while claiming good overall performance. In this paper, we investigate this programming model and evaluate its performance and programming model simplifications based on our experimental results. We find that beyond on-demand data transfers to the CPU, the GPU is also able to request subsets of data it requires on demand. This feature allows UMA to outperform full data transfer methods for certain parallel applications and small data sizes. We also find, however, that for the majority of applications and memory access patterns, the performance overheads associated with UMA are significant, while the simplifications to the programming model restrict flexibility for adding future optimizations. PMID:26594668

  8. 75 FR 54590 - Notice of 2010 National Organic Certification Cost-Share Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-09-08

    ...] Notice of 2010 National Organic Certification Cost-Share Program AGENCY: Agricultural Marketing Service... Certification Cost-Share Funds. The AMS has allocated $22.0 million for this organic certification cost-share... National Organic Certification Cost- Share Program is authorized under 7 U.S.C. 6523, as amended by section...

  9. Parallel discrete event simulation: A shared memory approach

    NASA Technical Reports Server (NTRS)

    Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

    1987-01-01

    With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.

  10. Exploration versus exploitation in space, mind, and society

    PubMed Central

    Hills, Thomas T.; Todd, Peter M.; Lazer, David; Redish, A. David; Couzin, Iain D.

    2015-01-01

    Search is a ubiquitous property of life. Although diverse domains have worked on search problems largely in isolation, recent trends across disciplines indicate that the formal properties of these problems share similar structures and, often, similar solutions. Moreover, internal search (e.g., memory search) shows similar characteristics to external search (e.g., spatial foraging), including shared neural mechanisms consistent with a common evolutionary origin across species. Search problems and their solutions also scale from individuals to societies, underlying and constraining problem solving, memory, information search, and scientific and cultural innovation. In summary, search represents a core feature of cognition, with a vast influence on its evolution and processes across contexts and requiring input from multiple domains to understand its implications and scope. PMID:25487706

  11. Residual stresses in injection molded shape memory polymer parts

    NASA Astrophysics Data System (ADS)

    Katmer, Sukran; Esen, Huseyin; Karatas, Cetin

    2016-03-01

    Shape memory polymers (SMPs) are materials which have shape memory effect (SME). SME is a property which has the ability to change shape when induced by a stimulator such as temperature, moisture, pH, electric current, magnetic field, light, etc. A process, known as programming, is applied to SMP parts in order to alter them from their permanent shape to their temporary shape. In this study we investigated effects of injection molding and programming processes on residual stresses in molded thermoplastic polyurethane shape memory polymer, experimentally. The residual stresses were measured by layer removal method. The study shows that injection molding and programming process conditions have significantly influence on residual stresses in molded shape memory polyurethane parts.

  12. C++QEDv2: The multi-array concept and compile-time algorithms in the definition of composite quantum systems

    NASA Astrophysics Data System (ADS)

    Vukics, András

    2012-06-01

    C++QED is a versatile framework for simulating open quantum dynamics. It allows to build arbitrarily complex quantum systems from elementary free subsystems and interactions, and simulate their time evolution with the available time-evolution drivers. Through this framework, we introduce a design which should be generic for high-level representations of composite quantum systems. It relies heavily on the object-oriented and generic programming paradigms on one hand, and on the other hand, compile-time algorithms, in particular C++ template-metaprogramming techniques. The core of the design is the data structure which represents the state vectors of composite quantum systems. This data structure models the multi-array concept. The use of template metaprogramming is not only crucial to the design, but with its use all computations pertaining to the layout of the simulated system can be shifted to compile time, hence cutting on runtime. Program summaryProgram title: C++QED Catalogue identifier: AELU_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AELU_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions:http://cpc.cs.qub.ac.uk/licence/aelu_v1_0.html. The C++QED package contains other software packages, Blitz, Boost and FLENS, all of which may be distributed freely but have individual license requirements. Please see individual packages for license conditions. No. of lines in distributed program, including test data, etc.: 597 974 No. of bytes in distributed program, including test data, etc.: 4 874 839 Distribution format: tar.gz Programming language: C++ Computer: i386-i686, x86_64 Operating system: In principle cross-platform, as yet tested only on UNIX-like systems (including Mac OS X). RAM: The framework itself takes about 60 MB, which is fully shared. The additional memory taken by the program which defines the actual physical system (script) is typically less than 1 MB. The memory storing the actual data scales with the system dimension for state-vector manipulations, and the square of the dimension for density-operator manipulations. This might easily be GBs, and often the memory of the machine limits the size of the simulated system. Classification: 4.3, 4.13, 6.2, 20 External routines: Boost C++ libraries (http://www.boost.org/), GNU Scientific Library (http://www.gnu.org/software/gsl/), Blitz++ (http://www.oonumerics.org/blitz/), Linear Algebra Package - Flexible Library for Efficient Numerical Solutions (http://flens.sourceforge.net/). Nature of problem: Definition of (open) composite quantum systems out of elementary building blocks [1]. Manipulation of such systems, with emphasis on dynamical simulations such as Master-equation evolution [2] and Monte Carlo wave-function simulation [3]. Solution method: Master equation, Monte Carlo wave-function method. Restrictions: Total dimensionality of the system. Master equation - few thousands. Monte Carlo wave-function trajectory - several millions. Unusual features: Because of the heavy use of compile-time algorithms, compilation of programs written in the framework may take a long time and much memory (up to several GBs). Additional comments: The framework is not a program, but provides and implements an application-programming interface for developing simulations in the indicated problem domain. Supplementary information: http://cppqed.sourceforge.net/. Running time: Depending on the magnitude of the problem, can vary from a few seconds to weeks.

  13. Efficient partitioning and assignment on programs for multiprocessor execution

    NASA Technical Reports Server (NTRS)

    Standley, Hilda M.

    1993-01-01

    The general problem studied is that of segmenting or partitioning programs for distribution across a multiprocessor system. Efficient partitioning and the assignment of program elements are of great importance since the time consumed in this overhead activity may easily dominate the computation, effectively eliminating any gains made by the use of the parallelism. In this study, the partitioning of sequentially structured programs (written in FORTRAN) is evaluated. Heuristics, developed for similar applications are examined. Finally, a model for queueing networks with finite queues is developed which may be used to analyze multiprocessor system architectures with a shared memory approach to the problem of partitioning. The properties of sequentially written programs form obstacles to large scale (at the procedure or subroutine level) parallelization. Data dependencies of even the minutest nature, reflecting the sequential development of the program, severely limit parallelism. The design of heuristic algorithms is tied to the experience gained in the parallel splitting. Parallelism obtained through the physical separation of data has seen some success, especially at the data element level. Data parallelism on a grander scale requires models that accurately reflect the effects of blocking caused by finite queues. A model for the approximation of the performance of finite queueing networks is developed. This model makes use of the decomposition approach combined with the efficiency of product form solutions.

  14. Get the gist? The effects of processing depth on false recognition in short-term and long-term memory.

    PubMed

    Flegal, Kristin E; Reuter-Lorenz, Patricia A

    2014-07-01

    Gist-based processing has been proposed to account for robust false memories in the converging-associates task. The deep-encoding processes known to enhance verbatim memory also strengthen gist memory and increase distortions of long-term memory (LTM). Recent research has demonstrated that compelling false memory illusions are relatively delay-invariant, also occurring under canonical short-term memory (STM) conditions. To investigate the contributions of gist to false memory at short and long delays, processing depth was manipulated as participants encoded lists of four semantically related words and were probed immediately, following a filled 3- to 4-s retention interval, or approximately 20 min later, in a surprise recognition test. In two experiments, the encoding manipulation dissociated STM and LTM on the frequency, but not the phenomenology, of false memory. Deep encoding at STM increases false recognition rates at LTM, but confidence ratings and remember/know judgments are similar across delays and do not differ as a function of processing depth. These results suggest that some shared and some unique processes underlie false memory illusions at short and long delays.

  15. Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications

    NASA Astrophysics Data System (ADS)

    Francés, J.; Otero, B.; Bleda, S.; Gallego, S.; Neipp, C.; Márquez, A.; Beléndez, A.

    2015-06-01

    The Finite-Difference Time-Domain (FDTD) method is applied to the analysis of vibroacoustic problems and to study the propagation of longitudinal and transversal waves in a stratified media. The potential of the scheme and the relevance of each acceleration strategy for massively computations in FDTD are demonstrated in this work. In this paper, we propose two new specific implementations of the bi-dimensional scheme of the FDTD method using multi-CPU and multi-GPU, respectively. In the first implementation, an open source message passing interface (OMPI) has been included in order to massively exploit the resources of a biprocessor station with two Intel Xeon processors. Moreover, regarding CPU code version, the streaming SIMD extensions (SSE) and also the advanced vectorial extensions (AVX) have been included with shared memory approaches that take advantage of the multi-core platforms. On the other hand, the second implementation called the multi-GPU code version is based on Peer-to-Peer communications available in CUDA on two GPUs (NVIDIA GTX 670). Subsequently, this paper presents an accurate analysis of the influence of the different code versions including shared memory approaches, vector instructions and multi-processors (both CPU and GPU) and compares them in order to delimit the degree of improvement of using distributed solutions based on multi-CPU and multi-GPU. The performance of both approaches was analysed and it has been demonstrated that the addition of shared memory schemes to CPU computing improves substantially the performance of vector instructions enlarging the simulation sizes that use efficiently the cache memory of CPUs. In this case GPU computing is slightly twice times faster than the fine tuned CPU version in both cases one and two nodes. However, for massively computations explicit vector instructions do not worth it since the memory bandwidth is the limiting factor and the performance tends to be the same than the sequential version with auto-vectorisation and also shared memory approach. In this scenario GPU computing is the best option since it provides a homogeneous behaviour. More specifically, the speedup of GPU computing achieves an upper limit of 12 for both one and two GPUs, whereas the performance reaches peak values of 80 GFlops and 146 GFlops for the performance for one GPU and two GPUs respectively. Finally, the method is applied to an earth crust profile in order to demonstrate the potential of our approach and the necessity of applying acceleration strategies in these type of applications.

  16. A theory of working memory without consciousness or sustained activity

    PubMed Central

    Trübutschek, Darinka; Marti, Sébastien; Ojeda, Andrés; King, Jean-Rémi; Mi, Yuanyuan; Tsodyks, Misha; Dehaene, Stanislas

    2017-01-01

    Working memory and conscious perception are thought to share similar brain mechanisms, yet recent reports of non-conscious working memory challenge this view. Combining visual masking with magnetoencephalography, we investigate the reality of non-conscious working memory and dissect its neural mechanisms. In a spatial delayed-response task, participants reported the location of a subjectively unseen target above chance-level after several seconds. Conscious perception and conscious working memory were characterized by similar signatures: a sustained desynchronization in the alpha/beta band over frontal cortex, and a decodable representation of target location in posterior sensors. During non-conscious working memory, such activity vanished. Our findings contradict models that identify working memory with sustained neural firing, but are compatible with recent proposals of ‘activity-silent’ working memory. We present a theoretical framework and simulations showing how slowly decaying synaptic changes allow cell assemblies to go dormant during the delay, yet be retrieved above chance-level after several seconds. DOI: http://dx.doi.org/10.7554/eLife.23871.001 PMID:28718763

  17. Thread mapping using system-level model for shared memory multicores

    NASA Astrophysics Data System (ADS)

    Mitra, Reshmi

    Exploring thread-to-core mapping options for a parallel application on a multicore architecture is computationally very expensive. For the same algorithm, the mapping strategy (MS) with the best response time may change with data size and thread counts. The primary challenge is to design a fast, accurate and automatic framework for exploring these MSs for large data-intensive applications. This is to ensure that the users can explore the design space within reasonable machine hours, without thorough understanding on how the code interacts with the platform. Response time is related to the cycles per instructions retired (CPI), taking into account both active and sleep states of the pipeline. This work establishes a hybrid approach, based on Markov Chain Model (MCM) and Model Tree (MT) for system-level steady state CPI prediction. It is designed for shared memory multicore processors with coarse-grained multithreading. The thread status is represented by the MCM states. The program characteristics are modeled as the transition probabilities, representing the system moving between active and suspended thread states. The MT model extrapolates these probabilities for the actual application size (AS) from the smaller AS performance. This aspect of the framework, along with, the use of mathematical expressions for the actual AS performance information, results in a tremendous reduction in the CPI prediction time. The framework is validated using an electromagnetics application. The average performance prediction error for steady state CPI results with 12 different MSs is less than 1%. The total run time of model is of the order of minutes, whereas the actual application execution time is in terms of days.

  18. Bayer image parallel decoding based on GPU

    NASA Astrophysics Data System (ADS)

    Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

    2012-11-01

    In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.

  19. Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown, W Michael; Wang, Peng; Plimpton, Steven J

    The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less

  20. Parallelization of elliptic solver for solving 1D Boussinesq model

    NASA Astrophysics Data System (ADS)

    Tarwidi, D.; Adytia, D.

    2018-03-01

    In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.

  1. Synthetic environment employing a craft for providing user perspective reference

    DOEpatents

    Maples, Creve; Peterson, Craig A.

    1997-10-21

    A multi-dimensional user oriented synthetic environment system allows application programs to be programmed and accessed with input/output device independent, generic functional commands which are a distillation of the actual functions performed by any application program. A shared memory structure allows the translation of device specific commands to device independent, generic functional commands. Complete flexibility of the mapping of synthetic environment data to the user is thereby allowed. Accordingly, synthetic environment data may be provided to the user on parallel user information processing channels allowing the subcognitive mind to act as a filter, eliminating irrelevant information and allowing the processing of increase amounts of data by the user. The user is further provided with a craft surrounding the user within the synthetic environment, which craft, imparts important visual referential an motion parallax cues, enabling the user to better appreciate distances and directions within the synthetic environment. Display of this craft in close proximity to the user's point of perspective may be accomplished without substantially degrading the image resolution of the displayed portions of the synthetic environment.

  2. A warm and friendly memorial session for Helmut Oeschler

    NASA Astrophysics Data System (ADS)

    Cleymans, Jean; Hippolyte, Boris; Kalweit, Alexander; Müntz, Christian; Stroth, Joachim

    2018-02-01

    A full session was organized in memory of Helmut Oeschler during the 2017 edition of the Strangeness in Quark Matter Conference. It was heart-warming to discuss with the audience his main achievements and share anecdotes about this exceptionally praised and appreciated colleague, who was also a great friend for many at the conference. A brief summary of the session is provided with these proceedings.

  3. Feasibility study of current pulse induced 2-bit/4-state multilevel programming in phase-change memory

    NASA Astrophysics Data System (ADS)

    Liu, Yan; Fan, Xi; Chen, Houpeng; Wang, Yueqing; Liu, Bo; Song, Zhitang; Feng, Songlin

    2017-08-01

    In this brief, multilevel data storage for phase-change memory (PCM) has attracted more attention in the memory market to implement high capacity memory system and reduce cost-per-bit. In this work, we present a universal programing method of SET stair-case current pulse in PCM cells, which can exploit the optimum programing scheme to achieve 2-bit/ 4state resistance-level with equal logarithm interval. SET stair-case waveform can be optimized by TCAD real time simulation to realize multilevel data storage efficiently in an arbitrary phase change material. Experimental results from 1 k-bit PCM test-chip have validated the proposed multilevel programing scheme. This multilevel programming scheme has improved the information storage density, robustness of resistance-level, energy efficient and avoiding process complexity.

  4. Knowledge of memory functions in European and Asian American adults and children: the relation to autobiographical memory.

    PubMed

    Wang, Qi; Koh, Jessie Bee Kim; Song, Qingfang; Hou, Yubo

    2015-01-01

    This study investigated explicit knowledge of autobiographical memory functions using a newly developed questionnaire. European and Asian American adults (N = 57) and school-aged children (N = 68) indicated their agreement with 13 statements about why people think about and share memories pertaining to four broad functions-self, social, directive and emotion regulation. Children were interviewed for personal memories concurrently with the memory function knowledge assessment and again 3 months later. It was found that adults agreed to the self, social and directive purposes of memory to a greater extent than did children, whereas European American children agreed to the emotion regulation purposes of memory to a greater extent than did European American adults. Furthermore, European American children endorsed more self and emotion regulation functions than did Asian American children, whereas Asian American adults endorsed more directive functions than did European American adults. Children's endorsement of memory functions, particularly social functions, was associated with more detailed and personally meaningful memories. These findings are informative for the understanding of developmental and cultural influences on memory function knowledge and of the relation of such knowledge to autobiographical memory development.

  5. Remembering Nancy. 25 Members of the Montessori Community Share Their Reflections on the Death of the AMS Founder.

    ERIC Educational Resources Information Center

    Turner, Joy; And Others

    1995-01-01

    Twenty-five members of the Montessori community share their memories of Dr. Nancy McCormick Rambusch, charismatic founder of the American Montessori movement, early childhood professional, and innovative educator, who died of pancreatic cancer on October 27, 1994. Rambusch's work of 40 years now flowers as an institutionalized educational program…

  6. Shared Values as Anchors of a Learning Community: A Case Study in Information Systems Design

    ERIC Educational Resources Information Center

    Giordano, Daniela

    2004-01-01

    This paper examines the role in both individual and organizational learning of the system of values sustained by a community undertaking a design task. The discussion is based on the results of a longitudinal study of a community of novice information system designers supported by a Web-based shared design memory which allows reuse of design…

  7. Literacy outcomes of children with early childhood speech sound disorders: impact of endophenotypes.

    PubMed

    Lewis, Barbara A; Avrich, Allison A; Freebairn, Lisa A; Hansen, Amy J; Sucheston, Lara E; Kuo, Iris; Taylor, H Gerry; Iyengar, Sudha K; Stein, Catherine M

    2011-12-01

    To demonstrate that early childhood speech sound disorders (SSD) and later school-age reading, written expression, and spelling skills are influenced by shared endophenotypes that may be in part genetic. Children with SSD and their siblings were assessed at early childhood (ages 4-6 years) and followed at school age (7-12 years). The relationship of shared endophenotypes with early childhood SSD and school-age outcomes and the shared genetic influences on these outcomes were examined. Structural equation modeling demonstrated that oral motor skills, phonological awareness, phonological memory, vocabulary, and speeded naming have varying influences on reading decoding, spelling, spoken language, and written expression at school age. Genetic linkage studies demonstrated linkage for reading, spelling, and written expression measures to regions on chromosomes 1, 3, 6, and 15 that were previously linked to oral motor skills, articulation, phonological memory, and vocabulary at early childhood testing. Endophenotypes predict school-age literacy outcomes over and above that predicted by clinical diagnoses of SSD or language impairment. Findings suggest that these shared endophenotypes and common genetic influences affect early childhood SSD and later school-age reading, spelling, spoken language, and written expression skills.

  8. Literacy Outcomes of Children With Early Childhood Speech Sound Disorders: Impact of Endophenotypes

    PubMed Central

    Lewis, Barbara A.; Avrich, Allison A.; Freebairn, Lisa A.; Hansen, Amy J.; Sucheston, Lara E.; Kuo, Iris; Taylor, H. Gerry; Iyengar, Sudha K.; Stein, Catherine M.

    2012-01-01

    Purpose To demonstrate that early childhood speech sound disorders (SSD) and later school-age reading, written expression, and spelling skills are influenced by shared endophenotypes that may be in part genetic. Method Children with SSD and their siblings were assessed at early childhood (ages 4–6 years) and followed at school age (7–12 years). The relationship of shared endophenotypes with early childhood SSD and school-age outcomes and the shared genetic influences on these outcomes were examined. Results Structural equation modeling demonstrated that oral motor skills, phonological awareness, phonological memory, vocabulary, and speeded naming have varying influences on reading decoding, spelling, spoken language, and written expression at school age. Genetic linkage studies demonstrated linkage for reading, spelling, and written expression measures to regions on chromosomes 1, 3, 6, and 15 that were previously linked to oral motor skills, articulation, phonological memory, and vocabulary at early childhood testing. Conclusions Endophenotypes predict school-age literacy outcomes over and above that predicted by clinical diagnoses of SSD or language impairment. Findings suggest that these shared endophenotypes and common genetic influences affect early childhood SSD and later school-age reading, spelling, spoken language, and written expression skills. PMID:21930616

  9. A Biometric Latent Curve Analysis of Memory Decline in Older Men of the NAS-NRC Twin Registry

    PubMed Central

    McArdle, John J.; Plassman, Brenda L.

    2010-01-01

    Previous research has shown cognitive abilities to have different biometric patterns of age-changes. Here we examined the variation in episodic memory (Words Recalled) for over 6,000 twin pairs who were initially aged 59-75, and were subsequently re-assessed up to three more times over 12 years. In cross-sectional analyses, variation in Education was explained by strong additive genetic influences (~43%) together with shared family influences (~35%) that were independent of age. The longitudinal phenotypic analysis of the Word Recall task showed systematic linear declines over age, but with positive influences of Education and Retesting. The longitudinal biometric estimation yielded: (a) A separation of non-shared environmental influences and transient measurement error (~50%): (b) Strong additive genetic components of this latent curve (~70% at age 60) with increases over age that reach about 90% by age 90. (c) The minor influences of shared family environment (~17% at age 60) were effectively eliminated by age 75. (d) Non-shared environmental effects play an important role over most of the life-span (peak of 42% at age 70) but their relative role diminishes after age 75. PMID:19404731

  10. Method for programming a flash memory

    DOEpatents

    Brosky, Alexander R.; Locke, William N.; Maher, Conrado M.

    2016-08-23

    A method of programming a flash memory is described. The method includes partitioning a flash memory into a first group having a first level of write-protection, a second group having a second level of write-protection, and a third group having a third level of write-protection. The write-protection of the second and third groups is disabled using an installation adapter. The third group is programmed using a Software Installation Device.

  11. Efficient accesses of data structures using processing near memory

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jayasena, Nuwan S.; Zhang, Dong Ping; Diez, Paula Aguilera

    Systems, apparatuses, and methods for implementing efficient queues and other data structures. A queue may be shared among multiple processors and/or threads without using explicit software atomic instructions to coordinate access to the queue. System software may allocate an atomic queue and corresponding queue metadata in system memory and return, to the requesting thread, a handle referencing the queue metadata. Any number of threads may utilize the handle for accessing the atomic queue. The logic for ensuring the atomicity of accesses to the atomic queue may reside in a management unit in the memory controller coupled to the memory wheremore » the atomic queue is allocated.« less

  12. Shared reality in interpersonal relationships.

    PubMed

    Andersen, Susan M; Przybylinski, Elizabeth

    2017-11-24

    Close relationships afford us opportunities to create and maintain meaning systems as shared perceptions of ourselves and the world. Establishing a sense of mutual understanding allows for creating and maintaining lasting social bonds, and as such, is important in human relations. In a related vein, it has long been known that knowledge of significant others in one's life is stored in memory and evoked with new persons-in the social-cognitive process of 'transference'-imbuing new encounters with significance and leading to predictable cognitive, evaluative, motivational, and behavioral consequences, as well as shifts in the self and self-regulation, depending on the particular significant other evoked. In these pages, we briefly review the literature on meaning as interpersonally defined and then selectively review research on transference in interpersonal perception. Based on this, we then highlight a recent series of studies focused on shared meaning systems in transference. The highlighted studies show that values and beliefs that develop in close relationships (as shared reality) are linked in memory to significant-other knowledge, and thus, are indirectly activated (made accessible) when cues in a new person implicitly activate that significant-other knowledge (in transference), with these shared beliefs then actively pursued with the new person and even protected against threat. This also confers a sense of mutual understanding, and all told, serves both relational and epistemic functions. In concluding, we consider as well the relevance of co-construction of shared reality n such processes. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. 76 FR 55000 - Notice of Agricultural Management Assistance Organic Certification Cost-Share Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-09-06

    ...] Notice of Agricultural Management Assistance Organic Certification Cost-Share Program AGENCY... Departments of Agriculture for the Agricultural Management Assistance Organic Certification Cost-Share Program... organic certification cost-share funds. The AMS has allocated $1.5 million for this organic certification...

  14. 78 FR 5164 - Notice of Agricultural Management Assistance Organic Certification Cost-Share Program

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-01-24

    ...] Notice of Agricultural Management Assistance Organic Certification Cost-Share Program AGENCY... Departments of Agriculture for the Agricultural Management Assistance Organic Certification Cost-Share Program... organic certification cost-share funds. The AMS has allocated $1.425 million for this organic...

  15. Memory bias for negative emotional words in recognition memory is driven by effects of category membership

    PubMed Central

    White, Corey N.; Kapucu, Aycan; Bruno, Davide; Rotello, Caren M.; Ratcliff, Roger

    2014-01-01

    Recognition memory studies often find that emotional items are more likely than neutral items to be labeled as studied. Previous work suggests this bias is driven by increased memory strength/familiarity for emotional items. We explored strength and bias interpretations of this effect with the conjecture that emotional stimuli might seem more familiar because they share features with studied items from the same category. Categorical effects were manipulated in a recognition task by presenting lists with a small, medium, or large proportion of emotional words. The liberal memory bias for emotional words was only observed when a medium or large proportion of categorized words were presented in the lists. Similar, though weaker, effects were observed with categorized words that were not emotional (animal names). These results suggest that liberal memory bias for emotional items may be largely driven by effects of category membership. PMID:24303902

  16. Memory: Enduring Traces of Perceptual and Reflective Attention

    PubMed Central

    Chun, Marvin M.; Johnson, Marcia K.

    2011-01-01

    Attention and memory are typically studied as separate topics, but they are highly intertwined. Here we discuss the relation between memory and two fundamental types of attention: perceptual and reflective. Memory is the persisting consequence of cognitive activities initiated by and/or focused on external information from the environment (perceptual attention) and initiated by and/or focused on internal mental representations (reflective attention). We consider three key questions for advancing a cognitive neuroscience of attention and memory: To what extent do perception and reflection share representational areas? To what extent are the control processes that select, maintain, and manipulate perceptual and reflective information subserved by common areas and networks? During perception and reflection, to what extent are common areas responsible for binding features together to create complex, episodic memories and for reviving them later? Considering similarities and differences in perceptual and reflective attention helps integrate a broad range of findings and raises important unresolved issues. PMID:22099456

  17. Memory bias for negative emotional words in recognition memory is driven by effects of category membership.

    PubMed

    White, Corey N; Kapucu, Aycan; Bruno, Davide; Rotello, Caren M; Ratcliff, Roger

    2014-01-01

    Recognition memory studies often find that emotional items are more likely than neutral items to be labelled as studied. Previous work suggests this bias is driven by increased memory strength/familiarity for emotional items. We explored strength and bias interpretations of this effect with the conjecture that emotional stimuli might seem more familiar because they share features with studied items from the same category. Categorical effects were manipulated in a recognition task by presenting lists with a small, medium or large proportion of emotional words. The liberal memory bias for emotional words was only observed when a medium or large proportion of categorised words were presented in the lists. Similar, though weaker, effects were observed with categorised words that were not emotional (animal names). These results suggest that liberal memory bias for emotional items may be largely driven by effects of category membership.

  18. Working Memory and Parent-Rated Components of Attention in Middle Childhood: A Behavioral Genetic Study

    PubMed Central

    Deater-Deckard, Kirby; Cutting, Laurie; Thompson, Lee A.; Petrill, Stephen A.

    2012-01-01

    The purpose of the current study was to investigate potential genetic and environmental correlations between working memory and three behavioral aspects of the attention network (i.e., executive, alerting, and orienting) using a twin design. Data were from 90 monozygotic (39% male) and 112 same-sex dizygotic (41% male) twins. Individual differences in working memory performance (digit span) and parent-rated measures of executive, alerting, and orienting attention included modest to moderate genetic variance, modest shared environmental variance, and modest to moderate nonshared environmental variance. As hypothesized, working memory performance was correlated with executive and alerting attention, but not orienting attention. The correlation between working memory, executive attention, and alerting attention was completely accounted for by overlapping genetic covariance, suggesting a common genetic mechanism or mechanisms underlying the links between working memory and certain parent-rated indicators of attentive behavior. PMID:21948215

  19. Developmental reversals in false memory: Effects of emotional valence and arousal.

    PubMed

    Brainerd, C J; Holliday, R E; Reyna, V F; Yang, Y; Toglia, M P

    2010-10-01

    Do the emotional valence and arousal of events distort children's memories? Do valence and arousal modulate counterintuitive age increases in false memory? We investigated those questions in children, adolescents, and adults using the Cornell/Cortland Emotion Lists, a word list pool that induces false memories and in which valence and arousal can be manipulated factorially. False memories increased with age for unpresented semantic associates of word lists, and net accuracy (the ratio of true memory to total memory) decreased with age. These surprising developmental trends were more pronounced for negatively valenced materials than for positively valenced materials, they were more pronounced for high-arousal materials than for low-arousal materials, and developmental increases in the effects of arousal were small in comparison with developmental increases in the effects of valence. These findings have ramifications for legal applications of false memory research; materials that share the emotional hallmark of crimes (events that are negatively valenced and arousing) produced the largest age increases in false memory and the largest age declines in net accuracy. Copyright 2010 Elsevier Inc. All rights reserved.

  20. Spatial working memory interferes with explicit, but not probabilistic cuing of spatial attention.

    PubMed

    Won, Bo-Yeong; Jiang, Yuhong V

    2015-05-01

    Recent empirical and theoretical work has depicted a close relationship between visual attention and visual working memory. For example, rehearsal in spatial working memory depends on spatial attention, whereas adding a secondary spatial working memory task impairs attentional deployment in visual search. These findings have led to the proposal that working memory is attention directed toward internal representations. Here, we show that the close relationship between these 2 constructs is limited to some but not all forms of spatial attention. In 5 experiments, participants held color arrays, dot locations, or a sequence of dots in working memory. During the memory retention interval, they performed a T-among-L visual search task. Crucially, the probable target location was cued either implicitly through location probability learning or explicitly with a central arrow or verbal instruction. Our results showed that whereas imposing a visual working memory load diminished the effectiveness of explicit cuing, it did not interfere with probability cuing. We conclude that spatial working memory shares similar mechanisms with explicit, goal-driven attention but is dissociated from implicitly learned attention. (c) 2015 APA, all rights reserved).

Top