distributed shared memory: Topics by Science.gov

Sample records for distributed shared memory

Shared versus distributed memory multiprocessors

NASA Technical Reports Server (NTRS)

Jordan, Harry F.

1991-01-01

The question of whether multiprocessors should have shared or distributed memory has attracted a great deal of attention. Some researchers argue strongly for building distributed memory machines, while others argue just as strongly for programming shared memory multiprocessors. A great deal of research is underway on both types of parallel systems. Special emphasis is placed on systems with a very large number of processors for computation intensive tasks and considers research and implementation trends. It appears that the two types of systems will likely converge to a common form for large scale multiprocessors.
Distributed simulation using a real-time shared memory network

NASA Technical Reports Server (NTRS)

Simon, Donald L.; Mattern, Duane L.; Wong, Edmond; Musgrave, Jeffrey L.

1993-01-01

The Advanced Control Technology Branch of the NASA Lewis Research Center performs research in the area of advanced digital controls for aeronautic and space propulsion systems. This work requires the real-time implementation of both control software and complex dynamical models of the propulsion system. We are implementing these systems in a distributed, multi-vendor computer environment. Therefore, a need exists for real-time communication and synchronization between the distributed multi-vendor computers. A shared memory network is a potential solution which offers several advantages over other real-time communication approaches. A candidate shared memory network was tested for basic performance. The shared memory network was then used to implement a distributed simulation of a ramjet engine. The accuracy and execution time of the distributed simulation was measured and compared to the performance of the non-partitioned simulation. The ease of partitioning the simulation, the minimal time required to develop for communication between the processors and the resulting execution time all indicate that the shared memory network is a real-time communication technique worthy of serious consideration.
Supporting shared data structures on distributed memory architectures

NASA Technical Reports Server (NTRS)

Koelbel, Charles; Mehrotra, Piyush; Vanrosendale, John

1990-01-01

Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece owned by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. A new programming environment is presented for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. The analysis and program transformations required to implement this environment are described, and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes are described.
Implementation of a parallel unstructured Euler solver on shared and distributed memory architectures

NASA Technical Reports Server (NTRS)

Mavriplis, D. J.; Das, Raja; Saltz, Joel; Vermeland, R. E.

1992-01-01

An efficient three dimensional unstructured Euler solver is parallelized on a Cray Y-MP C90 shared memory computer and on an Intel Touchstone Delta distributed memory computer. This paper relates the experiences gained and describes the software tools and hardware used in this study. Performance comparisons between two differing architectures are made.
Combining Distributed and Shared Memory Models: Approach and Evolution of the Global Arrays Toolkit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nieplocha, Jarek; Harrison, Robert J.; Kumar, Mukul

2002-07-29

Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in the modern computers this characteristic might have a negative impact on performance and scalability. Various techniques, such as code restructuring to increase data reuse and introducing blocking in data accesses, can address the problem and yield performance competitive with message passing[Singh], however at the cost of compromising the ease of use feature. Distributed memory models such as message passing or one-sided communication offer performance and scalability butmore » they compromise the ease-of-use. In this context, the message-passing model is sometimes referred to as?assembly programming for the scientific computing?. The Global Arrays toolkit[GA1, GA2] attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed explicitly by the programmer. This management is achieved by explicit calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be explicitly specified and hence managed. The GA model exposes to the programmer the hierarchical memory of modern high-performance computer systems, and by recognizing the communication overhead for remote data transfer, it promotes data reuse and locality of reference. This paper describes the characteristics of the Global Arrays programming model, capabilities of the toolkit, and discusses its evolution.« less
Mnemonic transmission, social contagion, and emergence of collective memory: Influence of emotional valence, group structure, and information distribution.

PubMed

Choi, Hae-Yoon; Kensinger, Elizabeth A; Rajaram, Suparna

2017-09-01

Social transmission of memory and its consequence on collective memory have generated enduring interdisciplinary interest because of their widespread significance in interpersonal, sociocultural, and political arenas. We tested the influence of 3 key factors-emotional salience of information, group structure, and information distribution-on mnemonic transmission, social contagion, and collective memory. Participants individually studied emotionally salient (negative or positive) and nonemotional (neutral) picture-word pairs that were completely shared, partially shared, or unshared within participant triads, and then completed 3 consecutive recalls in 1 of 3 conditions: individual-individual-individual (control), collaborative-collaborative (identical group; insular structure)-individual, and collaborative-collaborative (reconfigured group; diverse structure)-individual. Collaboration enhanced negative memories especially in insular group structure and especially for shared information, and promoted collective forgetting of positive memories. Diverse group structure reduced this negativity effect. Unequally distributed information led to social contagion that creates false memories; diverse structure propagated a greater variety of false memories whereas insular structure promoted confidence in false recognition and false collective memory. A simultaneous assessment of network structure, information distribution, and emotional valence breaks new ground to specify how network structure shapes the spread of negative memories and false memories, and the emergence of collective memory. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Address tracing for parallel machines

NASA Technical Reports Server (NTRS)

Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent

1991-01-01

Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
Comparison of two paradigms for distributed shared memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Levelt, W.G.; Kaashoek, M.F.; Bal, H.E.

1990-08-01

The paper compares two paradigms for Distributed Shared Memory on loosely coupled computing systems: the shared data-object model as used in Orca, a programming language specially designed for loosely coupled computing systems and the Shared Virtual Memory model. For both paradigms the authors have implemented two systems, one using only point-to-point messages, the other using broadcasting as well. They briefly describe these two paradigms and their implementations. Then they compare their performance on four applications: the traveling salesman problem, alpha-beta search, matrix multiplication and the all pairs shortest paths problem. The measurements show that both paradigms can be used efficientlymore » for programming large-grain parallel applications. Significant speedups were obtained on all applications. The unstructured Shared Virtual Memory paradigm achieves the best absolute performance, although this is largely due to the preliminary nature of the Orca compiler used. The structured shared data-object model achieves the highest speedups and is much easier to program and to debug.« less
Parallel computing for probabilistic fatigue analysis

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.

1993-01-01

This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry

1998-01-01

This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
Effects of cacheing on multitasking efficiency and programming strategy on an ELXSI 6400

DOE Office of Scientific and Technical Information (OSTI.GOV)

Montry, G.R.; Benner, R.E.

1985-12-01

The impact of a cache/shared memory architecture, and, in particular, the cache coherency problem, upon concurrent algorithm and program development is discussed. In this context, a simple set of programming strategies are proposed which streamline code development and improve code performance when multitasking in a cache/shared memory or distributed memory environment.
Memory access in shared virtual memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berrendorf, R.

1992-01-01

Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Memory access in shared virtual memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berrendorf, R.

1992-09-01

Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Secchi, Simone; Tumeo, Antonino; Villa, Oreste

Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less
Vascular system modeling in parallel environment - distributed and shared memory approaches

PubMed Central

Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne

2011-01-01

The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
Discrete-Slots Models of Visual Working-Memory Response Times

PubMed Central

Donkin, Christopher; Nosofsky, Robert M.; Gold, Jason M.; Shiffrin, Richard M.

2014-01-01

Much recent research has aimed to establish whether visual working memory (WM) is better characterized by a limited number of discrete all-or-none slots or by a continuous sharing of memory resources. To date, however, researchers have not considered the response-time (RT) predictions of discrete-slots versus shared-resources models. To complement the past research in this field, we formalize a family of mixed-state, discrete-slots models for explaining choice and RTs in tasks of visual WM change detection. In the tasks under investigation, a small set of visual items is presented, followed by a test item in 1 of the studied positions for which a change judgment must be made. According to the models, if the studied item in that position is retained in 1 of the discrete slots, then a memory-based evidence-accumulation process determines the choice and the RT; if the studied item in that position is missing, then a guessing-based accumulation process operates. Observed RT distributions are therefore theorized to arise as probabilistic mixtures of the memory-based and guessing distributions. We formalize an analogous set of continuous shared-resources models. The model classes are tested on individual subjects with both qualitative contrasts and quantitative fits to RT-distribution data. The discrete-slots models provide much better qualitative and quantitative accounts of the RT and choice data than do the shared-resources models, although there is some evidence for “slots plus resources” when memory set size is very small. PMID:24015956
Parallel processing for scientific computations

NASA Technical Reports Server (NTRS)

Alkhatib, Hasan S.

1995-01-01

The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.
Memory Network For Distributed Data Processors

NASA Technical Reports Server (NTRS)

Bolen, David; Jensen, Dean; Millard, ED; Robinson, Dave; Scanlon, George

1992-01-01

Universal Memory Network (UMN) is modular, digital data-communication system enabling computers with differing bus architectures to share 32-bit-wide data between locations up to 3 km apart with less than one millisecond of latency. Makes it possible to design sophisticated real-time and near-real-time data-processing systems without data-transfer "bottlenecks". This enterprise network permits transmission of volume of data equivalent to an encyclopedia each second. Facilities benefiting from Universal Memory Network include telemetry stations, simulation facilities, power-plants, and large laboratories or any facility sharing very large volumes of data. Main hub of UMN is reflection center including smaller hubs called Shared Memory Interfaces.
Rapid solution of large-scale systems of equations

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.

1994-01-01

The analysis and design of complex aerospace structures requires the rapid solution of large systems of linear and nonlinear equations, eigenvalue extraction for buckling, vibration and flutter modes, structural optimization and design sensitivity calculation. Computers with multiple processors and vector capabilities can offer substantial computational advantages over traditional scalar computer for these analyses. These computers fall into two categories: shared memory computers and distributed memory computers. This presentation covers general-purpose, highly efficient algorithms for generation/assembly or element matrices, solution of systems of linear and nonlinear equations, eigenvalue and design sensitivity analysis and optimization. All algorithms are coded in FORTRAN for shared memory computers and many are adapted to distributed memory computers. The capability and numerical performance of these algorithms will be addressed.
Multiprocessor shared-memory information exchange

DOE Office of Scientific and Technical Information (OSTI.GOV)

Santoline, L.L.; Bowers, M.D.; Crew, A.W.

1989-02-01

In distributed microprocessor-based instrumentation and control systems, the inter-and intra-subsystem communication requirements ultimately form the basis for the overall system architecture. This paper describes a software protocol which addresses the intra-subsystem communications problem. Specifically the protocol allows for multiple processors to exchange information via a shared-memory interface. The authors primary goal is to provide a reliable means for information to be exchanged between central application processor boards (masters) and dedicated function processor boards (slaves) in a single computer chassis. The resultant Multiprocessor Shared-Memory Information Exchange (MSMIE) protocol, a standard master-slave shared-memory interface suitable for use in nuclear safety systems, ismore » designed to pass unidirectional buffers of information between the processors while providing a minimum, deterministic cycle time for this data exchange.« less

Shared prefetching to reduce execution skew in multi-threaded systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Eichenberger, Alexandre E; Gunnels, John A

Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated basedmore » on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.« less
Technical support for digital systems technology development. Task order 1: ISP contention analysis and control

NASA Technical Reports Server (NTRS)

Stehle, Roy H.; Ogier, Richard G.

1993-01-01

Alternatives for realizing a packet-based network switch for use on a frequency division multiple access/time division multiplexed (FDMA/TDM) geostationary communication satellite were investigated. Each of the eight downlink beams supports eight directed dwells. The design needed to accommodate multicast packets with very low probability of loss due to contention. Three switch architectures were designed and analyzed. An output-queued, shared bus system yielded a functionally simple system, utilizing a first-in, first-out (FIFO) memory per downlink dwell, but at the expense of a large total memory requirement. A shared memory architecture offered the most efficiency in memory requirements, requiring about half the memory of the shared bus design. The processing requirement for the shared-memory system adds system complexity that may offset the benefits of the smaller memory. An alternative design using a shared memory buffer per downlink beam decreases circuit complexity through a distributed design, and requires at most 1000 packets of memory more than the completely shared memory design. Modifications to the basic packet switch designs were proposed to accommodate circuit-switched traffic, which must be served on a periodic basis with minimal delay. Methods for dynamically controlling the downlink dwell lengths were developed and analyzed. These methods adapt quickly to changing traffic demands, and do not add significant complexity or cost to the satellite and ground station designs. Methods for reducing the memory requirement by not requiring the satellite to store full packets were also proposed and analyzed. In addition, optimal packet and dwell lengths were computed as functions of memory size for the three switch architectures.
System and method for programmable bank selection for banked memory subsystems

DOEpatents

Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Hoenicke, Dirk; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan

2010-09-07

A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each of the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.
Distributed-Memory Fast Maximal Independent Set

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kanewala Appuhamilage, Thejaka Amila J.; Zalewski, Marcin J.; Lumsdaine, Andrew

The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby’s seminal MIS algorithms, “Luby(A)” and “Luby(B),” to distributed-memory execution, and we evaluatemore » their performance. We compare our results with the “Filtered MIS” implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.« less
Vienna FORTRAN: A FORTRAN language extension for distributed memory multiprocessors

NASA Technical Reports Server (NTRS)

Chapman, Barbara; Mehrotra, Piyush; Zima, Hans

1991-01-01

Exploiting the performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna FORTRAN is a language extension of FORTRAN which provides the user with a wide range of facilities for such mapping of data structures. However, programs in Vienna FORTRAN are written using global data references. Thus, the user has the advantage of a shared memory programming paradigm while explicitly controlling the placement of data. The basic features of Vienna FORTRAN are presented along with a set of examples illustrating the use of these features.
Programming distributed memory architectures using Kali

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Vanrosendale, John

1990-01-01

Programming nonshared memory systems is more difficult than programming shared memory systems, in part because of the relatively low level of current programming environments for such machines. A new programming environment is presented, Kali, which provides a global name space and allows direct access to remote data values. In order to retain efficiency, Kali provides a system on annotations, allowing the user to control those aspects of the program critical to performance, such as data distribution and load balancing. The primitives and constructs provided by the language is described, and some of the issues raised in translating a Kali program for execution on distributed memory systems are also discussed.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Shuangshuang; Chen, Yousu; Wu, Di

2015-12-09

Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
Parallel discrete event simulation using shared memory

NASA Technical Reports Server (NTRS)

Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

1988-01-01

With traditional event-list techniques, evaluating a detailed discrete-event simulation-model can often require hours or even days of computation time. By eliminating the event list and maintaining only sufficient synchronization to ensure causality, parallel simulation can potentially provide speedups that are linear in the numbers of processors. A set of shared-memory experiments, using the Chandy-Misra distributed-simulation algorithm, to simulate networks of queues is presented. Parameters of the study include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential-simulation of most queueing network models.
Parallelization of KENO-Va Monte Carlo code

NASA Astrophysics Data System (ADS)

Ramón, Javier; Peña, Jorge

1995-07-01

KENO-Va is a code integrated within the SCALE system developed by Oak Ridge that solves the transport equation through the Monte Carlo Method. It is being used at the Consejo de Seguridad Nuclear (CSN) to perform criticality calculations for fuel storage pools and shipping casks. Two parallel versions of the code: one for shared memory machines and other for distributed memory systems using the message-passing interface PVM have been generated. In both versions the neutrons of each generation are tracked in parallel. In order to preserve the reproducibility of the results in both versions, advanced seeds for random numbers were used. The CONVEX C3440 with four processors and shared memory at CSN was used to implement the shared memory version. A FDDI network of 6 HP9000/735 was employed to implement the message-passing version using proprietary PVM. The speedup obtained was 3.6 in both cases.
Execution time supports for adaptive scientific algorithms on distributed memory machines

NASA Technical Reports Server (NTRS)

Berryman, Harry; Saltz, Joel; Scroggs, Jeffrey

1990-01-01

Optimizations are considered that are required for efficient execution of code segments that consists of loops over distributed data structures. The PARTI (Parallel Automated Runtime Toolkit at ICASE) execution time primitives are designed to carry out these optimizations and can be used to implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to carry out gather and scatter operations on distributed arrays. Communications patterns are derived at runtime, and the appropriate send and receive messages are automatically generated.
Execution time support for scientific programs on distributed memory machines

NASA Technical Reports Server (NTRS)

Berryman, Harry; Saltz, Joel; Scroggs, Jeffrey

1990-01-01

Optimizations are considered that are required for efficient execution of code segments that consists of loops over distributed data structures. The PARTI (Parallel Automated Runtime Toolkit at ICASE) execution time primitives are designed to carry out these optimizations and can be used to implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to carry out gather and scatter operations on distributed arrays. Communications patterns are derived at runtime, and the appropriate send and receive messages are automatically generated.
Location-Unbound Color-Shape Binding Representations in Visual Working Memory.

PubMed

Saiki, Jun

2016-02-01

The mechanism by which nonspatial features, such as color and shape, are bound in visual working memory, and the role of those features' location in their binding, remains unknown. In the current study, I modified a redundancy-gain paradigm to investigate these issues. A set of features was presented in a two-object memory display, followed by a single object probe. Participants judged whether the probe contained any features of the memory display, regardless of its location. Response time distributions revealed feature coactivation only when both features of a single object in the memory display appeared together in the probe, regardless of the response time benefit from the probe and memory objects sharing the same location. This finding suggests that a shared location is necessary in the formation of bound representations but unnecessary in their maintenance. Electroencephalography data showed that amplitude modulations reflecting location-unbound feature coactivation were different from those reflecting the location-sharing benefit, consistent with the behavioral finding that feature-location binding is unnecessary in the maintenance of color-shape binding. © The Author(s) 2015.
Virtual memory support for distributed computing environments using a shared data object model

NASA Astrophysics Data System (ADS)

Huang, F.; Bacon, J.; Mapp, G.

1995-12-01

Conventional storage management systems provide one interface for accessing memory segments and another for accessing secondary storage objects. This hinders application programming and affects overall system performance due to mandatory data copying and user/kernel boundary crossings, which in the microkernel case may involve context switches. Memory-mapping techniques may be used to provide programmers with a unified view of the storage system. This paper extends such techniques to support a shared data object model for distributed computing environments in which good support for coherence and synchronization is essential. The approach is based on a microkernel, typed memory objects, and integrated coherence control. A microkernel architecture is used to support multiple coherence protocols and the addition of new protocols. Memory objects are typed and applications can choose the most suitable protocols for different types of object to avoid protocol mismatch. Low-level coherence control is integrated with high-level concurrency control so that the number of messages required to maintain memory coherence is reduced and system-wide synchronization is realized without severely impacting the system performance. These features together contribute a novel approach to the support for flexible coherence under application control.
Parallel Programming Paradigms

DTIC Science & Technology

1987-07-01

Unclassified IS.. DECLASSIFICATIONIOOWNGRADIN G 16. DISTRIBUTION STATEMENT (of this Report) Distribution of this report is unlimited. 17...8416878 and by the Office of Naval Research Contracts No. N00014-86-K-0264 and No. N00014-85- K-0328. 8 ?~~ O . G 1 49 II Parallel Programming Paradigms...processors -. "to fetch from the same memory cell (list head) and thus seems to favor a shared memory - g implementation [37). In this dissertation, we
Distributed shared memory for roaming large volumes.

PubMed

Castanié, Laurent; Mion, Christophe; Cavin, Xavier; Lévy, Bruno

2006-01-01

We present a cluster-based volume rendering system for roaming very large volumes. This system allows to move a gigabyte-sized probe inside a total volume of several tens or hundreds of gigabytes in real-time. While the size of the probe is limited by the total amount of texture memory on the cluster, the size of the total data set has no theoretical limit. The cluster is used as a distributed graphics processing unit that both aggregates graphics power and graphics memory. A hardware-accelerated volume renderer runs in parallel on the cluster nodes and the final image compositing is implemented using a pipelined sort-last rendering algorithm. Meanwhile, volume bricking and volume paging allow efficient data caching. On each rendering node, a distributed hierarchical cache system implements a global software-based distributed shared memory on the cluster. In case of a cache miss, this system first checks page residency on the other cluster nodes instead of directly accessing local disks. Using two Gigabit Ethernet network interfaces per node, we accelerate data fetching by a factor of 4 compared to directly accessing local disks. The system also implements asynchronous disk access and texture loading, which makes it possible to overlap data loading, volume slicing and rendering for optimal volume roaming.
Reducing Interprocessor Dependence in Recoverable Distributed Shared Memory

NASA Technical Reports Server (NTRS)

Janssens, Bob; Fuchs, W. Kent

1994-01-01

Checkpointing techniques in parallel systems use dependency tracking and/or message logging to ensure that a system rolls back to a consistent state. Traditional dependency tracking in distributed shared memory (DSM) systems is expensive because of high communication frequency. In this paper we show that, if designed correctly, a DSM system only needs to consider dependencies due to the transfer of blocks of data, resulting in reduced dependency tracking overhead and reduced potential for rollback propagation. We develop an ownership timestamp scheme to tolerate the loss of block state information and develop a passive server model of execution where interactions between processors are considered atomic. With our scheme, dependencies are significantly reduced compared to the traditional message-passing model.
An experimental distributed microprocessor implementation with a shared memory communications and control medium

NASA Technical Reports Server (NTRS)

Mejzak, R. S.

1980-01-01

The distributed processing concept is defined in terms of control primitives, variables, and structures and their use in performing a decomposed discrete Fourier transform (DET) application function. The design assumes interprocessor communications to be anonymous. In this scheme, all processors can access an entire common database by employing control primitives. Access to selected areas within the common database is random, enforced by a hardware lock, and determined by task and subtask pointers. This enables the number of processors to be varied in the configuration without any modifications to the control structure. Decompositional elements of the DFT application function in terms of tasks and subtasks are also described. The experimental hardware configuration consists of IMSAI 8080 chassis which are independent, 8 bit microcomputer units. These chassis are linked together to form a multiple processing system by means of a shared memory facility. This facility consists of hardware which provides a bus structure to enable up to six microcomputers to be interconnected. It provides polling and arbitration logic so that only one processor has access to shared memory at any one time.
Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

NASA Astrophysics Data System (ADS)

Calafiura, Paolo; Leggett, Charles; Seuster, Rolf; Tsulaia, Vakhtang; Van Gemmeren, Peter

2015-12-01

AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write mechanisms, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows the running of AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.
Automated quantitative muscle biopsy analysis system

NASA Technical Reports Server (NTRS)

Castleman, Kenneth R. (Inventor)

1980-01-01

An automated system to aid the diagnosis of neuromuscular diseases by producing fiber size histograms utilizing histochemically stained muscle biopsy tissue. Televised images of the microscopic fibers are processed electronically by a multi-microprocessor computer, which isolates, measures, and classifies the fibers and displays the fiber size distribution. The architecture of the multi-microprocessor computer, which is iterated to any required degree of complexity, features a series of individual microprocessors P.sub.n each receiving data from a shared memory M.sub.n-1 and outputing processed data to a separate shared memory M.sub.n+1 under control of a program stored in dedicated memory M.sub.n.

Parallel performance investigations of an unstructured mesh Navier-Stokes solver

NASA Technical Reports Server (NTRS)

Mavriplis, Dimitri J.

2000-01-01

A Reynolds-averaged Navier-Stokes solver based on unstructured mesh techniques for analysis of high-lift configurations is described. The method makes use of an agglomeration multigrid solver for convergence acceleration. Implicit line-smoothing is employed to relieve the stiffness associated with highly stretched meshes. A GMRES technique is also implemented to speed convergence at the expense of additional memory usage. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Convergence and scalability results are illustrated for various high-lift cases.
Cooperative Data Sharing: Simple Support for Clusters of SMP Nodes

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Balley, David H. (Technical Monitor)

1997-01-01

Libraries like PVM and MPI send typed messages to allow for heterogeneous cluster computing. Lower-level libraries, such as GAM, provide more efficient access to communication by removing the need to copy messages between the interface and user space in some cases. still lower-level interfaces, such as UNET, get right down to the hardware level to provide maximum performance. However, these are all still interfaces for passing messages from one process to another, and have limited utility in a shared-memory environment, due primarily to the fact that message passing is just another term for copying. This drawback is made more pertinent by today's hybrid architectures (e.g. clusters of SMPs), where it is difficult to know beforehand whether two communicating processes will share memory. As a result, even portable language tools (like HPF compilers) must either map all interprocess communication, into message passing with the accompanying performance degradation in shared memory environments, or they must check each communication at run-time and implement the shared-memory case separately for efficiency. Cooperative Data Sharing (CDS) is a single user-level API which abstracts all communication between processes into the sharing and access coordination of memory regions, in a model which might be described as "distributed shared messages" or "large-grain distributed shared memory". As a result, the user programs to a simple latency-tolerant abstract communication specification which can be mapped efficiently to either a shared-memory or message-passing based run-time system, depending upon the available architecture. Unlike some distributed shared memory interfaces, the user still has complete control over the assignment of data to processors, the forwarding of data to its next likely destination, and the queuing of data until it is needed, so even the relatively high latency present in clusters can be accomodated. CDS does not require special use of an MMU, which can add overhead to some DSM systems, and does not require an SPMD programming model. unlike some message-passing interfaces, CDS allows the user to implement efficient demand-driven applications where processes must "fight" over data, and does not perform copying if processes share memory and do not attempt concurrent writes. CDS also supports heterogeneous computing, dynamic process creation, handlers, and a very simple thread-arbitration mechanism. Additional support for array subsections is currently being considered. The CDS1 API, which forms the kernel of CDS, is built primarily upon only 2 communication primitives, one process initiation primitive, and some data translation (and marshalling) routines, memory allocation routines, and priority control routines. The entire current collection of 28 routines provides enough functionality to implement most (or all) of MPI 1 and 2, which has a much larger interface consisting of hundreds of routines. still, the API is small enough to consider integrating into standard os interfaces for handling inter-process communication in a network-independent way. This approach would also help to solve many of the problems plaguing other higher-level standards such as MPI and PVM which must, in some cases, "play OS" to adequately address progress and process control issues. The CDS2 API, a higher level of interface roughly equivalent in functionality to MPI and to be built entirely upon CDS1, is still being designed. It is intended to add support for the equivalent of communicators, reduction and other collective operations, process topologies, additional support for process creation, and some automatic memory management. CDS2 will not exactly match MPI, because the copy-free semantics of communication from CDS1 will be supported. CDS2 application programs will be free to carefully also use CDS1. CDS1 has been implemented on networks of workstations running unmodified Unix-based operating systems, using UDP/IP and vendor-supplied high- performance locks. Although its inter-node performance is currently unimpressive due to rudimentary implementation technique, it even now outperforms highly-optimized MPI implementation on intra-node communication due to its support for non-copy communication. The similarity of the CDS1 architecture to that of other projects such as UNET and TRAP suggests that the inter-node performance can be increased significantly to surpass MPI or PVM, and it may be possible to migrate some of its functionality to communication controllers.
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)

2001-01-01

This viewgraph presentation provides information on the technical aspects of debugging computer code that has been automatically converted for use in a parallel computing system. Shared memory parallelization and distributed memory parallelization entail separate and distinct challenges for a debugging program. A prototype system has been developed which integrates various tools for the debugging of automatically parallelized programs including the CAPTools Database which provides variable definition information across subroutines as well as array distribution information.
Efficient iteration in data-parallel programs with irregular and dynamically distributed data structures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Littlefield, R.J.

1990-02-01

To implement an efficient data-parallel program on a non-shared memory MIMD multicomputer, data and computations must be properly partitioned to achieve good load balance and locality of reference. Programs with irregular data reference patterns often require irregular partitions. Although good partitions may be easy to determine, they can be difficult or impossible to implement in programming languages that provide only regular data distributions, such as blocked or cyclic arrays. We are developing Onyx, a programming system that provides a shared memory model of distributed data structures and extends the concept of data distribution to include irregular and dynamic distributions. Thismore » provides a powerful means to specify irregular partitions. Perhaps surprisingly, programs using it can also execute efficiently. In this paper, we describe and evaluate the Onyx implementation of a model problem that repeatedly executes an irregular but fixed data reference pattern. On an NCUBE hypercube, the speed of the Onyx implementation is comparable to that of carefully handwritten message-passing code.« less
Programming in Vienna Fortran

NASA Technical Reports Server (NTRS)

Chapman, Barbara; Mehrotra, Piyush; Zima, Hans

1992-01-01

Exploiting the full performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna Fortran is a language extension of Fortran which provides the user with a wide range of facilities for such mapping of data structures. In contrast to current programming practice, programs in Vienna Fortran are written using global data references. Thus, the user has the advantages of a shared memory programming paradigm while explicitly controlling the data distribution. In this paper, we present the language features of Vienna Fortran for FORTRAN 77, together with examples illustrating the use of these features.
Parallel discrete event simulation: A shared memory approach

NASA Technical Reports Server (NTRS)

Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

1987-01-01

With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.
Global Arrays

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krishnamoorthy, Sriram; Daily, Jeffrey A.; Vishnu, Abhinav

2015-11-01

Global Arrays (GA) is a distributed-memory programming model that allows for shared-memory-style programming combined with one-sided communication, to create a set of tools that combine high performance with ease-of-use. GA exposes a relatively straightforward programming abstraction, while supporting fully-distributed data structures, locality of reference, and high-performance communication. GA was originally formulated in the early 1990’s to provide a communication layer for the Northwest Chemistry (NWChem) suite of chemistry modeling codes that was being developed concurrently.
Common data buffer

NASA Technical Reports Server (NTRS)

Byrne, F.

1981-01-01

Time-shared interface speeds data processing in distributed computer network. Two-level high-speed scanning approach routes information to buffer, portion of which is reserved for series of "first-in, first-out" memory stacks. Buffer address structure and memory are protected from noise or failed components by error correcting code. System is applicable to any computer or processing language.
A Massively Parallel Code for Polarization Calculations

NASA Astrophysics Data System (ADS)

Akiyama, Shizuka; Höflich, Peter

2001-03-01

We present an implementation of our Monte-Carlo radiation transport method for rapidly expanding, NLTE atmospheres for massively parallel computers which utilizes both the distributed and shared memory models. This allows us to take full advantage of the fast communication and low latency inherent to nodes with multiple CPUs, and to stretch the limits of scalability with the number of nodes compared to a version which is based on the shared memory model. Test calculations on a local 20-node Beowulf cluster with dual CPUs showed an improved scalability by about 40%.
A Distributed Operating System for BMD Applications.

DTIC Science & Technology

1982-01-01

Defense) applications executing on distributed hardware with local and shared memories. The objective was to develop real - time operating system functions...make the Basic Real - Time Operating System , and the set of new EPL language primitives that provide BMD application processes with efficient mechanisms
Parallel Navier-Stokes computations on shared and distributed memory architectures

NASA Technical Reports Server (NTRS)

Hayder, M. Ehtesham; Jayasimha, D. N.; Pillay, Sasi Kumar

1995-01-01

We study a high order finite difference scheme to solve the time accurate flow field of a jet using the compressible Navier-Stokes equations. As part of our ongoing efforts, we have implemented our numerical model on three parallel computing platforms to study the computational, communication, and scalability characteristics. The platforms chosen for this study are a cluster of workstations connected through fast networks (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and a distributed memory multiprocessor (the IBM SPI). Our focus in this study is on the LACE testbed. We present some results for the Cray YMP and the IBM SP1 mainly for comparison purposes. On the LACE testbed, we study: (1) the communication characteristics of Ethernet, FDDI, and the ALLNODE networks and (2) the overheads induced by the PVM message passing library used for parallelizing the application. We demonstrate that clustering of workstations is effective and has the potential to be computationally competitive with supercomputers at a fraction of the cost.
Ensuring correct rollback recovery in distributed shared memory systems

NASA Technical Reports Server (NTRS)

Janssens, Bob; Fuchs, W. Kent

1995-01-01

Distributed shared memory (DSM) implemented on a cluster of workstations is an increasingly attractive platform for executing parallel scientific applications. Checkpointing and rollback techniques can be used in such a system to allow the computation to progress in spite of the temporary failure of one or more processing nodes. This paper presents the design of an independent checkpointing method for DSM that takes advantage of DSM's specific properties to reduce error-free and rollback overhead. The scheme reduces the dependencies that need to be considered for correct rollback to those resulting from transfers of pages. Furthermore, in-transit messages can be recovered without the use of logging. We extend the scheme to a DSM implementation using lazy release consistency, where the frequency of dependencies is further reduced.
Shared Versus Distributed Memory Multiprocessors

DTIC Science & Technology

1991-01-01

multiprocessors should hawe shared or dis.trimuted meieo-% ha~ trr ~ g ’’~ de~i c4~accio;, S Cm teaicners argue S trongly tor Outiding (li15 tri huted...Applications, MIT Press (1985). 161 D. Gajski et el., "Cedar," Proc. Compcon, pp. 306-309 (Spring 19S9). 171 S. Ahuja, N. Carriero and D. Gelernter, "Linda
Efficient computation of aerodynamic influence coefficients for aeroelastic analysis on a transputer network

NASA Technical Reports Server (NTRS)

Janetzke, David C.; Murthy, Durbha V.

1991-01-01

Aeroelastic analysis is multi-disciplinary and computationally expensive. Hence, it can greatly benefit from parallel processing. As part of an effort to develop an aeroelastic capability on a distributed memory transputer network, a parallel algorithm for the computation of aerodynamic influence coefficients is implemented on a network of 32 transputers. The aerodynamic influence coefficients are calculated using a 3-D unsteady aerodynamic model and a parallel discretization. Efficiencies up to 85 percent were demonstrated using 32 processors. The effect of subtask ordering, problem size, and network topology are presented. A comparison to results on a shared memory computer indicates that higher speedup is achieved on the distributed memory system.
UPC++ Programmer’s Guide (v1.0 2017.9)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bachan, J.; Baden, S.; Bonachea, D.

UPC++ is a C++11 library that provides Asynchronous Partitioned Global Address Space (APGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The APGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, APGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, allmore » operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.« less
UPC++ Programmer’s Guide, v1.0-2018.3.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bachan, J.; Baden, S.; Bonachea, Dan

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operationsmore » that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.« less
Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

NASA Technical Reports Server (NTRS)

Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

2016-01-01

In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.
A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoemmen, Mark

2010-11-01

Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches formore » orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.« less
Scaling Irregular Applications through Data Aggregation and Software Multithreading

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morari, Alessandro; Tumeo, Antonino; Chavarría-Miranda, Daniel

Bioinformatics, data analytics, semantic databases, knowledge discovery are emerging high performance application areas that exploit dynamic, linked data structures such as graphs, unbalanced trees or unstructured grids. These data structures usually are very large, requiring significantly more memory than available on single shared memory systems. Additionally, these data structures are difficult to partition on distributed memory systems. They also present poor spatial and temporal locality, thus generating unpredictable memory and network accesses. The Partitioned Global Address Space (PGAS) programming model seems suitable for these applications, because it allows using a shared memory abstraction across distributed-memory clusters. However, current PGAS languagesmore » and libraries are built to target regular remote data accesses and block transfers. Furthermore, they usually rely on the Single Program Multiple Data (SPMD) parallel control model, which is not well suited to the fine grained, dynamic and unbalanced parallelism of irregular applications. In this paper we present {\\bf GMT} (Global Memory and Threading library), a custom runtime library that enables efficient execution of irregular applications on commodity clusters. GMT integrates a PGAS data substrate with simple fork/join parallelism and provides automatic load balancing on a per node basis. It implements multi-level aggregation and lightweight multithreading to maximize memory and network bandwidth with fine-grained data accesses and tolerate long data access latencies. A key innovation in the GMT runtime is its thread specialization (workers, helpers and communication threads) that realize the overall functionality. We compare our approach with other PGAS models, such as UPC running using GASNet, and hand-optimized MPI code on a set of typical large-scale irregular applications, demonstrating speedups of an order of magnitude.« less
Real-time implementations of image segmentation algorithms on shared memory multicore architecture: a survey (Conference Presentation)

NASA Astrophysics Data System (ADS)

Akil, Mohamed

2017-05-01

The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.

A multiarchitecture parallel-processing development environment

NASA Technical Reports Server (NTRS)

Townsend, Scott; Blech, Richard; Cole, Gary

1993-01-01

A description is given of the hardware and software of a multiprocessor test bed - the second generation Hypercluster system. The Hypercluster architecture consists of a standard hypercube distributed-memory topology, with multiprocessor shared-memory nodes. By using standard, off-the-shelf hardware, the system can be upgraded to use rapidly improving computer technology. The Hypercluster's multiarchitecture nature makes it suitable for researching parallel algorithms in computational field simulation applications (e.g., computational fluid dynamics). The dedicated test-bed environment of the Hypercluster and its custom-built software allows experiments with various parallel-processing concepts such as message passing algorithms, debugging tools, and computational 'steering'. Such research would be difficult, if not impossible, to achieve on shared, commercial systems.
What Multilevel Parallel Programs do when you are not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Labarta, Jesus; Gimenez, Judit

2004-01-01

With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sewell, Christopher Meyer

This is a set of slides from a guest lecture for a class at the University of Texas, El Paso on visualization and data analysis for high-performance computing. The topics covered are the following: trends in high-performance computing; scientific visualization, such as OpenGL, ray tracing and volume rendering, VTK, and ParaView; data science at scale, such as in-situ visualization, image databases, distributed memory parallelism, shared memory parallelism, VTK-m, "big data", and then an analysis example.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jones, J.P.; Bangs, A.L.; Butler, P.L.

Hetero Helix is a programming environment which simulates shared memory on a heterogeneous network of distributed-memory computers. The machines in the network may vary with respect to their native operating systems and internal representation of numbers. Hetero Helix presents a simple programming model to developers, and also considers the needs of designers, system integrators, and maintainers. The key software technology underlying Hetero Helix is the use of a compiler'' which analyzes the data structures in shared memory and automatically generates code which translates data representations from the format native to each machine into a common format, and vice versa. Themore » design of Hetero Helix was motivated in particular by the requirements of robotics applications. Hetero Helix has been used successfully in an integration effort involving 27 CPUs in a heterogeneous network and a body of software totaling roughly 100,00 lines of code. 25 refs., 6 figs.« less
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2002-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.
Performing an allreduce operation using shared memory

DOEpatents

Archer, Charles J [Rochester, MN; Dozsa, Gabor [Ardsley, NY; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

2012-04-17

Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Performing an allreduce operation using shared memory

DOEpatents

Archer, Charles J; Dozsa, Gabor; Ratterman, Joseph D; Smith, Brian E

2014-06-10

Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Static analysis of the hull plate using the finite element method

NASA Astrophysics Data System (ADS)

Ion, A.

2015-11-01

This paper aims at presenting the static analysis for two levels of a container ship's construction as follows: the first level is at the girder / hull plate and the second level is conducted at the entire strength hull of the vessel. This article will describe the work for the static analysis of a hull plate. We shall use the software package ANSYS Mechanical 14.5. The program is run on a computer with four Intel Xeon X5260 CPU processors at 3.33 GHz, 32 GB memory installed. In terms of software, the shared memory parallel version of ANSYS refers to running ANSYS across multiple cores on a SMP system. The distributed memory parallel version of ANSYS (Distributed ANSYS) refers to running ANSYS across multiple processors on SMP systems or DMP systems.
Sensorimotor memory of object weight distribution during multidigit grasp.

PubMed

Albert, Frederic; Santello, Marco; Gordon, Andrew M

2009-10-09

We studied the ability to transfer three-digit force sharing patterns learned through consecutive lifts of an object with an asymmetric center of mass (CM). After several object lifts, we asked subjects to rotate and translate the object to the contralateral hand and perform one additional lift. This task was performed under two weight conditions (550 and 950 g) to determine the extent to which subjects would be able to transfer weight and CM information. Learning transfer was quantified by measuring the extent to which force sharing patterns and peak object roll on the first post-translation trial resembled those measured on the pre-translation trial with the same CM. We found that the overall gain of fingertip forces was transferred following object rotation, but that the scaling of individual digit forces was specific to the learned digit-object configuration, and thus was not transferred following rotation. As a result, on the first post-translation trial there was a significantly larger object roll following object lift-off than on the pre-translation trial. This suggests that sensorimotor memories for weight, requiring scaling of fingertip force gain, may differ from memories for mass distribution.
Proceedings: Sisal `93

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feo, J.T.

1993-10-01

This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor;more » A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.« less
Tuning collective communication for Partitioned Global Address Space programming models

DOE PAGES

Nishtala, Rajesh; Zheng, Yili; Hargrove, Paul H.; ...

2011-06-12

Partitioned Global Address Space (PGAS) languages offer programmers the convenience of a shared memory programming style combined with locality control necessary to run on large-scale distributed memory systems. Even within a PGAS language programmers often need to perform global communication operations such as broadcasts or reductions, which are best performed as collective operations in which a group of threads work together to perform the operation. In this study we consider the problem of implementing collective communication within PGAS languages and explore some of the design trade-offs in both the interface and implementation. In particular, PGAS collectives have semantic issues thatmore » are different than in send–receive style message passing programs, and different implementation approaches that take advantage of the one-sided communication style in these languages. We present an implementation framework for PGAS collectives as part of the GASNet communication layer, which supports shared memory, distributed memory and hybrids. The framework supports a broad set of algorithms for each collective, over which the implementation may be automatically tuned. In conclusion, we demonstrate the benefit of optimized GASNet collectives using application benchmarks written in UPC, and demonstrate that the GASNet collectives can deliver scalable performance on a variety of state-of-the-art parallel machines including a Cray XT4, an IBM BlueGene/P, and a Sun Constellation system with InfiniBand interconnect.« less
Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures

DTIC Science & Technology

2017-10-04

Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These
Programming model for distributed intelligent systems

NASA Technical Reports Server (NTRS)

Sztipanovits, J.; Biegl, C.; Karsai, G.; Bogunovic, N.; Purves, B.; Williams, R.; Christiansen, T.

1988-01-01

A programming model and architecture which was developed for the design and implementation of complex, heterogeneous measurement and control systems is described. The Multigraph Architecture integrates artificial intelligence techniques with conventional software technologies, offers a unified framework for distributed and shared memory based parallel computational models and supports multiple programming paradigms. The system can be implemented on different hardware architectures and can be adapted to strongly different applications.
Distributed Systems Technology Survey.

DTIC Science & Technology

1987-03-01

and prolocols. 2. Hardware Technology Ecnomic factor we a majo reonm for the prolierat of dlstbted systoe. Processors, memory, an magne tc ndoptical...destined messages and pertorn the a pro te forwarding. There gImsno agreement that a ightweight process mechanism is essential to support com- monly used...Xerox PARC environment [311. Shared file servers, discussed below, are essential to the success of such a scheme. 11. ecurlity A distributed
Performance and Application of Parallel OVERFLOW Codes on Distributed and Shared Memory Platforms

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Rizk, Yehia M.

1999-01-01

The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each group are approximately balanced. A proper number of threads are initially allocated to each group, and in subsequent iterations during the run-time, the number of threads are adjusted to achieve load balancing across the processes. Each process exploits the multitasking directives already established in Overflow.
Data traffic reduction schemes for sparse Cholesky factorizations

NASA Technical Reports Server (NTRS)

Naik, Vijay K.; Patrick, Merrell L.

1988-01-01

Load distribution schemes are presented which minimize the total data traffic in the Cholesky factorization of dense and sparse, symmetric, positive definite matrices on multiprocessor systems with local and shared memory. The total data traffic in factoring an n x n sparse, symmetric, positive definite matrix representing an n-vertex regular 2-D grid graph using n (sup alpha), alpha is equal to or less than 1, processors are shown to be O(n(sup 1 + alpha/2)). It is O(n(sup 3/2)), when n (sup alpha), alpha is equal to or greater than 1, processors are used. Under the conditions of uniform load distribution, these results are shown to be asymptotically optimal. The schemes allow efficient use of up to O(n) processors before the total data traffic reaches the maximum value of O(n(sup 3/2)). The partitioning employed within the scheme, allows a better utilization of the data accessed from shared memory than those of previously published methods.
Working Memory and Decision-Making in a Frontoparietal Circuit Model

PubMed Central

2017-01-01

Working memory (WM) and decision-making (DM) are fundamental cognitive functions involving a distributed interacting network of brain areas, with the posterior parietal cortex (PPC) and prefrontal cortex (PFC) at the core. However, the shared and distinct roles of these areas and the nature of their coordination in cognitive function remain poorly understood. Biophysically based computational models of cortical circuits have provided insights into the mechanisms supporting these functions, yet they have primarily focused on the local microcircuit level, raising questions about the principles for distributed cognitive computation in multiregional networks. To examine these issues, we developed a distributed circuit model of two reciprocally interacting modules representing PPC and PFC circuits. The circuit architecture includes hierarchical differences in local recurrent structure and implements reciprocal long-range projections. This parsimonious model captures a range of behavioral and neuronal features of frontoparietal circuits across multiple WM and DM paradigms. In the context of WM, both areas exhibit persistent activity, but, in response to intervening distractors, PPC transiently encodes distractors while PFC filters distractors and supports WM robustness. With regard to DM, the PPC module generates graded representations of accumulated evidence supporting target selection, while the PFC module generates more categorical responses related to action or choice. These findings suggest computational principles for distributed, hierarchical processing in cortex during cognitive function and provide a framework for extension to multiregional models. SIGNIFICANCE STATEMENT Working memory and decision-making are fundamental “building blocks” of cognition, and deficits in these functions are associated with neuropsychiatric disorders such as schizophrenia. These cognitive functions engage distributed networks with prefrontal cortex (PFC) and posterior parietal cortex (PPC) at the core. It is not clear, however, what the contributions of PPC and PFC are in light of the computations that subserve working memory and decision-making. We constructed a biophysical model of a reciprocally connected frontoparietal circuit that revealed shared and distinct functions for the PFC and PPC across working memory and decision-making tasks. Our parsimonious model connects circuit-level properties to cognitive functions and suggests novel design principles beyond those of local circuits for cognitive processing in multiregional brain networks. PMID:29114071
Working Memory and Decision-Making in a Frontoparietal Circuit Model.

PubMed

Murray, John D; Jaramillo, Jorge; Wang, Xiao-Jing

2017-12-13

Working memory (WM) and decision-making (DM) are fundamental cognitive functions involving a distributed interacting network of brain areas, with the posterior parietal cortex (PPC) and prefrontal cortex (PFC) at the core. However, the shared and distinct roles of these areas and the nature of their coordination in cognitive function remain poorly understood. Biophysically based computational models of cortical circuits have provided insights into the mechanisms supporting these functions, yet they have primarily focused on the local microcircuit level, raising questions about the principles for distributed cognitive computation in multiregional networks. To examine these issues, we developed a distributed circuit model of two reciprocally interacting modules representing PPC and PFC circuits. The circuit architecture includes hierarchical differences in local recurrent structure and implements reciprocal long-range projections. This parsimonious model captures a range of behavioral and neuronal features of frontoparietal circuits across multiple WM and DM paradigms. In the context of WM, both areas exhibit persistent activity, but, in response to intervening distractors, PPC transiently encodes distractors while PFC filters distractors and supports WM robustness. With regard to DM, the PPC module generates graded representations of accumulated evidence supporting target selection, while the PFC module generates more categorical responses related to action or choice. These findings suggest computational principles for distributed, hierarchical processing in cortex during cognitive function and provide a framework for extension to multiregional models. SIGNIFICANCE STATEMENT Working memory and decision-making are fundamental "building blocks" of cognition, and deficits in these functions are associated with neuropsychiatric disorders such as schizophrenia. These cognitive functions engage distributed networks with prefrontal cortex (PFC) and posterior parietal cortex (PPC) at the core. It is not clear, however, what the contributions of PPC and PFC are in light of the computations that subserve working memory and decision-making. We constructed a biophysical model of a reciprocally connected frontoparietal circuit that revealed shared and distinct functions for the PFC and PPC across working memory and decision-making tasks. Our parsimonious model connects circuit-level properties to cognitive functions and suggests novel design principles beyond those of local circuits for cognitive processing in multiregional brain networks. Copyright © 2017 the authors 0270-6474/17/3712167-20$15.00/0.
The effect of the order in which episodic autobiographical memories versus autobiographical knowledge are shared on feelings of closeness.

PubMed

Brandon, Nicole R; Beike, Denise R; Cole, Holly E

2017-07-01

Autobiographical memories (AMs) can be used to create and maintain closeness with others [Alea, N., & Bluck, S. (2003). Why are you telling me that? A conceptual model of the social function of autobiographical memory. Memory, 11(2), 165-178]. However, the differential effects of memory specificity are not well established. Two studies with 148 participants tested whether the order in which autobiographical knowledge (AK) and specific episodic AM (EAM) are shared affects feelings of closeness. Participants read two memories hypothetically shared by each of four strangers. The strangers first shared either AK or an EAM, and then shared either AK or an EAM. Participants were randomly assigned to read either positive or negative AMs from the strangers. Findings suggest that people feel closer to those who share positive AMs in the same way they construct memories: starting with general and moving to specific.
Hypercluster Parallel Processor

NASA Technical Reports Server (NTRS)

Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela

1992-01-01

Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.

Unconditional security of entanglement-based continuous-variable quantum secret sharing

NASA Astrophysics Data System (ADS)

Kogias, Ioannis; Xiang, Yu; He, Qiongyi; Adesso, Gerardo

2017-01-01

The need for secrecy and security is essential in communication. Secret sharing is a conventional protocol to distribute a secret message to a group of parties, who cannot access it individually but need to cooperate in order to decode it. While several variants of this protocol have been investigated, including realizations using quantum systems, the security of quantum secret sharing schemes still remains unproven almost two decades after their original conception. Here we establish an unconditional security proof for entanglement-based continuous-variable quantum secret sharing schemes, in the limit of asymptotic keys and for an arbitrary number of players. We tackle the problem by resorting to the recently developed one-sided device-independent approach to quantum key distribution. We demonstrate theoretically the feasibility of our scheme, which can be implemented by Gaussian states and homodyne measurements, with no need for ideal single-photon sources or quantum memories. Our results contribute to validating quantum secret sharing as a viable primitive for quantum technologies.
Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems.

PubMed

Pan, Tony; Flick, Patrick; Jain, Chirag; Liu, Yongchao; Aluru, Srinivas

2017-10-09

Counting and indexing fixed length substrings, or k-mers, in biological sequences is a key step in many bioinformatics tasks including genome alignment and mapping, genome assembly, and error correction. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the datasets at the current generation rate of 1.8 terabases every 3 days. We present Kmerind, a high performance parallel k-mer indexing library for distributed memory environments. The Kmerind library provides a set of simple and consistent APIs with sequential semantics and parallel implementations that are designed to be flexible and extensible. Kmerind's k-mer counter performs similarly or better than the best existing k-mer counting tools even on shared memory systems. In a distributed memory environment, Kmerind counts k-mers in a 120 GB sequence read dataset in less than 13 seconds on 1024 Xeon CPU cores, and fully indexes their positions in approximately 17 seconds. Querying for 1% of the k-mers in these indices can be completed in 0.23 seconds and 28 seconds, respectively. Kmerind is the first k-mer indexing library for distributed memory environments, and the first extensible library for general k-mer indexing and counting. Kmerind is available at https://github.com/ParBLiSS/kmerind.
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

NASA Technical Reports Server (NTRS)

Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

1994-01-01

Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
Work stealing for GPU-accelerated parallel programs in a global address space framework: WORK STEALING ON GPU-ACCELERATED SYSTEMS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.« less
Work stealing for GPU-accelerated parallel programs in a global address space framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain« less
Guess Who's Coming to Graduation?

ERIC Educational Resources Information Center

Miller, Ann

2010-01-01

In this article, the author shares her experience in assisting seniors and two teachers at Kalamazoo (Michigan) Central High School when they joined a national contest to have the President of the United States deliver the commencement address to the class of 2010. One of the author's favorite memories surrounds the distribution of commencement…
Shared Semantics and the Use of Organizational Memories for E-Mail Communications.

ERIC Educational Resources Information Center

Schwartz, David G.

1998-01-01

Examines the use of shared semantics information to link concepts in an organizational memory to e-mail communications. Presents a framework for determining shared semantics based on organizational and personal user profiles. Illustrates how shared semantics are used by the HyperMail system to help link organizational memories (OM) content to…
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

DOE PAGES

Radhakrishnan, Hari; Rouson, Damian W. I.; Morris, Karla; ...

2015-01-01

This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were donemore » using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.« less
A message passing kernel for the hypercluster parallel processing test bed

NASA Technical Reports Server (NTRS)

Blech, Richard A.; Quealy, Angela; Cole, Gary L.

1989-01-01

A Message-Passing Kernel (MPK) for the Hypercluster parallel-processing test bed is described. The Hypercluster is being developed at the NASA Lewis Research Center to support investigations of parallel algorithms and architectures for computational fluid and structural mechanics applications. The Hypercluster resembles the hypercube architecture except that each node consists of multiple processors communicating through shared memory. The MPK efficiently routes information through the Hypercluster, using a message-passing protocol when necessary and faster shared-memory communication whenever possible. The MPK also interfaces all of the processors with the Hypercluster operating system (HYCLOPS), which runs on a Front-End Processor (FEP). This approach distributes many of the I/O tasks to the Hypercluster processors and eliminates the need for a separate I/O support program on the FEP.
A compositional reservoir simulator on distributed memory parallel computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rame, M.; Delshad, M.

1995-12-31

This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
High-Performance Computation of Distributed-Memory Parallel 3D Voronoi and Delaunay Tessellation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Peterka, Tom; Morozov, Dmitriy; Phillips, Carolyn

2014-11-14

Computing a Voronoi or Delaunay tessellation from a set of points is a core part of the analysis of many simulated and measured datasets: N-body simulations, molecular dynamics codes, and LIDAR point clouds are just a few examples. Such computational geometry methods are common in data analysis and visualization; but as the scale of simulations and observations surpasses billions of particles, the existing serial and shared-memory algorithms no longer suffice. A distributed-memory scalable parallel algorithm is the only feasible approach. The primary contribution of this paper is a new parallel Delaunay and Voronoi tessellation algorithm that automatically determines which neighbormore » points need to be exchanged among the subdomains of a spatial decomposition. Other contributions include periodic and wall boundary conditions, comparison of our method using two popular serial libraries, and application to numerous science datasets.« less
PANDA: A distributed multiprocessor operating system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chubb, P.

1989-01-01

PANDA is a design for a distributed multiprocessor and an operating system. PANDA is designed to allow easy expansion of both hardware and software. As such, the PANDA kernel provides only message passing and memory and process management. The other features needed for the system (device drivers, secondary storage management, etc.) are provided as replaceable user tasks. The thesis presents PANDA's design and implementation, both hardware and software. PANDA uses multiple 68010 processors sharing memory on a VME bus, each such node potentially connected to others via a high speed network. The machine is completely homogeneous: there are no differencesmore » between processors that are detectable by programs running on the machine. A single two-processor node has been constructed. Each processor contains memory management circuits designed to allow processors to share page tables safely. PANDA presents a programmers' model similar to the hardware model: a job is divided into multiple tasks, each having its own address space. Within each task, multiple processes share code and data. Tasks can send messages to each other, and set up virtual circuits between themselves. Peripheral devices such as disc drives are represented within PANDA by tasks. PANDA divides secondary storage into volumes, each volume being accessed by a volume access task, or VAT. All knowledge about the way that data is stored on a disc is kept in its volume's VAT. The design is such that PANDA should provide a useful testbed for file systems and device drivers, as these can be installed without recompiling PANDA itself, and without rebooting the machine.« less
Hierarchical Process Composition: Dynamic Maintenance of Structure in a Distributed Environment

DTIC Science & Technology

1988-01-01

One prominent hne of research stresses the independence of address space and thread of control, and the resulting efficiencies due to shared memory...cooperating processes. StarOS focuses on case of use and a general capability mechanism, while Medusa stresses the effect of distributed hardware on system...process structure and the asynchrony among agents and between agents and sources of failure. By stressing dynamic structure, we are led to adopt an
PELEC

DOE Office of Scientific and Technical Information (OSTI.GOV)

2017-05-17

PeleC is an adaptive-mesh compressible hydrodynamics code for reacting flows. It solves the compressible Navier-Stokes with multispecies transport in a block structured framework. The resulting algorithm is well suited for flows with localized resolution requirements and robust to discontinuities. User controllable refinement crieteria has the potential to result in extremely small numerical dissipation and dispersion, making this code appropriate for both research and applied usage. The code is built on the AMReX library which facilitates hierarchical parallelism and manages distributed memory parallism. PeleC algorithms are implemented to express shared memory parallelism.
Parallel Computing for Probabilistic Response Analysis of High Temperature Composites

NASA Technical Reports Server (NTRS)

Sues, R. H.; Lua, Y. J.; Smith, M. D.

1994-01-01

The objective of this Phase I research was to establish the required software and hardware strategies to achieve large scale parallelism in solving PCM problems. To meet this objective, several investigations were conducted. First, we identified the multiple levels of parallelism in PCM and the computational strategies to exploit these parallelisms. Next, several software and hardware efficiency investigations were conducted. These involved the use of three different parallel programming paradigms and solution of two example problems on both a shared-memory multiprocessor and a distributed-memory network of workstations.
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing

NASA Technical Reports Server (NTRS)

Fricker, David M.

1997-01-01

The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
The Global File System

NASA Technical Reports Server (NTRS)

Soltis, Steven R.; Ruwart, Thomas M.; OKeefe, Matthew T.

1996-01-01

The global file system (GFS) is a prototype design for a distributed file system in which cluster nodes physically share storage devices connected via a network-like fiber channel. Networks and network-attached storage devices have advanced to a level of performance and extensibility so that the previous disadvantages of shared disk architectures are no longer valid. This shared storage architecture attempts to exploit the sophistication of storage device technologies whereas a server architecture diminishes a device's role to that of a simple component. GFS distributes the file system responsibilities across processing nodes, storage across the devices, and file system resources across the entire storage pool. GFS caches data on the storage devices instead of the main memories of the machines. Consistency is established by using a locking mechanism maintained by the storage devices to facilitate atomic read-modify-write operations. The locking mechanism is being prototyped in the Silicon Graphics IRIX operating system and is accessed using standard Unix commands and modules.
Spaceborne Processor Array

NASA Technical Reports Server (NTRS)

Chow, Edward T.; Schatzel, Donald V.; Whitaker, William D.; Sterling, Thomas

2008-01-01

A Spaceborne Processor Array in Multifunctional Structure (SPAMS) can lower the total mass of the electronic and structural overhead of spacecraft, resulting in reduced launch costs, while increasing the science return through dynamic onboard computing. SPAMS integrates the multifunctional structure (MFS) and the Gilgamesh Memory, Intelligence, and Network Device (MIND) multi-core in-memory computer architecture into a single-system super-architecture. This transforms every inch of a spacecraft into a sharable, interconnected, smart computing element to increase computing performance while simultaneously reducing mass. The MIND in-memory architecture provides a foundation for high-performance, low-power, and fault-tolerant computing. The MIND chip has an internal structure that includes memory, processing, and communication functionality. The Gilgamesh is a scalable system comprising multiple MIND chips interconnected to operate as a single, tightly coupled, parallel computer. The array of MIND components shares a global, virtual name space for program variables and tasks that are allocated at run time to the distributed physical memory and processing resources. Individual processor- memory nodes can be activated or powered down at run time to provide active power management and to configure around faults. A SPAMS system is comprised of a distributed Gilgamesh array built into MFS, interfaces into instrument and communication subsystems, a mass storage interface, and a radiation-hardened flight computer.
Parallelization and automatic data distribution for nuclear reactor simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liebrock, L.M.

1997-07-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
Aho-Corasick String Matching on Shared and Distributed Memory Parallel Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tumeo, Antonino; Villa, Oreste; Chavarría-Miranda, Daniel

String matching is at the core of many critical applications, including network intrusion detection systems, search engines, virus scanners, spam filters, DNA and protein sequencing, and data mining. For all of these applications string matching requires a combination of (sometimes all) the following characteristics: high and/or predictable performance, support for large data sets and flexibility of integration and customization. Many software based implementations targeting conventional cache-based microprocessors fail to achieve high and predictable performance requirements, while Field-Programmable Gate Array (FPGA) implementations and dedicated hardware solutions fail to support large data sets (dictionary sizes) and are difficult to integrate and customize.more » The advent of multicore, multithreaded, and GPU-based systems is opening the possibility for software based solutions to reach very high performance at a sustained rate. This paper compares several software-based implementations of the Aho-Corasick string searching algorithm for high performance systems. We discuss the implementation of the algorithm on several types of shared-memory high-performance architectures (Niagara 2, large x86 SMPs and Cray XMT), distributed memory with homogeneous processing elements (InfiniBand cluster of x86 multicores) and heterogeneous processing elements (InfiniBand cluster of x86 multicores with NVIDIA Tesla C10 GPUs). We describe in detail how each solution achieves the objectives of supporting large dictionaries, sustaining high performance, and enabling customization and flexibility using various data sets.« less

Accelerate quasi Monte Carlo method for solving systems of linear algebraic equations through shared memory

NASA Astrophysics Data System (ADS)

Lai, Siyan; Xu, Ying; Shao, Bo; Guo, Menghan; Lin, Xiaola

2017-04-01

In this paper we study on Monte Carlo method for solving systems of linear algebraic equations (SLAE) based on shared memory. Former research demostrated that GPU can effectively speed up the computations of this issue. Our purpose is to optimize Monte Carlo method simulation on GPUmemoryachritecture specifically. Random numbers are organized to storein shared memory, which aims to accelerate the parallel algorithm. Bank conflicts can be avoided by our Collaborative Thread Arrays(CTA)scheme. The results of experiments show that the shared memory based strategy can speed up the computaions over than 3X at most.
CaLRS: A Critical-Aware Shared LLC Request Scheduling Algorithm on GPGPU

PubMed Central

Ma, Jianliang; Meng, Jinglei; Chen, Tianzhou; Wu, Minghui

2015-01-01

Ultra high thread-level parallelism in modern GPUs usually introduces numerous memory requests simultaneously. So there are always plenty of memory requests waiting at each bank of the shared LLC (L2 in this paper) and global memory. For global memory, various schedulers have already been developed to adjust the request sequence. But we find few work has ever focused on the service sequence on the shared LLC. We measured that a big number of GPU applications always queue at LLC bank for services, which provide opportunity to optimize the service order on LLC. Through adjusting the GPU memory request service order, we can improve the schedulability of SM. So we proposed a critical-aware shared LLC request scheduling algorithm (CaLRS) in this paper. The priority representative of memory request is critical for CaLRS. We use the number of memory requests that originate from the same warp but have not been serviced when they arrive at the shared LLC bank to represent the criticality of each warp. Experiments show that the proposed scheme can boost the SM schedulability effectively by promoting the scheduling priority of the memory requests with high criticality and improves the performance of GPU indirectly. PMID:25729772
Time Constraints and Resource Sharing in Adults' Working Memory Spans

ERIC Educational Resources Information Center

Barrouillet, Pierre; Bernardin, Sophie; Camos, Valerie

2004-01-01

This article presents a new model that accounts for working memory spans in adults, the time-based resource-sharing model. The model assumes that both components (i.e., processing and maintenance) of the main working memory tasks require attention and that memory traces decay as soon as attention is switched away. Because memory retrievals are…
Multi-processor including data flow accelerator module

DOEpatents

Davidson, George S.; Pierce, Paul E.

1990-01-01

An accelerator module for a data flow computer includes an intelligent memory. The module is added to a multiprocessor arrangement and uses a shared tagged memory architecture in the data flow computer. The intelligent memory module assigns locations for holding data values in correspondence with arcs leading to a node in a data dependency graph. Each primitive computation is associated with a corresponding memory cell, including a number of slots for operands needed to execute a primitive computation, a primitive identifying pointer, and linking slots for distributing the result of the cell computation to other cells requiring that result as an operand. Circuitry is provided for utilizing tag bits to determine automatically when all operands required by a processor are available and for scheduling the primitive for execution in a queue. Each memory cell of the module may be associated with any of the primitives, and the particular primitive to be executed by the processor associated with the cell is identified by providing an index, such as the cell number for the primitive, to the primitive lookup table of starting addresses. The module thus serves to perform functions previously performed by a number of sections of data flow architectures and coexists with conventional shared memory therein. A multiprocessing system including the module operates in a hybrid mode, wherein the same processing modules are used to perform some processing in a sequential mode, under immediate control of an operating system, while performing other processing in a data flow mode.
Externalising the autobiographical self: sharing personal memories online facilitated memory retention.

PubMed

Wang, Qi; Lee, Dasom; Hou, Yubo

2017-07-01

Internet technology provides a new means of recalling and sharing personal memories in the digital age. What is the mnemonic consequence of posting personal memories online? Theories of transactive memory and autobiographical memory would make contrasting predictions. In the present study, college students completed a daily diary for a week, listing at the end of each day all the events that happened to them on that day. They also reported whether they posted any of the events online. Participants received a surprise memory test after the completion of the diary recording and then another test a week later. At both tests, events posted online were significantly more likely than those not posted online to be recalled. It appears that sharing memories online may provide unique opportunities for rehearsal and meaning-making that facilitate memory retention.
Using process groups to implement failure detection in asynchronous environments

NASA Technical Reports Server (NTRS)

Ricciardi, Aleta M.; Birman, Kenneth P.

1991-01-01

Agreement on the membership of a group of processes in a distributed system is a basic problem that arises in a wide range of applications. Such groups occur when a set of processes cooperate to perform some task, share memory, monitor one another, subdivide a computation, and so forth. The group membership problems is discussed as it relates to failure detection in asynchronous, distributed systems. A rigorous, formal specification for group membership is presented under this interpretation. A solution is then presented for this problem.
An enhanced Ada run-time system for real-time embedded processors

NASA Technical Reports Server (NTRS)

Sims, J. T.

1991-01-01

An enhanced Ada run-time system has been developed to support real-time embedded processor applications. The primary focus of this development effort has been on the tasking system and the memory management facilities of the run-time system. The tasking system has been extended to support efficient and precise periodic task execution as required for control applications. Event-driven task execution providing a means of task-asynchronous control and communication among Ada tasks is supported in this system. Inter-task control is even provided among tasks distributed on separate physical processors. The memory management system has been enhanced to provide object allocation and protected access support for memory shared between disjoint processors, each of which is executing a distinct Ada program.
Implementations of BLAST for parallel computers.

PubMed

Jülich, A

1995-02-01

The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
A mixed parallel strategy for the solution of coupled multi-scale problems at finite strains

NASA Astrophysics Data System (ADS)

Lopes, I. A. Rodrigues; Pires, F. M. Andrade; Reis, F. J. P.

2018-02-01

A mixed parallel strategy for the solution of homogenization-based multi-scale constitutive problems undergoing finite strains is proposed. The approach aims to reduce the computational time and memory requirements of non-linear coupled simulations that use finite element discretization at both scales (FE^2). In the first level of the algorithm, a non-conforming domain decomposition technique, based on the FETI method combined with a mortar discretization at the interface of macroscopic subdomains, is employed. A master-slave scheme, which distributes tasks by macroscopic element and adopts dynamic scheduling, is then used for each macroscopic subdomain composing the second level of the algorithm. This strategy allows the parallelization of FE^2 simulations in computers with either shared memory or distributed memory architectures. The proposed strategy preserves the quadratic rates of asymptotic convergence that characterize the Newton-Raphson scheme. Several examples are presented to demonstrate the robustness and efficiency of the proposed parallel strategy.
Memory systems in schizophrenia: Modularity is preserved but deficits are generalized.

PubMed

Haut, Kristen M; Karlsgodt, Katherine H; Bilder, Robert M; Congdon, Eliza; Freimer, Nelson B; London, Edythe D; Sabb, Fred W; Ventura, Joseph; Cannon, Tyrone D

2015-10-01

Schizophrenia patients exhibit impaired working and episodic memory, but this may represent generalized impairment across memory modalities or performance deficits restricted to particular memory systems in subgroups of patients. Furthermore, it is unclear whether deficits are unique from those associated with other disorders. Healthy controls (n=1101) and patients with schizophrenia (n=58), bipolar disorder (n=49) and attention-deficit-hyperactivity-disorder (n=46) performed 18 tasks addressing primarily verbal and spatial episodic and working memory. Effect sizes for group contrasts were compared across tasks and the consistency of subjects' distributional positions across memory domains was measured. Schizophrenia patients performed poorly relative to the other groups on every test. While low to moderate correlation was found between memory domains (r=.320), supporting modularity of these systems, there was limited agreement between measures regarding each individual's task performance (ICC=.292) and in identifying those individuals falling into the lowest quintile (kappa=0.259). A general ability factor accounted for nearly all of the group differences in performance and agreement across measures in classifying low performers. Pathophysiological processes involved in schizophrenia appear to act primarily on general abilities required in all tasks rather than on specific abilities within different memory domains and modalities. These effects represent a general shift in the overall distribution of general ability (i.e., each case functioning at a lower level than they would have if not for the illness), rather than presence of a generally low-performing subgroup of patients. There is little evidence that memory impairments in schizophrenia are shared with bipolar disorder and ADHD. Copyright © 2015 Elsevier B.V. All rights reserved.
Memory systems in schizophrenia: Modularity is preserved but deficits are generalized

PubMed Central

Haut, Kristen M.; Karlsgodt, Katherine H.; Bilder, Robert M.; Congdon, Eliza; Freimer, Nelson; London, Edythe D.; Sabb, Fred W.; Ventura, Joseph; Cannon, Tyrone D.

2015-01-01

Objective Schizophrenia patients exhibit impaired working and episodic memory, but this may represent generalized impairment across memory modalities or performance deficits restricted to particular memory systems in subgroups of patients. Furthermore, it is unclear whether deficits are unique from those associated with other disorders. Method Healthy controls (n=1101) and patients with schizophrenia (n=58), bipolar disorder (n=49) and attention-deficit-hyperactivity-disorder (n=46) performed 18 tasks addressing primarily verbal and spatial episodic and working memory. Effect sizes for group contrasts were compared across tasks and the consistency of subjects’ distributional positions across memory domains was measured. Results Schizophrenia patients performed poorly relative to the other groups on every test. While low to moderate correlation was found between memory domains (r=.320), supporting modularity of these systems, there was limited agreement between measures regarding each individual’s task performance (ICC=.292) and in identifying those individuals falling into the lowest quintile (kappa=0.259). A general ability factor accounted for nearly all of the group differences in performance and agreement across measures in classifying low performers. Conclusions Pathophysiological processes involved in schizophrenia appear to act primarily on general abilities required in all tasks rather than on specific abilities within different memory domains and modalities. These effects represent a general shift in the overall distribution of general ability (i.e., each case functioning at a lower level than they would have if not for the illness), rather than presence of a generally low-performing subgroup of patients. There is little evidence that memory impairments in schizophrenia are shared with bipolar disorder and ADHD. PMID:26299707
Visual and Spatial Working Memory Are Not that Dissociated after All: A Time-Based Resource-Sharing Account

ERIC Educational Resources Information Center

Vergauwe, Evie; Barrouillet, Pierre; Camos, Valerie

2009-01-01

Examinations of interference between visual and spatial materials in working memory have suggested domain- and process-based fractionations of visuo-spatial working memory. The present study examined the role of central time-based resource sharing in visuo-spatial working memory and assessed its role in obtained interference patterns. Visual and…
Rapid recovery from transient faults in the fault-tolerant processor with fault-tolerant shared memory

NASA Technical Reports Server (NTRS)

Harper, Richard E.; Butler, Bryan P.

1990-01-01

The Draper fault-tolerant processor with fault-tolerant shared memory (FTP/FTSM), which is designed to allow application tasks to continue execution during the memory alignment process, is described. Processor performance is not affected by memory alignment. In addition, the FTP/FTSM incorporates a hardware scrubber device to perform the memory alignment quickly during unused memory access cycles. The FTP/FTSM architecture is described, followed by an estimate of the time required for channel reintegration.
Relations of maternal style and child self-concept to autobiographical memories in chinese, chinese immigrant, and European american 3-year-olds.

PubMed

Wang, Qi

2006-01-01

The relations of maternal reminiscing style and child self-concept to children's shared and independent autobiographical memories were examined in a sample of 189 three-year-olds and their mothers from Chinese families in China, first-generation Chinese immigrant families in the United States, and European American families. Mothers shared memories with their children and completed questionnaires; children recounted autobiographical events and described themselves with a researcher. Independent of culture, gender, child age, and language skills, maternal elaborations and evaluations were associated with children's shared memory reports, and maternal evaluations and child agentic self-focus were associated with children's independent memory reports. Maternal style and child self-concept further mediated cultural influences on children's memory. The findings provide insight into the social-cultural construction of autobiographical memory.
Memory Allocation: Mechanisms and Function.

PubMed

Josselyn, Sheena A; Frankland, Paul W

2018-04-25

Memories for events are thought to be represented in sparse, distributed neuronal ensembles (or engrams). In this article, we review how neurons are chosen to become part of a particular engram, via a process of neuronal allocation. Experiments in rodents indicate that eligible neurons compete for allocation to a given engram, with more excitable neurons winning this competition. Moreover, fluctuations in neuronal excitability determine how engrams interact, promoting either memory integration (via coallocation to overlapping engrams) or separation (via disallocation to nonoverlapping engrams). In parallel with rodent studies, recent findings in humans verify the importance of this memory integration process for linking memories that occur close in time or share related content. A deeper understanding of allocation promises to provide insights into the logic underlying how knowledge is normally organized in the brain and the disorders in which this process has gone awry. Expected final online publication date for the Annual Review of Neuroscience Volume 41 is July 8, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)

2002-01-01

The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
SAHAYOG: A Testbed for Load Sharing under Failure,

DTIC Science & Technology

1987-07-01

messages, shared memory and semaphores . To communicate using messages, processes create message queues using system-provided prim- itives. The message...The size of the memory that is to be shared is decided by the process when it makes a request for memory allocation. The semaphore option of IPC can be...used to prevent two or more concurrent processes from executing their critical sections at the same time. Semaphores must be used when the processes
Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Buntinas, D.; Mercier, G.; Gropp, W.

2007-09-01

This paper presents the implementation of MPICH2 over the Nemesis communication subsystem and the evaluation of its shared-memory performance. We describe design issues as well as some of the optimization techniques we employed. We conducted a performance evaluation over shared memory using microbenchmarks. The evaluation shows that MPICH2 Nemesis has very low communication overhead, making it suitable for smaller-grained applications.
Working memory resources are shared across sensory modalities.

PubMed

Salmela, V R; Moisala, M; Alho, K

2014-10-01

A common assumption in the working memory literature is that the visual and auditory modalities have separate and independent memory stores. Recent evidence on visual working memory has suggested that resources are shared between representations, and that the precision of representations sets the limit for memory performance. We tested whether memory resources are also shared across sensory modalities. Memory precision for two visual (spatial frequency and orientation) and two auditory (pitch and tone duration) features was measured separately for each feature and for all possible feature combinations. Thus, only the memory load was varied, from one to four features, while keeping the stimuli similar. In Experiment 1, two gratings and two tones-both containing two varying features-were presented simultaneously. In Experiment 2, two gratings and two tones-each containing only one varying feature-were presented sequentially. The memory precision (delayed discrimination threshold) for a single feature was close to the perceptual threshold. However, as the number of features to be remembered was increased, the discrimination thresholds increased more than twofold. Importantly, the decrease in memory precision did not depend on the modality of the other feature(s), or on whether the features were in the same or in separate objects. Hence, simultaneously storing one visual and one auditory feature had an effect on memory precision equal to those of simultaneously storing two visual or two auditory features. The results show that working memory is limited by the precision of the stored representations, and that working memory can be described as a resource pool that is shared across modalities.
Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

NASA Technical Reports Server (NTRS)

Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

2017-01-01

In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.

Parallel Computation of the Regional Ocean Modeling System (ROMS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, P; Song, Y T; Chao, Y

2005-04-05

The Regional Ocean Modeling System (ROMS) is a regional ocean general circulation modeling system solving the free surface, hydrostatic, primitive equations over varying topography. It is free software distributed world-wide for studying both complex coastal ocean problems and the basin-to-global scale ocean circulation. The original ROMS code could only be run on shared-memory systems. With the increasing need to simulate larger model domains with finer resolutions and on a variety of computer platforms, there is a need in the ocean-modeling community to have a ROMS code that can be run on any parallel computer ranging from 10 to hundreds ofmore » processors. Recently, we have explored parallelization for ROMS using the MPI programming model. In this paper, an efficient parallelization strategy for such a large-scale scientific software package, based on an existing shared-memory computing model, is presented. In addition, scientific applications and data-performance issues on a couple of SGI systems, including Columbia, the world's third-fastest supercomputer, are discussed.« less
Working Memory Span Development: A Time-Based Resource-Sharing Model Account

ERIC Educational Resources Information Center

Barrouillet, Pierre; Gavens, Nathalie; Vergauwe, Evie; Gaillard, Vinciane; Camos, Valerie

2009-01-01

The time-based resource-sharing model (P. Barrouillet, S. Bernardin, & V. Camos, 2004) assumes that during complex working memory span tasks, attention is frequently and surreptitiously switched from processing to reactivate decaying memory traces before their complete loss. Three experiments involving children from 5 to 14 years of age…
Direct access inter-process shared memory

DOEpatents

Brightwell, Ronald B; Pedretti, Kevin; Hudson, Trammell B

2013-10-22

A technique for directly sharing physical memory between processes executing on processor cores is described. The technique includes loading a plurality of processes into the physical memory for execution on a corresponding plurality of processor cores sharing the physical memory. An address space is mapped to each of the processes by populating a first entry in a top level virtual address table for each of the processes. The address space of each of the processes is cross-mapped into each of the processes by populating one or more subsequent entries of the top level virtual address table with the first entry in the top level virtual address table from other processes.
Grouping and binding in visual short-term memory.

PubMed

Quinlan, Philip T; Cohen, Dale J

2012-09-01

Findings of 2 experiments are reported that challenge the current understanding of visual short-term memory (VSTM). In both experiments, a single study display, containing 6 colored shapes, was presented briefly and then probed with a single colored shape. At stake is how VSTM retains a record of different objects that share common features: In the 1st experiment, 2 study items sometimes shared a common feature (either a shape or a color). The data revealed a color sharing effect, in which memory was much better for items that shared a common color than for items that did not. The 2nd experiment showed that the size of the color sharing effect depended on whether a single pair of items shared a common color or whether 2 pairs of items were so defined-memory for all items improved when 2 color groups were presented. In explaining performance, an account is advanced in which items compete for a fixed number of slots, but then memory recall for any given stored item is prone to error. A critical assumption is that items that share a common color are stored together in a slot as a chunk. The evidence provides further support for the idea that principles of perceptual organization may determine the manner in which items are stored in VSTM. PsycINFO Database Record (c) 2012 APA, all rights reserved.
Long-distance quantum key distribution with imperfect devices

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lo Piparo, Nicoló; Razavi, Mohsen

2014-12-04

Quantum key distribution over probabilistic quantum repeaters is addressed. We compare, under practical assumptions, two such schemes in terms of their secure key generation rate per memory, R{sub QKD}. The two schemes under investigation are the one proposed by Duan et al. in [Nat. 414, 413 (2001)] and that of Sangouard et al. proposed in [Phys. Rev. A 76, 050301 (2007)]. We consider various sources of imperfections in the latter protocol, such as a nonzero double-photon probability for the source, dark count per pulse, channel loss and inefficiencies in photodetectors and memories, to find the rate for different nesting levels.more » We determine the maximum value of the double-photon probability beyond which it is not possible to share a secret key anymore. We find the crossover distance for up to three nesting levels. We finally compare the two protocols.« less
Hardware/software codesign for embedded RISC core

NASA Astrophysics Data System (ADS)

Liu, Peng

2001-12-01

This paper describes hardware/software codesign method of the extendible embedded RISC core VIRGO, which based on MIPS-I instruction set architecture. VIRGO is described by Verilog hardware description language that has five-stage pipeline with shared 32-bit cache/memory interface, and it is controlled by distributed control scheme. Every pipeline stage has one small controller, which controls the pipeline stage status and cooperation among the pipeline phase. Since description use high level language and structure is distributed, VIRGO core has highly extension that can meet the requirements of application. We take look at the high-definition television MPEG2 MPHL decoder chip, constructed the hardware/software codesign virtual prototyping machine that can research on VIRGO core instruction set architecture, and system on chip memory size requirements, and system on chip software, etc. We also can evaluate the system on chip design and RISC instruction set based on the virtual prototyping machine platform.
SKIRT: Hybrid parallelization of radiative transfer simulations

NASA Astrophysics Data System (ADS)

Verstocken, S.; Van De Putte, D.; Camps, P.; Baes, M.

2017-07-01

We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modelling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behaviour of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.
Shared memories reveal shared structure in neural activity across individuals

PubMed Central

Chen, J.; Leong, Y.C.; Honey, C.J.; Yong, C.H.; Norman, K.A.; Hasson, U.

2016-01-01

Our lives revolve around sharing experiences and memories with others. When different people recount the same events, how similar are their underlying neural representations? Participants viewed a fifty-minute movie, then verbally described the events during functional MRI, producing unguided detailed descriptions lasting up to forty minutes. As each person spoke, event-specific spatial patterns were reinstated in default-network, medial-temporal, and high-level visual areas. Individual event patterns were both highly discriminable from one another and similar between people, suggesting consistent spatial organization. In many high-order areas, patterns were more similar between people recalling the same event than between recall and perception, indicating systematic reshaping of percept into memory. These results reveal the existence of a common spatial organization for memories in high-level cortical areas, where encoded information is largely abstracted beyond sensory constraints; and that neural patterns during perception are altered systematically across people into shared memory representations for real-life events. PMID:27918531
Hierarchical resilience with lightweight threads.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wheeler, Kyle Bruce

2011-10-01

This paper proposes methodology for providing robustness and resilience for a highly threaded distributed- and shared-memory environment based on well-defined inputs and outputs to lightweight tasks. These inputs and outputs form a failure 'barrier', allowing tasks to be restarted or duplicated as necessary. These barriers must be expanded based on task behavior, such as communication between tasks, but do not prohibit any given behavior. One of the trends in high-performance computing codes seems to be a trend toward self-contained functions that mimic functional programming. Software designers are trending toward a model of software design where their core functions are specifiedmore » in side-effect free or low-side-effect ways, wherein the inputs and outputs of the functions are well-defined. This provides the ability to copy the inputs to wherever they need to be - whether that's the other side of the PCI bus or the other side of the network - do work on that input using local memory, and then copy the outputs back (as needed). This design pattern is popular among new distributed threading environment designs. Such designs include the Barcelona STARS system, distributed OpenMP systems, the Habanero-C and Habanero-Java systems from Vivek Sarkar at Rice University, the HPX/ParalleX model from LSU, as well as our own Scalable Parallel Runtime effort (SPR) and the Trilinos stateless kernels. This design pattern is also shared by CUDA and several OpenMP extensions for GPU-type accelerators (e.g. the PGI OpenMP extensions).« less
I remember you: a role for memory in social cognition and the functional neuroanatomy of their interaction.

PubMed

Spreng, R Nathan; Mar, Raymond A

2012-01-05

Remembering events from the personal past (autobiographical memory) and inferring the thoughts and feelings of other people (mentalizing) share a neural substrate. The shared functional neuroanatomy of these processes has been demonstrated in a meta-analysis of independent task domains (Spreng, Mar & Kim, 2009) and within subjects performing both tasks (Rabin, Gilboa, Stuss, Mar, & Rosenbaum, 2010; Spreng & Grady, 2010). Here, we examine spontaneous low-frequency fluctuations in fMRI BOLD signal during rest from two separate regions key to memory and mentalizing, the left hippocampus and right temporal parietal junction, respectively. Activity in these two regions was then correlated with the entire brain in a resting-state functional connectivity analysis. Although the left hippocampus and right temporal parietal junction were not correlated with each other, both were correlated with a distributed network of brain regions. These regions were consistent with the previously observed overlap between autobiographical memory and mentalizing evoked brain activity found in past studies. Reliable patterns of overlap included the superior temporal sulcus, anterior temporal lobe, lateral inferior parietal cortex (angular gyrus), posterior cingulate cortex, dorsomedial and ventral prefrontal cortex, inferior frontal gyrus, and the amygdala. We propose that the functional overlap facilitates the integration of personal and interpersonal information and provides a means for personal experiences to become social conceptual knowledge. This knowledge, in turn, informs strategic social behavior in support of personal goals. In closing, we argue for a new perspective within social cognitive neuroscience, emphasizing the importance of memory in social cognition. Copyright © 2010 Elsevier B.V. All rights reserved.
I remember you: A role for memory in social cognition and the functional neuroanatomy of their interaction

PubMed Central

Spreng, R. Nathan; Mar, Raymond A.

2011-01-01

Remembering events from the personal past (autobiographical memory) and inferring the thoughts and feelings of other people (mentalizing) share a neural substrate. The shared functional neuroanatomy of these processes has been demonstrated in a meta-analysis of independent task domains (Spreng, Mar & Kim, 2009) and within subjects performing both tasks (Rabin, Gilboa, Stuss, Mar, & Rosenbaum, 2010; Spreng & Grady, 2010). Here, we examine spontaneous low-frequency fluctuations in fMRI BOLD signal during rest from two separate regions key to memory and mentalizing, the left hippocampus and right temporal parietal junction, respectively. Activity in these two regions was then correlated with the entire brain in a resting-state functional connectivity analysis. Although the left hippocampus and right temporal parietal junction were not correlated with each other, both were correlated with a distributed network of brain regions. These regions were consistent with the previously observed overlap between autobiographical memory and mentalizing evoked brain activity found in past studies. Reliable patterns of overlap included the superior temporal sulcus, anterior temporal lobe, lateral inferior parietal cortex (angular gyrus), posterior cingulate cortex, dorsomedial and ventral prefrontal cortex, inferior frontal gyrus, and the amygdala. We propose that the functional overlap facilitates the integration of personal and interpersonal information and provides a means for personal experiences to become social conceptual knowledge. This knowledge, in turn, informs strategic social behavior in support of personal goals. In closing, we argue for a new perspective within social cognitive neuroscience, emphasizing the importance of memory in social cognition. PMID:21172325
Effects of Network Structure, Competition and Memory Time on Social Spreading Phenomena

NASA Astrophysics Data System (ADS)

Gleeson, James P.; O'Sullivan, Kevin P.; Baños, Raquel A.; Moreno, Yamir

2016-04-01

Online social media has greatly affected the way in which we communicate with each other. However, little is known about what fundamental mechanisms drive dynamical information flow in online social systems. Here, we introduce a generative model for online sharing behavior that is analytically tractable and that can reproduce several characteristics of empirical micro-blogging data on hashtag usage, such as (time-dependent) heavy-tailed distributions of meme popularity. The presented framework constitutes a null model for social spreading phenomena that, in contrast to purely empirical studies or simulation-based models, clearly distinguishes the roles of two distinct factors affecting meme popularity: the memory time of users and the connectivity structure of the social network.
Sleep Benefits Memory for Semantic Category Structure While Preserving Exemplar-Specific Information.

PubMed

Schapiro, Anna C; McDevitt, Elizabeth A; Chen, Lang; Norman, Kenneth A; Mednick, Sara C; Rogers, Timothy T

2017-11-01

Semantic memory encompasses knowledge about both the properties that typify concepts (e.g. robins, like all birds, have wings) as well as the properties that individuate conceptually related items (e.g. robins, in particular, have red breasts). We investigate the impact of sleep on new semantic learning using a property inference task in which both kinds of information are initially acquired equally well. Participants learned about three categories of novel objects possessing some properties that were shared among category exemplars and others that were unique to an exemplar, with exposure frequency varying across categories. In Experiment 1, memory for shared properties improved and memory for unique properties was preserved across a night of sleep, while memory for both feature types declined over a day awake. In Experiment 2, memory for shared properties improved across a nap, but only for the lower-frequency category, suggesting a prioritization of weakly learned information early in a sleep period. The increase was significantly correlated with amount of REM, but was also observed in participants who did not enter REM, suggesting involvement of both REM and NREM sleep. The results provide the first evidence that sleep improves memory for the shared structure of object categories, while simultaneously preserving object-unique information.
The Raid distributed database system

NASA Technical Reports Server (NTRS)

Bhargava, Bharat; Riedl, John

1989-01-01

Raid, a robust and adaptable distributed database system for transaction processing (TP), is described. Raid is a message-passing system, with server processes on each site to manage concurrent processing, consistent replicated copies during site failures, and atomic distributed commitment. A high-level layered communications package provides a clean location-independent interface between servers. The latest design of the package delivers messages via shared memory in a configuration with several servers linked into a single process. Raid provides the infrastructure to investigate various methods for supporting reliable distributed TP. Measurements on TP and server CPU time are presented, along with data from experiments on communications software, consistent replicated copy control during site failures, and concurrent distributed checkpointing. A software tool for evaluating the implementation of TP algorithms in an operating-system kernel is proposed.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
Structurally Integrated Versus Structurally Segregated Memory Representations: Implications for the Design of Instructional Materials.

ERIC Educational Resources Information Center

Hayes-Roth, Barbara

Two kinds of memory organization are distinguished: segregrated versus integrated. In segregated memory organizations, related learned propositions have separate memory representations. In integrated memory organizations, memory representations of related propositions share common subrepresentations. Segregated memory organizations facilitate…
Getting connected: Both associative and semantic links structure semantic memory for newly learned persons.

PubMed

Wiese, Holger; Schweinberger, Stefan R

2015-01-01

The present study examined whether semantic memory for newly learned people is structured by visual co-occurrence, shared semantics, or both. Participants were trained with pairs of simultaneously presented (i.e., co-occurring) preexperimentally unfamiliar faces, which either did or did not share additionally provided semantic information (occupation, place of living, etc.). Semantic information could also be shared between faces that did not co-occur. A subsequent priming experiment revealed faster responses for both co-occurrence/no shared semantics and no co-occurrence/shared semantics conditions, than for an unrelated condition. Strikingly, priming was strongest in the co-occurrence/shared semantics condition, suggesting additive effects of these factors. Additional analysis of event-related brain potentials yielded priming in the N400 component only for combined effects of visual co-occurrence and shared semantics, with more positive amplitudes in this than in the unrelated condition. Overall, these findings suggest that both semantic relatedness and visual co-occurrence are important when novel information is integrated into person-related semantic memory.
Memory T and memory B cells share a transcriptional program of self-renewal with long-term hematopoietic stem cells

PubMed Central

Luckey, Chance John; Bhattacharya, Deepta; Goldrath, Ananda W.; Weissman, Irving L.; Benoist, Christophe; Mathis, Diane

2006-01-01

The only cells of the hematopoietic system that undergo self-renewal for the lifetime of the organism are long-term hematopoietic stem cells and memory T and B cells. To determine whether there is a shared transcriptional program among these self-renewing populations, we first compared the gene-expression profiles of naïve, effector and memory CD8+ T cells with those of long-term hematopoietic stem cells, short-term hematopoietic stem cells, and lineage-committed progenitors. Transcripts augmented in memory CD8+ T cells relative to naïve and effector T cells were selectively enriched in long-term hematopoietic stem cells and were progressively lost in their short-term and lineage-committed counterparts. Furthermore, transcripts selectively decreased in memory CD8+ T cells were selectively down-regulated in long-term hematopoietic stem cells and progressively increased with differentiation. To confirm that this pattern was a general property of immunologic memory, we turned to independently generated gene expression profiles of memory, naïve, germinal center, and plasma B cells. Once again, memory-enriched and -depleted transcripts were also appropriately augmented and diminished in long-term hematopoietic stem cells, and their expression correlated with progressive loss of self-renewal function. Thus, there appears to be a common signature of both up- and down-regulated transcripts shared between memory T cells, memory B cells, and long-term hematopoietic stem cells. This signature was not consistently enriched in neural or embryonic stem cell populations and, therefore, appears to be restricted to the hematopoeitic system. These observations provide evidence that the shared phenotype of self-renewal in the hematopoietic system is linked at the molecular level. PMID:16492737
Categorical and associative relations increase false memory relative to purely associative relations.

PubMed

Coane, Jennifer H; McBride, Dawn M; Termonen, Miia-Liisa; Cutting, J Cooper

2016-01-01

The goal of the present study was to examine the contributions of associative strength and similarity in terms of shared features to the production of false memories in the Deese/Roediger-McDermott list-learning paradigm. Whereas the activation/monitoring account suggests that false memories are driven by automatic associative activation from list items to nonpresented lures, combined with errors in source monitoring, other accounts (e.g., fuzzy trace theory, global-matching models) emphasize the importance of semantic-level similarity, and thus predict that shared features between list and lure items will increase false memory. Participants studied lists of nine items related to a nonpresented lure. Half of the lists consisted of items that were associated but did not share features with the lure, and the other half included items that were equally associated but also shared features with the lure (in many cases, these were taxonomically related items). The two types of lists were carefully matched in terms of a variety of lexical and semantic factors, and the same lures were used across list types. In two experiments, false recognition of the critical lures was greater following the study of lists that shared features with the critical lure, suggesting that similarity at a categorical or taxonomic level contributes to false memory above and beyond associative strength. We refer to this phenomenon as a "feature boost" that reflects additive effects of shared meaning and association strength and is generally consistent with accounts of false memory that have emphasized thematic or feature-level similarity among studied and nonstudied representations.
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Saini, Subhash; Grassi, Charles

1994-01-01

The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.

Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Howison, Mark; Bethel, E. Wes; Childs, Hank

2012-01-01

With the computing industry trending towards multi- and many-core processors, we study how a standard visualization algorithm, ray-casting volume rendering, can benefit from a hybrid parallelism approach. Hybrid parallelism provides the best of both worlds: using distributed-memory parallelism across a large numbers of nodes increases available FLOPs and memory, while exploiting shared-memory parallelism among the cores within each node ensures that each node performs its portion of the larger calculation as efficiently as possible. We demonstrate results from weak and strong scaling studies, at levels of concurrency ranging up to 216,000, and with datasets as large as 12.2 trillion cells.more » The greatest benefit from hybrid parallelism lies in the communication portion of the algorithm, the dominant cost at higher levels of concurrency. We show that reducing the number of participants with a hybrid approach significantly improves performance.« less
An Evaluation of Architectural Platforms for Parallel Navier-Stokes Computations

NASA Technical Reports Server (NTRS)

Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.

1996-01-01

We study the computational, communication, and scalability characteristics of a computational fluid dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architecture platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and distributed memory multiprocessors with different topologies - the IBM SP and the Cray T3D. We investigate the impact of various networks connecting the cluster of workstations on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Parallelizing Navier-Stokes Computations on a Variety of Architectural Platforms

NASA Technical Reports Server (NTRS)

Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.

1997-01-01

We study the computational, communication, and scalability characteristics of a Computational Fluid Dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architectural platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), distributed memory multiprocessors with different topologies-the IBM SP and the Cray T3D. We investigate the impact of various networks, connecting the cluster of workstations, on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
The attentional blink reveals serial working memory encoding: evidence from virtual and human event-related potentials.

PubMed

Craston, Patrick; Wyble, Brad; Chennu, Srivas; Bowman, Howard

2009-03-01

Observers often miss a second target (T2) if it follows an identified first target item (T1) within half a second in rapid serial visual presentation (RSVP), a finding termed the attentional blink. If two targets are presented in immediate succession, however, accuracy is excellent (Lag 1 sparing). The resource sharing hypothesis proposes a dynamic distribution of resources over a time span of up to 600 msec during the attentional blink. In contrast, the ST(2) model argues that working memory encoding is serial during the attentional blink and that, due to joint consolidation, Lag 1 is the only case where resources are shared. Experiment 1 investigates the P3 ERP component evoked by targets in RSVP. The results suggest that, in this context, P3 amplitude is an indication of bottom-up strength rather than a measure of cognitive resource allocation. Experiment 2, employing a two-target paradigm, suggests that T1 consolidation is not affected by the presentation of T2 during the attentional blink. However, if targets are presented in immediate succession (Lag 1 sparing), they are jointly encoded into working memory. We use the ST(2) model's neural network implementation, which replicates a range of behavioral results related to the attentional blink, to generate "virtual ERPs" by summing across activation traces. We compare virtual to human ERPs and show how the results suggest a serial nature of working memory encoding as implied by the ST(2) model.
Shared Memory Parallelization of an Implicit ADI-type CFD Code

NASA Technical Reports Server (NTRS)

Hauser, Th.; Huang, P. G.

1999-01-01

A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications

NASA Astrophysics Data System (ADS)

Francés, J.; Otero, B.; Bleda, S.; Gallego, S.; Neipp, C.; Márquez, A.; Beléndez, A.

2015-06-01

The Finite-Difference Time-Domain (FDTD) method is applied to the analysis of vibroacoustic problems and to study the propagation of longitudinal and transversal waves in a stratified media. The potential of the scheme and the relevance of each acceleration strategy for massively computations in FDTD are demonstrated in this work. In this paper, we propose two new specific implementations of the bi-dimensional scheme of the FDTD method using multi-CPU and multi-GPU, respectively. In the first implementation, an open source message passing interface (OMPI) has been included in order to massively exploit the resources of a biprocessor station with two Intel Xeon processors. Moreover, regarding CPU code version, the streaming SIMD extensions (SSE) and also the advanced vectorial extensions (AVX) have been included with shared memory approaches that take advantage of the multi-core platforms. On the other hand, the second implementation called the multi-GPU code version is based on Peer-to-Peer communications available in CUDA on two GPUs (NVIDIA GTX 670). Subsequently, this paper presents an accurate analysis of the influence of the different code versions including shared memory approaches, vector instructions and multi-processors (both CPU and GPU) and compares them in order to delimit the degree of improvement of using distributed solutions based on multi-CPU and multi-GPU. The performance of both approaches was analysed and it has been demonstrated that the addition of shared memory schemes to CPU computing improves substantially the performance of vector instructions enlarging the simulation sizes that use efficiently the cache memory of CPUs. In this case GPU computing is slightly twice times faster than the fine tuned CPU version in both cases one and two nodes. However, for massively computations explicit vector instructions do not worth it since the memory bandwidth is the limiting factor and the performance tends to be the same than the sequential version with auto-vectorisation and also shared memory approach. In this scenario GPU computing is the best option since it provides a homogeneous behaviour. More specifically, the speedup of GPU computing achieves an upper limit of 12 for both one and two GPUs, whereas the performance reaches peak values of 80 GFlops and 146 GFlops for the performance for one GPU and two GPUs respectively. Finally, the method is applied to an earth crust profile in order to demonstrate the potential of our approach and the necessity of applying acceleration strategies in these type of applications.
The Locus Preservation Hypothesis: Shared Linguistic Profiles across Developmental Disorders and the Resilient Part of the Human Language Faculty

PubMed Central

Leivada, Evelina; Kambanaros, Maria; Grohmann, Kleanthes K.

2017-01-01

Grammatical markers are not uniformly impaired across speakers of different languages, even when speakers share a diagnosis and the marker in question is grammaticalized in a similar way in these languages. The aim of this work is to demarcate, from a cross-linguistic perspective, the linguistic phenotype of three genetically heterogeneous developmental disorders: specific language impairment, Down syndrome, and autism spectrum disorder. After a systematic review of linguistic profiles targeting mainly English-, Greek-, Catalan-, and Spanish-speaking populations with developmental disorders (n = 880), shared loci of impairment are identified and certain domains of grammar are shown to be more vulnerable than others. The distribution of impaired loci is captured by the Locus Preservation Hypothesis which suggests that specific parts of the language faculty are immune to impairment across developmental disorders. Through the Locus Preservation Hypothesis, a classical chicken and egg question can be addressed: Do poor conceptual resources and memory limitations result in an atypical grammar or does a grammatical breakdown lead to conceptual and memory limitations? Overall, certain morphological markers reveal themselves as highly susceptible to impairment, while syntactic operations are preserved, granting support to the first scenario. The origin of resilient syntax is explained from a phylogenetic perspective in connection to the “syntax-before-phonology” hypothesis. PMID:29081756
Shared processing in multiple object tracking and visual working memory in the absence of response order and task order confounds

PubMed Central

Howe, Piers D. L.

2017-01-01

To understand how the visual system represents multiple moving objects and how those representations contribute to tracking, it is essential that we understand how the processes of attention and working memory interact. In the work described here we present an investigation of that interaction via a series of tracking and working memory dual-task experiments. Previously, it has been argued that tracking is resistant to disruption by a concurrent working memory task and that any apparent disruption is in fact due to observers making a response to the working memory task, rather than due to competition for shared resources. Contrary to this, in our experiments we find that when task order and response order confounds are avoided, all participants show a similar decrease in both tracking and working memory performance. However, if task and response order confounds are not adequately controlled for we find substantial individual differences, which could explain the previous conflicting reports on this topic. Our results provide clear evidence that tracking and working memory tasks share processing resources. PMID:28410383
Shared processing in multiple object tracking and visual working memory in the absence of response order and task order confounds.

PubMed

Lapierre, Mark D; Cropper, Simon J; Howe, Piers D L

2017-01-01

To understand how the visual system represents multiple moving objects and how those representations contribute to tracking, it is essential that we understand how the processes of attention and working memory interact. In the work described here we present an investigation of that interaction via a series of tracking and working memory dual-task experiments. Previously, it has been argued that tracking is resistant to disruption by a concurrent working memory task and that any apparent disruption is in fact due to observers making a response to the working memory task, rather than due to competition for shared resources. Contrary to this, in our experiments we find that when task order and response order confounds are avoided, all participants show a similar decrease in both tracking and working memory performance. However, if task and response order confounds are not adequately controlled for we find substantial individual differences, which could explain the previous conflicting reports on this topic. Our results provide clear evidence that tracking and working memory tasks share processing resources.
Visual and spatial working memory are not that dissociated after all: a time-based resource-sharing account.

PubMed

Vergauwe, Evie; Barrouillet, Pierre; Camos, Valérie

2009-07-01

Examinations of interference between visual and spatial materials in working memory have suggested domain- and process-based fractionations of visuo-spatial working memory. The present study examined the role of central time-based resource sharing in visuo-spatial working memory and assessed its role in obtained interference patterns. Visual and spatial storage were combined with both visual and spatial on-line processing components in computer-paced working memory span tasks (Experiment 1) and in a selective interference paradigm (Experiment 2). The cognitive load of the processing components was manipulated to investigate its impact on concurrent maintenance for both within-domain and between-domain combinations of processing and storage components. In contrast to both domain- and process-based fractionations of visuo-spatial working memory, the results revealed that recall performance was determined by the cognitive load induced by the processing of items, rather than by the domain to which those items pertained. These findings are interpreted as evidence for a time-based resource-sharing mechanism in visuo-spatial working memory.
[Neuroscience and collective memory: memory schemas linking brain, societies and cultures].

PubMed

Legrand, Nicolas; Gagnepain, Pierre; Peschanski, Denis; Eustache, Francis

2015-01-01

During the last two decades, the effect of intersubjective relationships on cognition has been an emerging topic in cognitive neurosciences leading through a so-called "social turn" to the formation of new domains integrating society and cultures to this research area. Such inquiry has been recently extended to collective memory studies. Collective memory refers to shared representations that are constitutive of the identity of a group and distributed among all its members connected by a common history. After briefly describing those evolutions in the study of human brain and behaviors, we review recent researches that have brought together cognitive psychology, neuroscience and social sciences into collective memory studies. Using the reemerging concept of memory schema, we propose a theoretical framework allowing to account for collective memories formation with a specific focus on the encoding process of historical events. We suggest that (1) if the concept of schema has been mainly used to describe rather passive framework of knowledge, such structure may also be implied in more active fashions in the understanding of significant collective events. And, (2) if some schema researches have restricted themselves to the individual level of inquiry, we describe a strong coherence between memory and cultural frameworks. Integrating the neural basis and properties of memory schema to collective memory studies may pave the way toward a better understanding of the reciprocal interaction between individual memories and cultural resources such as media or education. © Société de Biologie, 2016.
Why are you telling me that? A conceptual model of the social function of autobiographical memory.

PubMed

Alea, Nicole; Bluck, Susan

2003-03-01

In an effort to stimulate and guide empirical work within a functional framework, this paper provides a conceptual model of the social functions of autobiographical memory (AM) across the lifespan. The model delineates the processes and variables involved when AMs are shared to serve social functions. Components of the model include: lifespan contextual influences, the qualitative characteristics of memory (emotionality and level of detail recalled), the speaker's characteristics (age, gender, and personality), the familiarity and similarity of the listener to the speaker, the level of responsiveness during the memory-sharing process, and the nature of the social relationship in which the memory sharing occurs (valence and length of the relationship). These components are shown to influence the type of social function served and/or, the extent to which social functions are served. Directions for future empirical work to substantiate the model and hypotheses derived from the model are provided.
Linking agent-based models and stochastic models of financial markets

PubMed Central

Feng, Ling; Li, Baowen; Podobnik, Boris; Preis, Tobias; Stanley, H. Eugene

2012-01-01

It is well-known that financial asset returns exhibit fat-tailed distributions and long-term memory. These empirical features are the main objectives of modeling efforts using (i) stochastic processes to quantitatively reproduce these features and (ii) agent-based simulations to understand the underlying microscopic interactions. After reviewing selected empirical and theoretical evidence documenting the behavior of traders, we construct an agent-based model to quantitatively demonstrate that “fat” tails in return distributions arise when traders share similar technical trading strategies and decisions. Extending our behavioral model to a stochastic model, we derive and explain a set of quantitative scaling relations of long-term memory from the empirical behavior of individual market participants. Our analysis provides a behavioral interpretation of the long-term memory of absolute and squared price returns: They are directly linked to the way investors evaluate their investments by applying technical strategies at different investment horizons, and this quantitative relationship is in agreement with empirical findings. Our approach provides a possible behavioral explanation for stochastic models for financial systems in general and provides a method to parameterize such models from market data rather than from statistical fitting. PMID:22586086
Linking agent-based models and stochastic models of financial markets.

PubMed

Feng, Ling; Li, Baowen; Podobnik, Boris; Preis, Tobias; Stanley, H Eugene

2012-05-29

It is well-known that financial asset returns exhibit fat-tailed distributions and long-term memory. These empirical features are the main objectives of modeling efforts using (i) stochastic processes to quantitatively reproduce these features and (ii) agent-based simulations to understand the underlying microscopic interactions. After reviewing selected empirical and theoretical evidence documenting the behavior of traders, we construct an agent-based model to quantitatively demonstrate that "fat" tails in return distributions arise when traders share similar technical trading strategies and decisions. Extending our behavioral model to a stochastic model, we derive and explain a set of quantitative scaling relations of long-term memory from the empirical behavior of individual market participants. Our analysis provides a behavioral interpretation of the long-term memory of absolute and squared price returns: They are directly linked to the way investors evaluate their investments by applying technical strategies at different investment horizons, and this quantitative relationship is in agreement with empirical findings. Our approach provides a possible behavioral explanation for stochastic models for financial systems in general and provides a method to parameterize such models from market data rather than from statistical fitting.
Destination memory impairment in older people.

PubMed

Gopie, Nigel; Craik, Fergus I M; Hasher, Lynn

2010-12-01

Older adults are assumed to have poor destination memory-knowing to whom they tell particular information-and anecdotes about them repeating stories to the same people are cited as informal evidence for this claim. Experiment 1 assessed young and older adults' destination memory by having participants tell facts (e.g., "A dime has 118 ridges around its edge") to pictures of famous people (e.g., Oprah Winfrey). Surprise recognition memory tests, which also assessed confidence, revealed that older adults, compared to young adults, were disproportionately impaired on destination memory relative to spared memory for the individual components (i.e., facts, faces) of the episode. Older adults also were more confident that they had not told a fact to a particular person when they actually had (i.e., a miss); this presumably causes them to repeat information more often than young adults. When the direction of information transfer was reversed in Experiment 2, such that the famous people shared information with the participants (i.e., a source memory experiment), age-related memory differences disappeared. In contrast to the destination memory experiment, older adults in the source memory experiment were more confident than young adults that someone had shared a fact with them when a different person actually had shared the fact (i.e., a false alarm). Overall, accuracy and confidence jointly influence age-related changes to destination memory, a fundamental component of successful communication. (c) 2010 APA, all rights reserved).
Thread Migration in the Presence of Pointers

NASA Technical Reports Server (NTRS)

Cronk, David; Haines, Matthew; Mehrotra, Piyush

1996-01-01

Dynamic migration of lightweight threads supports both data locality and load balancing. However, migrating threads that contain pointers referencing data in both the stack and heap remains an open problem. In this paper we describe a technique by which threads with pointers referencing both stack and non-shared heap data can be migrated such that the pointers remain valid after migration. As a result, threads containing pointers can now be migrated between processors in a homogeneous distributed memory environment.
Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster

NASA Technical Reports Server (NTRS)

Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland

2003-01-01

In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
[Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].

PubMed

Furuta, Takuya; Sato, Tatsuhiko

2015-01-01

Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.
PCI bus content-addressable-memory (CAM) implementation on FPGA for pattern recognition/image retrieval in a distributed environment

NASA Astrophysics Data System (ADS)

Megherbi, Dalila B.; Yan, Yin; Tanmay, Parikh; Khoury, Jed; Woods, C. L.

2004-11-01

Recently surveillance and Automatic Target Recognition (ATR) applications are increasing as the cost of computing power needed to process the massive amount of information continues to fall. This computing power has been made possible partly by the latest advances in FPGAs and SOPCs. In particular, to design and implement state-of-the-Art electro-optical imaging systems to provide advanced surveillance capabilities, there is a need to integrate several technologies (e.g. telescope, precise optics, cameras, image/compute vision algorithms, which can be geographically distributed or sharing distributed resources) into a programmable system and DSP systems. Additionally, pattern recognition techniques and fast information retrieval, are often important components of intelligent systems. The aim of this work is using embedded FPGA as a fast, configurable and synthesizable search engine in fast image pattern recognition/retrieval in a distributed hardware/software co-design environment. In particular, we propose and show a low cost Content Addressable Memory (CAM)-based distributed embedded FPGA hardware architecture solution with real time recognition capabilities and computing for pattern look-up, pattern recognition, and image retrieval. We show how the distributed CAM-based architecture offers a performance advantage of an order-of-magnitude over RAM-based architecture (Random Access Memory) search for implementing high speed pattern recognition for image retrieval. The methods of designing, implementing, and analyzing the proposed CAM based embedded architecture are described here. Other SOPC solutions/design issues are covered. Finally, experimental results, hardware verification, and performance evaluations using both the Xilinx Virtex-II and the Altera Apex20k are provided to show the potential and power of the proposed method for low cost reconfigurable fast image pattern recognition/retrieval at the hardware/software co-design level.
HYDRA : High-speed simulation architecture for precision spacecraft formation simulation

NASA Technical Reports Server (NTRS)

Martin, Bryan J.; Sohl, Garett.

2003-01-01

e Hierarchical Distributed Reconfigurable Architecture- is a scalable simulation architecture that provides flexibility and ease-of-use which take advantage of modern computation and communication hardware. It also provides the ability to implement distributed - or workstation - based simulations and high-fidelity real-time simulation from a common core. Originally designed to serve as a research platform for examining fundamental challenges in formation flying simulation for future space missions, it is also finding use in other missions and applications, all of which can take advantage of the underlying Object-Oriented structure to easily produce distributed simulations. Hydra automates the process of connecting disparate simulation components (Hydra Clients) through a client server architecture that uses high-level descriptions of data associated with each client to find and forge desirable connections (Hydra Services) at run time. Services communicate through the use of Connectors, which abstract messaging to provide single-interface access to any desired communication protocol, such as from shared-memory message passing to TCP/IP to ACE and COBRA. Hydra shares many features with the HLA, although providing more flexibility in connectivity services and behavior overriding.

Interference due to shared features between action plans is influenced by working memory span.

PubMed

Fournier, Lisa R; Behmer, Lawrence P; Stubblefield, Alexandra M

2014-12-01

In this study, we examined the interactions between the action plans that we hold in memory and the actions that we carry out, asking whether the interference due to shared features between action plans is due to selection demands imposed on working memory. Individuals with low and high working memory spans learned arbitrary motor actions in response to two different visual events (A and B), presented in a serial order. They planned a response to the first event (A) and while maintaining this action plan in memory they then executed a speeded response to the second event (B). Afterward, they executed the action plan for the first event (A) maintained in memory. Speeded responses to the second event (B) were delayed when it shared an action feature (feature overlap) with the first event (A), relative to when it did not (no feature overlap). The size of the feature-overlap delay was greater for low-span than for high-span participants. This indicates that interference due to overlapping action plans is greater when fewer working memory resources are available, suggesting that this interference is due to selection demands imposed on working memory. Thus, working memory plays an important role in managing current and upcoming action plans, at least for newly learned tasks. Also, managing multiple action plans is compromised in individuals who have low versus high working memory spans.
Destination Memory Impairment in Older People

PubMed Central

Gopie, Nigel; Craik, Fergus I. M.; Hasher, Lynn

2012-01-01

Older adults are assumed to have poor destination memory— knowing to whom they tell particular information—and anecdotes about them repeating stories to the same people are cited as informal evidence for this claim. Experiment 1 assessed young and older adults’ destination memory by having participants tell facts (e.g., “A dime has 118 ridges around its edge”) to pictures of famous people (e.g., Oprah Winfrey). Surprise recognition memory tests, which also assessed confidence, revealed that older adults, compared to young adults, were disproportionately impaired on destination memory relative to spared memory for the individual components (i.e., facts, faces) of the episode. Older adults also were more confident that they had not told a fact to a particular person when they actually had (i.e., a miss); this presumably causes them to repeat information more often than young adults. When the direction of information transfer was reversed in Experiment 2, such that the famous people shared information with the participants (i.e., a source memory experiment), age-related memory differences disappeared. In contrast to the destination memory experiment, older adults in the source memory experiment were more confident than young adults that someone had shared a fact with them when a different person actually had shared the fact (i.e., a false alarm). Overall, accuracy and confidence jointly influence age-related changes to destination memory, a fundamental component of successful communication. PMID:20718537
Flexible language constructs for large parallel programs

NASA Technical Reports Server (NTRS)

Rosing, Matthew; Schnabel, Robert

1993-01-01

The goal of the research described is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (MIMD) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include SIMD (Single Instruction Multiple Data), SPMD (Single Program Multiple Data), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression of the variety of algorithms that occur in large scientific computations. An overview of a new language that combines many of these programming models in a clean manner is given. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. An overview of the language and discussion of some of the critical implementation details is given.
Low latency memory access and synchronization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.

A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processormore » only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.« less
Low latency memory access and synchronization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.

A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processormore » only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.« less
Shared reality in intergroup communication: Increasing the epistemic authority of an out-group audience.

PubMed

Echterhoff, Gerald; Kopietz, René; Higgins, E Tory

2017-06-01

Communicators typically tune messages to their audience's attitude. Such audience tuning biases communicators' memory for the topic toward the audience's attitude to the extent that they create a shared reality with the audience. To investigate shared reality in intergroup communication, we first established that a reduced memory bias after tuning messages to an out-group (vs. in-group) audience is a subtle index of communicators' denial of shared reality to that out-group audience (Experiments 1a and 1b). We then examined whether the audience-tuning memory bias might emerge when the out-group audience's epistemic authority is enhanced, either by increasing epistemic expertise concerning the communication topic or by creating epistemic consensus among members of a multiperson out-group audience. In Experiment 2, when Germans communicated to a Turkish audience with an attitude about a Turkish (vs. German) target, the audience-tuning memory bias appeared. In Experiment 3, when the audience of German communicators consisted of 3 Turks who all held the same attitude toward the target, the memory bias again appeared. The association between message valence and memory valence was consistently higher when the audience's epistemic authority was high (vs. low). An integrative analysis across all studies also suggested that the memory bias increases with increasing strength of epistemic inputs (epistemic expertise, epistemic consensus, and audience-tuned message production). The findings suggest novel ways of overcoming intergroup biases in intergroup relations. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Integrating Cache Performance Modeling and Tuning Support in Parallelization Tools

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

1998-01-01

With the resurgence of distributed shared memory (DSM) systems based on cache-coherent Non Uniform Memory Access (ccNUMA) architectures and increasing disparity between memory and processors speeds, data locality overheads are becoming the greatest bottlenecks in the way of realizing potential high performance of these systems. While parallelization tools and compilers facilitate the users in porting their sequential applications to a DSM system, a lot of time and effort is needed to tune the memory performance of these applications to achieve reasonable speedup. In this paper, we show that integrating cache performance modeling and tuning support within a parallelization environment can alleviate this problem. The Cache Performance Modeling and Prediction Tool (CPMP), employs trace-driven simulation techniques without the overhead of generating and managing detailed address traces. CPMP predicts the cache performance impact of source code level "what-if" modifications in a program to assist a user in the tuning process. CPMP is built on top of a customized version of the Computer Aided Parallelization Tools (CAPTools) environment. Finally, we demonstrate how CPMP can be applied to tune a real Computational Fluid Dynamics (CFD) application.
The Developmental Influence of Primary Memory Capacity on Working Memory and Academic Achievement

PubMed Central

2015-01-01

In this study, we investigate the development of primary memory capacity among children. Children between the ages of 5 and 8 completed 3 novel tasks (split span, interleaved lists, and a modified free-recall task) that measured primary memory by estimating the number of items in the focus of attention that could be spontaneously recalled in serial order. These tasks were calibrated against traditional measures of simple and complex span. Clear age-related changes in these primary memory estimates were observed. There were marked individual differences in primary memory capacity, but each novel measure was predictive of simple span performance. Among older children, each measure shared variance with reading and mathematics performance, whereas for younger children, the interleaved lists task was the strongest single predictor of academic ability. We argue that these novel tasks have considerable potential for the measurement of primary memory capacity and provide new, complementary ways of measuring the transient memory processes that predict academic performance. The interleaved lists task also shared features with interference control tasks, and our findings suggest that young children have a particular difficulty in resisting distraction and that variance in the ability to resist distraction is also shared with measures of educational attainment. PMID:26075630
The developmental influence of primary memory capacity on working memory and academic achievement.

PubMed

Hall, Debbora; Jarrold, Christopher; Towse, John N; Zarandi, Amy L

2015-08-01

In this study, we investigate the development of primary memory capacity among children. Children between the ages of 5 and 8 completed 3 novel tasks (split span, interleaved lists, and a modified free-recall task) that measured primary memory by estimating the number of items in the focus of attention that could be spontaneously recalled in serial order. These tasks were calibrated against traditional measures of simple and complex span. Clear age-related changes in these primary memory estimates were observed. There were marked individual differences in primary memory capacity, but each novel measure was predictive of simple span performance. Among older children, each measure shared variance with reading and mathematics performance, whereas for younger children, the interleaved lists task was the strongest single predictor of academic ability. We argue that these novel tasks have considerable potential for the measurement of primary memory capacity and provide new, complementary ways of measuring the transient memory processes that predict academic performance. The interleaved lists task also shared features with interference control tasks, and our findings suggest that young children have a particular difficulty in resisting distraction and that variance in the ability to resist distraction is also shared with measures of educational attainment. (c) 2015 APA, all rights reserved).
Conditional load and store in a shared memory

DOEpatents

Blumrich, Matthias A; Ohmacht, Martin

2015-02-03

A method, system and computer program product for implementing load-reserve and store-conditional instructions in a multi-processor computing system. The computing system includes a multitude of processor units and a shared memory cache, and each of the processor units has access to the memory cache. In one embodiment, the method comprises providing the memory cache with a series of reservation registers, and storing in these registers addresses reserved in the memory cache for the processor units as a result of issuing load-reserve requests. In this embodiment, when one of the processor units makes a request to store data in the memory cache using a store-conditional request, the reservation registers are checked to determine if an address in the memory cache is reserved for that processor unit. If an address in the memory cache is reserved for that processor, the data are stored at this address.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

NASA Technical Reports Server (NTRS)

Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.

2003-01-01

Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.
Measuring Transactiving Memory Systems Using Network Analysis

ERIC Educational Resources Information Center

King, Kylie Goodell

2017-01-01

Transactive memory systems (TMSs) describe the structures and processes that teams use to share information, work together, and accomplish shared goals. First introduced over three decades ago, TMSs have been measured in a variety of ways. This dissertation proposes the use of network analysis in measuring TMS. This is accomplished by describing…
Operator Influence of Unexploded Ordnance Sensor Technologies

DTIC Science & Technology

2007-03-01

chart display ActiveX control Mscomct2.dll – date/time display ActiveX control Pnpscr.dll – Systran SCRAMNet replicated shared memory device...response value database rgm_p2.dll – Phase 2 shared memory API and implementation Commercial components StripM.ocx – strip chart display ActiveX
Runtime support for parallelizing data mining algorithms

NASA Astrophysics Data System (ADS)

Jin, Ruoming; Agrawal, Gagan

2002-03-01

With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S

2015-01-01

This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
3D Navier-Stokes Flow Analysis for Shared and Distributed Memory MIMD Computers

DTIC Science & Technology

1992-09-15

arithmetical averaged density or Stefan -Boltzmann constant (= 5.67032 x 10-8 ) Oai+1/2 intermediate term for Harten-Yee fluxes - k, O’ constants for k...system of algebraic equations. These equations I are solved using point Gauss- Seidel relaxation. This relaxation scheme is modified to be a lower-upper...interaction of the radiation with the gas. The radiative heat flux per unit area is then I = -(T [EwT - awTdb] (19) Here a is the Stefan Boltzmann
Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster

NASA Technical Reports Server (NTRS)

Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)

2002-01-01

In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
A portable approach for PIC on emerging architectures

NASA Astrophysics Data System (ADS)

Decyk, Viktor

2016-03-01

A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.
Concurrent working memory load can facilitate selective attention: evidence for specialized load.

PubMed

Park, Soojin; Kim, Min-Shik; Chun, Marvin M

2007-10-01

Load theory predicts that concurrent working memory load impairs selective attention and increases distractor interference (N. Lavie, A. Hirst, J. W. de Fockert, & E. Viding). Here, the authors present new evidence that the type of concurrent working memory load determines whether load impairs selective attention or not. Working memory load was paired with a same/different matching task that required focusing on targets while ignoring distractors. When working memory items shared the same limited-capacity processing mechanisms with targets in the matching task, distractor interference increased. However, when working memory items shared processing with distractors in the matching task, distractor interference decreased, facilitating target selection. A specialized load account is proposed to describe the dissociable effects of working memory load on selective processing depending on whether the load overlaps with targets or with distractors. (c) 2007 APA
Transactive memory systems scale for couples: development and validation

PubMed Central

Hewitt, Lauren Y.; Roberts, Lynne D.

2015-01-01

People in romantic relationships can develop shared memory systems by pooling their cognitive resources, allowing each person access to more information but with less cognitive effort. Research examining such memory systems in romantic couples largely focuses on remembering word lists or performing lab-based tasks, but these types of activities do not capture the processes underlying couples’ transactive memory systems, and may not be representative of the ways in which romantic couples use their shared memory systems in everyday life. We adapted an existing measure of transactive memory systems for use with romantic couples (TMSS-C), and conducted an initial validation study. In total, 397 participants who each identified as being a member of a romantic relationship of at least 3 months duration completed the study. The data provided a good fit to the anticipated three-factor structure of the components of couples’ transactive memory systems (specialization, credibility and coordination), and there was reasonable evidence of both convergent and divergent validity, as well as strong evidence of test–retest reliability across a 2-week period. The TMSS-C provides a valuable tool that can quickly and easily capture the underlying components of romantic couples’ transactive memory systems. It has potential to help us better understand this intriguing feature of romantic relationships, and how shared memory systems might be associated with other important features of romantic relationships. PMID:25999873

Fast analysis of molecular dynamics trajectories with graphics processing units-Radial distribution function histogramming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Levine, Benjamin G., E-mail: ben.levine@temple.ed; Stone, John E., E-mail: johns@ks.uiuc.ed; Kohlmeyer, Axel, E-mail: akohlmey@temple.ed

2011-05-01

The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU's memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm aremore » presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 s per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis.« less
Fast Analysis of Molecular Dynamics Trajectories with Graphics Processing Units—Radial Distribution Function Histogramming

PubMed Central

Stone, John E.; Kohlmeyer, Axel

2011-01-01

The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU’s memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm are presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 seconds per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis. PMID:21547007
DMA shared byte counters in a parallel computer

DOEpatents

Chen, Dong; Gara, Alan G.; Heidelberger, Philip; Vranas, Pavlos

2010-04-06

A parallel computer system is constructed as a network of interconnected compute nodes. Each of the compute nodes includes at least one processor, a memory and a DMA engine. The DMA engine includes a processor interface for interfacing with the at least one processor, DMA logic, a memory interface for interfacing with the memory, a DMA network interface for interfacing with the network, injection and reception byte counters, injection and reception FIFO metadata, and status registers and control registers. The injection FIFOs maintain memory locations of the injection FIFO metadata memory locations including its current head and tail, and the reception FIFOs maintain the reception FIFO metadata memory locations including its current head and tail. The injection byte counters and reception byte counters may be shared between messages.
Division of attention as a function of the number of steps, visual shifts, and memory load

NASA Technical Reports Server (NTRS)

Chechile, R. A.; Butler, K.; Gutowski, W.; Palmer, E. A.

1986-01-01

The effects on divided attention of visual shifts and long-term memory retrieval during a monitoring task are considered. A concurrent vigilance task was standardized under all experimental conditions. The results show that subjects can perform nearly perfectly on all of the time-shared tasks if long-term memory retrieval is not required for monitoring. With the requirement of memory retrieval, however, there was a large decrease in accuracy for all of the time-shared activities. It was concluded that the attentional demand of longterm memory retrieval is appreciable (even for a well-learned motor sequence), and thus memory retrieval results in a sizable reduction in the capability of subjects to divide their attention. A selected bibliography on the divided attention literature is provided.
Personal semantics: Is it distinct from episodic and semantic memory? An electrophysiological study of memory for autobiographical facts and repeated events in honor of Shlomo Bentin.

PubMed

Renoult, Louis; Tanguay, Annick; Beaudry, Myriam; Tavakoli, Paniz; Rabipour, Sheida; Campbell, Kenneth; Moscovitch, Morris; Levine, Brian; Davidson, Patrick S R

2016-03-01

Declarative memory is thought to consist of two independent systems: episodic and semantic. Episodic memory represents personal and contextually unique events, while semantic memory represents culturally-shared, acontextual factual knowledge. Personal semantics refers to aspects of declarative memory that appear to fall somewhere in between the extremes of episodic and semantic. Examples include autobiographical knowledge and memories of repeated personal events. These two aspects of personal semantics have been studied little and rarely compared to both semantic and episodic memory. We recorded the event-related potentials (ERPs) of 27 healthy participants while they verified the veracity of sentences probing four types of questions: general (i.e., semantic) facts, autobiographical facts, repeated events, and unique (i.e., episodic) events. Behavioral results showed equivalent reaction times in all 4 conditions. True sentences were verified faster than false sentences, except for unique events for which no significant difference was observed. Electrophysiological results showed that the N400 (which is classically associated with retrieval from semantic memory) was maximal for general facts and the LPC (which is classically associated with retrieval from episodic memory) was maximal for unique events. For both ERP components, the two personal semantic conditions (i.e., autobiographical facts and repeated events) systematically differed from semantic memory. In addition, N400 amplitudes also differentiated autobiographical facts from unique events. Autobiographical facts and repeated events did not differ significantly from each other but their corresponding scalp distributions differed from those associated with general facts. Our results suggest that the neural correlates of personal semantics can be distinguished from those of semantic and episodic memory, and may provide clues as to how unique events are transformed to semantic memory. Copyright © 2015 Elsevier Ltd. All rights reserved.
Kanerva's sparse distributed memory: An associative memory algorithm well-suited to the Connection Machine

NASA Technical Reports Server (NTRS)

Rogers, David

1988-01-01

The advent of the Connection Machine profoundly changes the world of supercomputers. The highly nontraditional architecture makes possible the exploration of algorithms that were impractical for standard Von Neumann architectures. Sparse distributed memory (SDM) is an example of such an algorithm. Sparse distributed memory is a particularly simple and elegant formulation for an associative memory. The foundations for sparse distributed memory are described, and some simple examples of using the memory are presented. The relationship of sparse distributed memory to three important computational systems is shown: random-access memory, neural networks, and the cerebellum of the brain. Finally, the implementation of the algorithm for sparse distributed memory on the Connection Machine is discussed.
Welcoming nora: a family event.

PubMed

Walsh, Allison J; Walsh, Paul R; Walsh, Jane M; Walsh, Gavin T

2011-01-01

In this column, Allison and Paul Walsh share the story of the birth of Nora, their third baby and their second child to be born at home. Allison and Paul share their individual memories of labor and birth. But their story is only part of the story of Nora's birth. Nora's birth was a family event, with Allison and Paul's other children very much part of the experience. Jane and Gavin share their own memories of their baby sister's birth.
Fast distributed large-pixel-count hologram computation using a GPU cluster.

PubMed

Pan, Yuechao; Xu, Xuewu; Liang, Xinan

2013-09-10

Large-pixel-count holograms are one essential part for big size holographic three-dimensional (3D) display, but the generation of such holograms is computationally demanding. In order to address this issue, we have built a graphics processing unit (GPU) cluster with 32.5 Tflop/s computing power and implemented distributed hologram computation on it with speed improvement techniques, such as shared memory on GPU, GPU level adaptive load balancing, and node level load distribution. Using these speed improvement techniques on the GPU cluster, we have achieved 71.4 times computation speed increase for 186M-pixel holograms. Furthermore, we have used the approaches of diffraction limits and subdivision of holograms to overcome the GPU memory limit in computing large-pixel-count holograms. 745M-pixel and 1.80G-pixel holograms were computed in 343 and 3326 s, respectively, for more than 2 million object points with RGB colors. Color 3D objects with 1.02M points were successfully reconstructed from 186M-pixel hologram computed in 8.82 s with all the above three speed improvement techniques. It is shown that distributed hologram computation using a GPU cluster is a promising approach to increase the computation speed of large-pixel-count holograms for large size holographic display.
Colouring in the Blanks: Memory Drawings of the 1990 Kuwait Invasion

ERIC Educational Resources Information Center

Pepin-Wakefield, Yvonne

2009-01-01

This study used drawing tasks to examine the similarities and differences between females and males who shared a collective traumatic event in early childhood. Could these childhood memories be recorded, measured, and compared for gender differences in drawings by young adults who had shared a similar experience as children? Exploration of this…
Functions of Memory Sharing and Mother-Child Reminiscing Behaviors: Individual and Cultural Variations

ERIC Educational Resources Information Center

Kulkofsky, Sarah; Wang, Qi; Koh, Jessie Bee Kim

2009-01-01

This study examined maternal beliefs about the functions of memory sharing and the relations between these beliefs and mother-child reminiscing behaviors in a cross-cultural context. Sixty-three European American and 47 Chinese mothers completed an open-ended questionnaire concerning their beliefs about the functions of parent-child memory…
Unleashing the Power of Distributed CPU/GPU Architectures: Massive Astronomical Data Analysis and Visualization Case Study

NASA Astrophysics Data System (ADS)

Hassan, A. H.; Fluke, C. J.; Barnes, D. G.

2012-09-01

Upcoming and future astronomy research facilities will systematically generate terabyte-sized data sets moving astronomy into the Petascale data era. While such facilities will provide astronomers with unprecedented levels of accuracy and coverage, the increases in dataset size and dimensionality will pose serious computational challenges for many current astronomy data analysis and visualization tools. With such data sizes, even simple data analysis tasks (e.g. calculating a histogram or computing data minimum/maximum) may not be achievable without access to a supercomputing facility. To effectively handle such dataset sizes, which exceed today's single machine memory and processing limits, we present a framework that exploits the distributed power of GPUs and many-core CPUs, with a goal of providing data analysis and visualizing tasks as a service for astronomers. By mixing shared and distributed memory architectures, our framework effectively utilizes the underlying hardware infrastructure handling both batched and real-time data analysis and visualization tasks. Offering such functionality as a service in a “software as a service” manner will reduce the total cost of ownership, provide an easy to use tool to the wider astronomical community, and enable a more optimized utilization of the underlying hardware infrastructure.
Stillbirth and stigma: the spoiling and repair of multiple social identities.

PubMed

Brierley-Jones, Lyn; Crawley, Rosalind; Lomax, Samantha; Ayers, Susan

This study investigated mothers' experiences surrounding stillbirth in the United Kingdom, their memory making and sharing opportunities, and the effect these opportunities had on them. Qualitative data were generated from free text responses to open-ended questions. Thematic content analysis revealed that "stigma" was experienced by most women and Goffman's (1963) work on stigma was subsequently used as an analytical framework. Results suggest that stillbirth can spoil the identities of "patient," "mother," and "full citizen." Stigma was reported as arising from interactions with professionals, family, friends, work colleagues, and even casual acquaintances. Stillbirth produces common learning experiences often requiring "identity work" (Murphy, 2012). Memory making and sharing may be important in this work and further research is needed. Stigma can reduce the memory sharing opportunities for women after stillbirth and this may explain some of the differential mental health effects of memory making after stillbirth that is documented in the literature.
Keep your bias to yourself: How deliberating with differently biased others affects mock-jurors' guilt decisions, perceptions of the defendant, memories, and evidence interpretation.

PubMed

Ruva, Christine L; Guenther, Christina C

2017-10-01

This experiment explored how mock-jurors' (N = 648) guilt decisions, perceptions of the defendant, memories, and evidence interpretation varied as a function of jury type and pretrial publicity (PTP); utilizing a 2 (jury type: pure-PTP vs. mixed-PTP) × 3 (PTP: defendant, victim, and irrelevant) factorial design. Mock-juries (N = 126) were composed of jurors exposed to the same type of PTP (pure-PTP; e.g., defendant-PTP) or different types of PTP (mixed-PTP; e.g., half exposed to defendant-PTP and half to irrelevant-PTP). Before deliberations jurors exposed to defendant-PTP were most likely to vote guilty; while those exposed to victim-PTP were least likely. After deliberations, jury type and PTP affected jurors' guilt decisions. Specifically, jurors deliberating on pure-PTP juries had verdict distributions that closely resembled the predeliberation distributions. The verdict distributions of jurors on mixed-PTP juries suggested that jurors were influenced by those they deliberated with. Jurors not exposed to PTP appeared to incorporate bias from PTP-exposed jurors. Only PTP had significant effects on postdeliberation measures of memory and evidence interpretation. Mediation analyses revealed that evidence interpretation and defendant credibility assessments mediated the effect of PTP on guilt ratings. Taken together these findings suggest that during deliberations PTP bias can spread to jurors not previously exposed to PTP. In addition, juries composed of jurors exposed to different PTP slants, as opposed to a single PTP slant, can result in less biased decisions. Finally, deliberating with others who do not share similar biases may have little, if any, impact on biased evidence interpretation or memory errors. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
A simple modern correctness condition for a space-based high-performance multiprocessor

NASA Technical Reports Server (NTRS)

Probst, David K.; Li, Hon F.

1992-01-01

A number of U.S. national programs, including space-based detection of ballistic missile launches, envisage putting significant computing power into space. Given sufficient progress in low-power VLSI, multichip-module packaging and liquid-cooling technologies, we will see design of high-performance multiprocessors for individual satellites. In very high speed implementations, performance depends critically on tolerating large latencies in interprocessor communication; without latency tolerance, performance is limited by the vastly differing time scales in processor and data-memory modules, including interconnect times. The modern approach to tolerating remote-communication cost in scalable, shared-memory multiprocessors is to use a multithreaded architecture, and alter the semantics of shared memory slightly, at the price of forcing the programmer either to reason about program correctness in a relaxed consistency model or to agree to program in a constrained style. The literature on multiprocessor correctness conditions has become increasingly complex, and sometimes confusing, which may hinder its practical application. We propose a simple modern correctness condition for a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and the parallel programming system.
Brain Information Sharing During Visual Short-Term Memory Binding Yields a Memory Biomarker for Familial Alzheimer's Disease.

PubMed

Parra, Mario A; Mikulan, Ezequiel; Trujillo, Natalia; Sala, Sergio Della; Lopera, Francisco; Manes, Facundo; Starr, John; Ibanez, Agustin

2017-01-01

Alzheimer's disease (AD) as a disconnection syndrome which disrupts both brain information sharing and memory binding functions. The extent to which these two phenotypic expressions share pathophysiological mechanisms remains unknown. To unveil the electrophysiological correlates of integrative memory impairments in AD towards new memory biomarkers for its prodromal stages. Patients with 100% risk of familial AD (FAD) and healthy controls underwent assessment with the Visual Short-Term Memory binding test (VSTMBT) while we recorded their EEG. We applied a novel brain connectivity method (Weighted Symbolic Mutual Information) to EEG data. Patients showed significant deficits during the VSTMBT. A reduction of brain connectivity was observed during resting as well as during correct VSTM binding, particularly over frontal and posterior regions. An increase of connectivity was found during VSTM binding performance over central regions. While decreased connectivity was found in cases in more advanced stages of FAD, increased brain connectivity appeared in cases in earlier stages. Such altered patterns of task-related connectivity were found in 89% of the assessed patients. VSTM binding in the prodromal stages of FAD are associated to altered patterns of brain connectivity thus confirming the link between integrative memory deficits and impaired brain information sharing in prodromal FAD. While significant loss of brain connectivity seems to be a feature of the advanced stages of FAD increased brain connectivity characterizes its earlier stages. These findings are discussed in the light of recent proposals about the earliest pathophysiological mechanisms of AD and their clinical expression. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
The N400 reveals how personal semantics is processed: Insights into the nature and organization of self-knowledge

PubMed Central

Federmeier, Kara D.

2017-01-01

There is growing recognition that some important forms of long-term memory are difficult to classify into one of the well-studied memory subtypes. One example is personal semantics. Like the episodes that are stored as part of one’s autobiography, personal semantics is linked to an individual, yet, like general semantic memory, it is detached from a specific encoding context. Access to general semantics elicits an electrophysiological response known as the N400, which has been characterized across three decades of research; surprisingly, this response has not been fully examined in the context of personal semantics. In this study, we assessed responses to congruent and incongruent statements about people’s own, personal preferences. We found that access to personal preferences elicited N400 responses, with congruency effects that were similar in latency and distribution to those for general semantic statements elicited from the same participants. These results suggest that the processing of personal and general semantics share important functional and neurobiological features. PMID:26825011
Proceedings of the second SISAL users` conference

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feo, J T; Frerking, C; Miller, P J

1992-12-01

This report contains papers on the following topics: A sisal code for computing the fourier transform on S{sub N}; five ways to fill your knapsack; simulating material dislocation motion in sisal; candis as an interface for sisal; parallelisation and performance of the burg algorithm on a shared-memory multiprocessor; use of genetic algorithm in sisal to solve the file design problem; implementing FFT`s in sisal; programming and evaluating the performance of signal processing applications in the sisal programming environment; sisal and Von Neumann-based languages: translation and intercommunication; an IF2 code generator for ADAM architecture; program partitioning for NUMA multiprocessor computer systems;more » mapping functional parallelism on distributed memory machines; implicit array copying: prevention is better than cure ; mathematical syntax for sisal; an approach for optimizing recursive functions; implementing arrays in sisal 2.0; Fol: an object oriented extension to the sisal language; twine: a portable, extensible sisal execution kernel; and investigating the memory performance of the optimizing sisal compiler.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel W.

Coupled-cluster methods provide highly accurate models of molecular structure by explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix-matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy efficient manner. We achieve up to 240 speedup compared with the best optimized shared memory implementation. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures, (Cray XC30&XC40, BlueGene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance. Nevertheless, we preserve a uni ed interface to both programming models to maintain the productivity of computational quantum chemists.« less
Over-Distribution in Source Memory

PubMed Central

Brainerd, C. J.; Reyna, V. F.; Holliday, R. E.; Nakamura, K.

2012-01-01

Semantic false memories are confounded with a second type of error, over-distribution, in which items are attributed to contradictory episodic states. Over-distribution errors have proved to be more common than false memories when the two are disentangled. We investigated whether over-distribution is prevalent in another classic false memory paradigm: source monitoring. It is. Conventional false memory responses (source misattributions) were predominantly over-distribution errors, but unlike semantic false memory, over-distribution also accounted for more than half of true memory responses (correct source attributions). Experimental control of over-distribution was achieved via a series of manipulations that affected either recollection of contextual details or item memory (concreteness, frequency, list-order, number of presentation contexts, and individual differences in verbatim memory). A theoretical model was used to analyze the data (conjoint process dissociation) that predicts that predicts that (a) over-distribution is directly proportional to item memory but inversely proportional to recollection and (b) item memory is not a necessary precondition for recollection of contextual details. The results were consistent with both predictions. PMID:21942494
Decomposing the relationship between cognitive functioning and self-referent memory beliefs in older adulthood: What’s memory got to do with it?

PubMed Central

Payne, Brennan R.; Gross, Alden L.; Hill, Patrick L.; Parisi, Jeanine M.; Rebok, George W.; Stine-Morrow, Elizabeth A. L.

2018-01-01

With advancing age, episodic memory performance shows marked declines along with concurrent reports of lower subjective memory beliefs. Given that normative age-related declines in episodic memory co-occur with declines in other cognitive domains, we examined the relationship between memory beliefs and multiple domains of cognitive functioning. Confirmatory bi-factor structural equation models were used to parse the shared and independent variance among factors representing episodic memory, psychomotor speed, and executive reasoning in one large cohort study (Senior Odyssey, N = 462), and replicated using another large cohort of healthy older adults (ACTIVE, N = 2,802). Accounting for a general fluid cognitive functioning factor (comprised of the shared variance among measures of episodic memory, speed, and reasoning) attenuated the relationship between objective memory performance and subjective memory beliefs in both samples. Moreover, the general cognitive functioning factor was the strongest predictor of memory beliefs in both samples. These findings are consistent with the notion that dispositional memory beliefs may reflect perceptions of cognition more broadly. This may be one reason why memory beliefs have broad predictive validity for interventions that target fluid cognitive ability. PMID:27685541

Decomposing the relationship between cognitive functioning and self-referent memory beliefs in older adulthood: what's memory got to do with it?

PubMed

Payne, Brennan R; Gross, Alden L; Hill, Patrick L; Parisi, Jeanine M; Rebok, George W; Stine-Morrow, Elizabeth A L

2017-07-01

With advancing age, episodic memory performance shows marked declines along with concurrent reports of lower subjective memory beliefs. Given that normative age-related declines in episodic memory co-occur with declines in other cognitive domains, we examined the relationship between memory beliefs and multiple domains of cognitive functioning. Confirmatory bi-factor structural equation models were used to parse the shared and independent variance among factors representing episodic memory, psychomotor speed, and executive reasoning in one large cohort study (Senior Odyssey, N = 462), and replicated using another large cohort of healthy older adults (ACTIVE, N = 2802). Accounting for a general fluid cognitive functioning factor (comprised of the shared variance among measures of episodic memory, speed, and reasoning) attenuated the relationship between objective memory performance and subjective memory beliefs in both samples. Moreover, the general cognitive functioning factor was the strongest predictor of memory beliefs in both samples. These findings are consistent with the notion that dispositional memory beliefs may reflect perceptions of cognition more broadly. This may be one reason why memory beliefs have broad predictive validity for interventions that target fluid cognitive ability.
Parallel, Distributed Scripting with Python

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, P J

2002-05-24

Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI librarymore » gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.« less
Common neural substrates for visual working memory and attention.

PubMed

Mayer, Jutta S; Bittner, Robert A; Nikolić, Danko; Bledowski, Christoph; Goebel, Rainer; Linden, David E J

2007-06-01

Humans are severely limited in their ability to memorize visual information over short periods of time. Selective attention has been implicated as a limiting factor. Here we used functional magnetic resonance imaging to test the hypothesis that this limitation is due to common neural resources shared by visual working memory (WM) and selective attention. We combined visual search and delayed discrimination of complex objects and independently modulated the demands on selective attention and WM encoding. Participants were presented with a search array and performed easy or difficult visual search in order to encode one or three complex objects into visual WM. Overlapping activation for attention-demanding visual search and WM encoding was observed in distributed posterior and frontal regions. In the right prefrontal cortex and bilateral insula blood oxygen-level-dependent activation additively increased with increased WM load and attentional demand. Conversely, several visual, parietal and premotor areas showed overlapping activation for the two task components and were severely reduced in their WM load response under the condition with high attentional demand. Regions in the left prefrontal cortex were selectively responsive to WM load. Areas selectively responsive to high attentional demand were found within the right prefrontal and bilateral occipital cortex. These results indicate that encoding into visual WM and visual selective attention require to a high degree access to common neural resources. We propose that competition for resources shared by visual attention and WM encoding can limit processing capabilities in distributed posterior brain regions.
Method for prefetching non-contiguous data structures

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Brewster, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

2009-05-05

A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.
A shared resource between declarative memory and motor memory.

PubMed

Keisler, Aysha; Shadmehr, Reza

2010-11-03

The neural systems that support motor adaptation in humans are thought to be distinct from those that support the declarative system. Yet, during motor adaptation changes in motor commands are supported by a fast adaptive process that has important properties (rapid learning, fast decay) that are usually associated with the declarative system. The fast process can be contrasted to a slow adaptive process that also supports motor memory, but learns gradually and shows resistance to forgetting. Here we show that after people stop performing a motor task, the fast motor memory can be disrupted by a task that engages declarative memory, but the slow motor memory is immune from this interference. Furthermore, we find that the fast/declarative component plays a major role in the consolidation of the slow motor memory. Because of the competitive nature of declarative and nondeclarative memory during consolidation, impairment of the fast/declarative component leads to improvements in the slow/nondeclarative component. Therefore, the fast process that supports formation of motor memory is not only neurally distinct from the slow process, but it shares critical resources with the declarative memory system.
A shared resource between declarative memory and motor memory

PubMed Central

Keisler, Aysha; Shadmehr, Reza

2010-01-01

The neural systems that support motor adaptation in humans are thought to be distinct from those that support the declarative system. Yet, during motor adaptation changes in motor commands are supported by a fast adaptive process that has important properties (rapid learning, fast decay) that are usually associated with the declarative system. The fast process can be contrasted to a slow adaptive process that also supports motor memory, but learns gradually and shows resistance to forgetting. Here we show that after people stop performing a motor task, the fast motor memory can be disrupted by a task that engages declarative memory, but the slow motor memory is immune from this interference. Furthermore, we find that the fast/declarative component plays a major role in the consolidation of the slow motor memory. Because of the competitive nature of declarative and non-declarative memory during consolidation, impairment of the fast/declarative component leads to improvements in the slow/non-declarative component. Therefore, the fast process that supports formation of motor memory is not only neurally distinct from the slow process, but it shares critical resources with the declarative memory system. PMID:21048140
NAS Parallel Benchmark. Results 11-96: Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks. 1.0

NASA Technical Reports Server (NTRS)

Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.
Shared Representations in Language Processing and Verbal Short-Term Memory: The Case of Grammatical Gender

ERIC Educational Resources Information Center

Schweppe, Judith; Rummer, Ralf

2007-01-01

The general idea of language-based accounts of short-term memory is that retention of linguistic materials is based on representations within the language processing system. In the present sentence recall study, we address the question whether the assumption of shared representations holds for morphosyntactic information (here: grammatical gender…
The Precategorical Nature of Visual Short-Term Memory

ERIC Educational Resources Information Center

Quinlan, Philip T.; Cohen, Dale J.

2016-01-01

We conducted a series of recognition experiments that assessed whether visual short-term memory (VSTM) is sensitive to shared category membership of to-be-remembered (tbr) images of common objects. In Experiment 1 some of the tbr items shared the same basic level category (e.g., hand axe): Such items were no better retained than others. In the…
Fault tolerant onboard packet switch architecture for communication satellites: Shared memory per beam approach

NASA Technical Reports Server (NTRS)

Shalkhauser, Mary JO; Quintana, Jorge A.; Soni, Nitin J.

1994-01-01

The NASA Lewis Research Center is developing a multichannel communication signal processing satellite (MCSPS) system which will provide low data rate, direct to user, commercial communications services. The focus of current space segment developments is a flexible, high-throughput, fault tolerant onboard information switching processor. This information switching processor (ISP) is a destination-directed packet switch which performs both space and time switching to route user information among numerous user ground terminals. Through both industry study contracts and in-house investigations, several packet switching architectures were examined. A contention-free approach, the shared memory per beam architecture, was selected for implementation. The shared memory per beam architecture, fault tolerance insertion, implementation, and demonstration plans are described.
The performance of disk arrays in shared-memory database machines

NASA Technical Reports Server (NTRS)

Katz, Randy H.; Hong, Wei

1993-01-01

In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metric data temperature as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley.
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems

DOE PAGES

Song, Fengguang; Dongarra, Jack

2014-10-01

Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU-GPU systems to solve dense linear algebra problems, in this paper we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasksmore » without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Finally, our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs.« less
Parallel Implementation of Triangular Cellular Automata for Computing Two-Dimensional Elastodynamic Response on Arbitrary Domains

NASA Astrophysics Data System (ADS)

Leamy, Michael J.; Springer, Adam C.

In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Song, Fengguang; Dongarra, Jack

Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU-GPU systems to solve dense linear algebra problems, in this paper we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasksmore » without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Finally, our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs.« less
Optical memories in digital computing

NASA Technical Reports Server (NTRS)

Alford, C. O.; Gaylord, T. K.

1979-01-01

High capacity optical memories with relatively-high data-transfer rate and multiport simultaneous access capability may serve as basis for new computer architectures. Several computer structures that might profitably use memories are: a) simultaneous record-access system, b) simultaneously-shared memory computer system, and c) parallel digital processing structure.
NavP: Structured and Multithreaded Distributed Parallel Programming

NASA Technical Reports Server (NTRS)

Pan, Lei

2007-01-01

We present Navigational Programming (NavP) -- a distributed parallel programming methodology based on the principles of migrating computations and multithreading. The four major steps of NavP are: (1) Distribute the data using the data communication pattern in a given algorithm; (2) Insert navigational commands for the computation to migrate and follow large-sized distributed data; (3) Cut the sequential migrating thread and construct a mobile pipeline; and (4) Loop back for refinement. NavP is significantly different from the current prevailing Message Passing (MP) approach. The advantages of NavP include: (1) NavP is structured distributed programming and it does not change the code structure of an original algorithm. This is in sharp contrast to MP as MP implementations in general do not resemble the original sequential code; (2) NavP implementations are always competitive with the best MPI implementations in terms of performance. Approaches such as DSM or HPF have failed to deliver satisfying performance as of today in contrast, even if they are relatively easy to use compared to MP; (3) NavP provides incremental parallelization, which is beyond the reach of MP; and (4) NavP is a unifying approach that allows us to exploit both fine- (multithreading on shared memory) and coarse- (pipelined tasks on distributed memory) grained parallelism. This is in contrast to the currently popular hybrid use of MP+OpenMP, which is known to be complex to use. We present experimental results that demonstrate the effectiveness of NavP.
Reader set encoding for directory of shared cache memory in multiprocessor system

DOEpatents

Ahn, Dnaiel; Ceze, Luis H.; Gara, Alan; Ohmacht, Martin; Xiaotong, Zhuang

2014-06-10

In a parallel processing system with speculative execution, conflict checking occurs in a directory lookup of a cache memory that is shared by all processors. In each case, the same physical memory address will map to the same set of that cache, no matter which processor originated that access. The directory includes a dynamic reader set encoding, indicating what speculative threads have read a particular line. This reader set encoding is used in conflict checking. A bitset encoding is used to specify particular threads that have read the line.
Semantic graphs and associative memories

NASA Astrophysics Data System (ADS)

Pomi, Andrés; Mizraji, Eduardo

2004-12-01

Graphs have been increasingly utilized in the characterization of complex networks from diverse origins, including different kinds of semantic networks. Human memories are associative and are known to support complex semantic nets; these nets are represented by graphs. However, it is not known how the brain can sustain these semantic graphs. The vision of cognitive brain activities, shown by modern functional imaging techniques, assigns renewed value to classical distributed associative memory models. Here we show that these neural network models, also known as correlation matrix memories, naturally support a graph representation of the stored semantic structure. We demonstrate that the adjacency matrix of this graph of associations is just the memory coded with the standard basis of the concept vector space, and that the spectrum of the graph is a code invariant of the memory. As long as the assumptions of the model remain valid this result provides a practical method to predict and modify the evolution of the cognitive dynamics. Also, it could provide us with a way to comprehend how individual brains that map the external reality, almost surely with different particular vector representations, are nevertheless able to communicate and share a common knowledge of the world. We finish presenting adaptive association graphs, an extension of the model that makes use of the tensor product, which provides a solution to the known problem of branching in semantic nets.
Insights on consciousness from taste memory research.

PubMed

Gallo, Milagros

2016-01-01

Taste research in rodents supports the relevance of memory in order to determine the content of consciousness by modifying both taste perception and later action. Associated with this issue is the fact that taste and visual modalities share anatomical circuits traditionally related to conscious memory. This challenges the view of taste memory as a type of non-declarative unconscious memory.
Expressing Parallelism with ROOT

NASA Astrophysics Data System (ADS)

Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

2017-10-01

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

Expressing Parallelism with ROOT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Piparo, D.; Tejedor, E.; Guiraud, E.

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module inmore » Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.« less
Flexible Language Constructs for Large Parallel Programs

DOE PAGES

Rosing, Matt; Schnabel, Robert

1994-01-01

The goal of the research described in this article is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (multiple instruction multiple data [MIMD]) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include single instruction multiple data (SIMD), single program multiple data (SPMD), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression ofmore » the variety of algorithms that occur in large scientific computations. In this article, we give an overview of a new language that combines many of these programming models in a clean manner. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. In this article, we give an overview of the language and discuss some of the critical implementation details.« less
Investigating Ground Swarm Robotics Using Agent Based Simulation

DTIC Science & Technology

2006-12-01

Incorporation of virtual pheromones as a shared memory map is modeled as an additional capability that is found to enhance the robustness and reliability of the...virtual pheromones as a shared memory map is modeled as an additional capability that is found to enhance the robustness and reliability of the swarm... PHEROMONES .......................................... 42 1. Repel Friends under Inorganic SA.................................................. 45 2. Max
STEMsalabim: A high-performance computing cluster friendly code for scanning transmission electron microscopy image simulations of thin specimens.

PubMed

Oelerich, Jan Oliver; Duschek, Lennart; Belz, Jürgen; Beyer, Andreas; Baranovskii, Sergei D; Volz, Kerstin

2017-06-01

We present a new multislice code for the computer simulation of scanning transmission electron microscope (STEM) images based on the frozen lattice approximation. Unlike existing software packages, the code is optimized to perform well on highly parallelized computing clusters, combining distributed and shared memory architectures. This enables efficient calculation of large lateral scanning areas of the specimen within the frozen lattice approximation and fine-grained sweeps of parameter space. Copyright © 2017 Elsevier B.V. All rights reserved.
A Parallel Rendering Algorithm for MIMD Architectures

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.; Orloff, Tobias

1991-01-01

Applications such as animation and scientific visualization demand high performance rendering of complex three dimensional scenes. To deliver the necessary rendering rates, highly parallel hardware architectures are required. The challenge is then to design algorithms and software which effectively use the hardware parallelism. A rendering algorithm targeted to distributed memory MIMD architectures is described. For maximum performance, the algorithm exploits both object-level and pixel-level parallelism. The behavior of the algorithm is examined both analytically and experimentally. Its performance for large numbers of processors is found to be limited primarily by communication overheads. An experimental implementation for the Intel iPSC/860 shows increasing performance from 1 to 128 processors across a wide range of scene complexities. It is shown that minimal modifications to the algorithm will adapt it for use on shared memory architectures as well.
Efficient diagonalization of the sparse matrices produced within the framework of the UK R-matrix molecular codes

NASA Astrophysics Data System (ADS)

Galiatsatos, P. G.; Tennyson, J.

2012-11-01

The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.
A parallel implementation of a multisensor feature-based range-estimation method

NASA Technical Reports Server (NTRS)

Suorsa, Raymond E.; Sridhar, Banavar

1993-01-01

There are many proposed vision based methods to perform obstacle detection and avoidance for autonomous or semi-autonomous vehicles. All methods, however, will require very high processing rates to achieve real time performance. A system capable of supporting autonomous helicopter navigation will need to extract obstacle information from imagery at rates varying from ten frames per second to thirty or more frames per second depending on the vehicle speed. Such a system will need to sustain billions of operations per second. To reach such high processing rates using current technology, a parallel implementation of the obstacle detection/ranging method is required. This paper describes an efficient and flexible parallel implementation of a multisensor feature-based range-estimation algorithm, targeted for helicopter flight, realized on both a distributed-memory and shared-memory parallel computer.
The role of similarity in updating numerical information in working memory: decomposing the numerical distance effect.

PubMed

Lendínez, Cristina; Pelegrina, Santiago; Lechuga, M Teresa

2014-01-01

The present study investigates the process of updating representations in working memory (WM) and how similarity between the information involved influences this process. In WM updating tasks, the similarity in terms of numerical distance between the number to be substituted and the new one facilitates the updating process. We aimed to disentangle the possible effect of two dimensions of similarity that may contribute to this numerical effect: numerical distance itself and common digits shared between the numbers involved. Three experiments were conducted in which different ranges of distances and the coincidence between the digits of the two numbers involved in updating were manipulated. Results showed that the two dimensions of similarity had an effect on updating times. The greater the similarity between the information maintained in memory and the new information that substituted it, the faster the updating. This is consistent both with the idea of distributed representations based on features, and with a selective updating process based on a feature overwriting mechanism. Thus, updating in WM can be understood as a selective substitution process influenced by similarity in which only certain parts of the representation stored in memory are changed.
Centrally managed unified shared virtual address space

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wilkes, John

Systems, apparatuses, and methods for managing a unified shared virtual address space. A host may execute system software and manage a plurality of nodes coupled to the host. The host may send work tasks to the nodes, and for each node, the host may externally manage the node's view of the system's virtual address space. Each node may have a central processing unit (CPU) style memory management unit (MMU) with an internal translation lookaside buffer (TLB). In one embodiment, the host may be coupled to a given node via an input/output memory management unit (IOMMU) interface, where the IOMMU frontendmore » interface shares the TLB with the given node's MMU. In another embodiment, the host may control the given node's view of virtual address space via memory-mapped control registers.« less
Attention and Visuospatial Working Memory Share the Same Processing Resources

PubMed Central

Feng, Jing; Pratt, Jay; Spence, Ian

2012-01-01

Attention and visuospatial working memory (VWM) share very similar characteristics; both have the same upper bound of about four items in capacity and they recruit overlapping brain regions. We examined whether both attention and VWM share the same processing resources using a novel dual-task costs approach based on a load-varying dual-task technique. With sufficiently large loads on attention and VWM, considerable interference between the two processes was observed. A further load increase on either process produced reciprocal increases in interference on both processes, indicating that attention and VWM share common resources. More critically, comparison among four experiments on the reciprocal interference effects, as measured by the dual-task costs, demonstrates no significant contribution from additional processing other than the shared processes. These results support the notion that attention and VWM share the same processing resources. PMID:22529826
System and method for memory allocation in a multiclass memory system

DOEpatents

Loh, Gabriel; Meswani, Mitesh; Ignatowski, Michael; Nutter, Mark

2016-06-28

A system for memory allocation in a multiclass memory system includes a processor coupleable to a plurality of memories sharing a unified memory address space, and a library store to store a library of software functions. The processor identifies a type of a data structure in response to a memory allocation function call to the library for allocating memory to the data structure. Using the library, the processor allocates portions of the data structure among multiple memories of the multiclass memory system based on the type of the data structure.
A Formal Model of Capacity Limits in Working Memory

ERIC Educational Resources Information Center

Oberauer, Klaus; Kliegl, Reinhold

2006-01-01

A mathematical model of working-memory capacity limits is proposed on the key assumption of mutual interference between items in working memory. Interference is assumed to arise from overwriting of features shared by these items. The model was fit to time-accuracy data of memory-updating tasks from four experiments using nonlinear mixed effect…
Ordering of guarded and unguarded stores for no-sync I/O

DOEpatents

Gara, Alan; Ohmacht, Martin

2013-06-25

A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.
Explicit time integration of finite element models on a vectorized, concurrent computer with shared memory

NASA Technical Reports Server (NTRS)

Gilbertsen, Noreen D.; Belytschko, Ted

1990-01-01

The implementation of a nonlinear explicit program on a vectorized, concurrent computer with shared memory is described and studied. The conflict between vectorization and concurrency is described and some guidelines are given for optimal block sizes. Several example problems are summarized to illustrate the types of speed-ups which can be achieved by reprogramming as compared to compiler optimization.
LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kurzak, Jakub; Luszczek, Pitior; Faverge, Mathieu

2012-03-01

LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance LINPACK benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
Neural Mechanisms of Interference Control Underlie the Relationship between Fluid Intelligence and Working Memory Span

ERIC Educational Resources Information Center

Burgess, Gregory C.; Gray, Jeremy R.; Conway, Andrew R. A.; Braver, Todd S.

2011-01-01

Fluid intelligence (gF) and working memory (WM) span predict success in demanding cognitive situations. Recent studies show that much of the variance in gF and WM span is shared, suggesting common neural mechanisms. This study provides a direct investigation of the degree to which shared variance in gF and WM span can be explained by neural…
Support of Multidimensional Parallelism in the OpenMP Programming Model

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele

2003-01-01

OpenMP is the current standard for shared-memory programming. While providing ease of parallel programming, the OpenMP programming model also has limitations which often effect the scalability of applications. Examples for these limitations are work distribution and point-to-point synchronization among threads. We propose extensions to the OpenMP programming model which allow the user to easily distribute the work in multiple dimensions and synchronize the workflow among the threads. The proposed extensions include four new constructs and the associated runtime library. They do not require changes to the source code and can be implemented based on the existing OpenMP standard. We illustrate the concept in a prototype translator and test with benchmark codes and a cloud modeling code.
Audience-tuning effects on memory: the role of shared reality.

PubMed

Echterhoff, Gerald; Higgins, E Tory; Groll, Stephan

2005-09-01

After tuning to an audience, communicators' own memories for the topic often reflect the biased view expressed in their messages. Three studies examined explanations for this bias. Memories for a target person were biased when feedback signaled the audience's successful identification of the target but not after failed identification (Experiment 1). Whereas communicators tuning to an in-group audience exhibited the bias, communicators tuning to an out-group audience did not (Experiment 2). These differences did not depend on communicators' mood but were mediated by communicators' trust in their audience's judgment about other people (Experiments 2 and 3). Message and memory were more closely associated for high than for low trusters. Apparently, audience-tuning effects depend on the communicators' experience of a shared reality.
Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel

Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends

DOE PAGES

Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel; ...

2017-03-08

Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less

Examining age-related shared variance between face cognition, vision, and self-reported physical health: a test of the common cause hypothesis for social cognition

PubMed Central

Olderbak, Sally; Hildebrandt, Andrea; Wilhelm, Oliver

2015-01-01

The shared decline in cognitive abilities, sensory functions (e.g., vision and hearing), and physical health with increasing age is well documented with some research attributing this shared age-related decline to a single common cause (e.g., aging brain). We evaluate the extent to which the common cause hypothesis predicts associations between vision and physical health with social cognition abilities specifically face perception and face memory. Based on a sample of 443 adults (17–88 years old), we test a series of structural equation models, including Multiple Indicator Multiple Cause (MIMIC) models, and estimate the extent to which vision and self-reported physical health are related to face perception and face memory through a common factor, before and after controlling for their fluid cognitive component and the linear effects of age. Results suggest significant shared variance amongst these constructs, with a common factor explaining some, but not all, of the shared age-related variance. Also, we found that the relations of face perception, but not face memory, with vision and physical health could be completely explained by fluid cognition. Overall, results suggest that a single common cause explains most, but not all age-related shared variance with domain specific aging mechanisms evident. PMID:26321998
Solutions and debugging for data consistency in multiprocessors with noncoherent caches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bernstein, D.; Mendelson, B.; Breternitz, M. Jr.

1995-02-01

We analyze two important problems that arise in shared-memory multiprocessor systems. The stale data problem involves ensuring that data items in local memory of individual processors are current, independent of writes done by other processors. False sharing occurs when two processors have copies of the same shared data block but update different portions of the block. The false sharing problem involves guaranteeing that subsequent writes are properly combined. In modern architectures these problems are usually solved in hardware, by exploiting mechanisms for hardware controlled cache consistency. This leads to more expensive and nonscalable designs. Therefore, we are concentrating on softwaremore » methods for ensuring cache consistency that would allow for affordable and scalable multiprocessing systems. Unfortunately, providing software control is nontrivial, both for the compiler writer and for the application programmer. For this reason we are developing a debugging environment that will facilitate the development of compiler-based techniques and will help the programmer to tune his or her application using explicit cache management mechanisms. We extend the notion of a race condition for IBM Shared Memory System POWER/4, taking into consideration its noncoherent caches, and propose techniques for detection of false sharing problems. Identification of the stale data problem is discussed as well, and solutions are suggested.« less
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

NASA Technical Reports Server (NTRS)

Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
Static Memory Deduplication for Performance Optimization in Cloud Computing.

PubMed

Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan

2017-04-27

In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible.
Static Memory Deduplication for Performance Optimization in Cloud Computing

PubMed Central

Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan

2017-01-01

In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible. PMID:28448434
Scheduling for Locality in Shared-Memory Multiprocessors

DTIC Science & Technology

1993-05-01

Submitted in Partial Fulfillment of the Requirements for the Degree ’)iIC Q(JALfryT INSPECTED 5 DOCTOR OF PHILOSOPHY I Accesion For Supervised by NTIS CRAM... architecture on parallel program performance, explain the implications of this trend on popular parallel programming models, and propose system software to 0...decomoosition and scheduling algorithms. I. SUIUECT TERMS IS. NUMBER OF PAGES shared-memory multiprocessors; architecture trends; loop 110 scheduling
Advanced Development of Certified OS Kernels

DTIC Science & Technology

2015-06-01

It provides an infrastructure to map a physical page into multiple processes’ page maps in different address spaces. Their ownership mechanism ensures...of their shared memory infrastructure . Trap module The trap module specifies the behaviors of exception handlers and mCertiKOS system calls. In...layers), 1 pm for the shared memory infrastructure (3 layers), 3.5 pm for the thread management (10 layers), 1 pm for the process management (4 layers
6 DOF Nonlinear AUV Simulation Toolbox

DTIC Science & Technology

1997-01-01

is to supply a flexible 3D -simulation platform for motion visualization, in-lab debugging and testing of mission-specific strategies as well as those...Explorer are modular designed [Smith] in order to cut time and cost for vehicle recontlguration. A flexible 3D -simulation platform is desired to... 3D models. Current implemented modules include a nonlinear dynamic model for the OEX, shared memory and semaphore manager tools, shared memory monitor
A cache-aided multiprocessor rollback recovery scheme

NASA Technical Reports Server (NTRS)

Wu, Kun-Lung; Fuchs, W. Kent

1989-01-01

This paper demonstrates how previous uniprocessor cache-aided recovery schemes can be applied to multiprocessor architectures, for recovering from transient processor failures, utilizing private caches and a global shared memory. As with cache-aided uniprocessor recovery, the multiprocessor cache-aided recovery scheme of this paper can be easily integrated into standard bus-based snoopy cache coherence protocols. A consistent shared memory state is maintained without the necessity of global check-pointing.
Targeted Memory Reactivation during Sleep Adaptively Promotes the Strengthening or Weakening of Overlapping Memories.

PubMed

Oyarzún, Javiera P; Morís, Joaquín; Luque, David; de Diego-Balaguer, Ruth; Fuentemilla, Lluís

2017-08-09

System memory consolidation is conceptualized as an active process whereby newly encoded memory representations are strengthened through selective memory reactivation during sleep. However, our learning experience is highly overlapping in content (i.e., shares common elements), and memories of these events are organized in an intricate network of overlapping associated events. It remains to be explored whether and how selective memory reactivation during sleep has an impact on these overlapping memories acquired during awake time. Here, we test in a group of adult women and men the prediction that selective memory reactivation during sleep entails the reactivation of associated events and that this may lead the brain to adaptively regulate whether these associated memories are strengthened or pruned from memory networks on the basis of their relative associative strength with the shared element. Our findings demonstrate the existence of efficient regulatory neural mechanisms governing how complex memory networks are shaped during sleep as a function of their associative memory strength. SIGNIFICANCE STATEMENT Numerous studies have demonstrated that system memory consolidation is an active, selective, and sleep-dependent process in which only subsets of new memories become stabilized through their reactivation. However, the learning experience is highly overlapping in content and thus events are encoded in an intricate network of related memories. It remains to be explored whether and how memory reactivation has an impact on overlapping memories acquired during awake time. Here, we show that sleep memory reactivation promotes strengthening and weakening of overlapping memories based on their associative memory strength. These results suggest the existence of an efficient regulatory neural mechanism that avoids the formation of cluttered memory representation of multiple events and promotes stabilization of complex memory networks. Copyright © 2017 the authors 0270-6474/17/377748-11$15.00/0.
The N400 reveals how personal semantics is processed: Insights into the nature and organization of self-knowledge.

PubMed

Coronel, Jason C; Federmeier, Kara D

2016-04-01

There is growing recognition that some important forms of long-term memory are difficult to classify into one of the well-studied memory subtypes. One example is personal semantics. Like the episodes that are stored as part of one's autobiography, personal semantics is linked to an individual, yet, like general semantic memory, it is detached from a specific encoding context. Access to general semantics elicits an electrophysiological response known as the N400, which has been characterized across three decades of research; surprisingly, this response has not been fully examined in the context of personal semantics. In this study, we assessed responses to congruent and incongruent statements about people's own, personal preferences. We found that access to personal preferences elicited N400 responses, with congruency effects that were similar in latency and distribution to those for general semantic statements elicited from the same participants. These results suggest that the processing of personal and general semantics share important functional and neurobiological features. Copyright © 2016 Elsevier Ltd. All rights reserved.
Autobiographical memory functions of nostalgia in comparison to rumination and counterfactual thinking: similarity and uniqueness.

PubMed

Cheung, Wing-Yee; Wildschut, Tim; Sedikides, Constantine

2018-02-01

We compared and contrasted nostalgia with rumination and counterfactual thinking in terms of their autobiographical memory functions. Specifically, we assessed individual differences in nostalgia, rumination, and counterfactual thinking, which we then linked to self-reported functions or uses of autobiographical memory (Self-Regard, Boredom Reduction, Death Preparation, Intimacy Maintenance, Conversation, Teach/Inform, and Bitterness Revival). We tested which memory functions are shared and which are uniquely linked to nostalgia. The commonality among nostalgia, rumination, and counterfactual thinking resides in their shared positive associations with all memory functions: individuals who evinced a stronger propensity towards past-oriented thought (as manifested in nostalgia, rumination, and counterfactual thinking) reported greater overall recruitment of memories in the service of present functioning. The uniqueness of nostalgia resides in its comparatively strong positive associations with Intimacy Maintenance, Teach/Inform, and Self-Regard and weak association with Bitterness Revival. In all, nostalgia possesses a more positive functional signature than do rumination and counterfactual thinking.
Mnemonic convergence in social networks: The emergent properties of cognition at a collective level.

PubMed

Coman, Alin; Momennejad, Ida; Drach, Rae D; Geana, Andra

2016-07-19

The development of shared memories, beliefs, and norms is a fundamental characteristic of human communities. These emergent outcomes are thought to occur owing to a dynamic system of information sharing and memory updating, which fundamentally depends on communication. Here we report results on the formation of collective memories in laboratory-created communities. We manipulated conversational network structure in a series of real-time, computer-mediated interactions in fourteen 10-member communities. The results show that mnemonic convergence, measured as the degree of overlap among community members' memories, is influenced by both individual-level information-processing phenomena and by the conversational social network structure created during conversational recall. By studying laboratory-created social networks, we show how large-scale social phenomena (i.e., collective memory) can emerge out of microlevel local dynamics (i.e., mnemonic reinforcement and suppression effects). The social-interactionist approach proposed herein points to optimal strategies for spreading information in social networks and provides a framework for measuring and forging collective memories in communities of individuals.
Using memories to understand others: the role of episodic memory in theory of mind impairment in Alzheimer disease.

PubMed

Moreau, Noémie; Viallet, François; Champagne-Lavau, Maud

2013-09-01

Theory of mind (TOM) refers to the ability to infer one's own and other's mental states. Growing evidence highlighted the presence of impairment on the most complex TOM tasks in Alzheimer disease (AD). However, how TOM deficit is related to other cognitive dysfunctions and more specifically to episodic memory impairment - the prominent feature of this disease - is still under debate. Recent neuroanatomical findings have shown that remembering past events and inferring others' states of mind share the same cerebral network suggesting the two abilities share a common process .This paper proposes to review emergent evidence of TOM impairment in AD patients and to discuss the evidence of a relationship between TOM and episodic memory. We will discuss about AD patients' deficit in TOM being possibly related to their difficulties in recollecting memories of past social interactions. Copyright © 2013 Elsevier B.V. All rights reserved.
Mental time travel and the shaping of the human mind

PubMed Central

Suddendorf, Thomas; Addis, Donna Rose; Corballis, Michael C.

2009-01-01

Episodic memory, enabling conscious recollection of past episodes, can be distinguished from semantic memory, which stores enduring facts about the world. Episodic memory shares a core neural network with the simulation of future episodes, enabling mental time travel into both the past and the future. The notion that there might be something distinctly human about mental time travel has provoked ingenious attempts to demonstrate episodic memory or future simulation in non-human animals, but we argue that they have not yet established a capacity comparable to the human faculty. The evolution of the capacity to simulate possible future events, based on episodic memory, enhanced fitness by enabling action in preparation of different possible scenarios that increased present or future survival and reproduction chances. Human language may have evolved in the first instance for the sharing of past and planned future events, and, indeed, fictional ones, further enhancing fitness in social settings. PMID:19528013
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sayan Ghosh, Jeff Hammond

OpenSHMEM is a community effort to unifyt and standardize the SHMEM programming model. MPI (Message Passing Interface) is a well-known community standard for parallel programming using distributed memory. The most recen t release of MPI, version 3.0, was designed in part to support programming models like SHMEM.OSHMPI is an implementation of the OpenSHMEM standard using MPI-3 for the Linux operating system. It is the first implementation of SHMEM over MPI one-sided communication and has the potential to be widely adopted due to the portability and widely availability of Linux and MPI-3. OSHMPI has been tested on a variety of systemsmore » and implementations of MPI-3, includingInfiniBand clusters using MVAPICH2 and SGI shared-memory supercomputers using MPICH. Current support is limited to Linux but may be extended to Apple OSX if there is sufficient interest. The code is opensource via https://github.com/jeffhammond/oshmpi« less
Exploring Manycore Multinode Systems for Irregular Applications with FPGA Prototyping

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ceriani, Marco; Palermo, Gianluca; Secchi, Simone

We present a prototype of a multi-core architecture implemented on FPGA, designed to enable efficient execution of irregular applications on distributed shared memory machines, while maintaining high performance on regular workloads. The architecture is composed of off-the-shelf soft-core cores, local interconnection and memory interface, integrated with custom components that optimize it for irregular applications. It relies on three key elements: a global address space, multithreading, and fine-grained synchronization. Global addresses are scrambled to reduce the formation of network hot-spots, while the latency of the transactions is covered by integrating an hardware scheduler within the custom load/store buffers to take advantagemore » from the availability of multiple executions threads, increasing the efficiency in a transparent way to the application. We evaluated a dual node system irregular kernels showing scalability in the number of cores and threads.« less
A sample implementation for parallelizing Divide-and-Conquer algorithms on the GPU.

PubMed

Mei, Gang; Zhang, Jiayin; Xu, Nengxiong; Zhao, Kunyang

2018-01-01

The strategy of Divide-and-Conquer (D&C) is one of the frequently used programming patterns to design efficient algorithms in computer science, which has been parallelized on shared memory systems and distributed memory systems. Tzeng and Owens specifically developed a generic paradigm for parallelizing D&C algorithms on modern Graphics Processing Units (GPUs). In this paper, by following the generic paradigm proposed by Tzeng and Owens, we provide a new and publicly available GPU implementation of the famous D&C algorithm, QuickHull, to give a sample and guide for parallelizing D&C algorithms on the GPU. The experimental results demonstrate the practicality of our sample GPU implementation. Our research objective in this paper is to present a sample GPU implementation of a classical D&C algorithm to help interested readers to develop their own efficient GPU implementations with fewer efforts.
A Framework for Parallel Unstructured Grid Generation for Complex Aerodynamic Simulations

NASA Technical Reports Server (NTRS)

Zagaris, George; Pirzadeh, Shahyar Z.; Chrisochoides, Nikos

2009-01-01

A framework for parallel unstructured grid generation targeting both shared memory multi-processors and distributed memory architectures is presented. The two fundamental building-blocks of the framework consist of: (1) the Advancing-Partition (AP) method used for domain decomposition and (2) the Advancing Front (AF) method used for mesh generation. Starting from the surface mesh of the computational domain, the AP method is applied recursively to generate a set of sub-domains. Next, the sub-domains are meshed in parallel using the AF method. The recursive nature of domain decomposition naturally maps to a divide-and-conquer algorithm which exhibits inherent parallelism. For the parallel implementation, the Master/Worker pattern is employed to dynamically balance the varying workloads of each task on the set of available CPUs. Performance results by this approach are presented and discussed in detail as well as future work and improvements.
A general purpose subroutine for fast fourier transform on a distributed memory parallel machine

NASA Technical Reports Server (NTRS)

Dubey, A.; Zubair, M.; Grosch, C. E.

1992-01-01

One issue which is central in developing a general purpose Fast Fourier Transform (FFT) subroutine on a distributed memory parallel machine is the data distribution. It is possible that different users would like to use the FFT routine with different data distributions. Thus, there is a need to design FFT schemes on distributed memory parallel machines which can support a variety of data distributions. An FFT implementation on a distributed memory parallel machine which works for a number of data distributions commonly encountered in scientific applications is presented. The problem of rearranging the data after computing the FFT is also addressed. The performance of the implementation on a distributed memory parallel machine Intel iPSC/860 is evaluated.

Intergenerational transmission of historical memories and social-distance attitudes in post-war second-generation Croatians.

PubMed

Svob, Connie; Brown, Norman R; Takšić, Vladimir; Katulić, Katarina; Žauhar, Valnea

2016-08-01

Intergenerational transmission of memory is a process by which biographical knowledge contributes to the construction of collective memory (representation of a shared past). We investigated the intergenerational transmission of war-related memories and social-distance attitudes in second-generation post-war Croatians. We compared 2 groups of young adults from (1) Eastern Croatia (extensively affected by the war) and (2) Western Croatia (affected relatively less by the war). Participants were asked to (a) recall the 10 most important events that occurred in one of their parents' lives, (b) estimate the calendar years of each, and (c) provide scale ratings on them. Additionally, (d) all participants completed a modified Bogardus Social Distance scale, as well as an (e) War Events Checklist for their parents' lives. There were several findings. First, approximately two-thirds of Eastern Croatians and one-half of Western Croatians reported war-related events from their parents' lives. Second, war-related memories impacted the second-generation's identity to a greater extent than did non-war-related memories; this effect was significantly greater in Eastern Croatians than in Western Croatians. Third, war-related events displayed markedly different mnemonic characteristics than non-war-related events. Fourth, the temporal distribution of events surrounding the war produced an upheaval bump, suggesting major transitions (e.g., war) contribute to the way collective memory is formed. And, finally, outright social ostracism and aggression toward out-groups were rarely expressed, independent of region. Nonetheless, social-distance scores were notably higher in Eastern Croatia than in Western Croatia.
A shared neural ensemble links distinct contextual memories encoded close in time

NASA Astrophysics Data System (ADS)

Cai, Denise J.; Aharoni, Daniel; Shuman, Tristan; Shobe, Justin; Biane, Jeremy; Song, Weilin; Wei, Brandon; Veshkini, Michael; La-Vu, Mimi; Lou, Jerry; Flores, Sergio E.; Kim, Isaac; Sano, Yoshitake; Zhou, Miou; Baumgaertel, Karsten; Lavi, Ayal; Kamata, Masakazu; Tuszynski, Mark; Mayford, Mark; Golshani, Peyman; Silva, Alcino J.

2016-06-01

Recent studies suggest that a shared neural ensemble may link distinct memories encoded close in time. According to the memory allocation hypothesis, learning triggers a temporary increase in neuronal excitability that biases the representation of a subsequent memory to the neuronal ensemble encoding the first memory, such that recall of one memory increases the likelihood of recalling the other memory. Here we show in mice that the overlap between the hippocampal CA1 ensembles activated by two distinct contexts acquired within a day is higher than when they are separated by a week. Several findings indicate that this overlap of neuronal ensembles links two contextual memories. First, fear paired with one context is transferred to a neutral context when the two contexts are acquired within a day but not across a week. Second, the first memory strengthens the second memory within a day but not across a week. Older mice, known to have lower CA1 excitability, do not show the overlap between ensembles, the transfer of fear between contexts, or the strengthening of the second memory. Finally, in aged mice, increasing cellular excitability and activating a common ensemble of CA1 neurons during two distinct context exposures rescued the deficit in linking memories. Taken together, these findings demonstrate that contextual memories encoded close in time are linked by directing storage into overlapping ensembles. Alteration of these processes by ageing could affect the temporal structure of memories, thus impairing efficient recall of related information.
Factor structure of overall autobiographical memory usage: the directive, self and social functions revisited.

PubMed

Rasmussen, Anne S; Habermas, Tilmann

2011-08-01

According to theory, autobiographical memory serves three broad functions of overall usage: directive, self, and social. However, there is evidence to suggest that the tripartite model may be better conceptualised in terms of a four-factor model with two social functions. In the present study we examined the two models in Danish and German samples, using the Thinking About Life Experiences Questionnaire (TALE; Bluck, Alea, Habermas, & Rubin, 2005), which measures the overall usage of the three functions generalised across concrete memories. Confirmatory factor analysis supported the four-factor model and rejected the theoretical three-factor model in both samples. The results are discussed in relation to cultural differences in overall autobiographical memory usage as well as sharing versus non-sharing aspects of social remembering.
The application of a sparse, distributed memory to the detection, identification and manipulation of physical objects

NASA Technical Reports Server (NTRS)

Kanerva, P.

1986-01-01

To determine the relation of the sparse, distributed memory to other architectures, a broad review of the literature was made. The memory is called a pattern memory because they work with large patterns of features (high-dimensional vectors). A pattern is stored in a pattern memory by distributing it over a large number of storage elements and by superimposing it over other stored patterns. A pattern is retrieved by mathematical or statistical reconstruction from the distributed elements. Three pattern memories are discussed.
Distributed computing for membrane-based modeling of action potential propagation.

PubMed

Porras, D; Rogers, J M; Smith, W M; Pollard, A E

2000-08-01

Action potential propagation simulations with physiologic membrane currents and macroscopic tissue dimensions are computationally expensive. We, therefore, analyzed distributed computing schemes to reduce execution time in workstation clusters by parallelizing solutions with message passing. Four schemes were considered in two-dimensional monodomain simulations with the Beeler-Reuter membrane equations. Parallel speedups measured with each scheme were compared to theoretical speedups, recognizing the relationship between speedup and code portions that executed serially. A data decomposition scheme based on total ionic current provided the best performance. Analysis of communication latencies in that scheme led to a load-balancing algorithm in which measured speedups at 89 +/- 2% and 75 +/- 8% of theoretical speedups were achieved in homogeneous and heterogeneous clusters of workstations. Speedups in this scheme with the Luo-Rudy dynamic membrane equations exceeded 3.0 with eight distributed workstations. Cluster speedups were comparable to those measured during parallel execution on a shared memory machine.
Experimental evaluation of multiprocessor cache-based error recovery

NASA Technical Reports Server (NTRS)

Janssens, Bob; Fuchs, W. K.

1991-01-01

Several variations of cache-based checkpointing for rollback error recovery in shared-memory multiprocessors have been recently developed. By modifying the cache replacement policy, these techniques use the inherent redundancy in the memory hierarchy to periodically checkpoint the computation state. Three schemes, different in the manner in which they avoid rollback propagation, are evaluated. By simulation with address traces from parallel applications running on an Encore Multimax shared-memory multiprocessor, the performance effect of integrating the recovery schemes in the cache coherence protocol are evaluated. The results indicate that the cache-based schemes can provide checkpointing capability with low performance overhead but uncontrollable high variability in the checkpoint interval.
The FORCE - A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
The FORCE: A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
Hybrid MPI+OpenMP Programming of an Overset CFD Solver and Performance Investigations

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Jin, Haoqiang H.; Biegel, Bryan (Technical Monitor)

2002-01-01

This report describes a two level parallelization of a Computational Fluid Dynamic (CFD) solver with multi-zone overset structured grids. The approach is based on a hybrid MPI+OpenMP programming model suitable for shared memory and clusters of shared memory machines. The performance investigations of the hybrid application on an SGI Origin2000 (O2K) machine is reported using medium and large scale test problems.
Performing a local reduction operation on a parallel computer

DOEpatents

Blocksome, Michael A; Faraj, Daniel A

2013-06-04

A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
Performing a local reduction operation on a parallel computer

DOEpatents

Blocksome, Michael A.; Faraj, Daniel A.

2012-12-11

A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
Improving security of the ping-pong protocol

NASA Astrophysics Data System (ADS)

Zawadzki, Piotr

2013-01-01

A security layer for the asymptotically secure ping-pong protocol is proposed and analyzed in the paper. The operation of the improvement exploits inevitable errors introduced by the eavesdropping in the control and message modes. Its role is similar to the privacy amplification algorithms known from the quantum key distribution schemes. Messages are processed in blocks which guarantees that an eavesdropper is faced with a computationally infeasible problem as long as the system parameters are within reasonable limits. The introduced additional information preprocessing does not require quantum memory registers and confidential communication is possible without prior key agreement or some shared secret.
Memory loss

MedlinePlus

... this page: //medlineplus.gov/ency/article/003257.htm Memory loss To use the sharing features on this ... Bethesda, MD 20894 U.S. Department of Health and Human Services National Institutes of Health Page last updated: ...
We Remember, We Forget: Collaborative Remembering in Older Couples

ERIC Educational Resources Information Center

Harris, Celia B.; Keil, Paul G.; Sutton, John; Barnier, Amanda J.; McIlwain, Doris J. F.

2011-01-01

Transactive memory theory describes the processes by which benefits for memory can occur when remembering is shared in dyads or groups. In contrast, cognitive psychology experiments demonstrate that social influences on memory disrupt and inhibit individual recall. However, most research in cognitive psychology has focused on groups of strangers…
76 FR 12821 - 150th Anniversary of the Inauguration of Abraham Lincoln

Federal Register 2010, 2011, 2012, 2013, 2014

2011-03-09

... together by shared memories and common hopes. As we observe the 150th anniversary of his Inauguration, we... his memory enabled America to move beyond a young collection of States to become a free and unified... memory and uphold the principles he so nobly advanced. [[Page 12822
Expert Systems on Multiprocessor Architectures. Volume 2. Technical Reports

DTIC Science & Technology

1991-06-01

Report RC 12936 (#58037). IBM T. J. Wartson Reiearch Center. July 1987. � Alan Jay Smith. Cache memories. Coniputing Sitrry., 1.1(3): I.3-5:30...basic-shared is an instrument for ashared memory design. The components panels are processor- qload-scrolling-bar-panel, memory-qload-scrolling-bar-panel
Blanket Gate Would Address Blocks Of Memory

NASA Technical Reports Server (NTRS)

Lambe, John; Moopenn, Alexander; Thakoor, Anilkumar P.

1988-01-01

Circuit-chip area used more efficiently. Proposed gate structure selectively allows and restricts access to blocks of memory in electronic neural-type network. By breaking memory into independent blocks, gate greatly simplifies problem of reading from and writing to memory. Since blocks not used simultaneously, share operational amplifiers that prompt and read information stored in memory cells. Fewer operational amplifiers needed, and chip area occupied reduced correspondingly. Cost per bit drops as result.
The potential of multi-port optical memories in digital computing

NASA Technical Reports Server (NTRS)

Alford, C. O.; Gaylord, T. K.

1975-01-01

A high-capacity memory with a relatively high data transfer rate and multi-port simultaneous access capability may serve as the basis for new computer architectures. The implementation of a multi-port optical memory is discussed. Several computer structures are presented that might profitably use such a memory. These structures include (1) a simultaneous record access system, (2) a simultaneously shared memory computer system, and (3) a parallel digital processing structure.
The declarative/procedural model of lexicon and grammar.

PubMed

Ullman, M T

2001-01-01

Our use of language depends upon two capacities: a mental lexicon of memorized words and a mental grammar of rules that underlie the sequential and hierarchical composition of lexical forms into predictably structured larger words, phrases, and sentences. The declarative/procedural model posits that the lexicon/grammar distinction in language is tied to the distinction between two well-studied brain memory systems. On this view, the memorization and use of at least simple words (those with noncompositional, that is, arbitrary form-meaning pairings) depends upon an associative memory of distributed representations that is subserved by temporal-lobe circuits previously implicated in the learning and use of fact and event knowledge. This "declarative memory" system appears to be specialized for learning arbitrarily related information (i.e., for associative binding). In contrast, the acquisition and use of grammatical rules that underlie symbol manipulation is subserved by frontal/basal-ganglia circuits previously implicated in the implicit (nonconscious) learning and expression of motor and cognitive "skills" and "habits" (e.g., from simple motor acts to skilled game playing). This "procedural" system may be specialized for computing sequences. This novel view of lexicon and grammar offers an alternative to the two main competing theoretical frameworks. It shares the perspective of traditional dual-mechanism theories in positing that the mental lexicon and a symbol-manipulating mental grammar are subserved by distinct computational components that may be linked to distinct brain structures. However, it diverges from these theories where they assume components dedicated to each of the two language capacities (that is, domain-specific) and in their common assumption that lexical memory is a rote list of items. Conversely, while it shares with single-mechanism theories the perspective that the two capacities are subserved by domain-independent computational mechanisms, it diverges from them where they link both capacities to a single associative memory system with broad anatomic distribution. The declarative/procedural model, but neither traditional dual- nor single-mechanism models, predicts double dissociations between lexicon and grammar, with associations among associative memory properties, memorized words and facts, and temporal-lobe structures, and among symbol-manipulation properties, grammatical rule products, motor skills, and frontal/basal-ganglia structures. In order to contrast lexicon and grammar while holding other factors constant, we have focused our investigations of the declarative/procedural model on morphologically complex word forms. Morphological transformations that are (largely) unproductive (e.g., in go-went, solemn-solemnity) are hypothesized to depend upon declarative memory. These have been contrasted with morphological transformations that are fully productive (e.g., in walk-walked, happy-happiness), whose computation is posited to be solely dependent upon grammatical rules subserved by the procedural system. Here evidence is presented from studies that use a range of psycholinguistic and neurolinguistic approaches with children and adults. It is argued that converging evidence from these studies supports the declarative/procedural model of lexicon and grammar.
Sparse distributed memory overview

NASA Technical Reports Server (NTRS)

Raugh, Mike

1990-01-01

The Sparse Distributed Memory (SDM) project is investigating the theory and applications of massively parallel computing architecture, called sparse distributed memory, that will support the storage and retrieval of sensory and motor patterns characteristic of autonomous systems. The immediate objectives of the project are centered in studies of the memory itself and in the use of the memory to solve problems in speech, vision, and robotics. Investigation of methods for encoding sensory data is an important part of the research. Examples of NASA missions that may benefit from this work are Space Station, planetary rovers, and solar exploration. Sparse distributed memory offers promising technology for systems that must learn through experience and be capable of adapting to new circumstances, and for operating any large complex system requiring automatic monitoring and control. Sparse distributed memory is a massively parallel architecture motivated by efforts to understand how the human brain works. Sparse distributed memory is an associative memory, able to retrieve information from cues that only partially match patterns stored in the memory. It is able to store long temporal sequences derived from the behavior of a complex system, such as progressive records of the system's sensory data and correlated records of the system's motor controls.

Methodology for fast detection of false sharing in threaded scientific codes

DOEpatents

Chung, I-Hsin; Cong, Guojing; Murata, Hiroki; Negishi, Yasushi; Wen, Hui-Fang

2014-11-25

A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.
Remote quantum entanglement between two micromechanical oscillators.

PubMed

Riedinger, Ralf; Wallucks, Andreas; Marinković, Igor; Löschnauer, Clemens; Aspelmeyer, Markus; Hong, Sungkun; Gröblacher, Simon

2018-04-01

Entanglement, an essential feature of quantum theory that allows for inseparable quantum correlations to be shared between distant parties, is a crucial resource for quantum networks 1 . Of particular importance is the ability to distribute entanglement between remote objects that can also serve as quantum memories. This has been previously realized using systems such as warm 2,3 and cold atomic vapours 4,5 , individual atoms 6 and ions 7,8 , and defects in solid-state systems 9-11 . Practical communication applications require a combination of several advantageous features, such as a particular operating wavelength, high bandwidth and long memory lifetimes. Here we introduce a purely micromachined solid-state platform in the form of chip-based optomechanical resonators made of nanostructured silicon beams. We create and demonstrate entanglement between two micromechanical oscillators across two chips that are separated by 20 centimetres . The entangled quantum state is distributed by an optical field at a designed wavelength near 1,550 nanometres. Therefore, our system can be directly incorporated in a realistic fibre-optic quantum network operating in the conventional optical telecommunication band. Our results are an important step towards the development of large-area quantum networks based on silicon photonics.
Partitioning problems in parallel, pipelined and distributed computing

NASA Technical Reports Server (NTRS)

Bokhari, S.

1985-01-01

The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest.
States of mind: Emotions, body feelings, and thoughts share distributed neural networks

PubMed Central

Oosterwijk, Suzanne; Lindquist, Kristen A.; Anderson, Eric; Dautoff, Rebecca; Moriguchi, Yoshiya; Barrett, Lisa Feldman

2012-01-01

Scientists have traditionally assumed that different kinds of mental states (e.g., fear, disgust, love, memory, planning, concentration, etc.) correspond to different psychological faculties that have domain-specific correlates in the brain. Yet, growing evidence points to the constructionist hypothesis that mental states emerge from the combination of domain-general psychological processes that map to large-scale distributed brain networks. In this paper, we report a novel study testing a constructionist model of the mind in which participants generated three kinds of mental states (emotions, body feelings, or thoughts) while we measured activity within large-scale distributed brain networks using fMRI. We examined the similarity and differences in the pattern of network activity across these three classes of mental states. Consistent with a constructionist hypothesis, a combination of large-scale distributed networks contributed to emotions, thoughts, and body feelings, although these mental states differed in the relative contribution of those networks. Implications for a constructionist functional architecture of diverse mental states are discussed. PMID:22677148
Scalable Triadic Analysis of Large-Scale Graphs: Multi-Core vs. Multi-Processor vs. Multi-Threaded Shared Memory Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chin, George; Marquez, Andres; Choudhury, Sutanay

2012-09-01

Triadic analysis encompasses a useful set of graph mining methods that is centered on the concept of a triad, which is a subgraph of three nodes and the configuration of directed edges across the nodes. Such methods are often applied in the social sciences as well as many other diverse fields. Triadic methods commonly operate on a triad census that counts the number of triads of every possible edge configuration in a graph. Like other graph algorithms, triadic census algorithms do not scale well when graphs reach tens of millions to billions of nodes. To enable the triadic analysis ofmore » large-scale graphs, we developed and optimized a triad census algorithm to efficiently execute on shared memory architectures. We will retrace the development and evolution of a parallel triad census algorithm. Over the course of several versions, we continually adapted the code’s data structures and program logic to expose more opportunities to exploit parallelism on shared memory that would translate into improved computational performance. We will recall the critical steps and modifications that occurred during code development and optimization. Furthermore, we will compare the performances of triad census algorithm versions on three specific systems: Cray XMT, HP Superdome, and AMD multi-core NUMA machine. These three systems have shared memory architectures but with markedly different hardware capabilities to manage parallelism.« less
An Adaptive Insertion and Promotion Policy for Partitioned Shared Caches

NASA Astrophysics Data System (ADS)

Mahrom, Norfadila; Liebelt, Michael; Raof, Rafikha Aliana A.; Daud, Shuhaizar; Hafizah Ghazali, Nur

2018-03-01

Cache replacement policies in chip multiprocessors (CMP) have been investigated extensively and proven able to enhance shared cache management. However, competition among multiple processors executing different threads that require simultaneous access to a shared memory may cause cache contention and memory coherence problems on the chip. These issues also exist due to some drawbacks of the commonly used Least Recently Used (LRU) policy employed in multiprocessor systems, which are because of the cache lines residing in the cache longer than required. In image processing analysis of for example extra pulmonary tuberculosis (TB), an accurate diagnosis for tissue specimen is required. Therefore, a fast and reliable shared memory management system to execute algorithms for processing vast amount of specimen image is needed. In this paper, the effects of the cache replacement policy in a partitioned shared cache are investigated. The goal is to quantify whether better performance can be achieved by using less complex replacement strategies. This paper proposes a Middle Insertion 2 Positions Promotion (MI2PP) policy to eliminate cache misses that could adversely affect the access patterns and the throughput of the processors in the system. The policy employs a static predefined insertion point, near distance promotion, and the concept of ownership in the eviction policy to effectively improve cache thrashing and to avoid resource stealing among the processors.
Evaluating the performance of the particle finite element method in parallel architectures

NASA Astrophysics Data System (ADS)

Gimenez, Juan M.; Nigro, Norberto M.; Idelsohn, Sergio R.

2014-05-01

This paper presents a high performance implementation for the particle-mesh based method called particle finite element method two (PFEM-2). It consists of a material derivative based formulation of the equations with a hybrid spatial discretization which uses an Eulerian mesh and Lagrangian particles. The main aim of PFEM-2 is to solve transport equations as fast as possible keeping some level of accuracy. The method was found to be competitive with classical Eulerian alternatives for these targets, even in their range of optimal application. To evaluate the goodness of the method with large simulations, it is imperative to use of parallel environments. Parallel strategies for Finite Element Method have been widely studied and many libraries can be used to solve Eulerian stages of PFEM-2. However, Lagrangian stages, such as streamline integration, must be developed considering the parallel strategy selected. The main drawback of PFEM-2 is the large amount of memory needed, which limits its application to large problems with only one computer. Therefore, a distributed-memory implementation is urgently needed. Unlike a shared-memory approach, using domain decomposition the memory is automatically isolated, thus avoiding race conditions; however new issues appear due to data distribution over the processes. Thus, a domain decomposition strategy for both particle and mesh is adopted, which minimizes the communication between processes. Finally, performance analysis running over multicore and multinode architectures are presented. The Courant-Friedrichs-Lewy number used influences the efficiency of the parallelization and, in some cases, a weighted partitioning can be used to improve the speed-up. However the total cputime for cases presented is lower than that obtained when using classical Eulerian strategies.
A Multi-Level Parallelization Concept for High-Fidelity Multi-Block Solvers

NASA Technical Reports Server (NTRS)

Hatay, Ferhat F.; Jespersen, Dennis C.; Guruswamy, Guru P.; Rizk, Yehia M.; Byun, Chansup; Gee, Ken; VanDalsem, William R. (Technical Monitor)

1997-01-01

The integration of high-fidelity Computational Fluid Dynamics (CFD) analysis tools with the industrial design process benefits greatly from the robust implementations that are transportable across a wide range of computer architectures. In the present work, a hybrid domain-decomposition and parallelization concept was developed and implemented into the widely-used NASA multi-block Computational Fluid Dynamics (CFD) packages implemented in ENSAERO and OVERFLOW. The new parallel solver concept, PENS (Parallel Euler Navier-Stokes Solver), employs both fine and coarse granularity in data partitioning as well as data coalescing to obtain the desired load-balance characteristics on the available computer platforms. This multi-level parallelism implementation itself introduces no changes to the numerical results, hence the original fidelity of the packages are identically preserved. The present implementation uses the Message Passing Interface (MPI) library for interprocessor message passing and memory accessing. By choosing an appropriate combination of the available partitioning and coalescing capabilities only during the execution stage, the PENS solver becomes adaptable to different computer architectures from shared-memory to distributed-memory platforms with varying degrees of parallelism. The PENS implementation on the IBM SP2 distributed memory environment at the NASA Ames Research Center obtains 85 percent scalable parallel performance using fine-grain partitioning of single-block CFD domains using up to 128 wide computational nodes. Multi-block CFD simulations of complete aircraft simulations achieve 75 percent perfect load-balanced executions using data coalescing and the two levels of parallelism. SGI PowerChallenge, SGI Origin 2000, and a cluster of workstations are the other platforms where the robustness of the implementation is tested. The performance behavior on the other computer platforms with a variety of realistic problems will be included as this on-going study progresses.
Hierarchical Traces for Reduced NSM Memory Requirements

NASA Astrophysics Data System (ADS)

Dahl, Torbjørn S.

This paper presents work on using hierarchical long term memory to reduce the memory requirements of nearest sequence memory (NSM) learning, a previously published, instance-based reinforcement learning algorithm. A hierarchical memory representation reduces the memory requirements by allowing traces to share common sub-sequences. We present moderated mechanisms for estimating discounted future rewards and for dealing with hidden state using hierarchical memory. We also present an experimental analysis of how the sub-sequence length affects the memory compression achieved and show that the reduced memory requirements do not effect the speed of learning. Finally, we analyse and discuss the persistence of the sub-sequences independent of specific trace instances.
The Contribution of Working Memory to Fluid Reasoning: Capacity, Control, or Both?

ERIC Educational Resources Information Center

Chuderski, Adam; Necka, Edward

2012-01-01

Fluid reasoning shares a large part of its variance with working memory capacity (WMC). The literature on working memory (WM) suggests that the capacity of the focus of attention responsible for simultaneous maintenance and integration of information within WM, as well as the effectiveness of executive control exerted over WM, determines…
Feature-Based Memory-Driven Attentional Capture: Visual Working Memory Content Affects Visual Attention

ERIC Educational Resources Information Center

Olivers, Christian N. L.; Meijer, Frank; Theeuwes, Jan

2006-01-01

In 7 experiments, the authors explored whether visual attention (the ability to select relevant visual information) and visual working memory (the ability to retain relevant visual information) share the same content representations. The presence of singleton distractors interfered more strongly with a visual search task when it was accompanied by…
Time-Related Decay or Interference-Based Forgetting in Working Memory?

ERIC Educational Resources Information Center

Portrat, Sophie; Barrouillet, Pierre; Camos, Valerie

2008-01-01

The time-based resource-sharing model of working memory assumes that memory traces suffer from a time-related decay when attention is occupied by concurrent activities. Using complex continuous span tasks in which temporal parameters are carefully controlled, P. Barrouillet, S. Bernardin, S. Portrat, E. Vergauwe, & V. Camos (2007) recently…
Developmental Change in Working Memory Strategies: From Passive Maintenance to Active Refreshing

ERIC Educational Resources Information Center

Camos, Valerie; Barrouillet, Pierre

2011-01-01

Change in strategies is often mentioned as a source of memory development. However, though performance in working memory tasks steadily improves during childhood, theories differ in linking this development to strategy changes. Whereas some theories, such as the time-based resource-sharing model, invoke the age-related increase in use and…
Cache write generate for parallel image processing on shared memory architectures.

PubMed

Wittenbrink, C M; Somani, A K; Chen, C H

1996-01-01

We investigate cache write generate, our cache mode invention. We demonstrate that for parallel image processing applications, the new mode improves main memory bandwidth, CPU efficiency, cache hits, and cache latency. We use register level simulations validated by the UW-Proteus system. Many memory, cache, and processor configurations are evaluated.
Mnemonic convergence in social networks: The emergent properties of cognition at a collective level

PubMed Central

Coman, Alin; Momennejad, Ida; Drach, Rae D.; Geana, Andra

2016-01-01

The development of shared memories, beliefs, and norms is a fundamental characteristic of human communities. These emergent outcomes are thought to occur owing to a dynamic system of information sharing and memory updating, which fundamentally depends on communication. Here we report results on the formation of collective memories in laboratory-created communities. We manipulated conversational network structure in a series of real-time, computer-mediated interactions in fourteen 10-member communities. The results show that mnemonic convergence, measured as the degree of overlap among community members’ memories, is influenced by both individual-level information-processing phenomena and by the conversational social network structure created during conversational recall. By studying laboratory-created social networks, we show how large-scale social phenomena (i.e., collective memory) can emerge out of microlevel local dynamics (i.e., mnemonic reinforcement and suppression effects). The social-interactionist approach proposed herein points to optimal strategies for spreading information in social networks and provides a framework for measuring and forging collective memories in communities of individuals. PMID:27357678
Importance of balanced architectures in the design of high-performance imaging systems

NASA Astrophysics Data System (ADS)

Sgro, Joseph A.; Stanton, Paul C.

1999-03-01

Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.
Associative-memory representations emerge as shared spatial patterns of theta activity spanning the primate temporal cortex

PubMed Central

Nakahara, Kiyoshi; Adachi, Ken; Kawasaki, Keisuke; Matsuo, Takeshi; Sawahata, Hirohito; Majima, Kei; Takeda, Masaki; Sugiyama, Sayaka; Nakata, Ryota; Iijima, Atsuhiko; Tanigawa, Hisashi; Suzuki, Takafumi; Kamitani, Yukiyasu; Hasegawa, Isao

2016-01-01

Highly localized neuronal spikes in primate temporal cortex can encode associative memory; however, whether memory formation involves area-wide reorganization of ensemble activity, which often accompanies rhythmicity, or just local microcircuit-level plasticity, remains elusive. Using high-density electrocorticography, we capture local-field potentials spanning the monkey temporal lobes, and show that the visual pair-association (PA) memory is encoded in spatial patterns of theta activity in areas TE, 36, and, partially, in the parahippocampal cortex, but not in the entorhinal cortex. The theta patterns elicited by learned paired associates are distinct between pairs, but similar within pairs. This pattern similarity, emerging through novel PA learning, allows a machine-learning decoder trained on theta patterns elicited by a particular visual item to correctly predict the identity of those elicited by its paired associate. Our results suggest that the formation and sharing of widespread cortical theta patterns via learning-induced reorganization are involved in the mechanisms of associative memory representation. PMID:27282247
The costs of changing an intended action: movement planning, but not execution, interferes with verbal working memory.

PubMed

Spiegel, M A; Koester, D; Weigelt, M; Schack, T

2012-02-16

How much cognitive effort does it take to change a movement plan? In previous studies, it has been shown that humans plan and represent actions in advance, but it remains unclear whether or not action planning and verbal working memory share cognitive resources. Using a novel experimental paradigm, we combined in two experiments a grasp-to-place task with a verbal working memory task. Participants planned a placing movement toward one of two target positions and subsequently encoded and maintained visually presented letters. Both experiments revealed that re-planning the intended action reduced letter recall performance; execution time, however, was not influenced by action modifications. The results of Experiment 2 suggest that the action's interference with verbal working memory arose during the planning rather than the execution phase of the movement. Together, our results strongly suggest that movement planning and verbal working memory share common cognitive resources. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
ABINIT: Plane-Wave-Based Density-Functional Theory on High Performance Computers

NASA Astrophysics Data System (ADS)

Torrent, Marc

2014-03-01

For several years, a continuous effort has been produced to adapt electronic structure codes based on Density-Functional Theory to the future computing architectures. Among these codes, ABINIT is based on a plane-wave description of the wave functions which allows to treat systems of any kind. Porting such a code on petascale architectures pose difficulties related to the many-body nature of the DFT equations. To improve the performances of ABINIT - especially for what concerns standard LDA/GGA ground-state and response-function calculations - several strategies have been followed: A full multi-level parallelisation MPI scheme has been implemented, exploiting all possible levels and distributing both computation and memory. It allows to increase the number of distributed processes and could not be achieved without a strong restructuring of the code. The core algorithm used to solve the eigen problem (``Locally Optimal Blocked Congugate Gradient''), a Blocked-Davidson-like algorithm, is based on a distribution of processes combining plane-waves and bands. In addition to the distributed memory parallelization, a full hybrid scheme has been implemented, using standard shared-memory directives (openMP/openACC) or porting some comsuming code sections to Graphics Processing Units (GPU). As no simple performance model exists, the complexity of use has been increased; the code efficiency strongly depends on the distribution of processes among the numerous levels. ABINIT is able to predict the performances of several process distributions and automatically choose the most favourable one. On the other hand, a big effort has been carried out to analyse the performances of the code on petascale architectures, showing which sections of codes have to be improved; they all are related to Matrix Algebra (diagonalisation, orthogonalisation). The different strategies employed to improve the code scalability will be described. They are based on an exploration of new diagonalization algorithm, as well as the use of external optimized librairies. Part of this work has been supported by the european Prace project (PaRtnership for Advanced Computing in Europe) in the framework of its workpackage 8.
OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers

NASA Astrophysics Data System (ADS)

Kimura, Keiji; Mase, Masayoshi; Mikami, Hiroki; Miyamoto, Takamichi; Shirako, Jun; Kasahara, Hironori

OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an METI/NEDO project entitled "Multicore Technology for Realtime Consumer Electronics." By using the OSCAR API as an interface between the OSCAR compiler and backend compilers, the OSCAR compiler enables hierarchical multigrain parallel processing with memory optimization under capacity restriction for cache memory, local memory, distributed shared memory, and on-chip/off-chip shared memory; data transfer using a DMA controller; and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating for various embedded multicores. In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OSCAR API is designed on a subset of the OpenMP. This paper describes the OSCAR API and its compatibility with the OSCAR compiler by showing code examples. Performance evaluations of the OSCAR compiler and the OSCAR API are carried out using an IBM Power5+ workstation, an IBM Power6 high-end SMP server, and a newly developed consumer electronics multicore chip RP2 by Renesas, Hitachi and Waseda. From the results of scalability evaluation, it is found that on an average, the OSCAR compiler with the OSCAR API can exploit 5.8 times speedup over the sequential execution on the Power5+ workstation with eight cores and 2.9 times speedup on RP2 with four cores, respectively. In addition, the OSCAR compiler can accelerate an IBM XL Fortran compiler up to 3.3 times on the Power6 SMP server. Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.

Synapsin Determines Memory Strength after Punishment- and Relief-Learning

PubMed Central

Niewalda, Thomas; Michels, Birgit; Jungnickel, Roswitha; Diegelmann, Sören; Kleber, Jörg; Kähne, Thilo

2015-01-01

Adverse life events can induce two kinds of memory with opposite valence, dependent on timing: “negative” memories for stimuli preceding them and “positive” memories for stimuli experienced at the moment of “relief.” Such punishment memory and relief memory are found in insects, rats, and man. For example, fruit flies (Drosophila melanogaster) avoid an odor after odor-shock training (“forward conditioning” of the odor), whereas after shock-odor training (“backward conditioning” of the odor) they approach it. Do these timing-dependent associative processes share molecular determinants? We focus on the role of Synapsin, a conserved presynaptic phosphoprotein regulating the balance between the reserve pool and the readily releasable pool of synaptic vesicles. We find that a lack of Synapsin leaves task-relevant sensory and motor faculties unaffected. In contrast, both punishment memory and relief memory scores are reduced. These defects reflect a true lessening of associative memory strength, as distortions in nonassociative processing (e.g., susceptibility to handling, adaptation, habituation, sensitization), discrimination ability, and changes in the time course of coincidence detection can be ruled out as alternative explanations. Reductions in punishment- and relief-memory strength are also observed upon an RNAi-mediated knock-down of Synapsin, and are rescued both by acutely restoring Synapsin and by locally restoring it in the mushroom bodies of mutant flies. Thus, both punishment memory and relief memory require the Synapsin protein and in this sense share genetic and molecular determinants. We note that corresponding molecular commonalities between punishment memory and relief memory in humans would constrain pharmacological attempts to selectively interfere with excessive associative punishment memories, e.g., after traumatic experiences. PMID:25972175
Synapsin determines memory strength after punishment- and relief-learning.

PubMed

Niewalda, Thomas; Michels, Birgit; Jungnickel, Roswitha; Diegelmann, Sören; Kleber, Jörg; Kähne, Thilo; Gerber, Bertram

2015-05-13

Adverse life events can induce two kinds of memory with opposite valence, dependent on timing: "negative" memories for stimuli preceding them and "positive" memories for stimuli experienced at the moment of "relief." Such punishment memory and relief memory are found in insects, rats, and man. For example, fruit flies (Drosophila melanogaster) avoid an odor after odor-shock training ("forward conditioning" of the odor), whereas after shock-odor training ("backward conditioning" of the odor) they approach it. Do these timing-dependent associative processes share molecular determinants? We focus on the role of Synapsin, a conserved presynaptic phosphoprotein regulating the balance between the reserve pool and the readily releasable pool of synaptic vesicles. We find that a lack of Synapsin leaves task-relevant sensory and motor faculties unaffected. In contrast, both punishment memory and relief memory scores are reduced. These defects reflect a true lessening of associative memory strength, as distortions in nonassociative processing (e.g., susceptibility to handling, adaptation, habituation, sensitization), discrimination ability, and changes in the time course of coincidence detection can be ruled out as alternative explanations. Reductions in punishment- and relief-memory strength are also observed upon an RNAi-mediated knock-down of Synapsin, and are rescued both by acutely restoring Synapsin and by locally restoring it in the mushroom bodies of mutant flies. Thus, both punishment memory and relief memory require the Synapsin protein and in this sense share genetic and molecular determinants. We note that corresponding molecular commonalities between punishment memory and relief memory in humans would constrain pharmacological attempts to selectively interfere with excessive associative punishment memories, e.g., after traumatic experiences. Copyright © 2015 Niewalda et al.
Audience tuning effects in the context of situated and embodied processes.

PubMed

Semin, Gün R

2018-03-05

This review provides an overview of the research on communication and the 'Saying is Believing' paradigm in the context of different perspectives on communication. The process of 'audience tuning' is shaped by a variety of situated factors in contexts that affect the communicators' confidence in their message. The overwhelming common denominator is that the combination of features that create ambiguity yields the optimal condition for the formation of shared realities. I conclude with an argument that the implied invariance of memory processes in shared reality work needs to be more attentive to the regulatory function of memories driving the expression of shared realities. Copyright © 2018 Elsevier Ltd. All rights reserved.
A 300MHz Embedded Flash Memory with Pipeline Architecture and Offset-Free Sense Amplifiers for Dual-Core Automotive Microcontrollers

NASA Astrophysics Data System (ADS)

Kajiyama, Shinya; Fujito, Masamichi; Kasai, Hideo; Mizuno, Makoto; Yamaguchi, Takanori; Shinagawa, Yutaka

A novel 300MHz embedded flash memory for dual-core microcontrollers with a shared ROM architecture is proposed. One of its features is a three-stage pipeline read operation, which enables reduced access pitch and therefore reduces performance penalty due to conflict of shared ROM accesses. Another feature is a highly sensitive sense amplifier that achieves efficient pipeline operation with two-cycle latency one-cycle pitch as a result of a shortened sense time of 0.63ns. The combination of the pipeline architecture and proposed sense amplifiers significantly reduces access-conflict penalties with shared ROM and enhances performance of 32-bit RISC dual-core microcontrollers by 30%.
A general model for memory interference in a multiprocessor system with memory hierarchy

NASA Technical Reports Server (NTRS)

Taha, Badie A.; Standley, Hilda M.

1989-01-01

The problem of memory interference in a multiprocessor system with a hierarchy of shared buses and memories is addressed. The behavior of the processors is represented by a sequence of memory requests with each followed by a determined amount of processing time. A statistical queuing network model for determining the extent of memory interference in multiprocessor systems with clusters of memory hierarchies is presented. The performance of the system is measured by the expected number of busy memory clusters. The results of the analytic model are compared with simulation results, and the correlation between them is found to be very high.
Domain-general involvement of the posterior frontolateral cortex in time-based resource-sharing in working memory: An fMRI study.

PubMed

Vergauwe, Evie; Hartstra, Egbert; Barrouillet, Pierre; Brass, Marcel

2015-07-15

Working memory is often defined in cognitive psychology as a system devoted to the simultaneous processing and maintenance of information. In line with the time-based resource-sharing model of working memory (TBRS; Barrouillet and Camos, 2015; Barrouillet et al., 2004), there is accumulating evidence that, when memory items have to be maintained while performing a concurrent activity, memory performance depends on the cognitive load of this activity, independently of the domain involved. The present study used fMRI to identify regions in the brain that are sensitive to variations in cognitive load in a domain-general way. More precisely, we aimed at identifying brain areas that activate during maintenance of memory items as a direct function of the cognitive load induced by both verbal and spatial concurrent tasks. Results show that the right IFJ and bilateral SPL/IPS are the only areas showing an increased involvement as cognitive load increases and do so in a domain general manner. When correlating the fMRI signal with the approximated cognitive load as defined by the TBRS model, it was shown that the main focus of the cognitive load-related activation is located in the right IFJ. The present findings indicate that the IFJ makes domain-general contributions to time-based resource-sharing in working memory and allowed us to generate the novel hypothesis by which the IFJ might be the neural basis for the process of rapid switching. We argue that the IFJ might be a crucial part of a central attentional bottleneck in the brain because of its inability to upload more than one task rule at once. Copyright © 2015 Elsevier Inc. All rights reserved.
Information and processes underlying semantic and episodic memory across tasks, items, and individuals.

PubMed

Cox, Gregory E; Hemmer, Pernille; Aue, William R; Criss, Amy H

2018-04-01

The development of memory theory has been constrained by a focus on isolated tasks rather than the processes and information that are common to situations in which memory is engaged. We present results from a study in which 453 participants took part in five different memory tasks: single-item recognition, associative recognition, cued recall, free recall, and lexical decision. Using hierarchical Bayesian techniques, we jointly analyzed the correlations between tasks within individuals-reflecting the degree to which tasks rely on shared cognitive processes-and within items-reflecting the degree to which tasks rely on the same information conveyed by the item. Among other things, we find that (a) the processes involved in lexical access and episodic memory are largely separate and rely on different kinds of information, (b) access to lexical memory is driven primarily by perceptual aspects of a word, (c) all episodic memory tasks rely to an extent on a set of shared processes which make use of semantic features to encode both single words and associations between words, and (d) recall involves additional processes likely related to contextual cuing and response production. These results provide a large-scale picture of memory across different tasks which can serve to drive the development of comprehensive theories of memory. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
What Drives Memory-Driven Attentional Capture? The Effects of Memory Type, Display Type, and Search Type

ERIC Educational Resources Information Center

Olivers, Christian N. L.

2009-01-01

An important question is whether visual attention (the ability to select relevant visual information) and visual working memory (the ability to retain relevant visual information) share the same content representations. Some past research has indicated that they do: Singleton distractors interfered more strongly with a visual search task when they…
Discrete Resource Allocation in Visual Working Memory

ERIC Educational Resources Information Center

Barton, Brian; Ester, Edward F.; Awh, Edward

2009-01-01

Are resources in visual working memory allocated in a continuous or a discrete fashion? On one hand, flexible resource models suggest that capacity is determined by a central resource pool that can be flexibly divided such that items of greater complexity receive a larger share of resources. On the other hand, if capacity in working memory is…
Natural Conversations as a Source of False Memories in Children: Implications for the Testimony of Young Witnesses

PubMed Central

Principe, Gabrielle F.; Schindewolf, Erica

2012-01-01

Research on factors that can affect the accuracy of children’s autobiographical remembering has important implications for understanding the abilities of young witnesses to provide legal testimony. In this article, we review our own recent research on one factor that has much potential to induce errors in children’s event recall, namely natural memory sharing conversations with peers and parents. Our studies provide compelling evidence that not only can the content of conversations about the past intrude into later memory but that such exchanges can prompt the generation of entirely false narratives that are more detailed than true accounts of experienced events. Further, our work show that deeper and more creative participation in memory sharing dialogues can boost the damaging effects of conversationally conveyed misinformation. Implications of this collection of findings for children’s testimony are discussed. PMID:23129880
On nonlinear finite element analysis in single-, multi- and parallel-processors

NASA Technical Reports Server (NTRS)

Utku, S.; Melosh, R.; Islam, M.; Salama, M.

1982-01-01

Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

DOEpatents

Blocksome, Michael A.; Mamidala, Amith R.

2013-09-03

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

DOEpatents

Blocksome, Michael A; Mamidala, Amith R

2014-02-11

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Immunological memory is associative

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, D.J.; Forrest, S.; Perelson, A.S.

1996-12-31

The purpose of this paper is to show that immunological memory is an associative and robust memory that belongs to the class of sparse distributed memories. This class of memories derives its associative and robust nature by sparsely sampling the input space and distributing the data among many independent agents. Other members of this class include a model of the cerebellar cortex and Sparse Distributed Memory (SDM). First we present a simplified account of the immune response and immunological memory. Next we present SDM, and then we show the correlations between immunological memory and SDM. Finally, we show how associativemore » recall in the immune response can be both beneficial and detrimental to the fitness of an individual.« less
Protection of Mission-Critical Applications from Untrusted Execution Environment: Resource Efficient Replication and Migration of Virtual Machines

DTIC Science & Technology

2015-09-28

the performance of log-and- replay can degrade significantly for VMs configured with multiple virtual CPUs, since the shared memory communication...whether based on checkpoint replication or log-and- replay , existing HA ap- proaches use in- memory backups. The backup VM sits in the memory of a...efficiently. 15. SUBJECT TERMS High-availability virtual machines, live migration, memory and traffic overheads, application suspension, Java
The Spectral Element Method for Geophysical Flows

NASA Astrophysics Data System (ADS)

Taylor, Mark

1998-11-01

We will describe SEAM, a Spectral Element Atmospheric Model. SEAM solves the 3D primitive equations used in climate modeling and medium range forecasting. SEAM uses a spectral element discretization for the surface of the globe and finite differences in the vertical direction. The model is spectrally accurate, as demonstrated by a variety of test cases. It is well suited for modern distributed-shared memory computers, sustaining over 24 GFLOPS on a 240 processor HP Exemplar. This performance has allowed us to run several interesting simulations in full spherical geometry at high resolution (over 22 million grid points).
Authentication and Key Establishment in Dynamic Wireless Sensor Networks

PubMed Central

Qiu, Ying; Zhou, Jianying; Baek, Joonsang; Lopez, Javier

2010-01-01

When a sensor node roams within a very large and distributed wireless sensor network, which consists of numerous sensor nodes, its routing path and neighborhood keep changing. In order to provide a high level of security in this environment, the moving sensor node needs to be authenticated to new neighboring nodes and a key established for secure communication. The paper proposes an efficient and scalable protocol to establish and update the authentication key in a dynamic wireless sensor network environment. The protocol guarantees that two sensor nodes share at least one key with probability 1 (100%) with less memory and energy cost, while not causing considerable communication overhead. PMID:22319321
Implementing Shared Memory Parallelism in MCBEND

NASA Astrophysics Data System (ADS)

Bird, Adam; Long, David; Dobson, Geoff

2017-09-01

MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
Preliminary evidence that abscisic acid improves spatial memory in rats.

PubMed

Qi, Cong-Cong; Ge, Jin-Fang; Zhou, Jiang-Ning

2015-02-01

Abscisic acid (ABA) is a crucial phytohormone that exists in a wide range of animals, including humans, and has multiple bioactivities. As direct derivatives of carotenoids, ABA and retinoic acid (RA) share similar molecular structures, and RA has been reported to improve spatial memory in rodents. To explore the potential effects of ABA on spatial learning and memory in rodents, 20mg/kg ABA was administered to young rats for 6weeks, and its effects on behaviour performance were evaluated through a series of behavioural tests. ABA pharmacokinetic analysis revealed that the exogenous ABA was distributed widely in the rat brain, characterised by rapid absorption and slow elimination. The behavioural tests showed that ABA increased both the duration spent in the target quadrant and the frequency it was entered in the probe test of the Morris water maze (MWM) and decreased the latency to locate the target quadrant. Moreover, ABA decreased the latency to enter the novel arm in the Y-maze test, accompanied by increases in the total entries and distance travelled in the three arms. However, there were no significant differences between the ABA-treated and control rats in the open field test and elevated plus-maze test. These results preliminarily indicate that ABA improves spatial memory in MWM and exploratory activity in Y-maze in young rats. Copyright © 2014 Elsevier Inc. All rights reserved.
Effects of Aging on True and False Memory Formation: An fMRI Study

ERIC Educational Resources Information Center

Dennis, Nancy A.; Kim, Hongkeun; Cabeza, Roberto

2007-01-01

Compared to young, older adults are more likely to forget events that occurred in the past as well as remember events that never happened. Previous studies examining false memories and aging have shown that these memories are more likely to occur when new items share perceptual or semantic similarities with those presented during encoding. It is…

Ad Hoc Categories and False Memories: Memory Illusions for Categories Created On-The-Spot

ERIC Educational Resources Information Center

Soro, Jerônimo C.; Ferreira, Mário B.; Semin, Gün R.; Mata, André; Carneiro, Paula

2017-01-01

Three experiments were designed to test whether experimentally created ad hoc associative networks evoke false memories. We used the DRM (Deese, Roediger, McDermott) paradigm with lists of ad hoc categories composed of exemplars aggregated toward specific goals (e.g., going for a picnic) that do not share any consistent set of features. Experiment…
Transactive memory in organizational groups: the effects of content, consensus, specialization, and accuracy on group performance.

PubMed

Austin, John R

2003-10-01

Previous research on transactive memory has found a positive relationship between transactive memory system development and group performance in single project laboratory and ad hoc groups. Closely related research on shared mental models and expertise recognition supports these findings. In this study, the author examined the relationship between transactive memory systems and performance in mature, continuing groups. A group's transactive memory system, measured as a combination of knowledge stock, knowledge specialization, transactive memory consensus, and transactive memory accuracy, is positively related to group goal performance, external group evaluations, and internal group evaluations. The positive relationship with group performance was found to hold for both task and external relationship transactive memory systems.
Social Transmission of False Memory in Small Groups and Large Networks.

PubMed

Maswood, Raeya; Rajaram, Suparna

2018-05-21

Sharing information and memories is a key feature of social interactions, making social contexts important for developing and transmitting accurate memories and also false memories. False memory transmission can have wide-ranging effects, including shaping personal memories of individuals as well as collective memories of a network of people. This paper reviews a collection of key findings and explanations in cognitive research on the transmission of false memories in small groups. It also reviews the emerging experimental work on larger networks and collective false memories. Given the reconstructive nature of memory, the abundance of misinformation in everyday life, and the variety of social structures in which people interact, an understanding of transmission of false memories has both scientific and societal implications. © 2018 Cognitive Science Society, Inc.
Ultrahigh-order Maxwell solver with extreme scalability for electromagnetic PIC simulations of plasmas

NASA Astrophysics Data System (ADS)

Vincenti, Henri; Vay, Jean-Luc

2018-07-01

The advent of massively parallel supercomputers, with their distributed-memory technology using many processing units, has favored the development of highly-scalable local low-order solvers at the expense of harder-to-scale global very high-order spectral methods. Indeed, FFT-based methods, which were very popular on shared memory computers, have been largely replaced by finite-difference (FD) methods for the solution of many problems, including plasmas simulations with electromagnetic Particle-In-Cell methods. For some problems, such as the modeling of so-called "plasma mirrors" for the generation of high-energy particles and ultra-short radiations, we have shown that the inaccuracies of standard FD-based PIC methods prevent the modeling on present supercomputers at sufficient accuracy. We demonstrate here that a new method, based on the use of local FFTs, enables ultrahigh-order accuracy with unprecedented scalability, and thus for the first time the accurate modeling of plasma mirrors in 3D.
Distributed representations in memory: Insights from functional brain imaging

PubMed Central

Rissman, Jesse; Wagner, Anthony D.

2015-01-01

Forging new memories for facts and events, holding critical details in mind on a moment-to-moment basis, and retrieving knowledge in the service of current goals all depend on a complex interplay between neural ensembles throughout the brain. Over the past decade, researchers have increasingly leveraged powerful analytical tools (e.g., multi-voxel pattern analysis) to decode the information represented within distributed fMRI activity patterns. In this review, we discuss how these methods can sensitively index neural representations of perceptual and semantic content, and how leverage on the engagement of distributed representations provides unique insights into distinct aspects of memory-guided behavior. We emphasize that, in addition to characterizing the contents of memories, analyses of distributed patterns shed light on the processes that influence how information is encoded, maintained, or retrieved, and thus inform memory theory. We conclude by highlighting open questions about memory that can be addressed through distributed pattern analyses. PMID:21943171
MULTI: a shared memory approach to cooperative molecular modeling.

PubMed

Darden, T; Johnson, P; Smith, H

1991-03-01

A general purpose molecular modeling system, MULTI, based on the UNIX shared memory and semaphore facilities for interprocess communication is described. In addition to the normal querying or monitoring of geometric data, MULTI also provides processes for manipulating conformations, and for displaying peptide or nucleic acid ribbons, Connolly surfaces, close nonbonded contacts, crystal-symmetry related images, least-squares superpositions, and so forth. This paper outlines the basic techniques used in MULTI to ensure cooperation among these specialized processes, and then describes how they can work together to provide a flexible modeling environment.
SMT-Aware Instantaneous Footprint Optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roy, Probir; Liu, Xu; Song, Shuaiwen

Modern architectures employ simultaneous multithreading (SMT) to increase thread-level parallelism. SMT threads share many functional units and the whole memory hierarchy of a physical core. Without a careful code design, SMT threads can easily contend with each other for these shared resources, causing severe performance degradation. Minimizing SMT thread contention for HPC applications running on dedicated platforms is very challenging, because they usually spawn threads within Single Program Multiple Data (SPMD) models. To address this important issue, we introduce a simple scheme for SMT-aware code optimization, which aims to reduce the memory contention across SMT threads.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Caubet, Jordi; Biegel, Bryan A. (Technical Monitor)

2002-01-01

In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. This system consists of two major components: The OMPItrace dynamic instrumentation mechanism, which allows the tracing of processes and threads and the Paraver graphical user interface for inspection and analyses of the generated traces. We describe how to use the system to conduct a detailed comparative study of a benchmark code implemented in five different programming paradigms applicable for shared memory
Cache-based error recovery for shared memory multiprocessor systems

NASA Technical Reports Server (NTRS)

Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.

1989-01-01

A multiprocessor cache-based checkpointing and recovery scheme for of recovering from transient processor errors in a shared-memory multiprocessor with private caches is presented. New implementation techniques that use checkpoint identifiers and recovery stacks to reduce performance degradation in processor utilization during normal execution are examined. This cache-based checkpointing technique prevents rollback propagation, provides for rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions that take error latency into account are presented.
EOS developments

NASA Astrophysics Data System (ADS)

Sindrilaru, Elvin A.; Peters, Andreas J.; Adde, Geoffray M.; Duellmann, Dirk

2017-10-01

CERN has been developing and operating EOS as a disk storage solution successfully for over 6 years. The CERN deployment provides 135 PB and stores 1.2 billion replicas distributed over two computer centres. Deployment includes four LHC instances, a shared instance for smaller experiments and since last year an instance for individual user data as well. The user instance represents the backbone of the CERNBOX service for file sharing. New use cases like synchronisation and sharing, the planned migration to reduce AFS usage at CERN and the continuous growth has brought EOS to new challenges. Recent developments include the integration and evaluation of various technologies to do the transition from a single active in-memory namespace to a scale-out implementation distributed over many meta-data servers. The new architecture aims to separate the data from the application logic and user interface code, thus providing flexibility and scalability to the namespace component. Another important goal is to provide EOS as a CERN-wide mounted filesystem with strong authentication making it a single storage repository accessible via various services and front- ends (/eos initiative). This required new developments in the security infrastructure of the EOS FUSE implementation. Furthermore, there were a series of improvements targeting the end-user experience like tighter consistency and latency optimisations. In collaboration with Seagate as Openlab partner, EOS has a complete integration of OpenKinetic object drive cluster as a high-throughput, high-availability, low-cost storage solution. This contribution will discuss these three main development projects and present new performance metrics.
Avoiding and tolerating latency in large-scale next-generation shared-memory multiprocessors

NASA Technical Reports Server (NTRS)

Probst, David K.

1993-01-01

A scalable solution to the memory-latency problem is necessary to prevent the large latencies of synchronization and memory operations inherent in large-scale shared-memory multiprocessors from reducing high performance. We distinguish latency avoidance and latency tolerance. Latency is avoided when data is brought to nearby locales for future reference. Latency is tolerated when references are overlapped with other computation. Latency-avoiding locales include: processor registers, data caches used temporally, and nearby memory modules. Tolerating communication latency requires parallelism, allowing the overlap of communication and computation. Latency-tolerating techniques include: vector pipelining, data caches used spatially, prefetching in various forms, and multithreading in various forms. Relaxing the consistency model permits increased use of avoidance and tolerance techniques. Each model is a mapping from the program text to sets of partial orders on program operations; it is a convention about which temporal precedences among program operations are necessary. Information about temporal locality and parallelism constrains the use of avoidance and tolerance techniques. Suitable architectural primitives and compiler technology are required to exploit the increased freedom to reorder and overlap operations in relaxed models.
Supervisory control and diagnostics system for the mirror fusion test facility: overview and status 1980

DOE Office of Scientific and Technical Information (OSTI.GOV)

McGoldrick, P.R.

1981-01-01

The Mirror Fusion Test Facility (MFTF) is a complex facility requiring a highly-computerized Supervisory Control and Diagnostics System (SCDS) to monitor and provide control over ten subsystems; three of which require true process control. SCDS will provide physicists with a method of studying machine and plasma behavior by acquiring and processing up to four megabytes of plasma diagnostic information every five minutes. A high degree of availability and throughput is provided by a distributed computer system (nine 32-bit minicomputers on shared memory). Data, distributed across SCDS, is managed by a high-bandwidth Distributed Database Management System. The MFTF operators' control roommore » consoles use color television monitors with touch sensitive screens; this is a totally new approach. The method of handling deviations to normal machine operation and how the operator should be notified and assisted in the resolution of problems has been studied and a system designed.« less
Power/Performance Trade-offs of Small Batched LU Based Solvers on GPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Villa, Oreste; Fatica, Massimiliano; Gawande, Nitin A.

In this paper we propose and analyze a set of batched linear solvers for small matrices on Graphic Processing Units (GPUs), evaluating the various alternatives depending on the size of the systems to solve. We discuss three different solutions that operate with different level of parallelization and GPU features. The first, exploiting the CUBLAS library, manages matrices of size up to 32x32 and employs Warp level (one matrix, one Warp) parallelism and shared memory. The second works at Thread-block level parallelism (one matrix, one Thread-block), still exploiting shared memory but managing matrices up to 76x76. The third is Thread levelmore » parallel (one matrix, one thread) and can reach sizes up to 128x128, but it does not exploit shared memory and only relies on the high memory bandwidth of the GPU. The first and second solution only support partial pivoting, the third one easily supports partial and full pivoting, making it attractive to problems that require greater numerical stability. We analyze the trade-offs in terms of performance and power consumption as function of the size of the linear systems that are simultaneously solved. We execute the three implementations on a Tesla M2090 (Fermi) and on a Tesla K20 (Kepler).« less
A neuropsychological comparison of obsessive-compulsive disorder and trichotillomania.

PubMed

Chamberlain, Samuel R; Fineberg, Naomi A; Blackwell, Andrew D; Clark, Luke; Robbins, Trevor W; Sahakian, Barbara J

2007-03-02

Obsessive-compulsive disorder (OCD) and trichotillomania (compulsive hair-pulling) share overlapping co-morbidity, familial transmission, and phenomenology. However, the extent to which these disorders share a common cognitive phenotype has yet to be elucidated using patients without confounding co-morbidities. To compare neurocognitive functioning in co-morbidity-free patients with OCD and trichotillomania, focusing on domains of learning and memory, executive function, affective processing, reflection-impulsivity and decision-making. Twenty patients with OCD, 20 patients with trichotillomania, and 20 matched controls undertook neuropsychological assessment after meeting stringent inclusion criteria. Groups were matched for age, education, verbal IQ, and gender. The OCD and trichotillomania groups were impaired on spatial working memory. Only OCD patients showed additional impairments on executive planning and visual pattern recognition memory, and missed more responses to sad target words than other groups on an affective go/no-go task. Furthermore, OCD patients failed to modulate their behaviour between conditions on the reflection-impulsivity test, suggestive of cognitive inflexibility. Both clinical groups showed intact decision-making and probabilistic reversal learning. OCD and trichotillomania shared overlapping spatial working memory problems, but neuropsychological dysfunction in OCD spanned additional domains that were intact in trichotillomania. Findings are discussed in relation to likely fronto-striatal neural substrates and future research directions.
Three Types of Memory in Emergency Medical Services Communication

ERIC Educational Resources Information Center

Angeli, Elizabeth L.

2015-01-01

This article examines memory and distributed cognition involved in the writing practices of emergency medical services (EMS) professionals. Results from a 16-month study indicate that EMS professionals rely on distributed cognition and three kinds of memory: individual, collaborative, and professional. Distributed cognition and the three types of…
A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations

PubMed Central

Ho, ThienLuan; Oh, Seung-Rohk

2017-01-01

Approximate string matching with k-differences has a number of practical applications, ranging from pattern recognition to computational biology. This paper proposes an efficient memory-access algorithm for parallel approximate string matching with k-differences on Graphics Processing Units (GPUs). In the proposed algorithm, all threads in the same GPUs warp share data using warp-shuffle operation instead of accessing the shared memory. Moreover, we implement the proposed algorithm by exploiting the memory structure of GPUs to optimize its performance. Experiment results for real DNA packages revealed that the performance of the proposed algorithm and its implementation archived up to 122.64 and 1.53 times compared to that of sequential algorithm on CPU and previous parallel approximate string matching algorithm on GPUs, respectively. PMID:29016700
Relative time sharing: new findings and an extension of the resource allocation model of temporal processing.

PubMed

Buhusi, Catalin V; Meck, Warren H

2009-07-12

Individuals time as if using a stopwatch that can be stopped or reset on command. Here, we review behavioural and neurobiological data supporting the time-sharing hypothesis that perceived time depends on the attentional and memory resources allocated to the timing process. Neuroimaging studies in humans suggest that timekeeping tasks engage brain circuits typically involved in attention and working memory. Behavioural, pharmacological, lesion and electrophysiological studies in lower animals support this time-sharing hypothesis. When subjects attend to a second task, or when intruder events are presented, estimated durations are shorter, presumably due to resources being taken away from timing. Here, we extend the time-sharing hypothesis by proposing that resource reallocation is proportional to the perceived contrast, both in temporal and non-temporal features, between intruders and the timed events. New findings support this extension by showing that the effect of an intruder event is dependent on the relative duration of the intruder to the intertrial interval. The conclusion is that the brain circuits engaged by timekeeping comprise not only those primarily involved in time accumulation, but also those involved in the maintenance of attentional and memory resources for timing, and in the monitoring and reallocation of those resources among tasks.
States of mind: emotions, body feelings, and thoughts share distributed neural networks.

PubMed

Oosterwijk, Suzanne; Lindquist, Kristen A; Anderson, Eric; Dautoff, Rebecca; Moriguchi, Yoshiya; Barrett, Lisa Feldman

2012-09-01

Scientists have traditionally assumed that different kinds of mental states (e.g., fear, disgust, love, memory, planning, concentration, etc.) correspond to different psychological faculties that have domain-specific correlates in the brain. Yet, growing evidence points to the constructionist hypothesis that mental states emerge from the combination of domain-general psychological processes that map to large-scale distributed brain networks. In this paper, we report a novel study testing a constructionist model of the mind in which participants generated three kinds of mental states (emotions, body feelings, or thoughts) while we measured activity within large-scale distributed brain networks using fMRI. We examined the similarity and differences in the pattern of network activity across these three classes of mental states. Consistent with a constructionist hypothesis, a combination of large-scale distributed networks contributed to emotions, thoughts, and body feelings, although these mental states differed in the relative contribution of those networks. Implications for a constructionist functional architecture of diverse mental states are discussed. Copyright © 2012 Elsevier Inc. All rights reserved.
Testing New Programming Paradigms with NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

2000-01-01

Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
OFF, Open source Finite volume Fluid dynamics code: A free, high-order solver based on parallel, modular, object-oriented Fortran API

NASA Astrophysics Data System (ADS)

Zaghi, S.

2014-07-01

OFF, an open source (free software) code for performing fluid dynamics simulations, is presented. The aim of OFF is to solve, numerically, the unsteady (and steady) compressible Navier-Stokes equations of fluid dynamics by means of finite volume techniques: the research background is mainly focused on high-order (WENO) schemes for multi-fluids, multi-phase flows over complex geometries. To this purpose a highly modular, object-oriented application program interface (API) has been developed. In particular, the concepts of data encapsulation and inheritance available within Fortran language (from standard 2003) have been stressed in order to represent each fluid dynamics "entity" (e.g. the conservative variables of a finite volume, its geometry, etc…) by a single object so that a large variety of computational libraries can be easily (and efficiently) developed upon these objects. The main features of OFF can be summarized as follows: Programming LanguageOFF is written in standard (compliant) Fortran 2003; its design is highly modular in order to enhance simplicity of use and maintenance without compromising the efficiency; Parallel Frameworks Supported the development of OFF has been also targeted to maximize the computational efficiency: the code is designed to run on shared-memory multi-cores workstations and distributed-memory clusters of shared-memory nodes (supercomputers); the code's parallelization is based on Open Multiprocessing (OpenMP) and Message Passing Interface (MPI) paradigms; Usability, Maintenance and Enhancement in order to improve the usability, maintenance and enhancement of the code also the documentation has been carefully taken into account; the documentation is built upon comprehensive comments placed directly into the source files (no external documentation files needed): these comments are parsed by means of doxygen free software producing high quality html and latex documentation pages; the distributed versioning system referred as git has been adopted in order to facilitate the collaborative maintenance and improvement of the code; CopyrightsOFF is a free software that anyone can use, copy, distribute, study, change and improve under the GNU Public License version 3. The present paper is a manifesto of OFF code and presents the currently implemented features and ongoing developments. This work is focused on the computational techniques adopted and a detailed description of the main API characteristics is reported. OFF capabilities are demonstrated by means of one and two dimensional examples and a three dimensional real application.

Optimized Infrastructure for the Earth System Prediction Capability

DTIC Science & Technology

2013-09-30

for referencing memory between its native coupling datatype (MCT Attribute Vectors) and ESMF Arrays. This will reduce the copies required and will...introduced ability within CESM to share memory between ESMF and MCT datatypes makes using both tools together much easier. Using both is appealing
Infectious Cognition: Risk Perception Affects Socially Shared Retrieval-Induced Forgetting of Medical Information.

PubMed

Coman, Alin; Berry, Jessica N

2015-12-01

When speakers selectively retrieve previously learned information, listeners often concurrently, and covertly, retrieve their memories of that information. This concurrent retrieval typically enhances memory for mentioned information (the rehearsal effect) and impairs memory for unmentioned but related information (socially shared retrieval-induced forgetting, SSRIF), relative to memory for unmentioned and unrelated information. Building on research showing that anxiety leads to increased attention to threat-relevant information, we explored whether concurrent retrieval is facilitated in high-anxiety real-world contexts. Participants first learned category-exemplar facts about meningococcal disease. Following a manipulation of perceived risk of infection (low vs. high risk), they listened to a mock radio show in which some of the facts were selectively practiced. Final recall tests showed that the rehearsal effect was equivalent between the two risk conditions, but SSRIF was significantly larger in the high-risk than in the low-risk condition. Thus, the tendency to exaggerate consequences of news events was found to have deleterious consequences. © The Author(s) 2015.
Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blocksome, Michael A.; Mamidala, Amith R.

2013-09-03

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segmentmore » of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.« less
Automation of Data Traffic Control on DSM Architecture

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

2001-01-01

The design of distributed shared memory (DSM) computers liberates users from the duty to distribute data across processors and allows for the incremental development of parallel programs using, for example, OpenMP or Java threads. DSM architecture greatly simplifies the development of parallel programs having good performance on a few processors. However, to achieve a good program scalability on DSM computers requires that the user understand data flow in the application and use various techniques to avoid data traffic congestions. In this paper we discuss a number of such techniques, including data blocking, data placement, data transposition and page size control and evaluate their efficiency on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks. We also present a tool which automates the detection of constructs causing data congestions in Fortran array oriented codes and advises the user on code transformations for improving data traffic in the application.
Networking and AI systems: Requirements and benefits

NASA Technical Reports Server (NTRS)

1988-01-01

The price performance benefits of network systems is well documented. The ability to share expensive resources sold timesharing for mainframes, department clusters of minicomputers, and now local area networks of workstations and servers. In the process, other fundamental system requirements emerged. These have now been generalized with open system requirements for hardware, software, applications and tools. The ability to interconnect a variety of vendor products has led to a specification of interfaces that allow new techniques to extend existing systems for new and exciting applications. As an example of the message passing system, local area networks provide a testbed for many of the issues addressed by future concurrent architectures: synchronization, load balancing, fault tolerance and scalability. Gold Hill has been working with a number of vendors on distributed architectures that range from a network of workstations to a hypercube of microprocessors with distributed memory. Results from early applications are promising both for performance and scalability.
The Work's Not Over- Roll Up Your Sleeves and Make a Difference!

NASA Astrophysics Data System (ADS)

Sarquis, Mickey

1997-01-01

As my 17-year tenure as the first editor of the Secondary School Chemistry Section draws to a close, John Moore has invited me to share some reflections on my experiences. It's hard for me to believe that this many years have passed; in some ways, it seems like only yesterday that I took on this position. Looking back over my term as Section editor recalls wonderful memories, but it also stimulates me to seek out and take on new challenges as I move into a new phase of involvement in chemical education. In response to John's kind invitation, I'd like to share some of these memories and ideas with you who share my vision of quality chemical education, particularly at the secondary level.
Arousal-biased competition in perception and memory

PubMed Central

Mather, Mara; Sutherland, Matthew R.

2010-01-01

Our everyday surroundings besiege us with information. The battle is for a share of our limited attention and memory, with the brain selecting the winners and discarding the losers. Previous research shows that both bottom-up and top-down factors bias competition in favor of high priority stimuli. We propose that arousal during an event increases this bias both in perception and in long-term memory of the event. Arousal-biased competition theory provides specific predictions about when arousal will enhance and when it will impair memory for events, accounting for some puzzling contradictions in the emotional memory literature. PMID:21660127
Continuous Time Random Walks with memory and financial distributions

NASA Astrophysics Data System (ADS)

Montero, Miquel; Masoliver, Jaume

2017-11-01

We study financial distributions from the perspective of Continuous Time Random Walks with memory. We review some of our previous developments and apply them to financial problems. We also present some new models with memory that can be useful in characterizing tendency effects which are inherent in most markets. We also briefly study the effect on return distributions of fractional behaviors in the distribution of pausing times between successive transactions.
Glucocorticoids in the prefrontal cortex enhance memory consolidation and impair working memory by a common neural mechanism

PubMed Central

Barsegyan, Areg; Mackenzie, Scott M.; Kurose, Brian D.; McGaugh, James L.; Roozendaal, Benno

2010-01-01

It is well established that acute administration of adrenocortical hormones enhances the consolidation of memories of emotional experiences and, concurrently, impairs working memory. These different glucocorticoid effects on these two memory functions have generally been considered to be independently regulated processes. Here we report that a glucocorticoid receptor agonist administered into the medial prefrontal cortex (mPFC) of male Sprague-Dawley rats both enhances memory consolidation and impairs working memory. Both memory effects are mediated by activation of a membrane-bound steroid receptor and depend on noradrenergic activity within the mPFC to increase levels of cAMP-dependent protein kinase. These findings provide direct evidence that glucocorticoid effects on both memory consolidation and working memory share a common neural influence within the mPFC. PMID:20810923
Benefits and Costs of Context Reinstatement in Episodic Memory: An ERP Study.

PubMed

Bramão, Inês; Johansson, Mikael

2017-01-01

This study investigated context-dependent episodic memory retrieval. An influential idea in the memory literature is that performance benefits when the retrieval context overlaps with the original encoding context. However, such memory facilitation may not be driven by the encoding-retrieval overlap per se but by the presence of diagnostic features in the reinstated context that discriminate the target episode from competing episodes. To test this prediction, the encoding-retrieval overlap and the diagnostic value of the context were manipulated in a novel associative recognition memory task. Participants were asked to memorize word pairs presented together with diagnostic (unique) and nondiagnostic (shared) background scenes. At test, participants recognized the word pairs in the presence and absence of the previously encoded contexts. Behavioral data show facilitated memory performance in the presence of the original context but, importantly, only when the context was diagnostic of the target episode. The electrophysiological data reveal an early anterior ERP encoding-retrieval overlap effect that tracks the cost associated with having nondiagnostic contexts present at retrieval, that is, shared by multiple previous episodes, and a later posterior encoding-retrieval overlap effect that reflects facilitated access to the target episode during retrieval in diagnostic contexts. Taken together, our results underscore the importance of the diagnostic value of the context and suggest that context-dependent episodic memory effects are multiple determined.
Development of the cloud sharing system for residential earthquake responses using smartphones

NASA Astrophysics Data System (ADS)

Shohei, N.; Fujiwara, H.; Azuma, H.; Hao, K. X.

2015-12-01

Earthquake responses at residential depends on its building structure, site amplification, epicenter distance, and etc. Until recently, it was impossible to obtain the individual residential response by conventional seismometer in terms of costs. However, current technology makes it possible with the Micro Electro Mechanical Systems (MEMS) sensors inside mobile terminals like smartphones. We developed the cloud sharing system for residential earthquake response in local community utilizing mobile terminals, such as an iPhone, iPad, iPod touch as a collaboration between NIED and Hakusan Corp. The triggered earthquake acceleration waveforms are recorded at sampling frequencies of 100Hz and stored on their memories once an threshold value was exceeded or ordered information received from the Earthquake Early Warning system. The recorded data is automatically transmitted and archived on the cloud server once the wireless communication is available. Users can easily get the uploaded data by use of a web browser through Internet. The cloud sharing system is designed for residential and only shared in local community internal. Residents can freely add sensors and register information about installation points in each region. And if an earthquake occurs, they can easily view the local distribution of seismic intensities and even analyze waves.To verify this cloud-based seismic wave sharing system, we have performed on site experiments under the cooperation of several local communities, The system and experimental results will be introduced and demonstrated in the presentation.
Total recall in distributive associative memories

NASA Technical Reports Server (NTRS)

Danforth, Douglas G.

1991-01-01

Iterative error correction of asymptotically large associative memories is equivalent to a one-step learning rule. This rule is the inverse of the activation function of the memory. Spectral representations of nonlinear activation functions are used to obtain the inverse in closed form for Sparse Distributed Memory, Selected-Coordinate Design, and Radial Basis Functions.
Hypercluster - Parallel processing for computational mechanics

NASA Technical Reports Server (NTRS)

Blech, Richard A.

1988-01-01

An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.
How can individual differences in autobiographical memory distributions of older adults be explained?

PubMed

Wolf, Tabea; Zimprich, Daniel

2016-10-01

The reminiscence bump phenomenon has frequently been reported for the recall of autobiographical memories. The present study complements previous research by examining individual differences in the distribution of word-cued autobiographical memories. More importantly, we introduce predictor variables that might account for individual differences in the mean (location) and the standard deviation (scale) of individual memory distributions. All variables were derived from different theoretical accounts for the reminiscence bump phenomenon. We used a mixed location-scale logitnormal model, to analyse the 4602 autobiographical memories reported by 118 older participants. Results show reliable individual differences in the location and the scale. After controlling for age and gender, individual proportions of first-time experiences and individual proportions of positive memories, as well as the ratings on Openness to new Experiences and Self-Concept Clarity accounted for 29% of individual differences in location and 42% of individual differences in scale of autobiographical memory distributions. Results dovetail with a life-story account for the reminiscence bump which integrates central components of previous accounts.
Arranging computer architectures to create higher-performance controllers

NASA Technical Reports Server (NTRS)

Jacklin, Stephen A.

1988-01-01

Techniques for integrating microprocessors, array processors, and other intelligent devices in control systems are reviewed, with an emphasis on the (re)arrangement of components to form distributed or parallel processing systems. Consideration is given to the selection of the host microprocessor, increasing the power and/or memory capacity of the host, multitasking software for the host, array processors to reduce computation time, the allocation of real-time and non-real-time events to different computer subsystems, intelligent devices to share the computational burden for real-time events, and intelligent interfaces to increase communication speeds. The case of a helicopter vibration-suppression and stabilization controller is analyzed as an example, and significant improvements in computation and throughput rates are demonstrated.
Automatic Data Traffic Control on DSM Architecture

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry; Kwak, Dochan (Technical Monitor)

2000-01-01

We study data traffic on distributed shared memory machines and conclude that data placement and grouping improve performance of scientific codes. We present several methods which user can employ to improve data traffic in his code. We report on implementation of a tool which detects the code fragments causing data congestions and advises user on improvements of data routing in these fragments. The capabilities of the tool include deduction of data alignment and affinity from the source code; detection of the code constructs having abnormally high cache or TLB misses; generation of data placement constructs. We demonstrate the capabilities of the tool on experiments with NAS parallel benchmarks and with a simple computational fluid dynamics application ARC3D.
Contributions of Medial Temporal Lobe and Striatal Memory Systems to Learning and Retrieving Overlapping Spatial Memories

PubMed Central

Brown, Thackery I.; Stern, Chantal E.

2014-01-01

Many life experiences share information with other memories. In order to make decisions based on overlapping memories, we need to distinguish between experiences to determine the appropriate behavior for the current situation. Previous work suggests that the medial temporal lobe (MTL) and medial caudate interact to support the retrieval of overlapping navigational memories in different contexts. The present study used functional magnetic resonance imaging (fMRI) in humans to test the prediction that the MTL and medial caudate play complementary roles in learning novel mazes that cross paths with, and must be distinguished from, previously learned routes. During fMRI scanning, participants navigated virtual routes that were well learned from prior training while also learning new mazes. Critically, some routes learned during scanning shared hallways with those learned during pre-scan training. Overlap between mazes required participants to use contextual cues to select between alternative behaviors. Results demonstrated parahippocampal cortex activity specific for novel spatial cues that distinguish between overlapping routes. The hippocampus and medial caudate were active for learning overlapping spatial memories, and increased their activity for previously learned routes when they became context dependent. Our findings provide novel evidence that the MTL and medial caudate play complementary roles in the learning, updating, and execution of context-dependent navigational behaviors. PMID:23448868
Symbiosis of executive and selective attention in working memory

PubMed Central

Vandierendonck, André

2014-01-01

The notion of working memory (WM) was introduced to account for the usage of short-term memory resources by other cognitive tasks such as reasoning, mental arithmetic, language comprehension, and many others. This collaboration between memory and other cognitive tasks can only be achieved by a dedicated WM system that controls task coordination. To that end, WM models include executive control. Nevertheless, other attention control systems may be involved in coordination of memory and cognitive tasks calling on memory resources. The present paper briefly reviews the evidence concerning the role of selective attention in WM activities. A model is proposed in which selective attention control is directly linked to the executive control part of the WM system. The model assumes that apart from storage of declarative information, the system also includes an executive WM module that represents the current task set. Control processes are automatically triggered when particular conditions in these modules are met. As each task set represents the parameter settings and the actions needed to achieve the task goal, it will depend on the specific settings and actions whether selective attention control will have to be shared among the active tasks. Only when such sharing is required, task performance will be affected by the capacity limits of the control system involved. PMID:25152723
Symbiosis of executive and selective attention in working memory.

PubMed

Vandierendonck, André

2014-01-01

The notion of working memory (WM) was introduced to account for the usage of short-term memory resources by other cognitive tasks such as reasoning, mental arithmetic, language comprehension, and many others. This collaboration between memory and other cognitive tasks can only be achieved by a dedicated WM system that controls task coordination. To that end, WM models include executive control. Nevertheless, other attention control systems may be involved in coordination of memory and cognitive tasks calling on memory resources. The present paper briefly reviews the evidence concerning the role of selective attention in WM activities. A model is proposed in which selective attention control is directly linked to the executive control part of the WM system. The model assumes that apart from storage of declarative information, the system also includes an executive WM module that represents the current task set. Control processes are automatically triggered when particular conditions in these modules are met. As each task set represents the parameter settings and the actions needed to achieve the task goal, it will depend on the specific settings and actions whether selective attention control will have to be shared among the active tasks. Only when such sharing is required, task performance will be affected by the capacity limits of the control system involved.
The prevalence and quality of silent, socially silent, and disclosed autobiographical memories across adulthood.

PubMed

Alea, Nicole

2010-02-01

Two separate studies examined the prevalence and quality of silent (infrequently recalled), socially silent (i.e., recalled but not shared), and disclosed autobiographical memories. In Study 1 young and older men and women remembered positive events. Positive memories were more likely to be disclosed than to be kept socially silent or completely silent. However, socially silent and disclosed memories did not differ in memory quality: the memories were equally vivid, significant, and emotional. Silent memories were less qualitatively rich. This pattern of results was generally replicated in Study 2 with a lifespan sample for both positive and negative memories, and with additional qualitative variables. The exception was that negative memories were kept silent more often. Age differences were minimal. Women disclosed their autobiographical memories more, but men told a greater variety of people. Results are discussed in terms of the functions that memory telling and silences might serve for individuals.

Neuronal nicotinic acetylcholine receptors: neuroplastic changes underlying alcohol and nicotine addictions

PubMed Central

Feduccia, Allison A.; Chatterjee, Susmita; Bartlett, Selena E.

2012-01-01

Addictive drugs can activate systems involved in normal reward-related learning, creating long-lasting memories of the drug's reinforcing effects and the environmental cues surrounding the experience. These memories significantly contribute to the maintenance of compulsive drug use as well as cue-induced relapse which can occur even after long periods of abstinence. Synaptic plasticity is thought to be a prominent molecular mechanism underlying drug-induced learning and memories. Ethanol and nicotine are both widely abused drugs that share a common molecular target in the brain, the neuronal nicotinic acetylcholine receptors (nAChRs). The nAChRs are ligand-gated ion channels that are vastly distributed throughout the brain and play a key role in synaptic neurotransmission. In this review, we will delineate the role of nAChRs in the development of ethanol and nicotine addiction. We will characterize both ethanol and nicotine's effects on nAChR-mediated synaptic transmission and plasticity in several key brain areas that are important for addiction. Finally, we will discuss some of the behavioral outcomes of drug-induced synaptic plasticity in animal models. An understanding of the molecular and cellular changes that occur following administration of ethanol and nicotine will lead to better therapeutic strategies. PMID:22876217
The attention-weighted sample-size model of visual short-term memory: Attention capture predicts resource allocation and memory load.

PubMed

Smith, Philip L; Lilburn, Simon D; Corbett, Elaine A; Sewell, David K; Kyllingsbæk, Søren

2016-09-01

We investigated the capacity of visual short-term memory (VSTM) in a phase discrimination task that required judgments about the configural relations between pairs of black and white features. Sewell et al. (2014) previously showed that VSTM capacity in an orientation discrimination task was well described by a sample-size model, which views VSTM as a resource comprised of a finite number of noisy stimulus samples. The model predicts the invariance of [Formula: see text] , the sum of squared sensitivities across items, for displays of different sizes. For phase discrimination, the set-size effect significantly exceeded that predicted by the sample-size model for both simultaneously and sequentially presented stimuli. Instead, the set-size effect and the serial position curves with sequential presentation were predicted by an attention-weighted version of the sample-size model, which assumes that one of the items in the display captures attention and receives a disproportionate share of resources. The choice probabilities and response time distributions from the task were well described by a diffusion decision model in which the drift rates embodied the assumptions of the attention-weighted sample-size model. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Do familiar teammates request and accept more backup? Transactive memory in air traffic control.

PubMed

Smith-Jentsch, Kimberly A; Kraiger, Kurt; Cannon-Bowers, Janis A; Salas, Eduardo

2009-04-01

The present study investigated factors that explain when and why different groups of teammates are more likely to request and accept backup from one another when needed in an environment characterized by extreme time pressure and severe consequences of error: commercial air traffic control (ATC). Transactive memory theory states that teammates develop consensus regarding the distribution of their relative expertise as well as confidence in that expertise over time and that this facilitates coordination processes. The present study investigated whether this theory could help to explain between-team differences in requesting and accepting backup when needed. The present study used cross-sectional data collected from 51 commercial ATC teams. Hypotheses were tested using multiple regression analysis. Teammates with greater experience working together requested and accepted backup from one another more than those with lesser experience working together. Teammate knowledge consensus and perceived team efficacy appear to have mediated this relationship. Transactive memory theory extends to high-stress environments in which members' expertise is highly overlapping. Teammates' shared mental models about one another increase the likelihood that they will request and accept backup. Teammate familiarity should be considered when choosing among potential replacement team members. Training strategies that accelerate the development of teammate knowledge consensus and team efficacy are warranted.
Personal semantics: at the crossroads of semantic and episodic memory.

PubMed

Renoult, Louis; Davidson, Patrick S R; Palombo, Daniela J; Moscovitch, Morris; Levine, Brian

2012-11-01

Declarative memory is usually described as consisting of two systems: semantic and episodic memory. Between these two poles, however, may lie a third entity: personal semantics (PS). PS concerns knowledge of one's past. Although typically assumed to be an aspect of semantic memory, it is essentially absent from existing models of knowledge. Furthermore, like episodic memory (EM), PS is idiosyncratically personal (i.e., not culturally-shared). We show that, depending on how it is operationalized, the neural correlates of PS can look more similar to semantic memory, more similar to EM, or dissimilar to both. We consider three different perspectives to better integrate PS into existing models of declarative memory and suggest experimental strategies for disentangling PS from semantic and episodic memory. Copyright © 2012 Elsevier Ltd. All rights reserved.
Accelerating adaptive inverse distance weighting interpolation algorithm on a graphics processing unit

PubMed Central

Xu, Liangliang; Xu, Nengxiong

2017-01-01

This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points’ spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available. PMID:28989754
Accelerating adaptive inverse distance weighting interpolation algorithm on a graphics processing unit.

PubMed

Mei, Gang; Xu, Liangliang; Xu, Nengxiong

2017-09-01

This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points' spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available.
Data traffic reduction schemes for Cholesky factorization on asynchronous multiprocessor systems

NASA Technical Reports Server (NTRS)

Naik, Vijay K.; Patrick, Merrell L.

1989-01-01

Communication requirements of Cholesky factorization of dense and sparse symmetric, positive definite matrices are analyzed. The communication requirement is characterized by the data traffic generated on multiprocessor systems with local and shared memory. Lower bound proofs are given to show that when the load is uniformly distributed the data traffic associated with factoring an n x n dense matrix using n to the alpha power (alpha less than or equal 2) processors is omega(n to the 2 + alpha/2 power). For n x n sparse matrices representing a square root of n x square root of n regular grid graph the data traffic is shown to be omega(n to the 1 + alpha/2 power), alpha less than or equal 1. Partitioning schemes that are variations of block assignment scheme are described and it is shown that the data traffic generated by these schemes are asymptotically optimal. The schemes allow efficient use of up to O(n to the 2nd power) processors in the dense case and up to O(n) processors in the sparse case before the total data traffic reaches the maximum value of O(n to the 3rd power) and O(n to the 3/2 power), respectively. It is shown that the block based partitioning schemes allow a better utilization of the data accessed from shared memory and thus reduce the data traffic than those based on column-wise wrap around assignment schemes.
Etiological Distinction of Working Memory Components in Relation to Mathematics

PubMed Central

Lukowski, Sarah L.; Soden, Brooke; Hart, Sara A.; Thompson, Lee A.; Kovas, Yulia; Petrill, Stephen A.

2014-01-01

Working memory has been consistently associated with mathematics achievement, although the etiology of these relations remains poorly understood. The present study examined the genetic and environmental underpinnings of math story problem solving, timed calculation, and untimed calculation alongside working memory components in 12-year-old monozygotic (n = 105) and same-sex dizygotic (n = 143) twin pairs. Results indicated significant phenotypic correlation between each working memory component and all mathematics outcomes (r = 0.18 – 0.33). Additive genetic influences shared between the visuo-spatial sketchpad and mathematics achievement was significant, accounting for roughly 89% of the observed correlation. In addition, genetic covariance was found between the phonological loop and math story problem solving. In contrast, despite there being a significant observed relationship between phonological loop and timed and untimed calculation, there was no significant genetic or environmental covariance between the phonological loop and timed or untimed calculation skills. Further analyses indicated that genetic overlap between the visuo-spatial sketchpad and math story problem solving and math fluency was distinct from general genetic factors, whereas g, phonological loop, and mathematics shared generalist genes. Thus, although each working memory component was related to mathematics, the etiology of their relationships may be distinct. PMID:25477699
Test Sequence Priming in Recognition Memory

ERIC Educational Resources Information Center

Johns, Elizabeth E.; Mewhort, D. J. K.

2009-01-01

The authors examined priming within the test sequence in 3 recognition memory experiments. A probe primed its successor whenever both probes shared a feature with the same studied item ("interjacent priming"), indicating that the study item like the probe is central to the decision. Interjacent priming occurred even when the 2 probes did…
Two Maintenance Mechanisms of Verbal Information in Working Memory

ERIC Educational Resources Information Center

Camos, V.; Lagner, P.; Barrouillet, P.

2009-01-01

The present study evaluated the interplay between two mechanisms of maintenance of verbal information in working memory, namely articulatory rehearsal as described in Baddeley's model, and attentional refreshing as postulated in Barrouillet and Camos's Time-Based Resource-Sharing (TBRS) model. In four experiments using complex span paradigm, we…
Down Memory Lane: Recollections of Lamaze International's First 50 Years

PubMed Central

Zwelling, Elaine

2010-01-01

The 42-year involvement of one member of Lamaze International is chronicled through a decade-by-decade review of personal memories. The history of Lamaze International is shared through the recollections of her roles as a childbirth educator, faculty member, and member of the board of directors. PMID:21629385
Paul Ricoeur, Memory, and the Historical Gaze: Implications for Education Histories

ERIC Educational Resources Information Center

Colby, Sherri Rae

2012-01-01

In this article, the author shares the potential applications of Paul Ricoeur's philosophies of history, memory, and narrative to the interpretation of educational histories, and those histories' life spans: moving cyclically from early conception, to evidentiary construction, to published dissemination; and ultimately to death or immortality. Her…
Metadata based management and sharing of distributed biomedical data

PubMed Central

Vergara-Niedermayr, Cristobal; Liu, Peiya

2014-01-01

Biomedical research data sharing is becoming increasingly important for researchers to reuse experiments, pool expertise and validate approaches. However, there are many hurdles for data sharing, including the unwillingness to share, lack of flexible data model for providing context information, difficulty to share syntactically and semantically consistent data across distributed institutions, and high cost to provide tools to share the data. SciPort is a web-based collaborative biomedical data sharing platform to support data sharing across distributed organisations. SciPort provides a generic metadata model to flexibly customise and organise the data. To enable convenient data sharing, SciPort provides a central server based data sharing architecture with a one-click data sharing from a local server. To enable consistency, SciPort provides collaborative distributed schema management across distributed sites. To enable semantic consistency, SciPort provides semantic tagging through controlled vocabularies. SciPort is lightweight and can be easily deployed for building data sharing communities. PMID:24834105
The Generalized Quantum Episodic Memory Model.

PubMed

Trueblood, Jennifer S; Hemmer, Pernille

2017-11-01

Recent evidence suggests that experienced events are often mapped to too many episodic states, including those that are logically or experimentally incompatible with one another. For example, episodic over-distribution patterns show that the probability of accepting an item under different mutually exclusive conditions violates the disjunction rule. A related example, called subadditivity, occurs when the probability of accepting an item under mutually exclusive and exhaustive instruction conditions sums to a number >1. Both the over-distribution effect and subadditivity have been widely observed in item and source-memory paradigms. These phenomena are difficult to explain using standard memory frameworks, such as signal-detection theory. A dual-trace model called the over-distribution (OD) model (Brainerd & Reyna, 2008) can explain the episodic over-distribution effect, but not subadditivity. Our goal is to develop a model that can explain both effects. In this paper, we propose the Generalized Quantum Episodic Memory (GQEM) model, which extends the Quantum Episodic Memory (QEM) model developed by Brainerd, Wang, and Reyna (2013). We test GQEM by comparing it to the OD model using data from a novel item-memory experiment and a previously published source-memory experiment (Kellen, Singmann, & Klauer, 2014) examining the over-distribution effect. Using the best-fit parameters from the over-distribution experiments, we conclude by showing that the GQEM model can also account for subadditivity. Overall these results add to a growing body of evidence suggesting that quantum probability theory is a valuable tool in modeling recognition memory. Copyright © 2016 Cognitive Science Society, Inc.
Differentiation and Response Bias in Episodic Memory: Evidence from Reaction Time Distributions

ERIC Educational Resources Information Center

Criss, Amy H.

2010-01-01

In differentiation models, the processes of encoding and retrieval produce an increase in the distribution of memory strength for targets and a decrease in the distribution of memory strength for foils as the amount of encoding increases. This produces an increase in the hit rate and decrease in the false-alarm rate for a strongly encoded compared…
The DASH Project: An Overview

DTIC Science & Technology

1988-02-29

by memory copyin g will degrade system performance on shared-memory multiprocessors. Virtual memor y (VM) remapping, as opposed to memory copying...Bershad, G.D. Giuseppe Facchetti, Kevin Fall, G . Scott Graham, Ellen Nelson , P. Venkat Rangan, Bruno Sartirana, Shin-Yuan Tzou, Raj Vaswani, and Robert...Remote Execution in NEST", IEEE Trans. on Software Eng. 13, 8 (August 1987), 905-912. 3. G . T. Almes, A. P. Black, E. Lazowska and J. Noe, "The Eden
Modular architectures for quantum networks

NASA Astrophysics Data System (ADS)

Pirker, A.; Wallnöfer, J.; Dür, W.

2018-05-01

We consider the problem of generating multipartite entangled states in a quantum network upon request. We follow a top-down approach, where the required entanglement is initially present in the network in form of network states shared between network devices, and then manipulated in such a way that the desired target state is generated. This minimizes generation times, and allows for network structures that are in principle independent of physical links. We present a modular and flexible architecture, where a multi-layer network consists of devices of varying complexity, including quantum network routers, switches and clients, that share certain resource states. We concentrate on the generation of graph states among clients, which are resources for numerous distributed quantum tasks. We assume minimal functionality for clients, i.e. they do not participate in the complex and distributed generation process of the target state. We present architectures based on shared multipartite entangled Greenberger–Horne–Zeilinger states of different size, and fully connected decorated graph states, respectively. We compare the features of these architectures to an approach that is based on bipartite entanglement, and identify advantages of the multipartite approach in terms of memory requirements and complexity of state manipulation. The architectures can handle parallel requests, and are designed in such a way that the network state can be dynamically extended if new clients or devices join the network. For generation or dynamical extension of the network states, we propose a quantum network configuration protocol, where entanglement purification is used to establish high fidelity states. The latter also allows one to show that the entanglement generated among clients is private, i.e. the network is secure.
A numerical differentiation library exploiting parallel architectures

NASA Astrophysics Data System (ADS)

Voglis, C.; Hadjidoukas, P. E.; Lagaris, I. E.; Papageorgiou, D. G.

2009-08-01

We present a software library for numerically estimating first and second order partial derivatives of a function by finite differencing. Various truncation schemes are offered resulting in corresponding formulas that are accurate to order O(h), O(h), and O(h), h being the differencing step. The derivatives are calculated via forward, backward and central differences. Care has been taken that only feasible points are used in the case where bound constraints are imposed on the variables. The Hessian may be approximated either from function or from gradient values. There are three versions of the software: a sequential version, an OpenMP version for shared memory architectures and an MPI version for distributed systems (clusters). The parallel versions exploit the multiprocessing capability offered by computer clusters, as well as modern multi-core systems and due to the independent character of the derivative computation, the speedup scales almost linearly with the number of available processors/cores. Program summaryProgram title: NDL (Numerical Differentiation Library) Catalogue identifier: AEDG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 73 030 No. of bytes in distributed program, including test data, etc.: 630 876 Distribution format: tar.gz Programming language: ANSI FORTRAN-77, ANSI C, MPI, OPENMP Computer: Distributed systems (clusters), shared memory systems Operating system: Linux, Solaris Has the code been vectorised or parallelized?: Yes RAM: The library uses O(N) internal storage, N being the dimension of the problem Classification: 4.9, 4.14, 6.5 Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, etc. The parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Restrictions: The library uses only double precision arithmetic. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 15 ms for the serial distribution, 0.6 s for the OpenMP and 4.2 s for the MPI parallel distribution on 2 processors.
Initial Performance Results on IBM POWER6

NASA Technical Reports Server (NTRS)

Saini, Subbash; Talcott, Dale; Jespersen, Dennis; Djomehri, Jahed; Jin, Haoqiang; Mehrotra, Piysuh

2008-01-01

The POWER5+ processor has a faster memory bus than that of the previous generation POWER5 processor (533 MHz vs. 400 MHz), but the measured per-core memory bandwidth of the latter is better than that of the former (5.7 GB/s vs. 4.3 GB/s). The reason for this is that in the POWER5+, the two cores on the chip share the L2 cache, L3 cache and memory bus. The memory controller is also on the chip and is shared by the two cores. This serializes the path to memory. For consistently good performance on a wide range of applications, the performance of the processor, the memory subsystem, and the interconnects (both latency and bandwidth) should be balanced. Recognizing this, IBM has designed the Power6 processor so as to avoid the bottlenecks due to the L2 cache, memory controller and buffer chips of the POWER5+. Unlike the POWER5+, each core in the POWER6 has its own L2 cache (4 MB - double that of the Power5+), memory controller and buffer chips. Each core in the POWER6 runs at 4.7 GHz instead of 1.9 GHz in POWER5+. In this paper, we evaluate the performance of a dual-core Power6 based IBM p6-570 system, and we compare its performance with that of a dual-core Power5+ based IBM p575+ system. In this evaluation, we have used the High- Performance Computing Challenge (HPCC) benchmarks, NAS Parallel Benchmarks (NPB), and four real-world applications--three from computational fluid dynamics and one from climate modeling.
Implementation and performance of parallel Prolog interpreter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei, S.; Kale, L.V.; Balkrishna, R.

1988-01-01

In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.

Statistical prediction with Kanerva's sparse distributed memory

NASA Technical Reports Server (NTRS)

Rogers, David

1989-01-01

A new viewpoint of the processing performed by Kanerva's sparse distributed memory (SDM) is presented. In conditions of near- or over-capacity, where the associative-memory behavior of the model breaks down, the processing performed by the model can be interpreted as that of a statistical predictor. Mathematical results are presented which serve as the framework for a new statistical viewpoint of sparse distributed memory and for which the standard formulation of SDM is a special case. This viewpoint suggests possible enhancements to the SDM model, including a procedure for improving the predictiveness of the system based on Holland's work with genetic algorithms, and a method for improving the capacity of SDM even when used as an associative memory.
Feature-based memory-driven attentional capture: visual working memory content affects visual attention.

PubMed

Olivers, Christian N L; Meijer, Frank; Theeuwes, Jan

2006-10-01

In 7 experiments, the authors explored whether visual attention (the ability to select relevant visual information) and visual working memory (the ability to retain relevant visual information) share the same content representations. The presence of singleton distractors interfered more strongly with a visual search task when it was accompanied by an additional memory task. Singleton distractors interfered even more when they were identical or related to the object held in memory, but only when it was difficult to verbalize the memory content. Furthermore, this content-specific interaction occurred for features that were relevant to the memory task but not for irrelevant features of the same object or for once-remembered objects that could be forgotten. Finally, memory-related distractors attracted more eye movements but did not result in longer fixations. The results demonstrate memory-driven attentional capture on the basis of content-specific representations. Copyright 2006 APA.
Self-defining memories, scripts, and the life story: narrative identity in personality and psychotherapy.

PubMed

Singer, Jefferson A; Blagov, Pavel; Berry, Meredith; Oost, Kathryn M

2013-12-01

An integrative model of narrative identity builds on a dual memory system that draws on episodic memory and a long-term self to generate autobiographical memories. Autobiographical memories related to critical goals in a lifetime period lead to life-story memories, which in turn become self-defining memories when linked to an individual's enduring concerns. Self-defining memories that share repetitive emotion-outcome sequences yield narrative scripts, abstracted templates that filter cognitive-affective processing. The life story is the individual's overarching narrative that provides unity and purpose over the life course. Healthy narrative identity combines memory specificity with adaptive meaning-making to achieve insight and well-being, as demonstrated through a literature review of personality and clinical research, as well as new findings from our own research program. A clinical case study drawing on this narrative identity model is also presented with implications for treatment and research. © 2012 Wiley Periodicals, Inc.
Differential memory in the earth's magnetotail

NASA Technical Reports Server (NTRS)

Burkhart, G. R.; Chen, J.

1991-01-01

The process of 'differential memory' in the earth's magnetotail is studied in the framework of the modified Harris magnetotail geometry. It is verified that differential memory can generate non-Maxwellian features in the modified Harris field model. The time scales and the potentially observable distribution functions associated with the process of differential memory are investigated, and it is shown that non-Maxwelllian distributions can evolve as a test particle response to distribution function boundary conditions in a Harris field magnetotail model. The non-Maxwellian features which arise from distribution function mapping have definite time scales associated with them, which are generally shorter than the earthward convection time scale but longer than the typical Alfven crossing time.
Low Latency Messages on Distributed Memory Multiprocessors

DOE PAGES

Rosing, Matt; Saltz, Joel

1995-01-01

This article describes many of the issues in developing an efficient interface for communication on distributed memory machines. Although the hardware component of message latency is less than 1 ws on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 μs. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. This article describes several tests performed and many of the issues involvedmore » in supporting low latency messages on distributed memory machines.« less
Limits of the memory coefficient in measuring correlated bursts

NASA Astrophysics Data System (ADS)

Jo, Hang-Hyun; Hiraoka, Takayuki

2018-03-01

Temporal inhomogeneities in event sequences of natural and social phenomena have been characterized in terms of interevent times and correlations between interevent times. The inhomogeneities of interevent times have been extensively studied, while the correlations between interevent times, often called correlated bursts, are far from being fully understood. For measuring the correlated bursts, two relevant approaches were suggested, i.e., memory coefficient and burst size distribution. Here a burst size denotes the number of events in a bursty train detected for a given time window. Empirical analyses have revealed that the larger memory coefficient tends to be associated with the heavier tail of the burst size distribution. In particular, empirical findings in human activities appear inconsistent, such that the memory coefficient is close to 0, while burst size distributions follow a power law. In order to comprehend these observations, by assuming the conditional independence between consecutive interevent times, we derive the analytical form of the memory coefficient as a function of parameters describing interevent time and burst size distributions. Our analytical result can explain the general tendency of the larger memory coefficient being associated with the heavier tail of burst size distribution. We also find that the apparently inconsistent observations in human activities are compatible with each other, indicating that the memory coefficient has limits to measure the correlated bursts.
Working Memory in Children: A Time-Constrained Functioning Similar to Adults

ERIC Educational Resources Information Center

Portrat, Sophie; Camos, Valerie; Barrouillet, Pierre

2009-01-01

Within the time-based resource-sharing (TBRS) model, we tested a new conception of the relationships between processing and storage in which the core mechanisms of working memory (WM) are time constrained. However, our previous studies were restricted to adults. The current study aimed at demonstrating that these mechanisms are present and…
Close Associations and Memory in Brainwriting Groups

ERIC Educational Resources Information Center

Coskun, Hamit

2011-01-01

The present experiment examined whether or not the type of associations (close (e.g. apple-pear) and distant (e.g. apple-fish) word associations) and memory instruction (paying attention to the ideas of others) had effects on the idea generation performances in the brainwriting paradigm in which all participants shared their ideas by using paper…
Accumulating Evidence about What Prospective Memory Costs Actually Reveal

ERIC Educational Resources Information Center

Strickland, Luke; Heathcote, Andrew; Remington, Roger W.; Loft, Shayne

2017-01-01

Event-based prospective memory (PM) tasks require participants to substitute an atypical PM response for an ongoing task response when presented with PM targets. Responses to ongoing tasks are often slower with the addition of PM demands ("PM costs"). Prominent PM theories attribute costs to capacity-sharing between the ongoing and PM…
How communication goals determine when audience tuning biases memory.

PubMed

Echterhoff, Gerald; Higgins, E Tory; Kopietz, René; Groll, Stephan

2008-02-01

After tuning their message to suit their audience's attitude, communicators' own memories for the original information (e.g., a target person's behaviors) often reflect the biased view expressed in their message--producing an audience-congruent memory bias. Exploring the motivational circumstances of message production, the authors investigated whether this bias depends on the goals driving audience tuning. In 4 experiments, the memory bias was found to a greater extent when audience tuning served the creation of a shared reality than when it served alternative, nonshared reality goals (being polite toward a stigmatized-group audience; obtaining incentives; being entertaining; complying with a blatant demand). In addition, the authors found that these effects were mediated by the epistemic trust in the audience-congruent view but not by the rehearsal or accurate retrieval of the original input information, the ability to discriminate between the original and the message information, or a contrast away from extremely tuned messages. The central role of epistemic trust, a measure of the communicators' experience of shared reality, was supported in meta-analyses across the experiments. PsycINFO Database Record (c) 2008 APA, all rights reserved.
Multi-variants synthesis of Petri nets for FPGA devices

NASA Astrophysics Data System (ADS)

Bukowiec, Arkadiusz; Doligalski, Michał

2015-09-01

There is presented new method of synthesis of application specific logic controllers for FPGA devices. The specification of control algorithm is made with use of control interpreted Petri net (PT type). It allows specifying parallel processes in easy way. The Petri net is decomposed into state-machine type subnets. In this case, each subnet represents one parallel process. For this purpose there are applied algorithms of coloring of Petri nets. There are presented two approaches of such decomposition: with doublers of macroplaces or with one global wait place. Next, subnets are implemented into two-level logic circuit of the controller. The levels of logic circuit are obtained as a result of its architectural decomposition. The first level combinational circuit is responsible for generation of next places and second level decoder is responsible for generation output symbols. There are worked out two variants of such circuits: with one shared operational memory or with many flexible distributed memories as a decoder. Variants of Petri net decomposition and structures of logic circuits can be combined together without any restrictions. It leads to existence of four variants of multi-variants synthesis.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ghysels, Pieter; Li, Xiaoye S.; Rouet, Francois -Henry

Here, we present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factoriz ation leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite.more » The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK - STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices.« less
An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling

DOE PAGES

Ghysels, Pieter; Li, Xiaoye S.; Rouet, Francois -Henry; ...

2016-10-27

Here, we present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factoriz ation leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite.more » The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK - STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices.« less
Notes on implementation of sparsely distributed memory

NASA Technical Reports Server (NTRS)

Keeler, J. D.; Denning, P. J.

1986-01-01

The Sparsely Distributed Memory (SDM) developed by Kanerva is an unconventional memory design with very interesting and desirable properties. The memory works in a manner that is closely related to modern theories of human memory. The SDM model is discussed in terms of its implementation in hardware. Two appendices discuss the unconventional approaches of the SDM: Appendix A treats a resistive circuit for fast, parallel address decoding; and Appendix B treats a systolic array for high throughput read and write operations.
Shared filtering processes link attentional and visual short-term memory capacity limits.

PubMed

Bettencourt, Katherine C; Michalka, Samantha W; Somers, David C

2011-09-30

Both visual attention and visual short-term memory (VSTM) have been shown to have capacity limits of 4 ± 1 objects, driving the hypothesis that they share a visual processing buffer. However, these capacity limitations also show strong individual differences, making the degree to which these capacities are related unclear. Moreover, other research has suggested a distinction between attention and VSTM buffers. To explore the degree to which capacity limitations reflect the use of a shared visual processing buffer, we compared individual subject's capacities on attentional and VSTM tasks completed in the same testing session. We used a multiple object tracking (MOT) and a VSTM change detection task, with varying levels of distractors, to measure capacity. Significant correlations in capacity were not observed between the MOT and VSTM tasks when distractor filtering demands differed between the tasks. Instead, significant correlations were seen when the tasks shared spatial filtering demands. Moreover, these filtering demands impacted capacity similarly in both attention and VSTM tasks. These observations fail to support the view that visual attention and VSTM capacity limits result from a shared buffer but instead highlight the role of the resource demands of underlying processes in limiting capacity.
A class Hierarchical, object-oriented approach to virtual memory management

NASA Technical Reports Server (NTRS)

Russo, Vincent F.; Campbell, Roy H.; Johnston, Gary M.

1989-01-01

The Choices family of operating systems exploits class hierarchies and object-oriented programming to facilitate the construction of customized operating systems for shared memory and networked multiprocessors. The software is being used in the Tapestry laboratory to study the performance of algorithms, mechanisms, and policies for parallel systems. Described here are the architectural design and class hierarchy of the Choices virtual memory management system. The software and hardware mechanisms and policies of a virtual memory system implement a memory hierarchy that exploits the trade-off between response times and storage capacities. In Choices, the notion of a memory hierarchy is captured by abstract classes. Concrete subclasses of those abstractions implement a virtual address space, segmentation, paging, physical memory management, secondary storage, and remote (that is, networked) storage. Captured in the notion of a memory hierarchy are classes that represent memory objects. These classes provide a storage mechanism that contains encapsulated data and have methods to read or write the memory object. Each of these classes provides specializations to represent the memory hierarchy.
Genetic and environmental contributions to the associations between intraindividual variability in reaction time and cognitive function.

PubMed

Finkel, Deborah; Pedersen, Nancy L

2014-01-01

Intraindividual variability (IIV) in reaction time has been related to cognitive decline, but questions remain about the nature of this relationship. Mean and range in movement and decision time for simple reaction time were available from 241 individuals aged 51-86 years at the fifth testing wave of the Swedish Adoption/Twin Study of Aging. Cognitive performance on four factors was also available: verbal, spatial, memory, and speed. Analyses indicated that range in reaction time could be used as an indicator of IIV. Heritability estimates were 35% for mean reaction and 20% for range in reaction. Multivariate analysis indicated that the genetic variance on the memory, speed, and spatial factors is shared with genetic variance for mean or range in reaction time. IIV shares significant genetic variance with fluid ability in late adulthood, over and above and genetic variance shared with mean reaction time.
A Study of Shared-Memory Mutual Exclusion Protocols Using CADP

NASA Astrophysics Data System (ADS)

Mateescu, Radu; Serwe, Wendelin

Mutual exclusion protocols are an essential building block of concurrent systems: indeed, such a protocol is required whenever a shared resource has to be protected against concurrent non-atomic accesses. Hence, many variants of mutual exclusion protocols exist in the shared-memory setting, such as Peterson's or Dekker's well-known protocols. Although the functional correctness of these protocols has been studied extensively, relatively little attention has been paid to their non-functional aspects, such as their performance in the long run. In this paper, we report on experiments with the performance evaluation of mutual exclusion protocols using Interactive Markov Chains. Steady-state analysis provides an additional criterion for comparing protocols, which complements the verification of their functional properties. We also carefully re-examined the functional properties, whose accurate formulation as temporal logic formulas in the action-based setting turns out to be quite involved.
Handling debugger breakpoints in a shared instruction system

DOEpatents

Gooding, Thomas Michael; Shok, Richard Michael

2014-01-21

A debugger debugs processes that execute shared instructions so that a breakpoint set for one process will not cause a breakpoint to occur in the other processes. A breakpoint is set by recording the original instruction at the desired location and writing a trap instruction to the shared instructions at that location. When a process encounters the breakpoint, the process passes control to the debugger for breakpoint processing if the breakpoint was set at that location for that process. If the trap was not set at that location for that process, the cacheline containing the trap is copied to a small scratchpad memory, and the virtual memory mappings are changed to translate the virtual address of the cacheline to the scratchpad. The original instruction is then written to replace the trap instruction in the scratchpad, so that process can execute the instructions in the scatchpad thereby avoiding the trap instruction.
An Army dentist in the combat zone during WWII.

PubMed

Orden, C Q

2001-11-01

It is 60 years since the bombing of Pearl Harbor and the outbreak of World War II for the United States. Some of the men and women who served in the armed forces at the time are willing to share some of their reminiscences with those of us who could not serve for one reason or another or who may not even have been born at the time. One of the dentists who is willing to share some of his memories is Dr. Charles Q. Orden of New York. Unless these people share their memories much history will be lost forever in the next years, and we will all be poorer for it. We sincerely thank Dr. Orden for his offer of information and for allowing us to reproduce Fig. 1 in which he is seen as the dentist using a field dental chair, a foot-powered drill, and with a black dental corpsman as his assistant.

WWWinda Orchestrator: a mechanism for coordinating distributed flocks of Java Applets

NASA Astrophysics Data System (ADS)

Gutfreund, Yechezkal-Shimon; Nicol, John R.

1997-01-01

The WWWinda Orchestrator is a simple but powerful tool for coordinating distributed Java applets. Loosely derived from the Linda programming language developed by David Gelernter and Nicholas Carriero of Yale, WWWinda implements a distributed shared object space called TupleSpace where applets can post, read, or permanently store arbitrary Java objects. In this manner, applets can easily share information without being aware of the underlying communication mechanisms. WWWinda is a very useful for orchestrating flocks of distributed Java applets. Coordination event scan be posted to WWWinda TupleSpace and used to orchestrate the actions of remote applets. Applets can easily share information via the TupleSpace. The technology combines several functions in one simple metaphor: distributed web objects, remote messaging between applets, distributed synchronization mechanisms, object- oriented database, and a distributed event signaling mechanisms. WWWinda can be used a s platform for implementing shared VRML environments, shared groupware environments, controlling remote devices such as cameras, distributed Karaoke, distributed gaming, and shared audio and video experiences.
Memory Systems Do Not Divide on Consciousness: Reinterpreting Memory in Terms of Activation and Binding

PubMed Central

Reder, Lynne M.; Park, Heekyeong; Kieffaber, Paul D.

2009-01-01

There is a popular hypothesis that performance on implicit and explicit memory tasks reflects 2 distinct memory systems. Explicit memory is said to store those experiences that can be consciously recollected, and implicit memory is said to store experiences and affect subsequent behavior but to be unavailable to conscious awareness. Although this division based on awareness is a useful taxonomy for memory tasks, the authors review the evidence that the unconscious character of implicit memory does not necessitate that it be treated as a separate system of human memory. They also argue that some implicit and explicit memory tasks share the same memory representations and that the important distinction is whether the task (implicit or explicit) requires the formation of a new association. The authors review and critique dissociations from the behavioral, amnesia, and neuroimaging literatures that have been advanced in support of separate explicit and implicit memory systems by highlighting contradictory evidence and by illustrating how the data can be accounted for using a simple computational memory model that assumes the same memory representation for those disparate tasks. PMID:19210052
Distributed memory compiler methods for irregular problems: Data copy reuse and runtime partitioning

NASA Technical Reports Server (NTRS)

Das, Raja; Ponnusamy, Ravi; Saltz, Joel; Mavriplis, Dimitri

1991-01-01

Outlined here are two methods which we believe will play an important role in any distributed memory compiler able to handle sparse and unstructured problems. We describe how to link runtime partitioners to distributed memory compilers. In our scheme, programmers can implicitly specify how data and loop iterations are to be distributed between processors. This insulates users from having to deal explicitly with potentially complex algorithms that carry out work and data partitioning. We also describe a viable mechanism for tracking and reusing copies of off-processor data. In many programs, several loops access the same off-processor memory locations. As long as it can be verified that the values assigned to off-processor memory locations remain unmodified, we show that we can effectively reuse stored off-processor data. We present experimental data from a 3-D unstructured Euler solver run on iPSC/860 to demonstrate the usefulness of our methods.
Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

NASA Technical Reports Server (NTRS)

Quealy, Angela; Cole, Gary L.; Blech, Richard A.

1993-01-01

The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.
Flow of Emotional Messages in Artificial Social Networks

NASA Astrophysics Data System (ADS)

Chmiel, Anna; Hołyst, Janusz A.

Models of message flows in an artificial group of users communicating via the Internet are introduced and investigated using numerical simulations. We assumed that messages possess an emotional character with a positive valence and that the willingness to send the next affective message to a given person increases with the number of messages received from this person. As a result, the weights of links between group members evolve over time. Memory effects are introduced, taking into account that the preferential selection of message receivers depends on the communication intensity during the recent period only. We also model the phenomenon of secondary social sharing when the reception of an emotional e-mail triggers the distribution of several emotional e-mails to other people.
Maximal clique enumeration with data-parallel primitives

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lessley, Brenton; Perciano, Talita; Mathai, Manish

The enumeration of all maximal cliques in an undirected graph is a fundamental problem arising in several research areas. We consider maximal clique enumeration on shared-memory, multi-core architectures and introduce an approach consisting entirely of data-parallel operations, in an effort to achieve efficient and portable performance across different architectures. We study the performance of the algorithm via experiments varying over benchmark graphs and architectures. Overall, we observe that our algorithm achieves up to a 33-time speedup and 9-time speedup over state-of-the-art distributed and serial algorithms, respectively, for graphs with higher ratios of maximal cliques to total cliques. Further, we attainmore » additional speedups on a GPU architecture, demonstrating the portable performance of our data-parallel design.« less
Teleoperated position control of a PUMA robot

NASA Technical Reports Server (NTRS)

Austin, Edmund; Fong, Chung P.

1987-01-01

A laboratory distributed computer control teleoperator system is developed to support NASA's future space telerobotic operation. This teleoperator system uses a universal force-reflecting hand controller in the local iste as the operator's input device. In the remote site, a PUMA controller recieves the Cartesian position commands and implements PID control laws to position the PUMA robot. The local site uses two microprocessors while the remote site uses three. The processors communicate with each other through shared memory. The PUMA robot controller was interfaced through custom made electronics to bypass VAL. The development status of this teleoperator system is reported. The execution time of each processor is analyzed, and the overall system throughput rate is reported. Methods to improve the efficiency and performance are discussed.
Distinct and shared cognitive functions mediate event- and time-based prospective memory impairment in normal ageing

PubMed Central

Gonneaud, Julie; Kalpouzos, Grégoria; Bon, Laetitia; Viader, Fausto; Eustache, Francis; Desgranges, Béatrice

2011-01-01

Prospective memory (PM) is the ability to remember to perform an action at a specific point in the future. Regarded as multidimensional, PM involves several cognitive functions that are known to be impaired in normal aging. In the present study, we set out to investigate the cognitive correlates of PM impairment in normal aging. Manipulating cognitive load, we assessed event- and time-based PM, as well as several cognitive functions, including executive functions, working memory and retrospective episodic memory, in healthy subjects covering the entire adulthood. We found that normal aging was characterized by PM decline in all conditions and that event-based PM was more sensitive to the effects of aging than time-based PM. Whatever the conditions, PM was linked to inhibition and processing speed. However, while event-based PM was mainly mediated by binding and retrospective memory processes, time-based PM was mainly related to inhibition. The only distinction between high- and low-load PM cognitive correlates lays in an additional, but marginal, correlation between updating and the high-load PM condition. The association of distinct cognitive functions, as well as shared mechanisms with event- and time-based PM confirms that each type of PM relies on a different set of processes. PMID:21678154
Neuropsychological characteristics of child and adolescent offspring of patients with schizophrenia or bipolar disorder.

PubMed

de la Serna, Elena; Sugranyes, Gisela; Sanchez-Gistau, Vanessa; Rodriguez-Toscano, Elisa; Baeza, Immaculada; Vila, Montserrat; Romero, Soledad; Sanchez-Gutierrez, Teresa; Penzol, Mª José; Moreno, Dolores; Castro-Fornieles, Josefina

2017-05-01

Schizophrenia (SZ) and bipolar disorder (BD) are considered neurobiological disorders which share some clinical, cognitive and neuroimaging characteristics. Studying child and adolescent offspring of patients diagnosed with bipolar disorder (BDoff) or schizophrenia (SZoff) is regarded as a reliable method for investigating early alterations and vulnerability factors for these disorders. This study compares the neuropsychological characteristics of SZoff, BDoff and a community control offspring group (CC) with the aim of examining shared and differential cognitive characteristics among groups. 41 SZoff, 90 BDoff and 107 CC were recruited. They were all assessed with a complete neuropsychological battery which included intelligence quotient, working memory (WM), processing speed, verbal memory and learning, visual memory, executive functions and sustained attention. SZoff and BDoff showed worse performance in some cognitive areas compared with CC. Some of these difficulties (visual memory) were common to both offspring groups, whereas others, such as verbal learning and WM in SZoff or PSI in BDoff, were group-specific. The cognitive difficulties in visual memory shown by both the SZoff and BDoff groups might point to a common endophenotype in the two disorders. Difficulties in other cognitive functions would be specific depending on the family diagnosis. Copyright © 2016 Elsevier B.V. All rights reserved.
Working memory costs of task switching.

PubMed

Liefooghe, Baptist; Barrouillet, Pierre; Vandierendonck, André; Camos, Valérie

2008-05-01

Although many accounts of task switching emphasize the importance of working memory as a substantial source of the switch cost, there is a lack of evidence demonstrating that task switching actually places additional demands on working memory. The present study addressed this issue by implementing task switching in continuous complex span tasks with strictly controlled time parameters. A series of 4 experiments demonstrate that recall performance decreased as a function of the number of task switches and that the concurrent load of item maintenance had no influence on task switching. These results indicate that task switching induces a cost on working memory functioning. Implications for theories of task switching, working memory, and resource sharing are addressed.
Implementation of a Fully-Balanced Periodic Tridiagonal Solver on a Parallel Distributed Memory Architecture

DTIC Science & Technology

1994-05-01

PARALLEL DISTRIBUTED MEMORY ARCHITECTURE LTJh T. M. Eidson 0 - 8 l 9 5 " G. Erlebacher _ _ _. _ DTIe QUALITY INSPECTED a Contract NAS I - 19480 May 1994...DISTRIBUTED MEMORY ARCHITECTURE T.M. Eidson * High Technology Corporation Hampton, VA 23665 G. Erlebachert Institute for Computer Applications in Science and...developed and evaluated. Simple model calculations as well as timing results are pres.nted to evaluate the various strategies. The particular
Examining procedural working memory processing in obsessive-compulsive disorder.

PubMed

Shahar, Nitzan; Teodorescu, Andrei R; Anholt, Gideon E; Karmon-Presser, Anat; Meiran, Nachshon

2017-07-01

Previous research has suggested that a deficit in working memory might underlie the difficulty of obsessive-compulsive disorder (OCD) patients to control their thoughts and actions. However, a recent meta-analyses found only small effect sizes for working memory deficits in OCD. Recently, a distinction has been made between declarative and procedural working memory. Working memory in OCD was tested mostly using declarative measurements. However, OCD symptoms typically concerns actions, making procedural working-memory more relevant. Here, we tested the operation of procedural working memory in OCD. Participants with OCD and healthy controls performed a battery of choice reaction tasks under high and low procedural working memory demands. Reaction-times (RT) were estimated using ex-Gaussian distribution fitting, revealing no group differences in the size of the RT distribution tail (i.e., τ parameter), known to be sensitive to procedural working memory manipulations. Group differences, unrelated to working memory manipulations, were found in the leading-edge of the RT distribution and analyzed using a two-stage evidence accumulation model. Modeling results suggested that perceptual difficulties might underlie the current group differences. In conclusion, our results suggest that procedural working-memory processing is most likely intact in OCD, and raise a novel, yet untested assumption regarding perceptual deficits in OCD. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Method of up-front load balancing for local memory parallel processors

NASA Technical Reports Server (NTRS)

Baffes, Paul Thomas (Inventor)

1990-01-01

In a parallel processing computer system with multiple processing units and shared memory, a method is disclosed for uniformly balancing the aggregate computational load in, and utilizing minimal memory by, a network having identical computations to be executed at each connection therein. Read-only and read-write memory are subdivided into a plurality of process sets, which function like artificial processing units. Said plurality of process sets is iteratively merged and reduced to the number of processing units without exceeding the balance load. Said merger is based upon the value of a partition threshold, which is a measure of the memory utilization. The turnaround time and memory savings of the instant method are functions of the number of processing units available and the number of partitions into which the memory is subdivided. Typical results of the preferred embodiment yielded memory savings of from sixty to seventy five percent.
Hippocampal and ventral medial prefrontal activation during retrieval-mediated learning supports novel inference.

PubMed

Zeithamova, Dagmar; Dominick, April L; Preston, Alison R

2012-07-12

Memory enables flexible use of past experience to inform new behaviors. Although leading theories hypothesize that this fundamental flexibility results from the formation of integrated memory networks relating multiple experiences, the neural mechanisms that support memory integration are not well understood. Here, we demonstrate that retrieval-mediated learning, whereby prior event details are reinstated during encoding of related experiences, supports participants' ability to infer relationships between distinct events that share content. Furthermore, we show that activation changes in a functionally coupled hippocampal and ventral medial prefrontal cortical circuit track the formation of integrated memories and successful inferential memory performance. These findings characterize the respective roles of these regions in retrieval-mediated learning processes that support relational memory network formation and inferential memory in the human brain. More broadly, these data reveal fundamental mechanisms through which memory representations are constructed into prospectively useful formats. Copyright © 2012 Elsevier Inc. All rights reserved.
Hippocampal and ventral medial prefrontal activation during retrieval-mediated learning supports novel inference

PubMed Central

Zeithamova, Dagmar; Dominick, April L.; Preston, Alison R.

2012-01-01

SUMMARY Memory enables flexible use of past experience to inform new behaviors. Though leading theories hypothesize that this fundamental flexibility results from the formation of integrated memory networks relating multiple experiences, the neural mechanisms that support memory integration are not well understood. Here, we demonstrate that retrieval-mediated learning, whereby prior event details are reinstated during encoding of related experiences, supports participants’ ability to infer relationships between distinct events that share content. Furthermore, we show that activation changes in a functionally coupled hippocampal and ventral medial prefrontal cortical circuit track the formation of integrated memories and successful inferential memory performance. These findings characterize the respective roles of these regions in retrieval-mediated learning processes that support relational memory network formation and inferential memory in the human brain. More broadly, these data reveal fundamental mechanisms through which memory representations are constructed into prospectively useful formats. PMID:22794270
Jim Thomas: A Collection of Memories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wong, Pak C.

Jim Thomas, a guest editor and a long-time associate editor of Information Visualization (IVS), died in Richland, WA, on August 6, 2010 due to complications from a brain tumor. His friends and colleagues from around the world have since expressed their sadness and paid tribute to a visionary scientist in multiple public forums. For those who didn't get the chance to know Jim, I share a collection of my own memories of Jim Thomas and memories from some of his colleagues.
Effects of motor congruence on visual working memory.

PubMed

Quak, Michel; Pecher, Diane; Zeelenberg, Rene

2014-10-01

Grounded-cognition theories suggest that memory shares processing resources with perception and action. The motor system could be used to help memorize visual objects. In two experiments, we tested the hypothesis that people use motor affordances to maintain object representations in working memory. Participants performed a working memory task on photographs of manipulable and nonmanipulable objects. The manipulable objects were objects that required either a precision grip (i.e., small items) or a power grip (i.e., large items) to use. A concurrent motor task that could be congruent or incongruent with the manipulable objects caused no difference in working memory performance relative to nonmanipulable objects. Moreover, the precision- or power-grip motor task did not affect memory performance on small and large items differently. These findings suggest that the motor system plays no part in visual working memory.
Many Roads Lead to Recognition: Electrophysiological Correlates of Familiarity Derived from Short-Term Masked Repetition Priming

ERIC Educational Resources Information Center

Lucas, Heather D.; Taylor, Jason R.; Henson, Richard N.; Paller, Ken A.

2012-01-01

The neural mechanisms that underlie familiarity memory have been extensively investigated, but a consensus understanding remains elusive. Behavioral evidence suggests that familiarity sometimes shares sources with instances of implicit memory known as priming, in that the same increases in processing fluency that give rise to priming can engender…
Forward Association, Backward Association, and the False-Memory Illusion

ERIC Educational Resources Information Center

Brainerd, C. J.; Wright, Ron

2005-01-01

In the Deese-Roediger-McDermott false-memory illusion, forward associative strength (FAS) is unrelated to the strength of the illusion; this is puzzling, because high-FAS lists ought to share more semantic features with critical unpresented words than should low-FAS lists. The authors show that this null result is probably a truncated range…
The Impact of Storage on Processing: How Is Information Maintained in Working Memory?

ERIC Educational Resources Information Center

Vergauwe, Evie; Camos, Valérie; Barrouillet, Pierre

2014-01-01

Working memory is typically defined as a system devoted to the simultaneous maintenance and processing of information. However, the interplay between these 2 functions is still a matter of debate in the literature, with views ranging from complete independence to complete dependence. The time-based resource-sharing model assumes that a central…

Selecting Learning Tasks: Effects of Adaptation and Shared Control on Learning Efficiency and Task Involvement

ERIC Educational Resources Information Center

Corbalan, Gemma; Kester, Liesbeth; van Merrienboer, Jeroen J. G.

2008-01-01

Complex skill acquisition by performing authentic learning tasks is constrained by limited working memory capacity [Baddeley, A. D. (1992). Working memory. "Science, 255", 556-559]. To prevent cognitive overload, task difficulty and support of each newly selected learning task can be adapted to the learner's competence level and perceived task…
Android Protection Mechanism: A Signed Code Security Mechanism for Smartphone Applications

DTIC Science & Technology

2011-03-01

status registers, exceptions, endian support, unaligned access support, synchronization primitives , the Jazelle Extension, and saturated integer...supports comprehensive non-blocking shared-memory synchronization primitives that scale for multiple-processor system designs. This is an improvement... synchronization . Memory semaphores can be loaded and altered without interruption because the load and store operations are atomic. Processor
Senior Citizens' Personal Stories...Literacy through Narrative...Sharing the Richness of the Past.

ERIC Educational Resources Information Center

Lineberry, Colleen

Using simple writing strategies, senior citizens at an elder camp workshop collected memories in journals. In some cases, readings were used to trigger memories. The exercise enabled students to make connections between their own life experiences and the life experiences of others. Workshops encouraging participants to tell their own stories for…
Processes and Content of Narrative Identity Development in Adolescence: Gender and Well-Being

ERIC Educational Resources Information Center

McLean, Kate C.; Breen, Andrea V.

2009-01-01

The present study examined narrative identity in adolescence (14-18 years) in terms of narrative content and processes of identity development. Age- and gender-related differences in narrative patterns in turning point memories and gender differences in the content and functions for sharing those memories were examined, as was the relationship…
The Performance of a Lifetime

ERIC Educational Resources Information Center

Burdette, Kimberly

2007-01-01

In this article, the author recalls and shares the first half of her college journey. Her memories do not play back to her in bursts of sounds or colors; friends or lovers; feelings, touches, tastes, or ideas. They play, rather, as silent images of herself that flicker disjointedly across her mind, the lens of her memory having recorded her…
Schools of the Past: A Treasury of Photographs. Fastback 80.

ERIC Educational Resources Information Center

Davis, O. L., Jr.

The experience of schooling in America is recalled through a memory-sharing essay and an album of photographs. The intent of the article is to prompt readers to remember their personal schooling experiences and relate them to the larger framework of national memories. The essay, focusing on schools at the turn of the 20th century, discusses…
Shared Etiology of Phonological Memory and Vocabulary Deficits in School-Age Children

ERIC Educational Resources Information Center

Peterson, Robin L.; Pennington, Bruce F.; Samuelsson, Stefan; Byrne, Brian; Olson, Richard K.

2013-01-01

Purpose: The goal of this study was to investigate the etiologic basis for the association between deficits in phonological memory (PM) and vocabulary in school-age children. Method: Children with deficits in PM or vocabulary were identified within the International Longitudinal Twin Study (ILTS; Samuelsson et al., 2005). The ILTS includes 1,045…
The CA3 Network as a Memory Store for Spatial Representations

ERIC Educational Resources Information Center

Papp, Gergely; Witter, Menno P.; Treves, Alessandro

2007-01-01

Comparative neuroanatomy suggests that the CA3 region of the mammalian hippocampus is directly homologous with the medio-dorsal pallium in birds and reptiles, with which it largely shares the basic organization of primitive cortex. Autoassociative memory models, which are generically applicable to cortical networks, then help assess how well CA3…
Memories Are Made of This

ERIC Educational Resources Information Center

Chang, Christine

2010-01-01

In this article, the author shares her memories of Sally Smith, the founder of The Lab School of Washington, where she works as the director of the Occupational Therapy. When the author first met Smith, Smith asked her what brought her to The Lab School at that point in her career. She told Smith that her background was rather eclectic, since she…
They're Why We're Here

ERIC Educational Resources Information Center

Razook, Nim

2009-01-01

The author began teaching at the University of Oklahoma in the late 1970s. In this article, the author shares two memories of those times on campus. The first was looking out his office window and seeing Iranian students marching on campus, shouting, "The Shah is a Fascist Pig." The second memory provoked this paper. It made the author…
Rearview Memories

ERIC Educational Resources Information Center

Gross, Gwen E.

2008-01-01

In this article, the author shares her experience when she was still a student until she became a superintendent. In her 17th year in the superintendency, the author finds the joys of her work all around her, grateful to be bestowed with the gift of leadership. She shares with colleagues a few especially meaningful moments from her professional…
Episodic memory in aspects of large-scale brain networks

PubMed Central

Jeong, Woorim; Chung, Chun Kee; Kim, June Sic

2015-01-01

Understanding human episodic memory in aspects of large-scale brain networks has become one of the central themes in neuroscience over the last decade. Traditionally, episodic memory was regarded as mostly relying on medial temporal lobe (MTL) structures. However, recent studies have suggested involvement of more widely distributed cortical network and the importance of its interactive roles in the memory process. Both direct and indirect neuro-modulations of the memory network have been tried in experimental treatments of memory disorders. In this review, we focus on the functional organization of the MTL and other neocortical areas in episodic memory. Task-related neuroimaging studies together with lesion studies suggested that specific sub-regions of the MTL are responsible for specific components of memory. However, recent studies have emphasized that connectivity within MTL structures and even their network dynamics with other cortical areas are essential in the memory process. Resting-state functional network studies also have revealed that memory function is subserved by not only the MTL system but also a distributed network, particularly the default-mode network (DMN). Furthermore, researchers have begun to investigate memory networks throughout the entire brain not restricted to the specific resting-state network (RSN). Altered patterns of functional connectivity (FC) among distributed brain regions were observed in patients with memory impairments. Recently, studies have shown that brain stimulation may impact memory through modulating functional networks, carrying future implications of a novel interventional therapy for memory impairment. PMID:26321939
Modeling Confidence and Response Time in Recognition Memory

ERIC Educational Resources Information Center

Ratcliff, Roger; Starns, Jeffrey J.

2009-01-01

A new model for confidence judgments in recognition memory is presented. In the model, the match between a single test item and memory produces a distribution of evidence, with better matches corresponding to distributions with higher means. On this match dimension, confidence criteria are placed, and the areas between the criteria under the…
Ageing-related stereotypes in memory: When the beliefs come true.

PubMed

Bouazzaoui, Badiâa; Follenfant, Alice; Ric, François; Fay, Séverine; Croizet, Jean-Claude; Atzeni, Thierry; Taconnat, Laurence

2016-01-01

Age-related stereotype concerns culturally shared beliefs about the inevitable decline of memory with age. In this study, stereotype priming and stereotype threat manipulations were used to explore the impact of age-related stereotype on metamemory beliefs and episodic memory performance. Ninety-two older participants who reported the same perceived memory functioning were divided into two groups: a threatened group and a non-threatened group (control). First, the threatened group was primed with an ageing stereotype questionnaire. Then, both groups were administered memory complaints and memory self-efficacy questionnaires to measure metamemory beliefs. Finally, both groups were administered the Logical Memory task to measure episodic memory, for the threatened group the instructions were manipulated to enhance the stereotype threat. Results indicated that the threatened individuals reported more memory complaints and less memory efficacy, and had lower scores than the control group on the logical memory task. A multiple mediation analysis revealed that the stereotype threat effect on the episodic memory performance was mediated by both memory complaints and memory self-efficacy. This study revealed that stereotype threat impacts belief in one's own memory functioning, which in turn impairs episodic memory performance.
Efficient ICCG on a shared memory multiprocessor

NASA Technical Reports Server (NTRS)

Hammond, Steven W.; Schreiber, Robert

1989-01-01

Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.
MPF: A portable message passing facility for shared memory multiprocessors

NASA Technical Reports Server (NTRS)

Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.

1987-01-01

The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.
Which neuropsychological functions predict various processing speed components in children with and without attention-deficit/hyperactivity disorder?

PubMed

Vadnais, Sarah A; Kibby, Michelle Y; Jagger-Rickels, Audreyana C

2018-01-01

We identified statistical predictors of four processing speed (PS) components in a sample of 151 children with and without attention-deficit/hyperactivity disorder (ADHD). Performance on perceptual speed was predicted by visual attention/short-term memory, whereas incidental learning/psychomotor speed was predicted by verbal working memory. Rapid naming was predictive of each PS component assessed, and inhibition predicted all but one task, suggesting a shared need to identify/retrieve stimuli rapidly and inhibit incorrect responding across PS components. Hence, we found both shared and unique predictors of perceptual, cognitive, and output speed, suggesting more specific terminology should be used in future research on PS in ADHD.
Parallel k-means++ for Multiple Shared-Memory Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mackey, Patrick S.; Lewis, Robert R.

2016-09-22

In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithms that are only approximations of k-means++. In this paper we present a parallelization of the exact k-means++ algorithm, with a proof of its correctness. We develop implementations for three distinct shared-memory architectures: multicore CPU, high performance GPU, and the massively multithreaded Cray XMT platform. We demonstrate the scalability of the algorithm on each platform. In addition we present a visual approach for showing which platform performed k-means++ the fastest for varyingmore » data sizes.« less
Shared virtual memory and generalized speedup

NASA Technical Reports Server (NTRS)

Sun, Xian-He; Zhu, Jianping

1994-01-01

Generalized speedup is defined as parallel speed over sequential speed. The generalized speedup and its relation with other existing performance metrics, such as traditional speedup, efficiency, scalability, etc., are carefully studied. In terms of the introduced asymptotic speed, it was shown that the difference between the generalized speedup and the traditional speedup lies in the definition of the efficiency of uniprocessor processing, which is a very important issue in shared virtual memory machines. A scientific application was implemented on a KSR-1 parallel computer. Experimental and theoretical results show that the generalized speedup is distinct from the traditional speedup and provides a more reasonable measurement. In the study of different speedups, various causes of superlinear speedup are also presented.
Error recovery in shared memory multiprocessors using private caches

NASA Technical Reports Server (NTRS)

Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.

1990-01-01

The problem of recovering from processor transient faults in shared memory multiprocesses systems is examined. A user-transparent checkpointing and recovery scheme using private caches is presented. Processes can recover from errors due to faulty processors by restarting from the checkpointed computation state. Implementation techniques using checkpoint identifiers and recovery stacks are examined as a means of reducing performance degradation in processor utilization during normal execution. This cache-based checkpointing technique prevents rollback propagation, provides rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions to take error latency into account are presented.

Towards memory-aware services and browsing through lifelogging sensing.

PubMed

Arcega, Lorena; Font, Jaime; Cetina, Carlos

2013-11-05

Every day we receive lots of information through our senses that is lost forever, because it lacked the strength or the repetition needed to generate a lasting memory. Combining the emerging Internet of Things and lifelogging sensors, we believe it is possible to build up a Digital Memory (Dig-Mem) in order to complement the fallible memory of people. This work shows how to realize the Dig-Mem in terms of interactions, affinities, activities, goals and protocols. We also complement this Dig-Mem with memory-aware services and a Dig-Mem browser. Furthermore, we propose a RFID Tag-Sharing technique to speed up the adoption of Dig-Mem. Experimentation reveals an improvement of the user understanding of Dig-Mem as time passes, compared to natural memories where the level of detail decreases over time.
A neural network model of semantic memory linking feature-based object representation and words.

PubMed

Cuppini, C; Magosso, E; Ursino, M

2009-06-01

Recent theories in cognitive neuroscience suggest that semantic memory is a distributed process, which involves many cortical areas and is based on a multimodal representation of objects. The aim of this work is to extend a previous model of object representation to realize a semantic memory, in which sensory-motor representations of objects are linked with words. The model assumes that each object is described as a collection of features, coded in different cortical areas via a topological organization. Features in different objects are segmented via gamma-band synchronization of neural oscillators. The feature areas are further connected with a lexical area, devoted to the representation of words. Synapses among the feature areas, and among the lexical area and the feature areas are trained via a time-dependent Hebbian rule, during a period in which individual objects are presented together with the corresponding words. Simulation results demonstrate that, during the retrieval phase, the network can deal with the simultaneous presence of objects (from sensory-motor inputs) and words (from acoustic inputs), can correctly associate objects with words and segment objects even in the presence of incomplete information. Moreover, the network can realize some semantic links among words representing objects with shared features. These results support the idea that semantic memory can be described as an integrated process, whose content is retrieved by the co-activation of different multimodal regions. In perspective, extended versions of this model may be used to test conceptual theories, and to provide a quantitative assessment of existing data (for instance concerning patients with neural deficits).
Implementing Molecular Dynamics on Hybrid High Performance Computers - Three-Body Potentials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, W Michael; Yamada, Masako

The use of coprocessors or accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power re- quirements. Hybrid high-performance computers, defined as machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. Although there has been extensive research into methods to efficiently use accelerators to improve the performance of molecular dynamics (MD) employing pairwise potential energy models, little is reported in the literature for models that includemore » many-body effects. 3-body terms are required for many popular potentials such as MEAM, Tersoff, REBO, AIREBO, Stillinger-Weber, Bond-Order Potentials, and others. Because the per-atom simulation times are much higher for models incorporating 3-body terms, there is a clear need for efficient algo- rithms usable on hybrid high performance computers. Here, we report a shared-memory force-decomposition for 3-body potentials that avoids memory conflicts to allow for a deterministic code with substantial performance improvements on hybrid machines. We describe modifications necessary for use in distributed memory MD codes and show results for the simulation of water with Stillinger-Weber on the hybrid Titan supercomputer. We compare performance of the 3-body model to the SPC/E water model when using accelerators. Finally, we demonstrate that our approach can attain a speedup of 5.1 with acceleration on Titan for production simulations to study water droplet freezing on a surface.« less
Parallel Clustering Algorithm for Large-Scale Biological Data Sets

PubMed Central

Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang

2014-01-01

Backgrounds Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Methods Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. Result A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. PMID:24705246
Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, W Michael; Wang, Peng; Plimpton, Steven J

The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less
Common capacity-limited neural mechanisms of selective attention and spatial working memory encoding

PubMed Central

Fusser, Fabian; Linden, David E J; Rahm, Benjamin; Hampel, Harald; Haenschel, Corinna; Mayer, Jutta S

2011-01-01

One characteristic feature of visual working memory (WM) is its limited capacity, and selective attention has been implicated as limiting factor. A possible reason why attention constrains the number of items that can be encoded into WM is that the two processes share limited neural resources. Functional magnetic resonance imaging (fMRI) studies have indeed demonstrated commonalities between the neural substrates of WM and attention. Here we investigated whether such overlapping activations reflect interacting neural mechanisms that could result in capacity limitations. To independently manipulate the demands on attention and WM encoding within one single task, we combined visual search and delayed discrimination of spatial locations. Participants were presented with a search array and performed easy or difficult visual search in order to encode one, three or five positions of target items into WM. Our fMRI data revealed colocalised activation for attention-demanding visual search and WM encoding in distributed posterior and frontal regions. However, further analysis yielded two patterns of results. Activity in prefrontal regions increased additively with increased demands on WM and attention, indicating regional overlap without functional interaction. Conversely, the WM load-dependent activation in visual, parietal and premotor regions was severely reduced during high attentional demand. We interpret this interaction as indicating the sites of shared capacity-limited neural resources. Our findings point to differential contributions of prefrontal and posterior regions to the common neural mechanisms that support spatial WM encoding and attention, providing new imaging evidence for attention-based models of WM encoding. PMID:21781193
Fully device-independent quantum key distribution.

PubMed

Vazirani, Umesh; Vidick, Thomas

2014-10-03

Quantum cryptography promises levels of security that are impossible to replicate in a classical world. Can this security be guaranteed even when the quantum devices on which the protocol relies are untrusted? This central question dates back to the early 1990s when the challenge of achieving device-independent quantum key distribution was first formulated. We answer this challenge by rigorously proving the device-independent security of a slight variant of Ekert's original entanglement-based protocol against the most general (coherent) attacks. The resulting protocol is robust: While assuming only that the devices can be modeled by the laws of quantum mechanics and are spatially isolated from each other and from any adversary's laboratory, it achieves a linear key rate and tolerates a constant noise rate in the devices. In particular, the devices may have quantum memory and share arbitrary quantum correlations with the eavesdropper. The proof of security is based on a new quantitative understanding of the monogamous nature of quantum correlations in the context of a multiparty protocol.
Automatic Parallelization of Numerical Python Applications using the Global Arrays Toolkit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daily, Jeffrey A.; Lewis, Robert R.

2011-11-30

Global Arrays is a software system from Pacific Northwest National Laboratory that enables an efficient, portable, and parallel shared-memory programming interface to manipulate distributed dense arrays. The NumPy module is the de facto standard for numerical calculation in the Python programming language, a language whose use is growing rapidly in the scientific and engineering communities. NumPy provides a powerful N-dimensional array class as well as other scientific computing capabilities. However, like the majority of the core Python modules, NumPy is inherently serial. Using a combination of Global Arrays and NumPy, we have reimplemented NumPy as a distributed drop-in replacement calledmore » Global Arrays in NumPy (GAiN). Serial NumPy applications can become parallel, scalable GAiN applications with only minor source code changes. Scalability studies of several different GAiN applications will be presented showing the utility of developing serial NumPy codes which can later run on more capable clusters or supercomputers.« less
Developing Smartphone Apps for Education, Outreach, Science, and Engineering

NASA Astrophysics Data System (ADS)

Weatherwax, A. T.; Fitzsimmons, Z.; Czajkowski, J.; Breimer, E.; Hellman, S. B.; Hunter, S.; Dematteo, J.; Savery, T.; Melsert, K.; Sneeringer, J.

2010-12-01

The increased popularity of mobile phone apps provide scientists with a new avenue for sharing and distributing data and knowledge with colleagues, while also providing meaningful education and outreach products for consumption by the general public. Our initial development of iPhone and Android apps centered on the distribution of exciting auroral images taken at the South Pole for education and outreach purposes. These portable platforms, with limited resources when compared to computers, presented a unique set of design and implementation challenges that we will discuss in this presentation. For example, the design must account for limited memory; screen size; processing power; battery life; and potentially high data transport costs. Some of these unique requirements created an environment that enabled undergraduate and high-school students to participate in the creation of these apps. Additionally, during development it became apparent that these apps could also serve as data analysis and engineering tools. Our presentation will further discuss our plans to use apps not only for Education and Public Outreach, but for teaching, science and engineering.
Fully Device-Independent Quantum Key Distribution

NASA Astrophysics Data System (ADS)

Vazirani, Umesh; Vidick, Thomas

2014-10-01

Quantum cryptography promises levels of security that are impossible to replicate in a classical world. Can this security be guaranteed even when the quantum devices on which the protocol relies are untrusted? This central question dates back to the early 1990s when the challenge of achieving device-independent quantum key distribution was first formulated. We answer this challenge by rigorously proving the device-independent security of a slight variant of Ekert's original entanglement-based protocol against the most general (coherent) attacks. The resulting protocol is robust: While assuming only that the devices can be modeled by the laws of quantum mechanics and are spatially isolated from each other and from any adversary's laboratory, it achieves a linear key rate and tolerates a constant noise rate in the devices. In particular, the devices may have quantum memory and share arbitrary quantum correlations with the eavesdropper. The proof of security is based on a new quantitative understanding of the monogamous nature of quantum correlations in the context of a multiparty protocol.
Multiprocessor architecture: Synthesis and evaluation

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1990-01-01

Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.
Frequent Statement and Dereference Elimination for Imperative and Object-Oriented Distributed Programs

PubMed Central

El-Zawawy, Mohamed A.

2014-01-01

This paper introduces new approaches for the analysis of frequent statement and dereference elimination for imperative and object-oriented distributed programs running on parallel machines equipped with hierarchical memories. The paper uses languages whose address spaces are globally partitioned. Distributed programs allow defining data layout and threads writing to and reading from other thread memories. Three type systems (for imperative distributed programs) are the tools of the proposed techniques. The first type system defines for every program point a set of calculated (ready) statements and memory accesses. The second type system uses an enriched version of types of the first type system and determines which of the ready statements and memory accesses are used later in the program. The third type system uses the information gather so far to eliminate unnecessary statement computations and memory accesses (the analysis of frequent statement and dereference elimination). Extensions to these type systems are also presented to cover object-oriented distributed programs. Two advantages of our work over related work are the following. The hierarchical style of concurrent parallel computers is similar to the memory model used in this paper. In our approach, each analysis result is assigned a type derivation (serves as a correctness proof). PMID:24892098
Early Experiences Writing Performance Portable OpenMP 4 Codes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Joubert, Wayne; Hernandez, Oscar R

In this paper, we evaluate the recently available directives in OpenMP 4 to parallelize a computational kernel using both the traditional shared memory approach and the newer accelerator targeting capabilities. In addition, we explore various transformations that attempt to increase application performance portability, and examine the expressiveness and performance implications of using these approaches. For example, we want to understand if the target map directives in OpenMP 4 improve data locality when mapped to a shared memory system, as opposed to the traditional first touch policy approach in traditional OpenMP. To that end, we use recent Cray and Intel compilersmore » to measure the performance variations of a simple application kernel when executed on the OLCF s Titan supercomputer with NVIDIA GPUs and the Beacon system with Intel Xeon Phi accelerators attached. To better understand these trade-offs, we compare our results from traditional OpenMP shared memory implementations to the newer accelerator programming model when it is used to target both the CPU and an attached heterogeneous device. We believe the results and lessons learned as presented in this paper will be useful to the larger user community by providing guidelines that can assist programmers in the development of performance portable code.« less
The Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

NASA Technical Reports Server (NTRS)

Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)

2001-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
Probability distribution of extreme share returns in Malaysia

NASA Astrophysics Data System (ADS)

Zin, Wan Zawiah Wan; Safari, Muhammad Aslam Mohd; Jaaman, Saiful Hafizah; Yie, Wendy Ling Shin

2014-09-01

The objective of this study is to investigate the suitable probability distribution to model the extreme share returns in Malaysia. To achieve this, weekly and monthly maximum daily share returns are derived from share prices data obtained from Bursa Malaysia over the period of 2000 to 2012. The study starts with summary statistics of the data which will provide a clue on the likely candidates for the best fitting distribution. Next, the suitability of six extreme value distributions, namely the Gumbel, Generalized Extreme Value (GEV), Generalized Logistic (GLO) and Generalized Pareto (GPA), the Lognormal (GNO) and the Pearson (PE3) distributions are evaluated. The method of L-moments is used in parameter estimation. Based on several goodness of fit tests and L-moment diagram test, the Generalized Pareto distribution and the Pearson distribution are found to be the best fitted distribution to represent the weekly and monthly maximum share returns in Malaysia stock market during the studied period, respectively.
An empirical investigation of sparse distributed memory using discrete speech recognition

NASA Technical Reports Server (NTRS)

Danforth, Douglas G.

1990-01-01

Presented here is a step by step analysis of how the basic Sparse Distributed Memory (SDM) model can be modified to enhance its generalization capabilities for classification tasks. Data is taken from speech generated by a single talker. Experiments are used to investigate the theory of associative memories and the question of generalization from specific instances.
Switching behavior of resistive change memory using oxide nanowires

NASA Astrophysics Data System (ADS)

Aono, Takashige; Sugawa, Kosuke; Shimizu, Tomohiro; Shingubara, Shoso; Takase, Kouichi

2018-06-01

Resistive change random access memory (ReRAM), which is expected to be the next-generation nonvolatile memory, often has wide switching voltage distributions due to many kinds of conductive filaments. In this study, we have tried to suppress the distribution through the structural restriction of the filament-forming area using NiO nanowires. The capacitor with Ni metal nanowires whose surface is oxidized showed good switching behaviors with narrow distributions. The knowledge gained from our study will be very helpful in producing practical ReRAM devices.
Association of KIBRA and memory.

PubMed

Bates, Timothy C; Price, Jackie F; Harris, Sarah E; Marioni, Riccardo E; Fowkes, F Gerry R; Stewart, Marlene C; Murray, Gordon D; Whalley, Lawrence J; Starr, John M; Deary, Ian J

2009-07-24

We report on the association of KIBRA with memory in two samples of older individuals assessed on either memory for semantically unrelated word stimuli (Rey Auditory Verbal Learning Test, n=2091), or a measure of semantically related material (the WAIS Logical Memory Test of prose-passage recall, n=542). SNP rs17070145 was associated with delayed recall of semantically unrelated items, but not with immediate recall for these stimuli, nor with either immediate or delayed recall for semantically related material. The pattern of results suggests a role for the T-->C substitution in intron 9 of KIBRA in a component of episodic memory involved in long-term storage but independent of processes shared with immediate recall such as rehearsal involved in acquisition and rehearsal or processes.
Cultural scripts guide recall of intensely positive life events.

PubMed

Collins, Katherine A; Pillemer, David B; Ivcevic, Zorana; Gooze, Rachel A

2007-06-01

In four studies, we examined the temporal distribution of positive and negative memories of momentous life events. College students and middle-aged adults reported events occurring from the ages of 8 to 18 years in which they had felt especially good or especially bad about themselves. Distributions of positive memories showed a marked peak at ages 17 and 18. In contrast, distributions of negative memories were relatively flat. These patterns were consistent for males and females and for younger and older adults. Content analyses indicated that a substantial proportion of positive memories from late adolescence described culturally prescribed landmark events surrounding the major life transition from high school to college. When the participants were asked for recollections from life periods that lack obvious age-linked milestone events, age distributions of positive and negative memories were similar. The results support and extend Berntsen and Rubin's (2004) conclusion that cultural expectations, or life scripts, organize recall of positive, but not negative, events.
We Have Met Our Past and Our Future: Thanks for the Walk down Memory Lane

ERIC Educational Resources Information Center

Wiseman, Robert C.

2006-01-01

In this article, the author takes the readers for a walk down memory lane on the use of teaching aids. He shares his experience of the good old days of Audio Visual--opaque projector, motion pictures/films, recorders, and overhead projector. Computers have arrived, and now people can make graphics, pictures, motion pictures, and many different…

Dealing with Prospective Memory Demands While Performing an Ongoing Task: Shared Processing, Increased On-Task Focus, or Both?

ERIC Educational Resources Information Center

Rummel, Jan; Smeekens, Bridget A.; Kane, Michael J.

2017-01-01

Prospective memory (PM) is the cognitive ability to remember to fulfill intended action plans at the appropriate future moment. Current theories assume that PM fulfillment draws on attentional processes. Accordingly, pending PM intentions interfere with other ongoing tasks to the extent to which both tasks rely on the same processes. How do people…
An Action Sequence Withheld in Memory Can Delay Execution of Visually Guided Actions: The Generalization of Response Compatibility Interference

ERIC Educational Resources Information Center

Wiediger, Matthew D.; Fournier, Lisa R.

2008-01-01

Withholding an action plan in memory for later execution can delay execution of another action, if the actions share a similar (compatible) action feature (i.e., response hand). This phenomenon, termed compatibility interference (CI), was found for identity-based actions that do not require visual guidance. The authors examined whether CI can…
Relations of Maternal Style and Child Self-Concept to Autobiographical Memories in Chinese, Chinese Immigrant, and European American 3-Year-Olds

ERIC Educational Resources Information Center

Wang, Qi

2006-01-01

The relations of maternal reminiscing style and child self-concept to children's shared and independent autobiographical memories were examined in a sample of 189 three-year-olds and their mothers from Chinese families in China, first-generation Chinese immigrant families in the United States, and European American families. Mothers shared…
Genetic and environmental influences on individual differences in emotion regulation and its relation to working memory in toddlerhood.

PubMed

Wang, Manjie; Saudino, Kimberly J

2013-12-01

This is the first study to explore genetic and environmental contributions to individual differences in emotion regulation in toddlers, and the first to examine the genetic and environmental etiology underlying the association between emotion regulation and working memory. In a sample of 304 same-sex twin pairs (140 MZ, 164 DZ) at age 3, emotion regulation was assessed using the Behavior Rating Scale of the Bayley Scales of Infant Development (BRS; Bayley, 1993), and working memory was measured by the visually cued recall (VCR) task (Zelazo, Jacques, Burack, & Frye, 2002) and several memory tasks from the Mental Scale of the BSID. Based on model-fitting analyses, both emotion regulation and working memory were significantly influenced by genetic and nonshared environmental factors. Shared environmental effects were significant for working memory, but not for emotion regulation. Only genetic factors significantly contributed to the covariation between emotion regulation and working memory.
Genetic and Environmental Influences on Individual Differences in Emotion Regulation and Its Relation to Working Memory in Toddlerhood

PubMed Central

Wang, Manjie; Saudino, Kimberly J.

2014-01-01

This is the first study to explore genetic and environmental contributions to individual differences in emotion regulation in toddlers, and the first to examine the genetic and environmental etiology underlying the association between emotion regulation and working memory. In a sample of 304 same-sex twin pairs (140 MZ, 164 DZ) at age 3, emotion regulation was assessed using the Behavior Rating Scale of the Bayley Scales of Infant Development (BRS; Bayley, 1993), and working memory was measured by the visually cued recall (VCR) task (Zelazo et al., 2002) and several memory tasks from the Mental Scale of BSID. Based on model-fitting analyses, both emotion regulation and working memory were significantly influenced by genetic and nonshared environmental factors. Shared environmental effects were significant for working memory, but not for emotion regulation. Only genetic factors significantly contributed to the covariation between emotion regulation and working memory. PMID:24098922
Autobiographical Memory Sharing in Everyday Life: Characteristics of a Good Story

ERIC Educational Resources Information Center

Baron, Jacqueline M.; Bluck, Susan

2009-01-01

Storytelling is a ubiquitous human activity that occurs across the lifespan as part of everyday life. Studies from three disparate literatures suggest that older adults (as compared to younger adults) are (a) less likely to recall story details, (b) more likely to go off-target when sharing stories, and, in contrast, (c) more likely to receive…
Electrophysiological Activity Generated during the Implicit Association Test: A Study Using Event-Related Potentials

ERIC Educational Resources Information Center

O'Toole, Catriona; Barnes-Holmes, Dermot

2009-01-01

The Implicit Association Test (IAT) examines the differential association of 2 target concepts with 2 attribute concepts. Responding is predicted to be faster on consistent trials, when concepts that are associated in memory share a response key, than on inconsistent trials, when less associated items share a key. In the current study,…
The evolution of episodic memory

PubMed Central

Allen, Timothy A.; Fortin, Norbert J.

2013-01-01

One prominent view holds that episodic memory emerged recently in humans and lacks a “(neo)Darwinian evolution” [Tulving E (2002) Annu Rev Psychol 53:1–25]. Here, we review evidence supporting the alternative perspective that episodic memory has a long evolutionary history. We show that fundamental features of episodic memory capacity are present in mammals and birds and that the major brain regions responsible for episodic memory in humans have anatomical and functional homologs in other species. We propose that episodic memory capacity depends on a fundamental neural circuit that is similar across mammalian and avian species, suggesting that protoepisodic memory systems exist across amniotes and, possibly, all vertebrates. The implication is that episodic memory in diverse species may primarily be due to a shared underlying neural ancestry, rather than the result of evolutionary convergence. We also discuss potential advantages that episodic memory may offer, as well as species-specific divergences that have developed on top of the fundamental episodic memory architecture. We conclude by identifying possible time points for the emergence of episodic memory in evolution, to help guide further research in this area. PMID:23754432
Explaining prompts children to privilege inductively rich properties.

PubMed

Walker, Caren M; Lombrozo, Tania; Legare, Cristine H; Gopnik, Alison

2014-11-01

Four experiments with preschool-aged children test the hypothesis that engaging in explanation promotes inductive reasoning on the basis of shared causal properties as opposed to salient (but superficial) perceptual properties. In Experiments 1a and 1b, 3- to 5-year-old children prompted to explain during a causal learning task were more likely to override a tendency to generalize according to perceptual similarity and instead extend an internal feature to an object that shared a causal property. Experiment 2 replicated this effect of explanation in a case of label extension (i.e., categorization). Experiment 3 demonstrated that explanation improves memory for clusters of causally relevant (non-perceptual) features, but impairs memory for superficial (perceptual) features, providing evidence that effects of explanation are selective in scope and apply to memory as well as inference. In sum, our data support the proposal that engaging in explanation influences children's reasoning by privileging inductively rich, causal properties. Copyright © 2014 Elsevier B.V. All rights reserved.
Power and Performance Trade-offs for Space Time Adaptive Processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gawande, Nitin A.; Manzano Franco, Joseph B.; Tumeo, Antonino

Computational efficiency – performance relative to power or energy – is one of the most important concerns when designing RADAR processing systems. This paper analyzes power and performance trade-offs for a typical Space Time Adaptive Processing (STAP) application. We study STAP implementations for CUDA and OpenMP on two computationally efficient architectures, Intel Haswell Core I7-4770TE and NVIDIA Kayla with a GK208 GPU. We analyze the power and performance of STAP’s computationally intensive kernels across the two hardware testbeds. We also show the impact and trade-offs of GPU optimization techniques. We show that data parallelism can be exploited for efficient implementationmore » on the Haswell CPU architecture. The GPU architecture is able to process large size data sets without increase in power requirement. The use of shared memory has a significant impact on the power requirement for the GPU. A balance between the use of shared memory and main memory access leads to an improved performance in a typical STAP application.« less
A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction

DOE PAGES

Kumar, B.; Huang, C. -H.; Sadayappan, P.; ...

1995-01-01

In this article, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storagemore » of size O(7 n ) for multiplying 2 n × 2 n matrices. We present a modified formulation in which the working storage requirement is reduced to O(4 n ). The modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor. Performance results on a Cray Y-MP8/64 are presented.« less
Distributed learning enhances relational memory consolidation.

PubMed

Litman, Leib; Davachi, Lila

2008-09-01

It has long been known that distributed learning (DL) provides a mnemonic advantage over massed learning (ML). However, the underlying mechanisms that drive this robust mnemonic effect remain largely unknown. In two experiments, we show that DL across a 24 hr interval does not enhance immediate memory performance but instead slows the rate of forgetting relative to ML. Furthermore, we demonstrate that this savings in forgetting is specific to relational, but not item, memory. In the context of extant theories and knowledge of memory consolidation, these results suggest that an important mechanism underlying the mnemonic benefit of DL is enhanced memory consolidation. We speculate that synaptic strengthening mechanisms supporting long-term memory consolidation may be differentially mediated by the spacing of memory reactivation. These findings have broad implications for the scientific study of episodic memory consolidation and, more generally, for educational curriculum development and policy.
Interference from mere thinking: mental rehearsal temporarily disrupts recall of motor memory.

PubMed

Yin, Cong; Wei, Kunlin

2014-08-01

Interference between successively learned tasks is widely investigated to study motor memory. However, how simultaneously learned motor memories interact with each other has been rarely studied despite its prevalence in daily life. Assuming that motor memory shares common neural mechanisms with declarative memory system, we made unintuitive predictions that mental rehearsal, as opposed to further practice, of one motor memory will temporarily impair the recall of another simultaneously learned memory. Subjects simultaneously learned two sensorimotor tasks, i.e., visuomotor rotation and gain. They retrieved one memory by either practice or mental rehearsal and then had their memory evaluated. We found that mental rehearsal, instead of execution, impaired the recall of unretrieved memory. This impairment was content-independent, i.e., retrieving either gain or rotation impaired the other memory. Hence, conscious recollection of one motor memory interferes with the recall of another memory. This is analogous to retrieval-induced forgetting in declarative memory, suggesting a common neural process across memory systems. Our findings indicate that motor imagery is sufficient to induce interference between motor memories. Mental rehearsal, currently widely regarded as beneficial for motor performance, negatively affects memory recall when it is exercised for a subset of memorized items. Copyright © 2014 the American Physiological Society.
Reliable file sharing in distributed operating system using web RTC

NASA Astrophysics Data System (ADS)

Dukiya, Rajesh

2017-12-01

Since, the evolution of distributed operating system, distributed file system is come out to be important part in operating system. P2P is a reliable way in Distributed Operating System for file sharing. It was introduced in 1999, later it became a high research interest topic. Peer to Peer network is a type of network, where peers share network workload and other load related tasks. A P2P network can be a period of time connection, where a bunch of computers connected by a USB (Universal Serial Bus) port to transfer or enable disk sharing i.e. file sharing. Currently P2P requires special network that should be designed in P2P way. Nowadays, there is a big influence of browsers in our life. In this project we are going to study of file sharing mechanism in distributed operating system in web browsers, where we will try to find performance bottlenecks which our research will going to be an improvement in file sharing by performance and scalability in distributed file systems. Additionally, we will discuss the scope of Web Torrent file sharing and free-riding in peer to peer networks.
Towards Memory-Aware Services and Browsing through Lifelogging Sensing

PubMed Central

Arcega, Lorena; Font, Jaime; Cetina, Carlos

2013-01-01

Every day we receive lots of information through our senses that is lost forever, because it lacked the strength or the repetition needed to generate a lasting memory. Combining the emerging Internet of Things and lifelogging sensors, we believe it is possible to build up a Digital Memory (Dig-Mem) in order to complement the fallible memory of people. This work shows how to realize the Dig-Mem in terms of interactions, affinities, activities, goals and protocols. We also complement this Dig-Mem with memory-aware services and a Dig-Mem browser. Furthermore, we propose a RFID Tag-Sharing technique to speed up the adoption of Dig-Mem. Experimentation reveals an improvement of the user understanding of Dig-Mem as time passes, compared to natural memories where the level of detail decreases over time. PMID:24196436
The structure of the clouds distributed operating system

NASA Technical Reports Server (NTRS)

Dasgupta, Partha; Leblanc, Richard J., Jr.

1989-01-01

A novel system architecture, based on the object model, is the central structuring concept used in the Clouds distributed operating system. This architecture makes Clouds attractive over a wide class of machines and environments. Clouds is a native operating system, designed and implemented at Georgia Tech. and runs on a set of generated purpose computers connected via a local area network. The system architecture of Clouds is composed of a system-wide global set of persistent (long-lived) virtual address spaces, called objects that contain persistent data and code. The object concept is implemented at the operating system level, thus presenting a single level storage view to the user. Lightweight treads carry computational activity through the code stored in the objects. The persistent objects and threads gives rise to a programming environment composed of shared permanent memory, dispensing with the need for hardware-derived concepts such as the file systems and message systems. Though the hardware may be distributed and may have disks and networks, the Clouds provides the applications with a logically centralized system, based on a shared, structured, single level store. The current design of Clouds uses a minimalist philosophy with respect to both the kernel and the operating system. That is, the kernel and the operating system support a bare minimum of functionality. Clouds also adheres to the concept of separation of policy and mechanism. Most low-level operating system services are implemented above the kernel and most high level services are implemented at the user level. From the measured performance of using the kernel mechanisms, we are able to demonstrate that efficient implementations are feasible for the object model on commercially available hardware. Clouds provides a rich environment for conducting research in distributed systems. Some of the topics addressed in this paper include distributed programming environments, consistency of persistent data and fault-tolerance.
Simulation Analysis of Data Sharing in Shared Memory Multiprocessors

DTIC Science & Technology

1989-02-24

LIMITATION OF ABSTRACT Same as Report (SAR) 18. NUMBER OF PAGES 178 19a. NAME OF RESPONSIBLE PERSON a. REPORT unclassified b . ABSTRACT unclassified...work. Andrea Casotto (CELL), Steve McGrogan (SPICE), Srinivas Devadas (TOPOP1) and Hi-Keung Tony Ma (VERIFY) donated the parallel programs and a con...Effect of Block Size on B us Utilization 120 5-14 Ratio of Sharing Bus Cyc les to Total Bus Cycles 120 5-15 Oassification of Bus Cyc les for
76 FR 66012 - Partner's Distributive Share

Federal Register 2010, 2011, 2012, 2013, 2014

2011-10-25

[email protected] . SUPPLEMENTARY INFORMATION: Background Subchapter K is intended to permit taxpayers to... Revenue Code provides that a partner's distributive share of income, gain, loss, deduction, or credit... that a partner's distributive share of income, gain, loss, deduction, or credit (or item thereof) shall...
An Efficient Implementation For Real Time Applications Of The Wigner-Ville Distribution

NASA Astrophysics Data System (ADS)

Boashash, Boualem; Black, Peter; Whitehouse, Harper J.

1986-03-01

The Wigner-Ville Distribution (WVD) is a valuable tool for time-frequency signal analysis. In order to implement the WVD in real time an efficient algorithm and architecture have been developed which may be implemented with commercial components. This algorithm successively computes the analytic signal corresponding to the input signal, forms a weighted kernel function and analyses the kernel via a Discrete Fourier Transform (DFT). To evaluate the analytic signal required by the algorithm it is shown that the time domain definition implemented as a finite impulse response (FIR) filter is practical and more efficient than the frequency domain definition of the analytic signal. The windowed resolution of the WVD in the frequency domain is shown to be similar to the resolution of a windowed Fourier Transform. A real time signal processsor has been designed for evaluation of the WVD analysis system. The system is easily paralleled and can be configured to meet a variety of frequency and time resolutions. The arithmetic unit is based on a pair of high speed VLSI floating-point multiplier and adder chips. Dual operand buses and an independent result bus maximize data transfer rates. The system is horizontally microprogrammed and utilizes a full instruction pipeline. Each microinstruction specifies two operand addresses, a result location, the type of arithmetic and the memory configuration. input and output is via shared memory blocks with front-end processors to handle data transfers during the non access periods of the analyzer.
Exploration versus exploitation in space, mind, and society

PubMed Central

Hills, Thomas T.; Todd, Peter M.; Lazer, David; Redish, A. David; Couzin, Iain D.

2015-01-01

Search is a ubiquitous property of life. Although diverse domains have worked on search problems largely in isolation, recent trends across disciplines indicate that the formal properties of these problems share similar structures and, often, similar solutions. Moreover, internal search (e.g., memory search) shows similar characteristics to external search (e.g., spatial foraging), including shared neural mechanisms consistent with a common evolutionary origin across species. Search problems and their solutions also scale from individuals to societies, underlying and constraining problem solving, memory, information search, and scientific and cultural innovation. In summary, search represents a core feature of cognition, with a vast influence on its evolution and processes across contexts and requiring input from multiple domains to understand its implications and scope. PMID:25487706

Dynamic programming on a shared-memory multiprocessor

NASA Technical Reports Server (NTRS)

Edmonds, Phil; Chu, Eleanor; George, Alan

1993-01-01

Three new algorithms for solving dynamic programming problems on a shared-memory parallel computer are described. All three algorithms attempt to balance work load, while keeping synchronization cost low. In particular, for a multiprocessor having p processors, an analysis of the best algorithm shows that the arithmetic cost is O(n-cubed/6p) and that the synchronization cost is O(absolute value of log sub C n) if p much less than n, where C = (2p-1)/(2p + 1) and n is the size of the problem. The low synchronization cost is important for machines where synchronization is expensive. Analysis and experiments show that the best algorithm is effective in balancing the work load and producing high efficiency.
Implementing High-Performance Geometric Multigrid Solver with Naturally Grained Messages

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shan, Hongzhang; Williams, Samuel; Zheng, Yili

2015-10-26

Structured-grid linear solvers often require manually packing and unpacking of communication data to achieve high performance.Orchestrating this process efficiently is challenging, labor-intensive, and potentially error-prone.In this paper, we explore an alternative approach that communicates the data with naturally grained messagesizes without manual packing and unpacking. This approach is the distributed analogue of shared-memory programming, taking advantage of the global addressspace in PGAS languages to provide substantial programming ease. However, its performance may suffer from the large number of small messages. We investigate theruntime support required in the UPC ++ library for this naturally grained version to close the performance gapmore » between the two approaches and attain comparable performance at scale using the High-Performance Geometric Multgrid (HPGMG-FV) benchmark as a driver.« less
Get the gist? The effects of processing depth on false recognition in short-term and long-term memory.

PubMed

Flegal, Kristin E; Reuter-Lorenz, Patricia A

2014-07-01

Gist-based processing has been proposed to account for robust false memories in the converging-associates task. The deep-encoding processes known to enhance verbatim memory also strengthen gist memory and increase distortions of long-term memory (LTM). Recent research has demonstrated that compelling false memory illusions are relatively delay-invariant, also occurring under canonical short-term memory (STM) conditions. To investigate the contributions of gist to false memory at short and long delays, processing depth was manipulated as participants encoded lists of four semantically related words and were probed immediately, following a filled 3- to 4-s retention interval, or approximately 20 min later, in a surprise recognition test. In two experiments, the encoding manipulation dissociated STM and LTM on the frequency, but not the phenomenology, of false memory. Deep encoding at STM increases false recognition rates at LTM, but confidence ratings and remember/know judgments are similar across delays and do not differ as a function of processing depth. These results suggest that some shared and some unique processes underlie false memory illusions at short and long delays.
A theory of working memory without consciousness or sustained activity

PubMed Central

Trübutschek, Darinka; Marti, Sébastien; Ojeda, Andrés; King, Jean-Rémi; Mi, Yuanyuan; Tsodyks, Misha; Dehaene, Stanislas

2017-01-01

Working memory and conscious perception are thought to share similar brain mechanisms, yet recent reports of non-conscious working memory challenge this view. Combining visual masking with magnetoencephalography, we investigate the reality of non-conscious working memory and dissect its neural mechanisms. In a spatial delayed-response task, participants reported the location of a subjectively unseen target above chance-level after several seconds. Conscious perception and conscious working memory were characterized by similar signatures: a sustained desynchronization in the alpha/beta band over frontal cortex, and a decodable representation of target location in posterior sensors. During non-conscious working memory, such activity vanished. Our findings contradict models that identify working memory with sustained neural firing, but are compatible with recent proposals of ‘activity-silent’ working memory. We present a theoretical framework and simulations showing how slowly decaying synaptic changes allow cell assemblies to go dormant during the delay, yet be retrieved above chance-level after several seconds. DOI: http://dx.doi.org/10.7554/eLife.23871.001 PMID:28718763
Parallelization Issues and Particle-In Codes.

NASA Astrophysics Data System (ADS)

Elster, Anne Cathrine

1994-01-01

"Everything should be made as simple as possible, but not simpler." Albert Einstein. The field of parallel scientific computing has concentrated on parallelization of individual modules such as matrix solvers and factorizers. However, many applications involve several interacting modules. Our analyses of a particle-in-cell code modeling charged particles in an electric field, show that these accompanying dependencies affect data partitioning and lead to new parallelization strategies concerning processor, memory and cache utilization. Our test-bed, a KSR1, is a distributed memory machine with a globally shared addressing space. However, most of the new methods presented hold generally for hierarchical and/or distributed memory systems. We introduce a novel approach that uses dual pointers on the local particle arrays to keep the particle locations automatically partially sorted. Complexity and performance analyses with accompanying KSR benchmarks, have been included for both this scheme and for the traditional replicated grids approach. The latter approach maintains load-balance with respect to particles. However, our results demonstrate it fails to scale properly for problems with large grids (say, greater than 128-by-128) running on as few as 15 KSR nodes, since the extra storage and computation time associated with adding the grid copies, becomes significant. Our grid partitioning scheme, although harder to implement, does not need to replicate the whole grid. Consequently, it scales well for large problems on highly parallel systems. It may, however, require load balancing schemes for non-uniform particle distributions. Our dual pointer approach may facilitate this through dynamically partitioned grids. We also introduce hierarchical data structures that store neighboring grid-points within the same cache -line by reordering the grid indexing. This alignment produces a 25% savings in cache-hits for a 4-by-4 cache. A consideration of the input data's effect on the simulation may lead to further improvements. For example, in the case of mean particle drift, it is often advantageous to partition the grid primarily along the direction of the drift. The particle-in-cell codes for this study were tested using physical parameters, which lead to predictable phenomena including plasma oscillations and two-stream instabilities. An overview of the most central references related to parallel particle codes is also given.
Multiple-User, Multitasking, Virtual-Memory Computer System

NASA Technical Reports Server (NTRS)

Generazio, Edward R.; Roth, Don J.; Stang, David B.

1993-01-01

Computer system designed and programmed to serve multiple users in research laboratory. Provides for computer control and monitoring of laboratory instruments, acquisition and anlaysis of data from those instruments, and interaction with users via remote terminals. System provides fast access to shared central processing units and associated large (from megabytes to gigabytes) memories. Underlying concept of system also applicable to monitoring and control of industrial processes.
A warm and friendly memorial session for Helmut Oeschler

NASA Astrophysics Data System (ADS)

Cleymans, Jean; Hippolyte, Boris; Kalweit, Alexander; Müntz, Christian; Stroth, Joachim

2018-02-01

A full session was organized in memory of Helmut Oeschler during the 2017 edition of the Strangeness in Quark Matter Conference. It was heart-warming to discuss with the audience his main achievements and share anecdotes about this exceptionally praised and appreciated colleague, who was also a great friend for many at the conference. A brief summary of the session is provided with these proceedings.
The Effect of the Underlying Distribution in Hurst Exponent Estimation

PubMed Central

Sánchez, Miguel Ángel; Trinidad, Juan E.; García, José; Fernández, Manuel

2015-01-01

In this paper, a heavy-tailed distribution approach is considered in order to explore the behavior of actual financial time series. We show that this kind of distribution allows to properly fit the empirical distribution of the stocks from S&P500 index. In addition to that, we explain in detail why the underlying distribution of the random process under study should be taken into account before using its self-similarity exponent as a reliable tool to state whether that financial series displays long-range dependence or not. Finally, we show that, under this model, no stocks from S&P500 index show persistent memory, whereas some of them do present anti-persistent memory and most of them present no memory at all. PMID:26020942
Knowledge of memory functions in European and Asian American adults and children: the relation to autobiographical memory.

PubMed

Wang, Qi; Koh, Jessie Bee Kim; Song, Qingfang; Hou, Yubo

2015-01-01

This study investigated explicit knowledge of autobiographical memory functions using a newly developed questionnaire. European and Asian American adults (N = 57) and school-aged children (N = 68) indicated their agreement with 13 statements about why people think about and share memories pertaining to four broad functions-self, social, directive and emotion regulation. Children were interviewed for personal memories concurrently with the memory function knowledge assessment and again 3 months later. It was found that adults agreed to the self, social and directive purposes of memory to a greater extent than did children, whereas European American children agreed to the emotion regulation purposes of memory to a greater extent than did European American adults. Furthermore, European American children endorsed more self and emotion regulation functions than did Asian American children, whereas Asian American adults endorsed more directive functions than did European American adults. Children's endorsement of memory functions, particularly social functions, was associated with more detailed and personally meaningful memories. These findings are informative for the understanding of developmental and cultural influences on memory function knowledge and of the relation of such knowledge to autobiographical memory development.
Remembering Nancy. 25 Members of the Montessori Community Share Their Reflections on the Death of the AMS Founder.

ERIC Educational Resources Information Center

Turner, Joy; And Others

1995-01-01

Twenty-five members of the Montessori community share their memories of Dr. Nancy McCormick Rambusch, charismatic founder of the American Montessori movement, early childhood professional, and innovative educator, who died of pancreatic cancer on October 27, 1994. Rambusch's work of 40 years now flowers as an institutionalized educational program…
Shared Values as Anchors of a Learning Community: A Case Study in Information Systems Design

ERIC Educational Resources Information Center

Giordano, Daniela

2004-01-01

This paper examines the role in both individual and organizational learning of the system of values sustained by a community undertaking a design task. The discussion is based on the results of a longitudinal study of a community of novice information system designers supported by a Web-based shared design memory which allows reuse of design…
Literacy outcomes of children with early childhood speech sound disorders: impact of endophenotypes.

PubMed

Lewis, Barbara A; Avrich, Allison A; Freebairn, Lisa A; Hansen, Amy J; Sucheston, Lara E; Kuo, Iris; Taylor, H Gerry; Iyengar, Sudha K; Stein, Catherine M

2011-12-01

To demonstrate that early childhood speech sound disorders (SSD) and later school-age reading, written expression, and spelling skills are influenced by shared endophenotypes that may be in part genetic. Children with SSD and their siblings were assessed at early childhood (ages 4-6 years) and followed at school age (7-12 years). The relationship of shared endophenotypes with early childhood SSD and school-age outcomes and the shared genetic influences on these outcomes were examined. Structural equation modeling demonstrated that oral motor skills, phonological awareness, phonological memory, vocabulary, and speeded naming have varying influences on reading decoding, spelling, spoken language, and written expression at school age. Genetic linkage studies demonstrated linkage for reading, spelling, and written expression measures to regions on chromosomes 1, 3, 6, and 15 that were previously linked to oral motor skills, articulation, phonological memory, and vocabulary at early childhood testing. Endophenotypes predict school-age literacy outcomes over and above that predicted by clinical diagnoses of SSD or language impairment. Findings suggest that these shared endophenotypes and common genetic influences affect early childhood SSD and later school-age reading, spelling, spoken language, and written expression skills.
Literacy Outcomes of Children With Early Childhood Speech Sound Disorders: Impact of Endophenotypes

PubMed Central

Lewis, Barbara A.; Avrich, Allison A.; Freebairn, Lisa A.; Hansen, Amy J.; Sucheston, Lara E.; Kuo, Iris; Taylor, H. Gerry; Iyengar, Sudha K.; Stein, Catherine M.

2012-01-01

Purpose To demonstrate that early childhood speech sound disorders (SSD) and later school-age reading, written expression, and spelling skills are influenced by shared endophenotypes that may be in part genetic. Method Children with SSD and their siblings were assessed at early childhood (ages 4–6 years) and followed at school age (7–12 years). The relationship of shared endophenotypes with early childhood SSD and school-age outcomes and the shared genetic influences on these outcomes were examined. Results Structural equation modeling demonstrated that oral motor skills, phonological awareness, phonological memory, vocabulary, and speeded naming have varying influences on reading decoding, spelling, spoken language, and written expression at school age. Genetic linkage studies demonstrated linkage for reading, spelling, and written expression measures to regions on chromosomes 1, 3, 6, and 15 that were previously linked to oral motor skills, articulation, phonological memory, and vocabulary at early childhood testing. Conclusions Endophenotypes predict school-age literacy outcomes over and above that predicted by clinical diagnoses of SSD or language impairment. Findings suggest that these shared endophenotypes and common genetic influences affect early childhood SSD and later school-age reading, spelling, spoken language, and written expression skills. PMID:21930616
A Biometric Latent Curve Analysis of Memory Decline in Older Men of the NAS-NRC Twin Registry

PubMed Central

McArdle, John J.; Plassman, Brenda L.

2010-01-01

Previous research has shown cognitive abilities to have different biometric patterns of age-changes. Here we examined the variation in episodic memory (Words Recalled) for over 6,000 twin pairs who were initially aged 59-75, and were subsequently re-assessed up to three more times over 12 years. In cross-sectional analyses, variation in Education was explained by strong additive genetic influences (~43%) together with shared family influences (~35%) that were independent of age. The longitudinal phenotypic analysis of the Word Recall task showed systematic linear declines over age, but with positive influences of Education and Retesting. The longitudinal biometric estimation yielded: (a) A separation of non-shared environmental influences and transient measurement error (~50%): (b) Strong additive genetic components of this latent curve (~70% at age 60) with increases over age that reach about 90% by age 90. (c) The minor influences of shared family environment (~17% at age 60) were effectively eliminated by age 75. (d) Non-shared environmental effects play an important role over most of the life-span (peak of 42% at age 70) but their relative role diminishes after age 75. PMID:19404731
Modeling Distributions of Immediate Memory Effects: No Strategies Needed?

ERIC Educational Resources Information Center

Beaman, C. Philip; Neath, Ian; Surprenant, Aimee M.

2008-01-01

Many models of immediate memory predict the presence or absence of various effects, but none have been tested to see whether they predict an appropriate distribution of effect sizes. The authors show that the feature model (J. S. Nairne, 1990) produces appropriate distributions of effect sizes for both the phonological confusion effect and the…
Efficient accesses of data structures using processing near memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jayasena, Nuwan S.; Zhang, Dong Ping; Diez, Paula Aguilera

Systems, apparatuses, and methods for implementing efficient queues and other data structures. A queue may be shared among multiple processors and/or threads without using explicit software atomic instructions to coordinate access to the queue. System software may allocate an atomic queue and corresponding queue metadata in system memory and return, to the requesting thread, a handle referencing the queue metadata. Any number of threads may utilize the handle for accessing the atomic queue. The logic for ensuring the atomicity of accesses to the atomic queue may reside in a management unit in the memory controller coupled to the memory wheremore » the atomic queue is allocated.« less
C-MOS array design techniques: SUMC multiprocessor system study

NASA Technical Reports Server (NTRS)

Clapp, W. A.; Helbig, W. A.; Merriam, A. S.

1972-01-01

The current capabilities of LSI techniques for speed and reliability, plus the possibilities of assembling large configurations of LSI logic and storage elements, have demanded the study of multiprocessors and multiprocessing techniques, problems, and potentialities. Evaluated are three previous systems studies for a space ultrareliable modular computer multiprocessing system, and a new multiprocessing system is proposed that is flexibly configured with up to four central processors, four 1/0 processors, and 16 main memory units, plus auxiliary memory and peripheral devices. This multiprocessor system features a multilevel interrupt, qualified S/360 compatibility for ground-based generation of programs, virtual memory management of a storage hierarchy through 1/0 processors, and multiport access to multiple and shared memory units.
Scalable Parallel Density-based Clustering and Applications

NASA Astrophysics Data System (ADS)

Patwary, Mostofa Ali

2014-04-01

Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
Visualization in aerospace research with a large wall display system

NASA Astrophysics Data System (ADS)

Matsuo, Yuichi

2002-05-01

National Aerospace Laboratory of Japan has built a large- scale visualization system with a large wall-type display. The system has been operational since April 2001 and comprises a 4.6x1.5-meter (15x5-foot) rear projection screen with 3 BARCO 812 high-resolution CRT projectors. The reason we adopted the 3-gun CRT projectors is support for stereoscopic viewing, ease with color/luminosity matching and accuracy of edge-blending. The system is driven by a new SGI Onyx 3400 server of distributed shared-memory architecture with 32 CPUs, 64Gbytes memory, 1.5TBytes FC RAID disk and 6 IR3 graphics pipelines. Software is another important issue for us to make full use of the system. We have introduced some applications available in a multi- projector environment such as AVS/MPE, EnSight Gold and COVISE, and been developing some software tools that create volumetric images with using SGI graphics libraries. The system is mainly used for visualization fo computational fluid dynamics (CFD) simulation sin aerospace research. Visualized CFD results are of our help for designing an improved configuration of aerospace vehicles and analyzing their aerodynamic performances. These days we also use it for various collaborations among researchers.
Sparse distributed memory prototype: Principles of operation

NASA Technical Reports Server (NTRS)

Flynn, Michael J.; Kanerva, Pentti; Ahanin, Bahram; Bhadkamkar, Neal; Flaherty, Paul; Hickey, Philip

1988-01-01

Sparse distributed memory is a generalized random access memory (RAM) for long binary words. Such words can be written into and read from the memory, and they can be used to address the memory. The main attribute of the memory is sensitivity to similarity, meaning that a word can be read back not only by giving the original right address but also by giving one close to it as measured by the Hamming distance between addresses. Large memories of this kind are expected to have wide use in speech and scene analysis, in signal detection and verification, and in adaptive control of automated equipment. The memory can be realized as a simple, massively parallel computer. Digital technology has reached a point where building large memories is becoming practical. The research is aimed at resolving major design issues that have to be faced in building the memories. The design of a prototype memory with 256-bit addresses and from 8K to 128K locations for 256-bit words is described. A key aspect of the design is extensive use of dynamic RAM and other standard components.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.