High Performance Programming Using Explicit Shared Memory Model on Cray T3D1
NASA Technical Reports Server (NTRS)
Simon, Horst D.; Saini, Subhash; Grassi, Charles
1994-01-01
The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry
1998-01-01
This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
Comparison of two paradigms for distributed shared memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Levelt, W.G.; Kaashoek, M.F.; Bal, H.E.
1990-08-01
The paper compares two paradigms for Distributed Shared Memory on loosely coupled computing systems: the shared data-object model as used in Orca, a programming language specially designed for loosely coupled computing systems and the Shared Virtual Memory model. For both paradigms the authors have implemented two systems, one using only point-to-point messages, the other using broadcasting as well. They briefly describe these two paradigms and their implementations. Then they compare their performance on four applications: the traveling salesman problem, alpha-beta search, matrix multiplication and the all pairs shortest paths problem. The measurements show that both paradigms can be used efficientlymore » for programming large-grain parallel applications. Significant speedups were obtained on all applications. The unstructured Shared Virtual Memory paradigm achieves the best absolute performance, although this is largely due to the preliminary nature of the Orca compiler used. The structured shared data-object model achieves the highest speedups and is much easier to program and to debug.« less
Combining Distributed and Shared Memory Models: Approach and Evolution of the Global Arrays Toolkit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nieplocha, Jarek; Harrison, Robert J.; Kumar, Mukul
2002-07-29
Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in the modern computers this characteristic might have a negative impact on performance and scalability. Various techniques, such as code restructuring to increase data reuse and introducing blocking in data accesses, can address the problem and yield performance competitive with message passing[Singh], however at the cost of compromising the ease of use feature. Distributed memory models such as message passing or one-sided communication offer performance and scalability butmore » they compromise the ease-of-use. In this context, the message-passing model is sometimes referred to as?assembly programming for the scientific computing?. The Global Arrays toolkit[GA1, GA2] attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed explicitly by the programmer. This management is achieved by explicit calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be explicitly specified and hence managed. The GA model exposes to the programmer the hierarchical memory of modern high-performance computer systems, and by recognizing the communication overhead for remote data transfer, it promotes data reuse and locality of reference. This paper describes the characteristics of the Global Arrays programming model, capabilities of the toolkit, and discusses its evolution.« less
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)
2002-01-01
The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes
NASA Technical Reports Server (NTRS)
Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Labarta, Jesus; Gimenez, Judit
2004-01-01
With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
Scheduling for Locality in Shared-Memory Multiprocessors
1993-05-01
Submitted in Partial Fulfillment of the Requirements for the Degree ’)iIC Q(JALfryT INSPECTED 5 DOCTOR OF PHILOSOPHY I Accesion For Supervised by NTIS CRAM... architecture on parallel program performance, explain the implications of this trend on popular parallel programming models, and propose system software to 0...decomoosition and scheduling algorithms. I. SUIUECT TERMS IS. NUMBER OF PAGES shared-memory multiprocessors; architecture trends; loop 110 scheduling
Hybrid MPI+OpenMP Programming of an Overset CFD Solver and Performance Investigations
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Jin, Haoqiang H.; Biegel, Bryan (Technical Monitor)
2002-01-01
This report describes a two level parallelization of a Computational Fluid Dynamic (CFD) solver with multi-zone overset structured grids. The approach is based on a hybrid MPI+OpenMP programming model suitable for shared memory and clusters of shared memory machines. The performance investigations of the hybrid application on an SGI Origin2000 (O2K) machine is reported using medium and large scale test problems.
Avoiding and tolerating latency in large-scale next-generation shared-memory multiprocessors
NASA Technical Reports Server (NTRS)
Probst, David K.
1993-01-01
A scalable solution to the memory-latency problem is necessary to prevent the large latencies of synchronization and memory operations inherent in large-scale shared-memory multiprocessors from reducing high performance. We distinguish latency avoidance and latency tolerance. Latency is avoided when data is brought to nearby locales for future reference. Latency is tolerated when references are overlapped with other computation. Latency-avoiding locales include: processor registers, data caches used temporally, and nearby memory modules. Tolerating communication latency requires parallelism, allowing the overlap of communication and computation. Latency-tolerating techniques include: vector pipelining, data caches used spatially, prefetching in various forms, and multithreading in various forms. Relaxing the consistency model permits increased use of avoidance and tolerance techniques. Each model is a mapping from the program text to sets of partial orders on program operations; it is a convention about which temporal precedences among program operations are necessary. Information about temporal locality and parallelism constrains the use of avoidance and tolerance techniques. Suitable architectural primitives and compiler technology are required to exploit the increased freedom to reorder and overlap operations in relaxed models.
A simple modern correctness condition for a space-based high-performance multiprocessor
NASA Technical Reports Server (NTRS)
Probst, David K.; Li, Hon F.
1992-01-01
A number of U.S. national programs, including space-based detection of ballistic missile launches, envisage putting significant computing power into space. Given sufficient progress in low-power VLSI, multichip-module packaging and liquid-cooling technologies, we will see design of high-performance multiprocessors for individual satellites. In very high speed implementations, performance depends critically on tolerating large latencies in interprocessor communication; without latency tolerance, performance is limited by the vastly differing time scales in processor and data-memory modules, including interconnect times. The modern approach to tolerating remote-communication cost in scalable, shared-memory multiprocessors is to use a multithreaded architecture, and alter the semantics of shared memory slightly, at the price of forcing the programmer either to reason about program correctness in a relaxed consistency model or to agree to program in a constrained style. The literature on multiprocessor correctness conditions has become increasingly complex, and sometimes confusing, which may hinder its practical application. We propose a simple modern correctness condition for a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and the parallel programming system.
NASA Technical Reports Server (NTRS)
Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)
2001-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Snyder, L.; Notkin, D.; Adams, L.
1990-03-31
This task relates to research on programming massively parallel computers. Previous work on the Ensamble concept of programming was extended and investigation into nonshared memory models of parallel computation was undertaken. Previous work on the Ensamble concept defined a set of programming abstractions and was used to organize the programming task into three distinct levels; Composition of machine instruction, composition of processes, and composition of phases. It was applied to shared memory models of computations. During the present research period, these concepts were extended to nonshared memory models. During the present research period, one Ph D. thesis was completed, onemore » book chapter, and six conference proceedings were published.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishnamoorthy, Sriram; Daily, Jeffrey A.; Vishnu, Abhinav
2015-11-01
Global Arrays (GA) is a distributed-memory programming model that allows for shared-memory-style programming combined with one-sided communication, to create a set of tools that combine high performance with ease-of-use. GA exposes a relatively straightforward programming abstraction, while supporting fully-distributed data structures, locality of reference, and high-performance communication. GA was originally formulated in the early 1990’s to provide a communication layer for the Northwest Chemistry (NWChem) suite of chemistry modeling codes that was being developed concurrently.
Supporting shared data structures on distributed memory architectures
NASA Technical Reports Server (NTRS)
Koelbel, Charles; Mehrotra, Piyush; Vanrosendale, John
1990-01-01
Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece owned by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. A new programming environment is presented for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. The analysis and program transformations required to implement this environment are described, and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes are described.
NASA Technical Reports Server (NTRS)
Gilbertsen, Noreen D.; Belytschko, Ted
1990-01-01
The implementation of a nonlinear explicit program on a vectorized, concurrent computer with shared memory is described and studied. The conflict between vectorization and concurrency is described and some guidelines are given for optimal block sizes. Several example problems are summarized to illustrate the types of speed-ups which can be achieved by reprogramming as compared to compiler optimization.
Effects of cacheing on multitasking efficiency and programming strategy on an ELXSI 6400
DOE Office of Scientific and Technical Information (OSTI.GOV)
Montry, G.R.; Benner, R.E.
1985-12-01
The impact of a cache/shared memory architecture, and, in particular, the cache coherency problem, upon concurrent algorithm and program development is discussed. In this context, a simple set of programming strategies are proposed which streamline code development and improve code performance when multitasking in a cache/shared memory or distributed memory environment.
NASA Astrophysics Data System (ADS)
Akil, Mohamed
2017-05-01
The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Littlefield, R.J.
1990-02-01
To implement an efficient data-parallel program on a non-shared memory MIMD multicomputer, data and computations must be properly partitioned to achieve good load balance and locality of reference. Programs with irregular data reference patterns often require irregular partitions. Although good partitions may be easy to determine, they can be difficult or impossible to implement in programming languages that provide only regular data distributions, such as blocked or cyclic arrays. We are developing Onyx, a programming system that provides a shared memory model of distributed data structures and extends the concept of data distribution to include irregular and dynamic distributions. Thismore » provides a powerful means to specify irregular partitions. Perhaps surprisingly, programs using it can also execute efficiently. In this paper, we describe and evaluate the Onyx implementation of a model problem that repeatedly executes an irregular but fixed data reference pattern. On an NCUBE hypercube, the speed of the Onyx implementation is comparable to that of carefully handwritten message-passing code.« less
Implementation and performance of parallel Prolog interpreter
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wei, S.; Kale, L.V.; Balkrishna, R.
1988-01-01
In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
Programming model for distributed intelligent systems
NASA Technical Reports Server (NTRS)
Sztipanovits, J.; Biegl, C.; Karsai, G.; Bogunovic, N.; Purves, B.; Williams, R.; Christiansen, T.
1988-01-01
A programming model and architecture which was developed for the design and implementation of complex, heterogeneous measurement and control systems is described. The Multigraph Architecture integrates artificial intelligence techniques with conventional software technologies, offers a unified framework for distributed and shared memory based parallel computational models and supports multiple programming paradigms. The system can be implemented on different hardware architectures and can be adapted to strongly different applications.
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S
2015-01-01
This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
Strategies for Energy Efficient Resource Management of Hybrid Programming Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Dong; Supinski, Bronis de; Schulz, Martin
2013-01-01
Many scientific applications are programmed using hybrid programming models that use both message-passing and shared-memory, due to the increasing prevalence of large-scale systems with multicore, multisocket nodes. Previous work has shown that energy efficiency can be improved using software-controlled execution schemes that consider both the programming model and the power-aware execution capabilities of the system. However, such approaches have focused on identifying optimal resource utilization for one programming model, either shared-memory or message-passing, in isolation. The potential solution space, thus the challenge, increases substantially when optimizing hybrid models since the possible resource configurations increase exponentially. Nonetheless, with the accelerating adoptionmore » of hybrid programming models, we increasingly need improved energy efficiency in hybrid parallel applications on large-scale systems. In this work, we present new software-controlled execution schemes that consider the effects of dynamic concurrency throttling (DCT) and dynamic voltage and frequency scaling (DVFS) in the context of hybrid programming models. Specifically, we present predictive models and novel algorithms based on statistical analysis that anticipate application power and time requirements under different concurrency and frequency configurations. We apply our models and methods to the NPB MZ benchmarks and selected applications from the ASC Sequoia codes. Overall, we achieve substantial energy savings (8.74% on average and up to 13.8%) with some performance gain (up to 7.5%) or negligible performance loss.« less
SMT-Aware Instantaneous Footprint Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roy, Probir; Liu, Xu; Song, Shuaiwen
Modern architectures employ simultaneous multithreading (SMT) to increase thread-level parallelism. SMT threads share many functional units and the whole memory hierarchy of a physical core. Without a careful code design, SMT threads can easily contend with each other for these shared resources, causing severe performance degradation. Minimizing SMT thread contention for HPC applications running on dedicated platforms is very challenging, because they usually spawn threads within Single Program Multiple Data (SPMD) models. To address this important issue, we introduce a simple scheme for SMT-aware code optimization, which aims to reduce the memory contention across SMT threads.
MaMR: High-performance MapReduce programming model for material cloud applications
NASA Astrophysics Data System (ADS)
Jing, Weipeng; Tong, Danyu; Wang, Yangang; Wang, Jingyuan; Liu, Yaqiu; Zhao, Peng
2017-02-01
With the increasing data size in materials science, existing programming models no longer satisfy the application requirements. MapReduce is a programming model that enables the easy development of scalable parallel applications to process big data on cloud computing systems. However, this model does not directly support the processing of multiple related data, and the processing performance does not reflect the advantages of cloud computing. To enhance the capability of workflow applications in material data processing, we defined a programming model for material cloud applications that supports multiple different Map and Reduce functions running concurrently based on hybrid share-memory BSP called MaMR. An optimized data sharing strategy to supply the shared data to the different Map and Reduce stages was also designed. We added a new merge phase to MapReduce that can efficiently merge data from the map and reduce modules. Experiments showed that the model and framework present effective performance improvements compared to previous work.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, J.P.; Bangs, A.L.; Butler, P.L.
Hetero Helix is a programming environment which simulates shared memory on a heterogeneous network of distributed-memory computers. The machines in the network may vary with respect to their native operating systems and internal representation of numbers. Hetero Helix presents a simple programming model to developers, and also considers the needs of designers, system integrators, and maintainers. The key software technology underlying Hetero Helix is the use of a compiler'' which analyzes the data structures in shared memory and automatically generates code which translates data representations from the format native to each machine into a common format, and vice versa. Themore » design of Hetero Helix was motivated in particular by the requirements of robotics applications. Hetero Helix has been used successfully in an integration effort involving 27 CPUs in a heterogeneous network and a body of software totaling roughly 100,00 lines of code. 25 refs., 6 figs.« less
Flexible language constructs for large parallel programs
NASA Technical Reports Server (NTRS)
Rosing, Matthew; Schnabel, Robert
1993-01-01
The goal of the research described is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (MIMD) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include SIMD (Single Instruction Multiple Data), SPMD (Single Program Multiple Data), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression of the variety of algorithms that occur in large scientific computations. An overview of a new language that combines many of these programming models in a clean manner is given. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. An overview of the language and discussion of some of the critical implementation details is given.
A portable approach for PIC on emerging architectures
NASA Astrophysics Data System (ADS)
Decyk, Viktor
2016-03-01
A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.
MPF: A portable message passing facility for shared memory multiprocessors
NASA Technical Reports Server (NTRS)
Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.
1987-01-01
The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.
Time Constraints and Resource Sharing in Adults' Working Memory Spans
ERIC Educational Resources Information Center
Barrouillet, Pierre; Bernardin, Sophie; Camos, Valerie
2004-01-01
This article presents a new model that accounts for working memory spans in adults, the time-based resource-sharing model. The model assumes that both components (i.e., processing and maintenance) of the main working memory tasks require attention and that memory traces decay as soon as attention is switched away. Because memory retrievals are…
UPC++ Programmer’s Guide (v1.0 2017.9)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bachan, J.; Baden, S.; Bonachea, D.
UPC++ is a C++11 library that provides Asynchronous Partitioned Global Address Space (APGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The APGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, APGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, allmore » operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.« less
UPC++ Programmer’s Guide, v1.0-2018.3.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bachan, J.; Baden, S.; Bonachea, Dan
UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operationsmore » that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.« less
Scaling Irregular Applications through Data Aggregation and Software Multithreading
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morari, Alessandro; Tumeo, Antonino; Chavarría-Miranda, Daniel
Bioinformatics, data analytics, semantic databases, knowledge discovery are emerging high performance application areas that exploit dynamic, linked data structures such as graphs, unbalanced trees or unstructured grids. These data structures usually are very large, requiring significantly more memory than available on single shared memory systems. Additionally, these data structures are difficult to partition on distributed memory systems. They also present poor spatial and temporal locality, thus generating unpredictable memory and network accesses. The Partitioned Global Address Space (PGAS) programming model seems suitable for these applications, because it allows using a shared memory abstraction across distributed-memory clusters. However, current PGAS languagesmore » and libraries are built to target regular remote data accesses and block transfers. Furthermore, they usually rely on the Single Program Multiple Data (SPMD) parallel control model, which is not well suited to the fine grained, dynamic and unbalanced parallelism of irregular applications. In this paper we present {\\bf GMT} (Global Memory and Threading library), a custom runtime library that enables efficient execution of irregular applications on commodity clusters. GMT integrates a PGAS data substrate with simple fork/join parallelism and provides automatic load balancing on a per node basis. It implements multi-level aggregation and lightweight multithreading to maximize memory and network bandwidth with fine-grained data accesses and tolerate long data access latencies. A key innovation in the GMT runtime is its thread specialization (workers, helpers and communication threads) that realize the overall functionality. We compare our approach with other PGAS models, such as UPC running using GASNet, and hand-optimized MPI code on a set of typical large-scale irregular applications, demonstrating speedups of an order of magnitude.« less
Singer, Jefferson A; Blagov, Pavel; Berry, Meredith; Oost, Kathryn M
2013-12-01
An integrative model of narrative identity builds on a dual memory system that draws on episodic memory and a long-term self to generate autobiographical memories. Autobiographical memories related to critical goals in a lifetime period lead to life-story memories, which in turn become self-defining memories when linked to an individual's enduring concerns. Self-defining memories that share repetitive emotion-outcome sequences yield narrative scripts, abstracted templates that filter cognitive-affective processing. The life story is the individual's overarching narrative that provides unity and purpose over the life course. Healthy narrative identity combines memory specificity with adaptive meaning-making to achieve insight and well-being, as demonstrated through a literature review of personality and clinical research, as well as new findings from our own research program. A clinical case study drawing on this narrative identity model is also presented with implications for treatment and research. © 2012 Wiley Periodicals, Inc.
The FORCE - A highly portable parallel programming language
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger
1989-01-01
This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
The FORCE: A highly portable parallel programming language
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger
1989-01-01
Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
Virtual memory support for distributed computing environments using a shared data object model
NASA Astrophysics Data System (ADS)
Huang, F.; Bacon, J.; Mapp, G.
1995-12-01
Conventional storage management systems provide one interface for accessing memory segments and another for accessing secondary storage objects. This hinders application programming and affects overall system performance due to mandatory data copying and user/kernel boundary crossings, which in the microkernel case may involve context switches. Memory-mapping techniques may be used to provide programmers with a unified view of the storage system. This paper extends such techniques to support a shared data object model for distributed computing environments in which good support for coherence and synchronization is essential. The approach is based on a microkernel, typed memory objects, and integrated coherence control. A microkernel architecture is used to support multiple coherence protocols and the addition of new protocols. Memory objects are typed and applications can choose the most suitable protocols for different types of object to avoid protocol mismatch. Low-level coherence control is integrated with high-level concurrency control so that the number of messages required to maintain memory coherence is reduced and system-wide synchronization is realized without severely impacting the system performance. These features together contribute a novel approach to the support for flexible coherence under application control.
Shared versus distributed memory multiprocessors
NASA Technical Reports Server (NTRS)
Jordan, Harry F.
1991-01-01
The question of whether multiprocessors should have shared or distributed memory has attracted a great deal of attention. Some researchers argue strongly for building distributed memory machines, while others argue just as strongly for programming shared memory multiprocessors. A great deal of research is underway on both types of parallel systems. Special emphasis is placed on systems with a very large number of processors for computation intensive tasks and considers research and implementation trends. It appears that the two types of systems will likely converge to a common form for large scale multiprocessors.
Flexible Language Constructs for Large Parallel Programs
Rosing, Matt; Schnabel, Robert
1994-01-01
The goal of the research described in this article is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (multiple instruction multiple data [MIMD]) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include single instruction multiple data (SIMD), single program multiple data (SPMD), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression ofmore » the variety of algorithms that occur in large scientific computations. In this article, we give an overview of a new language that combines many of these programming models in a clean manner. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. In this article, we give an overview of the language and discuss some of the critical implementation details.« less
Cooperative Data Sharing: Simple Support for Clusters of SMP Nodes
NASA Technical Reports Server (NTRS)
DiNucci, David C.; Balley, David H. (Technical Monitor)
1997-01-01
Libraries like PVM and MPI send typed messages to allow for heterogeneous cluster computing. Lower-level libraries, such as GAM, provide more efficient access to communication by removing the need to copy messages between the interface and user space in some cases. still lower-level interfaces, such as UNET, get right down to the hardware level to provide maximum performance. However, these are all still interfaces for passing messages from one process to another, and have limited utility in a shared-memory environment, due primarily to the fact that message passing is just another term for copying. This drawback is made more pertinent by today's hybrid architectures (e.g. clusters of SMPs), where it is difficult to know beforehand whether two communicating processes will share memory. As a result, even portable language tools (like HPF compilers) must either map all interprocess communication, into message passing with the accompanying performance degradation in shared memory environments, or they must check each communication at run-time and implement the shared-memory case separately for efficiency. Cooperative Data Sharing (CDS) is a single user-level API which abstracts all communication between processes into the sharing and access coordination of memory regions, in a model which might be described as "distributed shared messages" or "large-grain distributed shared memory". As a result, the user programs to a simple latency-tolerant abstract communication specification which can be mapped efficiently to either a shared-memory or message-passing based run-time system, depending upon the available architecture. Unlike some distributed shared memory interfaces, the user still has complete control over the assignment of data to processors, the forwarding of data to its next likely destination, and the queuing of data until it is needed, so even the relatively high latency present in clusters can be accomodated. CDS does not require special use of an MMU, which can add overhead to some DSM systems, and does not require an SPMD programming model. unlike some message-passing interfaces, CDS allows the user to implement efficient demand-driven applications where processes must "fight" over data, and does not perform copying if processes share memory and do not attempt concurrent writes. CDS also supports heterogeneous computing, dynamic process creation, handlers, and a very simple thread-arbitration mechanism. Additional support for array subsections is currently being considered. The CDS1 API, which forms the kernel of CDS, is built primarily upon only 2 communication primitives, one process initiation primitive, and some data translation (and marshalling) routines, memory allocation routines, and priority control routines. The entire current collection of 28 routines provides enough functionality to implement most (or all) of MPI 1 and 2, which has a much larger interface consisting of hundreds of routines. still, the API is small enough to consider integrating into standard os interfaces for handling inter-process communication in a network-independent way. This approach would also help to solve many of the problems plaguing other higher-level standards such as MPI and PVM which must, in some cases, "play OS" to adequately address progress and process control issues. The CDS2 API, a higher level of interface roughly equivalent in functionality to MPI and to be built entirely upon CDS1, is still being designed. It is intended to add support for the equivalent of communicators, reduction and other collective operations, process topologies, additional support for process creation, and some automatic memory management. CDS2 will not exactly match MPI, because the copy-free semantics of communication from CDS1 will be supported. CDS2 application programs will be free to carefully also use CDS1. CDS1 has been implemented on networks of workstations running unmodified Unix-based operating systems, using UDP/IP and vendor-supplied high- performance locks. Although its inter-node performance is currently unimpressive due to rudimentary implementation technique, it even now outperforms highly-optimized MPI implementation on intra-node communication due to its support for non-copy communication. The similarity of the CDS1 architecture to that of other projects such as UNET and TRAP suggests that the inter-node performance can be increased significantly to surpass MPI or PVM, and it may be possible to migrate some of its functionality to communication controllers.
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study
Radhakrishnan, Hari; Rouson, Damian W. I.; Morris, Karla; ...
2015-01-01
This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were donemore » using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.« less
Memory access in shared virtual memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berrendorf, R.
1992-01-01
Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Memory access in shared virtual memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berrendorf, R.
1992-09-01
Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Parallel Computation of the Regional Ocean Modeling System (ROMS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, P; Song, Y T; Chao, Y
2005-04-05
The Regional Ocean Modeling System (ROMS) is a regional ocean general circulation modeling system solving the free surface, hydrostatic, primitive equations over varying topography. It is free software distributed world-wide for studying both complex coastal ocean problems and the basin-to-global scale ocean circulation. The original ROMS code could only be run on shared-memory systems. With the increasing need to simulate larger model domains with finer resolutions and on a variety of computer platforms, there is a need in the ocean-modeling community to have a ROMS code that can be run on any parallel computer ranging from 10 to hundreds ofmore » processors. Recently, we have explored parallelization for ROMS using the MPI programming model. In this paper, an efficient parallelization strategy for such a large-scale scientific software package, based on an existing shared-memory computing model, is presented. In addition, scientific applications and data-performance issues on a couple of SGI systems, including Columbia, the world's third-fastest supercomputer, are discussed.« less
Address tracing for parallel machines
NASA Technical Reports Server (NTRS)
Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent
1991-01-01
Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures
NASA Technical Reports Server (NTRS)
Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.
2003-01-01
Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.
High Performance Programming Using Explicit Shared Memory Model on the Cray T3D
NASA Technical Reports Server (NTRS)
Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)
1994-01-01
The Cray T3D is the first-phase system in Cray Research Inc.'s (CRI) three-phase massively parallel processing program. In this report we describe the architecture of the T3D, as well as the CRAFT (Cray Research Adaptive Fortran) programming model, and contrast it with PVM, which is also supported on the T3D We present some performance data based on the NAS Parallel Benchmarks to illustrate both architectural and software features of the T3D.
ERIC Educational Resources Information Center
Thaxton, Terry Ann
2011-01-01
In this article, the author takes a multidimensional and personal look at creative writing work in an assisted living facility. The people she works with at the facility have memory loss. She shares her experience working with these people and describes a storytelling workshop that was modeled after Timeslips, a program started by Anne Basting at…
Why are you telling me that? A conceptual model of the social function of autobiographical memory.
Alea, Nicole; Bluck, Susan
2003-03-01
In an effort to stimulate and guide empirical work within a functional framework, this paper provides a conceptual model of the social functions of autobiographical memory (AM) across the lifespan. The model delineates the processes and variables involved when AMs are shared to serve social functions. Components of the model include: lifespan contextual influences, the qualitative characteristics of memory (emotionality and level of detail recalled), the speaker's characteristics (age, gender, and personality), the familiarity and similarity of the listener to the speaker, the level of responsiveness during the memory-sharing process, and the nature of the social relationship in which the memory sharing occurs (valence and length of the relationship). These components are shown to influence the type of social function served and/or, the extent to which social functions are served. Directions for future empirical work to substantiate the model and hypotheses derived from the model are provided.
Programming distributed memory architectures using Kali
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush; Vanrosendale, John
1990-01-01
Programming nonshared memory systems is more difficult than programming shared memory systems, in part because of the relatively low level of current programming environments for such machines. A new programming environment is presented, Kali, which provides a global name space and allows direct access to remote data values. In order to retain efficiency, Kali provides a system on annotations, allowing the user to control those aspects of the program critical to performance, such as data distribution and load balancing. The primitives and constructs provided by the language is described, and some of the issues raised in translating a Kali program for execution on distributed memory systems are also discussed.
Working Memory Span Development: A Time-Based Resource-Sharing Model Account
ERIC Educational Resources Information Center
Barrouillet, Pierre; Gavens, Nathalie; Vergauwe, Evie; Gaillard, Vinciane; Camos, Valerie
2009-01-01
The time-based resource-sharing model (P. Barrouillet, S. Bernardin, & V. Camos, 2004) assumes that during complex working memory span tasks, attention is frequently and surreptitiously switched from processing to reactivate decaying memory traces before their complete loss. Three experiments involving children from 5 to 14 years of age…
Tuning collective communication for Partitioned Global Address Space programming models
Nishtala, Rajesh; Zheng, Yili; Hargrove, Paul H.; ...
2011-06-12
Partitioned Global Address Space (PGAS) languages offer programmers the convenience of a shared memory programming style combined with locality control necessary to run on large-scale distributed memory systems. Even within a PGAS language programmers often need to perform global communication operations such as broadcasts or reductions, which are best performed as collective operations in which a group of threads work together to perform the operation. In this study we consider the problem of implementing collective communication within PGAS languages and explore some of the design trade-offs in both the interface and implementation. In particular, PGAS collectives have semantic issues thatmore » are different than in send–receive style message passing programs, and different implementation approaches that take advantage of the one-sided communication style in these languages. We present an implementation framework for PGAS collectives as part of the GASNet communication layer, which supports shared memory, distributed memory and hybrids. The framework supports a broad set of algorithms for each collective, over which the implementation may be automatically tuned. In conclusion, we demonstrate the benefit of optimized GASNet collectives using application benchmarks written in UPC, and demonstrate that the GASNet collectives can deliver scalable performance on a variety of state-of-the-art parallel machines including a Cray XT4, an IBM BlueGene/P, and a Sun Constellation system with InfiniBand interconnect.« less
Brandon, Nicole R; Beike, Denise R; Cole, Holly E
2017-07-01
Autobiographical memories (AMs) can be used to create and maintain closeness with others [Alea, N., & Bluck, S. (2003). Why are you telling me that? A conceptual model of the social function of autobiographical memory. Memory, 11(2), 165-178]. However, the differential effects of memory specificity are not well established. Two studies with 148 participants tested whether the order in which autobiographical knowledge (AK) and specific episodic AM (EAM) are shared affects feelings of closeness. Participants read two memories hypothetically shared by each of four strangers. The strangers first shared either AK or an EAM, and then shared either AK or an EAM. Participants were randomly assigned to read either positive or negative AMs from the strangers. Findings suggest that people feel closer to those who share positive AMs in the same way they construct memories: starting with general and moving to specific.
A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoemmen, Mark
2010-11-01
Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches formore » orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.« less
Luckey, Chance John; Bhattacharya, Deepta; Goldrath, Ananda W.; Weissman, Irving L.; Benoist, Christophe; Mathis, Diane
2006-01-01
The only cells of the hematopoietic system that undergo self-renewal for the lifetime of the organism are long-term hematopoietic stem cells and memory T and B cells. To determine whether there is a shared transcriptional program among these self-renewing populations, we first compared the gene-expression profiles of naïve, effector and memory CD8+ T cells with those of long-term hematopoietic stem cells, short-term hematopoietic stem cells, and lineage-committed progenitors. Transcripts augmented in memory CD8+ T cells relative to naïve and effector T cells were selectively enriched in long-term hematopoietic stem cells and were progressively lost in their short-term and lineage-committed counterparts. Furthermore, transcripts selectively decreased in memory CD8+ T cells were selectively down-regulated in long-term hematopoietic stem cells and progressively increased with differentiation. To confirm that this pattern was a general property of immunologic memory, we turned to independently generated gene expression profiles of memory, naïve, germinal center, and plasma B cells. Once again, memory-enriched and -depleted transcripts were also appropriately augmented and diminished in long-term hematopoietic stem cells, and their expression correlated with progressive loss of self-renewal function. Thus, there appears to be a common signature of both up- and down-regulated transcripts shared between memory T cells, memory B cells, and long-term hematopoietic stem cells. This signature was not consistently enriched in neural or embryonic stem cell populations and, therefore, appears to be restricted to the hematopoeitic system. These observations provide evidence that the shared phenotype of self-renewal in the hematopoietic system is linked at the molecular level. PMID:16492737
Testing New Programming Paradigms with NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.
2000-01-01
Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
Support of Multidimensional Parallelism in the OpenMP Programming Model
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele
2003-01-01
OpenMP is the current standard for shared-memory programming. While providing ease of parallel programming, the OpenMP programming model also has limitations which often effect the scalability of applications. Examples for these limitations are work distribution and point-to-point synchronization among threads. We propose extensions to the OpenMP programming model which allow the user to easily distribute the work in multiple dimensions and synchronize the workflow among the threads. The proposed extensions include four new constructs and the associated runtime library. They do not require changes to the source code and can be implemented based on the existing OpenMP standard. We illustrate the concept in a prototype translator and test with benchmark codes and a cloud modeling code.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sayan Ghosh, Jeff Hammond
OpenSHMEM is a community effort to unifyt and standardize the SHMEM programming model. MPI (Message Passing Interface) is a well-known community standard for parallel programming using distributed memory. The most recen t release of MPI, version 3.0, was designed in part to support programming models like SHMEM.OSHMPI is an implementation of the OpenSHMEM standard using MPI-3 for the Linux operating system. It is the first implementation of SHMEM over MPI one-sided communication and has the potential to be widely adopted due to the portability and widely availability of Linux and MPI-3. OSHMPI has been tested on a variety of systemsmore » and implementations of MPI-3, includingInfiniBand clusters using MVAPICH2 and SGI shared-memory supercomputers using MPICH. Current support is limited to Linux but may be extended to Apple OSX if there is sufficient interest. The code is opensource via https://github.com/jeffhammond/oshmpi« less
Parallel computing for probabilistic fatigue analysis
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.
1993-01-01
This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
Shared Memory Parallelization of an Implicit ADI-type CFD Code
NASA Technical Reports Server (NTRS)
Hauser, Th.; Huang, P. G.
1999-01-01
A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
Discrete-Slots Models of Visual Working-Memory Response Times
Donkin, Christopher; Nosofsky, Robert M.; Gold, Jason M.; Shiffrin, Richard M.
2014-01-01
Much recent research has aimed to establish whether visual working memory (WM) is better characterized by a limited number of discrete all-or-none slots or by a continuous sharing of memory resources. To date, however, researchers have not considered the response-time (RT) predictions of discrete-slots versus shared-resources models. To complement the past research in this field, we formalize a family of mixed-state, discrete-slots models for explaining choice and RTs in tasks of visual WM change detection. In the tasks under investigation, a small set of visual items is presented, followed by a test item in 1 of the studied positions for which a change judgment must be made. According to the models, if the studied item in that position is retained in 1 of the discrete slots, then a memory-based evidence-accumulation process determines the choice and the RT; if the studied item in that position is missing, then a guessing-based accumulation process operates. Observed RT distributions are therefore theorized to arise as probabilistic mixtures of the memory-based and guessing distributions. We formalize an analogous set of continuous shared-resources models. The model classes are tested on individual subjects with both qualitative contrasts and quantitative fits to RT-distribution data. The discrete-slots models provide much better qualitative and quantitative accounts of the RT and choice data than do the shared-resources models, although there is some evidence for “slots plus resources” when memory set size is very small. PMID:24015956
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++
NASA Technical Reports Server (NTRS)
Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis
1994-01-01
Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
Tolerant (parallel) Programming
NASA Technical Reports Server (NTRS)
DiNucci, David C.; Bailey, David H. (Technical Monitor)
1997-01-01
In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
Multiprocessor architecture: Synthesis and evaluation
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1990-01-01
Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.
A Formal Model of Capacity Limits in Working Memory
ERIC Educational Resources Information Center
Oberauer, Klaus; Kliegl, Reinhold
2006-01-01
A mathematical model of working-memory capacity limits is proposed on the key assumption of mutual interference between items in working memory. Interference is assumed to arise from overwriting of features shared by these items. The model was fit to time-accuracy data of memory-updating tasks from four experiments using nonlinear mixed effect…
Message Passing and Shared Address Space Parallelism on an SMP Cluster
NASA Technical Reports Server (NTRS)
Shan, Hongzhang; Singh, Jaswinder P.; Oliker, Leonid; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2002-01-01
Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI+SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)
1998-01-01
This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
Conditional load and store in a shared memory
Blumrich, Matthias A; Ohmacht, Martin
2015-02-03
A method, system and computer program product for implementing load-reserve and store-conditional instructions in a multi-processor computing system. The computing system includes a multitude of processor units and a shared memory cache, and each of the processor units has access to the memory cache. In one embodiment, the method comprises providing the memory cache with a series of reservation registers, and storing in these registers addresses reserved in the memory cache for the processor units as a result of issuing load-reserve requests. In this embodiment, when one of the processor units makes a request to store data in the memory cache using a store-conditional request, the reservation registers are checked to determine if an address in the memory cache is reserved for that processor unit. If an address in the memory cache is reserved for that processor, the data are stored at this address.
Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Secchi, Simone; Tumeo, Antonino; Villa, Oreste
Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less
Support for Debugging Automatically Parallelized Programs
NASA Technical Reports Server (NTRS)
Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)
2001-01-01
This viewgraph presentation provides information on the technical aspects of debugging computer code that has been automatically converted for use in a parallel computing system. Shared memory parallelization and distributed memory parallelization entail separate and distinct challenges for a debugging program. A prototype system has been developed which integrates various tools for the debugging of automatically parallelized programs including the CAPTools Database which provides variable definition information across subroutines as well as array distribution information.
Rasmussen, Anne S; Habermas, Tilmann
2011-08-01
According to theory, autobiographical memory serves three broad functions of overall usage: directive, self, and social. However, there is evidence to suggest that the tripartite model may be better conceptualised in terms of a four-factor model with two social functions. In the present study we examined the two models in Danish and German samples, using the Thinking About Life Experiences Questionnaire (TALE; Bluck, Alea, Habermas, & Rubin, 2005), which measures the overall usage of the three functions generalised across concrete memories. Confirmatory factor analysis supported the four-factor model and rejected the theoretical three-factor model in both samples. The results are discussed in relation to cultural differences in overall autobiographical memory usage as well as sharing versus non-sharing aspects of social remembering.
NASA Technical Reports Server (NTRS)
Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana
2016-01-01
In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.
Investigating Ground Swarm Robotics Using Agent Based Simulation
2006-12-01
Incorporation of virtual pheromones as a shared memory map is modeled as an additional capability that is found to enhance the robustness and reliability of the...virtual pheromones as a shared memory map is modeled as an additional capability that is found to enhance the robustness and reliability of the swarm... PHEROMONES .......................................... 42 1. Repel Friends under Inorganic SA.................................................. 45 2. Max
Message Passing vs. Shared Address Space on a Cluster of SMPs
NASA Technical Reports Server (NTRS)
Shan, Hongzhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswas, Rupak
2000-01-01
The convergence of scalable computer architectures using clusters of PCs (or PC-SMPs) with commodity networking has become an attractive platform for high end scientific computing. Currently, message-passing and shared address space (SAS) are the two leading programming paradigms for these systems. Message-passing has been standardized with MPI, and is the most common and mature programming approach. However message-passing code development can be extremely difficult, especially for irregular structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality, and high protocol overhead. In this paper, we compare the performance of and programming effort, required for six applications under both programming models on a 32 CPU PC-SMP cluster. Our application suite consists of codes that typically do not exhibit high efficiency under shared memory programming. due to their high communication to computation ratios and complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications: however, on certain classes of problems SAS performance is competitive with MPI. We also present new algorithms for improving the PC cluster performance of MPI collective operations.
Integrating Cache Performance Modeling and Tuning Support in Parallelization Tools
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)
1998-01-01
With the resurgence of distributed shared memory (DSM) systems based on cache-coherent Non Uniform Memory Access (ccNUMA) architectures and increasing disparity between memory and processors speeds, data locality overheads are becoming the greatest bottlenecks in the way of realizing potential high performance of these systems. While parallelization tools and compilers facilitate the users in porting their sequential applications to a DSM system, a lot of time and effort is needed to tune the memory performance of these applications to achieve reasonable speedup. In this paper, we show that integrating cache performance modeling and tuning support within a parallelization environment can alleviate this problem. The Cache Performance Modeling and Prediction Tool (CPMP), employs trace-driven simulation techniques without the overhead of generating and managing detailed address traces. CPMP predicts the cache performance impact of source code level "what-if" modifications in a program to assist a user in the tuning process. CPMP is built on top of a customized version of the Computer Aided Parallelization Tools (CAPTools) environment. Finally, we demonstrate how CPMP can be applied to tune a real Computational Fluid Dynamics (CFD) application.
Early Experiences Writing Performance Portable OpenMP 4 Codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joubert, Wayne; Hernandez, Oscar R
In this paper, we evaluate the recently available directives in OpenMP 4 to parallelize a computational kernel using both the traditional shared memory approach and the newer accelerator targeting capabilities. In addition, we explore various transformations that attempt to increase application performance portability, and examine the expressiveness and performance implications of using these approaches. For example, we want to understand if the target map directives in OpenMP 4 improve data locality when mapped to a shared memory system, as opposed to the traditional first touch policy approach in traditional OpenMP. To that end, we use recent Cray and Intel compilersmore » to measure the performance variations of a simple application kernel when executed on the OLCF s Titan supercomputer with NVIDIA GPUs and the Beacon system with Intel Xeon Phi accelerators attached. To better understand these trade-offs, we compare our results from traditional OpenMP shared memory implementations to the newer accelerator programming model when it is used to target both the CPU and an attached heterogeneous device. We believe the results and lessons learned as presented in this paper will be useful to the larger user community by providing guidelines that can assist programmers in the development of performance portable code.« less
A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction
Kumar, B.; Huang, C. -H.; Sadayappan, P.; ...
1995-01-01
In this article, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storagemore » of size O(7 n ) for multiplying 2 n × 2 n matrices. We present a modified formulation in which the working storage requirement is reduced to O(4 n ). The modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor. Performance results on a Cray Y-MP8/64 are presented.« less
NASA Technical Reports Server (NTRS)
Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.
Parallelization and automatic data distribution for nuclear reactor simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liebrock, L.M.
1997-07-01
Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
Automated quantitative muscle biopsy analysis system
NASA Technical Reports Server (NTRS)
Castleman, Kenneth R. (Inventor)
1980-01-01
An automated system to aid the diagnosis of neuromuscular diseases by producing fiber size histograms utilizing histochemically stained muscle biopsy tissue. Televised images of the microscopic fibers are processed electronically by a multi-microprocessor computer, which isolates, measures, and classifies the fibers and displays the fiber size distribution. The architecture of the multi-microprocessor computer, which is iterated to any required degree of complexity, features a series of individual microprocessors P.sub.n each receiving data from a shared memory M.sub.n-1 and outputing processed data to a separate shared memory M.sub.n+1 under control of a program stored in dedicated memory M.sub.n.
Vienna FORTRAN: A FORTRAN language extension for distributed memory multiprocessors
NASA Technical Reports Server (NTRS)
Chapman, Barbara; Mehrotra, Piyush; Zima, Hans
1991-01-01
Exploiting the performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna FORTRAN is a language extension of FORTRAN which provides the user with a wide range of facilities for such mapping of data structures. However, programs in Vienna FORTRAN are written using global data references. Thus, the user has the advantage of a shared memory programming paradigm while explicitly controlling the placement of data. The basic features of Vienna FORTRAN are presented along with a set of examples illustrating the use of these features.
Parallel Programming Paradigms
1987-07-01
Unclassified IS.. DECLASSIFICATIONIOOWNGRADIN G 16. DISTRIBUTION STATEMENT (of this Report) Distribution of this report is unlimited. 17...8416878 and by the Office of Naval Research Contracts No. N00014-86-K-0264 and No. N00014-85- K-0328. 8 ?~~ O . G 1 49 II Parallel Programming Paradigms...processors -. "to fetch from the same memory cell (list head) and thus seems to favor a shared memory - g implementation [37). In this dissertation, we
NASA Technical Reports Server (NTRS)
Chapman, Barbara; Mehrotra, Piyush; Zima, Hans
1992-01-01
Exploiting the full performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna Fortran is a language extension of Fortran which provides the user with a wide range of facilities for such mapping of data structures. In contrast to current programming practice, programs in Vienna Fortran are written using global data references. Thus, the user has the advantages of a shared memory programming paradigm while explicitly controlling the data distribution. In this paper, we present the language features of Vienna Fortran for FORTRAN 77, together with examples illustrating the use of these features.
6 DOF Nonlinear AUV Simulation Toolbox
1997-01-01
is to supply a flexible 3D -simulation platform for motion visualization, in-lab debugging and testing of mission-specific strategies as well as those...Explorer are modular designed [Smith] in order to cut time and cost for vehicle recontlguration. A flexible 3D -simulation platform is desired to... 3D models. Current implemented modules include a nonlinear dynamic model for the OEX, shared memory and semaphore manager tools, shared memory monitor
Distributed simulation using a real-time shared memory network
NASA Technical Reports Server (NTRS)
Simon, Donald L.; Mattern, Duane L.; Wong, Edmond; Musgrave, Jeffrey L.
1993-01-01
The Advanced Control Technology Branch of the NASA Lewis Research Center performs research in the area of advanced digital controls for aeronautic and space propulsion systems. This work requires the real-time implementation of both control software and complex dynamical models of the propulsion system. We are implementing these systems in a distributed, multi-vendor computer environment. Therefore, a need exists for real-time communication and synchronization between the distributed multi-vendor computers. A shared memory network is a potential solution which offers several advantages over other real-time communication approaches. A candidate shared memory network was tested for basic performance. The shared memory network was then used to implement a distributed simulation of a ramjet engine. The accuracy and execution time of the distributed simulation was measured and compared to the performance of the non-partitioned simulation. The ease of partitioning the simulation, the minimal time required to develop for communication between the processors and the resulting execution time all indicate that the shared memory network is a real-time communication technique worthy of serious consideration.
[Artificial intelligence meeting neuropsychology. Semantic memory in normal and pathological aging].
Aimé, Xavier; Charlet, Jean; Maillet, Didier; Belin, Catherine
2015-03-01
Artificial intelligence (IA) is the subject of much research, but also many fantasies. It aims to reproduce human intelligence in its learning capacity, knowledge storage and computation. In 2014, the Defense Advanced Research Projects Agency (DARPA) started the restoring active memory (RAM) program that attempt to develop implantable technology to bridge gaps in the injured brain and restore normal memory function to people with memory loss caused by injury or disease. In another IA's field, computational ontologies (a formal and shared conceptualization) try to model knowledge in order to represent a structured and unambiguous meaning of the concepts of a target domain. The aim of these structures is to ensure a consensual understanding of their meaning and a univariant use (the same concept is used by all to categorize the same individuals). The first representations of knowledge in the AI's domain are largely based on model tests of semantic memory. This one, as a component of long-term memory is the memory of words, ideas, concepts. It is the only declarative memory system that resists so remarkably to the effects of age. In contrast, non-specific cognitive changes may decrease the performance of elderly in various events and instead report difficulties of access to semantic representations that affect the semantics stock itself. Some dementias, like semantic dementia and Alzheimer's disease, are linked to alteration of semantic memory. We propose in this paper, using the computational ontologies model, a formal and relatively thin modeling, in the service of neuropsychology: 1) for the practitioner with decision support systems, 2) for the patient as cognitive prosthesis outsourced, and 3) for the researcher to study semantic memory.
PANDA: A distributed multiprocessor operating system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chubb, P.
1989-01-01
PANDA is a design for a distributed multiprocessor and an operating system. PANDA is designed to allow easy expansion of both hardware and software. As such, the PANDA kernel provides only message passing and memory and process management. The other features needed for the system (device drivers, secondary storage management, etc.) are provided as replaceable user tasks. The thesis presents PANDA's design and implementation, both hardware and software. PANDA uses multiple 68010 processors sharing memory on a VME bus, each such node potentially connected to others via a high speed network. The machine is completely homogeneous: there are no differencesmore » between processors that are detectable by programs running on the machine. A single two-processor node has been constructed. Each processor contains memory management circuits designed to allow processors to share page tables safely. PANDA presents a programmers' model similar to the hardware model: a job is divided into multiple tasks, each having its own address space. Within each task, multiple processes share code and data. Tasks can send messages to each other, and set up virtual circuits between themselves. Peripheral devices such as disc drives are represented within PANDA by tasks. PANDA divides secondary storage into volumes, each volume being accessed by a volume access task, or VAT. All knowledge about the way that data is stored on a disc is kept in its volume's VAT. The design is such that PANDA should provide a useful testbed for file systems and device drivers, as these can be installed without recompiling PANDA itself, and without rebooting the machine.« less
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Caubet, Jordi; Biegel, Bryan A. (Technical Monitor)
2002-01-01
In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. This system consists of two major components: The OMPItrace dynamic instrumentation mechanism, which allows the tracing of processes and threads and the Paraver graphical user interface for inspection and analyses of the generated traces. We describe how to use the system to conduct a detailed comparative study of a benchmark code implemented in five different programming paradigms applicable for shared memory
Dynamic programming on a shared-memory multiprocessor
NASA Technical Reports Server (NTRS)
Edmonds, Phil; Chu, Eleanor; George, Alan
1993-01-01
Three new algorithms for solving dynamic programming problems on a shared-memory parallel computer are described. All three algorithms attempt to balance work load, while keeping synchronization cost low. In particular, for a multiprocessor having p processors, an analysis of the best algorithm shows that the arithmetic cost is O(n-cubed/6p) and that the synchronization cost is O(absolute value of log sub C n) if p much less than n, where C = (2p-1)/(2p + 1) and n is the size of the problem. The low synchronization cost is important for machines where synchronization is expensive. Analysis and experiments show that the best algorithm is effective in balancing the work load and producing high efficiency.
NASA Astrophysics Data System (ADS)
Ginosar, Ran; Aviely, Peleg; Liran, Tuvia; Alon, Dov; Dobkin, Reuven; Goldberg, Michael
2013-08-01
RC64, a novel 64-core many-core signal processing chip targets DSP performance of 12.8 GIPS, 100 GOPS and 12.8 single precision GFLOS while dissipating only 3 Watts. RC64 employs advanced DSP cores, a multi-bank shared memory and a hardware scheduler, supports DDR2 memory and communicates over five proprietary 6.4 Gbps channels. The programming model employs sequential fine-grain tasks and a separate task map to define task dependencies. RC64 is implemented as a 200 MHz ASIC on Tower 130nm CMOS technology, assembled in hermetically sealed ceramic QFP package and qualified to the highest space standards.
A general model for memory interference in a multiprocessor system with memory hierarchy
NASA Technical Reports Server (NTRS)
Taha, Badie A.; Standley, Hilda M.
1989-01-01
The problem of memory interference in a multiprocessor system with a hierarchy of shared buses and memories is addressed. The behavior of the processors is represented by a sequence of memory requests with each followed by a determined amount of processing time. A statistical queuing network model for determining the extent of memory interference in multiprocessor systems with clusters of memory hierarchies is presented. The performance of the system is measured by the expected number of busy memory clusters. The results of the analytic model are compared with simulation results, and the correlation between them is found to be very high.
SKIRT: Hybrid parallelization of radiative transfer simulations
NASA Astrophysics Data System (ADS)
Verstocken, S.; Van De Putte, D.; Camps, P.; Baes, M.
2017-07-01
We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modelling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behaviour of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.
Implementations of BLAST for parallel computers.
Jülich, A
1995-02-01
The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel
Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel; ...
2017-03-08
Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feo, J.T.
1993-10-01
This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor;more » A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.« less
MULTI: a shared memory approach to cooperative molecular modeling.
Darden, T; Johnson, P; Smith, H
1991-03-01
A general purpose molecular modeling system, MULTI, based on the UNIX shared memory and semaphore facilities for interprocess communication is described. In addition to the normal querying or monitoring of geometric data, MULTI also provides processes for manipulating conformations, and for displaying peptide or nucleic acid ribbons, Connolly surfaces, close nonbonded contacts, crystal-symmetry related images, least-squares superpositions, and so forth. This paper outlines the basic techniques used in MULTI to ensure cooperation among these specialized processes, and then describes how they can work together to provide a flexible modeling environment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram
Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.« less
Work stealing for GPU-accelerated parallel programs in a global address space framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram
Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain« less
Merlin - Massively parallel heterogeneous computing
NASA Technical Reports Server (NTRS)
Wittie, Larry; Maples, Creve
1989-01-01
Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.
Vascular system modeling in parallel environment - distributed and shared memory approaches
Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne
2011-01-01
The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
RC64, a Rad-Hard Many-Core High- Performance DSP for Space Applications
NASA Astrophysics Data System (ADS)
Ginosar, Ran; Aviely, Peleg; Gellis, Hagay; Liran, Tuvia; Israeli, Tsvika; Nesher, Roy; Lange, Fredy; Dobkin, Reuven; Meirov, Henri; Reznik, Dror
2015-09-01
RC64, a novel rad-hard 64-core signal processing chip targets DSP performance of 75 GMACs (16bit), 150 GOPS and 38 single precision GFLOPS while dissipating less than 10 Watts. RC64 integrates advanced DSP cores with a multi-bank shared memory and a hardware scheduler, also supporting DDR2/3 memory and twelve 3.125 Gbps full duplex high speed serial links using SpaceFibre and other protocols. The programming model employs sequential fine-grain tasks and a separate task map to define task dependencies. RC64 is implemented as a 300 MHz integrated circuit on a 65nm CMOS technology, assembled in hermetically sealed ceramic CCGA624 package and qualified to the highest space standards.
RC64, a Rad-Hard Many-Core High-Performance DSP for Space Applications
NASA Astrophysics Data System (ADS)
Ginosar, Ran; Aviely, Peleg; Liran, Tuvia; Alon, Dov; Mandler, Alberto; Lange, Fredy; Dobkin, Reuven; Goldberg, Miki
2014-08-01
RC64, a novel rad-hard 64-core signal processing chip targets DSP performance of 75 GMACs (16bit), 150 GOPS and 20 single precision GFLOPS while dissipating less than 10 Watts. RC64 integrates advanced DSP cores with a multi-bank shared memory and a hardware scheduler, also supporting DDR2/3 memory and twelve 2.5 Gbps full duplex high speed serial links using SpaceFibre and other protocols. The programming model employs sequential fine-grain tasks and a separate task map to define task dependencies. RC64 is implemented as a 300 MHz integrated circuit on a 65nm CMOS technology, assembled in hermetically sealed ceramic CCGA624 package and qualified to the highest space standards.
Crystallographic and general use programs for the XDS Sigma 5 computer
NASA Technical Reports Server (NTRS)
Snyder, R. L.
1973-01-01
Programs in basic FORTRAN 4 are described, which fall into three catagories: (1) interactive programs to be executed under time sharing (BTM); (2) non interactive programs which are executed in batch processing mode (BPM); and (3) large non interactive programs which require more memory than is available in the normal BPM/BTM operating system and must be run overnight on a special system called XRAY which releases about 45,000 words of memory to the user. Programs in catagories (1) and (2) are stored as FORTRAN source files in the account FSNYDER. Programs in catagory (3) are stored in the XRAY system as load modules. The type of file in account FSNYDER is identified by the first two letters in the name.
Olderbak, Sally; Hildebrandt, Andrea; Wilhelm, Oliver
2015-01-01
The shared decline in cognitive abilities, sensory functions (e.g., vision and hearing), and physical health with increasing age is well documented with some research attributing this shared age-related decline to a single common cause (e.g., aging brain). We evaluate the extent to which the common cause hypothesis predicts associations between vision and physical health with social cognition abilities specifically face perception and face memory. Based on a sample of 443 adults (17–88 years old), we test a series of structural equation models, including Multiple Indicator Multiple Cause (MIMIC) models, and estimate the extent to which vision and self-reported physical health are related to face perception and face memory through a common factor, before and after controlling for their fluid cognitive component and the linear effects of age. Results suggest significant shared variance amongst these constructs, with a common factor explaining some, but not all, of the shared age-related variance. Also, we found that the relations of face perception, but not face memory, with vision and physical health could be completely explained by fluid cognition. Overall, results suggest that a single common cause explains most, but not all age-related shared variance with domain specific aging mechanisms evident. PMID:26321998
A Massively Parallel Code for Polarization Calculations
NASA Astrophysics Data System (ADS)
Akiyama, Shizuka; Höflich, Peter
2001-03-01
We present an implementation of our Monte-Carlo radiation transport method for rapidly expanding, NLTE atmospheres for massively parallel computers which utilizes both the distributed and shared memory models. This allows us to take full advantage of the fast communication and low latency inherent to nodes with multiple CPUs, and to stretch the limits of scalability with the number of nodes compared to a version which is based on the shared memory model. Test calculations on a local 20-node Beowulf cluster with dual CPUs showed an improved scalability by about 40%.
Two Maintenance Mechanisms of Verbal Information in Working Memory
ERIC Educational Resources Information Center
Camos, V.; Lagner, P.; Barrouillet, P.
2009-01-01
The present study evaluated the interplay between two mechanisms of maintenance of verbal information in working memory, namely articulatory rehearsal as described in Baddeley's model, and attentional refreshing as postulated in Barrouillet and Camos's Time-Based Resource-Sharing (TBRS) model. In four experiments using complex span paradigm, we…
Coane, Jennifer H; McBride, Dawn M; Termonen, Miia-Liisa; Cutting, J Cooper
2016-01-01
The goal of the present study was to examine the contributions of associative strength and similarity in terms of shared features to the production of false memories in the Deese/Roediger-McDermott list-learning paradigm. Whereas the activation/monitoring account suggests that false memories are driven by automatic associative activation from list items to nonpresented lures, combined with errors in source monitoring, other accounts (e.g., fuzzy trace theory, global-matching models) emphasize the importance of semantic-level similarity, and thus predict that shared features between list and lure items will increase false memory. Participants studied lists of nine items related to a nonpresented lure. Half of the lists consisted of items that were associated but did not share features with the lure, and the other half included items that were equally associated but also shared features with the lure (in many cases, these were taxonomically related items). The two types of lists were carefully matched in terms of a variety of lexical and semantic factors, and the same lures were used across list types. In two experiments, false recognition of the critical lures was greater following the study of lists that shared features with the critical lure, suggesting that similarity at a categorical or taxonomic level contributes to false memory above and beyond associative strength. We refer to this phenomenon as a "feature boost" that reflects additive effects of shared meaning and association strength and is generally consistent with accounts of false memory that have emphasized thematic or feature-level similarity among studied and nonstudied representations.
Multiple-User, Multitasking, Virtual-Memory Computer System
NASA Technical Reports Server (NTRS)
Generazio, Edward R.; Roth, Don J.; Stang, David B.
1993-01-01
Computer system designed and programmed to serve multiple users in research laboratory. Provides for computer control and monitoring of laboratory instruments, acquisition and anlaysis of data from those instruments, and interaction with users via remote terminals. System provides fast access to shared central processing units and associated large (from megabytes to gigabytes) memories. Underlying concept of system also applicable to monitoring and control of industrial processes.
Li, Jian; Bloch, Pavel; Xu, Jing; Sarunic, Marinko V; Shannon, Lesley
2011-05-01
Fourier domain optical coherence tomography (FD-OCT) provides faster line rates, better resolution, and higher sensitivity for noninvasive, in vivo biomedical imaging compared to traditional time domain OCT (TD-OCT). However, because the signal processing for FD-OCT is computationally intensive, real-time FD-OCT applications demand powerful computing platforms to deliver acceptable performance. Graphics processing units (GPUs) have been used as coprocessors to accelerate FD-OCT by leveraging their relatively simple programming model to exploit thread-level parallelism. Unfortunately, GPUs do not "share" memory with their host processors, requiring additional data transfers between the GPU and CPU. In this paper, we implement a complete FD-OCT accelerator on a consumer grade GPU/CPU platform. Our data acquisition system uses spectrometer-based detection and a dual-arm interferometer topology with numerical dispersion compensation for retinal imaging. We demonstrate that the maximum line rate is dictated by the memory transfer time and not the processing time due to the GPU platform's memory model. Finally, we discuss how the performance trends of GPU-based accelerators compare to the expected future requirements of FD-OCT data rates.
Time-Related Decay or Interference-Based Forgetting in Working Memory?
ERIC Educational Resources Information Center
Portrat, Sophie; Barrouillet, Pierre; Camos, Valerie
2008-01-01
The time-based resource-sharing model of working memory assumes that memory traces suffer from a time-related decay when attention is occupied by concurrent activities. Using complex continuous span tasks in which temporal parameters are carefully controlled, P. Barrouillet, S. Bernardin, S. Portrat, E. Vergauwe, & V. Camos (2007) recently…
Developmental Change in Working Memory Strategies: From Passive Maintenance to Active Refreshing
ERIC Educational Resources Information Center
Camos, Valerie; Barrouillet, Pierre
2011-01-01
Change in strategies is often mentioned as a source of memory development. However, though performance in working memory tasks steadily improves during childhood, theories differ in linking this development to strategy changes. Whereas some theories, such as the time-based resource-sharing model, invoke the age-related increase in use and…
Performing an allreduce operation using shared memory
Archer, Charles J [Rochester, MN; Dozsa, Gabor [Ardsley, NY; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2012-04-17
Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Performing an allreduce operation using shared memory
Archer, Charles J; Dozsa, Gabor; Ratterman, Joseph D; Smith, Brian E
2014-06-10
Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Simulation Analysis of Data Sharing in Shared Memory Multiprocessors
1989-02-24
LIMITATION OF ABSTRACT Same as Report (SAR) 18. NUMBER OF PAGES 178 19a. NAME OF RESPONSIBLE PERSON a. REPORT unclassified b . ABSTRACT unclassified...work. Andrea Casotto (CELL), Steve McGrogan (SPICE), Srinivas Devadas (TOPOP1) and Hi-Keung Tony Ma (VERIFY) donated the parallel programs and a con...Effect of Block Size on B us Utilization 120 5-14 Ratio of Sharing Bus Cyc les to Total Bus Cycles 120 5-15 Oassification of Bus Cyc les for
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, George; Marquez, Andres; Choudhury, Sutanay
2012-09-01
Triadic analysis encompasses a useful set of graph mining methods that is centered on the concept of a triad, which is a subgraph of three nodes and the configuration of directed edges across the nodes. Such methods are often applied in the social sciences as well as many other diverse fields. Triadic methods commonly operate on a triad census that counts the number of triads of every possible edge configuration in a graph. Like other graph algorithms, triadic census algorithms do not scale well when graphs reach tens of millions to billions of nodes. To enable the triadic analysis ofmore » large-scale graphs, we developed and optimized a triad census algorithm to efficiently execute on shared memory architectures. We will retrace the development and evolution of a parallel triad census algorithm. Over the course of several versions, we continually adapted the code’s data structures and program logic to expose more opportunities to exploit parallelism on shared memory that would translate into improved computational performance. We will recall the critical steps and modifications that occurred during code development and optimization. Furthermore, we will compare the performances of triad census algorithm versions on three specific systems: Cray XMT, HP Superdome, and AMD multi-core NUMA machine. These three systems have shared memory architectures but with markedly different hardware capabilities to manage parallelism.« less
The force on the flex: Global parallelism and portability
NASA Technical Reports Server (NTRS)
Jordan, H. F.
1986-01-01
A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment.
GPU COMPUTING FOR PARTICLE TRACKING
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nishimura, Hiroshi; Song, Kai; Muriki, Krishna
2011-03-25
This is a feasibility study of using a modern Graphics Processing Unit (GPU) to parallelize the accelerator particle tracking code. To demonstrate the massive parallelization features provided by GPU computing, a simplified TracyGPU program is developed for dynamic aperture calculation. Performances, issues, and challenges from introducing GPU are also discussed. General purpose Computation on Graphics Processing Units (GPGPU) bring massive parallel computing capabilities to numerical calculation. However, the unique architecture of GPU requires a comprehensive understanding of the hardware and programming model to be able to well optimize existing applications. In the field of accelerator physics, the dynamic aperture calculationmore » of a storage ring, which is often the most time consuming part of the accelerator modeling and simulation, can benefit from GPU due to its embarrassingly parallel feature, which fits well with the GPU programming model. In this paper, we use the Tesla C2050 GPU which consists of 14 multi-processois (MP) with 32 cores on each MP, therefore a total of 448 cores, to host thousands ot threads dynamically. Thread is a logical execution unit of the program on GPU. In the GPU programming model, threads are grouped into a collection of blocks Within each block, multiple threads share the same code, and up to 48 KB of shared memory. Multiple thread blocks form a grid, which is executed as a GPU kernel. A simplified code that is a subset of Tracy++ [2] is developed to demonstrate the possibility of using GPU to speed up the dynamic aperture calculation by having each thread track a particle.« less
Discrete Resource Allocation in Visual Working Memory
ERIC Educational Resources Information Center
Barton, Brian; Ester, Edward F.; Awh, Edward
2009-01-01
Are resources in visual working memory allocated in a continuous or a discrete fashion? On one hand, flexible resource models suggest that capacity is determined by a central resource pool that can be flexibly divided such that items of greater complexity receive a larger share of resources. On the other hand, if capacity in working memory is…
Payne, Brennan R.; Gross, Alden L.; Hill, Patrick L.; Parisi, Jeanine M.; Rebok, George W.; Stine-Morrow, Elizabeth A. L.
2018-01-01
With advancing age, episodic memory performance shows marked declines along with concurrent reports of lower subjective memory beliefs. Given that normative age-related declines in episodic memory co-occur with declines in other cognitive domains, we examined the relationship between memory beliefs and multiple domains of cognitive functioning. Confirmatory bi-factor structural equation models were used to parse the shared and independent variance among factors representing episodic memory, psychomotor speed, and executive reasoning in one large cohort study (Senior Odyssey, N = 462), and replicated using another large cohort of healthy older adults (ACTIVE, N = 2,802). Accounting for a general fluid cognitive functioning factor (comprised of the shared variance among measures of episodic memory, speed, and reasoning) attenuated the relationship between objective memory performance and subjective memory beliefs in both samples. Moreover, the general cognitive functioning factor was the strongest predictor of memory beliefs in both samples. These findings are consistent with the notion that dispositional memory beliefs may reflect perceptions of cognition more broadly. This may be one reason why memory beliefs have broad predictive validity for interventions that target fluid cognitive ability. PMID:27685541
Payne, Brennan R; Gross, Alden L; Hill, Patrick L; Parisi, Jeanine M; Rebok, George W; Stine-Morrow, Elizabeth A L
2017-07-01
With advancing age, episodic memory performance shows marked declines along with concurrent reports of lower subjective memory beliefs. Given that normative age-related declines in episodic memory co-occur with declines in other cognitive domains, we examined the relationship between memory beliefs and multiple domains of cognitive functioning. Confirmatory bi-factor structural equation models were used to parse the shared and independent variance among factors representing episodic memory, psychomotor speed, and executive reasoning in one large cohort study (Senior Odyssey, N = 462), and replicated using another large cohort of healthy older adults (ACTIVE, N = 2802). Accounting for a general fluid cognitive functioning factor (comprised of the shared variance among measures of episodic memory, speed, and reasoning) attenuated the relationship between objective memory performance and subjective memory beliefs in both samples. Moreover, the general cognitive functioning factor was the strongest predictor of memory beliefs in both samples. These findings are consistent with the notion that dispositional memory beliefs may reflect perceptions of cognition more broadly. This may be one reason why memory beliefs have broad predictive validity for interventions that target fluid cognitive ability.
NASA Astrophysics Data System (ADS)
Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide
2015-09-01
The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Symbiosis of executive and selective attention in working memory
Vandierendonck, André
2014-01-01
The notion of working memory (WM) was introduced to account for the usage of short-term memory resources by other cognitive tasks such as reasoning, mental arithmetic, language comprehension, and many others. This collaboration between memory and other cognitive tasks can only be achieved by a dedicated WM system that controls task coordination. To that end, WM models include executive control. Nevertheless, other attention control systems may be involved in coordination of memory and cognitive tasks calling on memory resources. The present paper briefly reviews the evidence concerning the role of selective attention in WM activities. A model is proposed in which selective attention control is directly linked to the executive control part of the WM system. The model assumes that apart from storage of declarative information, the system also includes an executive WM module that represents the current task set. Control processes are automatically triggered when particular conditions in these modules are met. As each task set represents the parameter settings and the actions needed to achieve the task goal, it will depend on the specific settings and actions whether selective attention control will have to be shared among the active tasks. Only when such sharing is required, task performance will be affected by the capacity limits of the control system involved. PMID:25152723
Symbiosis of executive and selective attention in working memory.
Vandierendonck, André
2014-01-01
The notion of working memory (WM) was introduced to account for the usage of short-term memory resources by other cognitive tasks such as reasoning, mental arithmetic, language comprehension, and many others. This collaboration between memory and other cognitive tasks can only be achieved by a dedicated WM system that controls task coordination. To that end, WM models include executive control. Nevertheless, other attention control systems may be involved in coordination of memory and cognitive tasks calling on memory resources. The present paper briefly reviews the evidence concerning the role of selective attention in WM activities. A model is proposed in which selective attention control is directly linked to the executive control part of the WM system. The model assumes that apart from storage of declarative information, the system also includes an executive WM module that represents the current task set. Control processes are automatically triggered when particular conditions in these modules are met. As each task set represents the parameter settings and the actions needed to achieve the task goal, it will depend on the specific settings and actions whether selective attention control will have to be shared among the active tasks. Only when such sharing is required, task performance will be affected by the capacity limits of the control system involved.
Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures
2017-10-04
Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These
Robert Hooke's model of memory.
Hintzman, Douglas L
2003-03-01
In 1682 the scientist and inventor Robert Hooke read a lecture to the Royal Society of London, in which he described a mechanistic model of human memory. Yet few psychologists today seem to have heard of Hooke's memory model. The lecture addressed questions of encoding, memory capacity, repetition, retrieval, and forgetting--some of these in a surprisingly modern way. Hooke's model shares several characteristics with the theory of Richard Semon, which came more than 200 years later, but it is more complete. Among the model's interesting properties are that (1) it allows for attention and other top-down influences on encoding; (2) it uses resonance to implement parallel, cue-dependent retrieval; (3) it explains memory for recency; (4) it offers a single-system account of repetition priming; and (5) the power law of forgetting can be derived from the model's assumptions in a straightforward way.
An Investigation of Unified Memory Access Performance in CUDA
Landaverde, Raphael; Zhang, Tiansheng; Coskun, Ayse K.; Herbordt, Martin
2015-01-01
Managing memory between the CPU and GPU is a major challenge in GPU computing. A programming model, Unified Memory Access (UMA), has been recently introduced by Nvidia to simplify the complexities of memory management while claiming good overall performance. In this paper, we investigate this programming model and evaluate its performance and programming model simplifications based on our experimental results. We find that beyond on-demand data transfers to the CPU, the GPU is also able to request subsets of data it requires on demand. This feature allows UMA to outperform full data transfer methods for certain parallel applications and small data sizes. We also find, however, that for the majority of applications and memory access patterns, the performance overheads associated with UMA are significant, while the simplifications to the programming model restrict flexibility for adding future optimizations. PMID:26594668
Hybrid Memory Management for Parallel Execution of Prolog on Shared Memory Multiprocessors
1990-06-01
organizing data to increase locality. The stack structure exhibits greater locality than the heap structure. Tradeoff decisions can also be made on...PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES...University of California at Berkeley,Department of Electrical Engineering and Computer Sciences,Berkeley,CA,94720 8. PERFORMING ORGANIZATION REPORT
Personal semantics: at the crossroads of semantic and episodic memory.
Renoult, Louis; Davidson, Patrick S R; Palombo, Daniela J; Moscovitch, Morris; Levine, Brian
2012-11-01
Declarative memory is usually described as consisting of two systems: semantic and episodic memory. Between these two poles, however, may lie a third entity: personal semantics (PS). PS concerns knowledge of one's past. Although typically assumed to be an aspect of semantic memory, it is essentially absent from existing models of knowledge. Furthermore, like episodic memory (EM), PS is idiosyncratically personal (i.e., not culturally-shared). We show that, depending on how it is operationalized, the neural correlates of PS can look more similar to semantic memory, more similar to EM, or dissimilar to both. We consider three different perspectives to better integrate PS into existing models of declarative memory and suggest experimental strategies for disentangling PS from semantic and episodic memory. Copyright © 2012 Elsevier Ltd. All rights reserved.
Parallel discrete event simulation using shared memory
NASA Technical Reports Server (NTRS)
Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.
1988-01-01
With traditional event-list techniques, evaluating a detailed discrete-event simulation-model can often require hours or even days of computation time. By eliminating the event list and maintaining only sufficient synchronization to ensure causality, parallel simulation can potentially provide speedups that are linear in the numbers of processors. A set of shared-memory experiments, using the Chandy-Misra distributed-simulation algorithm, to simulate networks of queues is presented. Parameters of the study include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential-simulation of most queueing network models.
Thread mapping using system-level model for shared memory multicores
NASA Astrophysics Data System (ADS)
Mitra, Reshmi
Exploring thread-to-core mapping options for a parallel application on a multicore architecture is computationally very expensive. For the same algorithm, the mapping strategy (MS) with the best response time may change with data size and thread counts. The primary challenge is to design a fast, accurate and automatic framework for exploring these MSs for large data-intensive applications. This is to ensure that the users can explore the design space within reasonable machine hours, without thorough understanding on how the code interacts with the platform. Response time is related to the cycles per instructions retired (CPI), taking into account both active and sleep states of the pipeline. This work establishes a hybrid approach, based on Markov Chain Model (MCM) and Model Tree (MT) for system-level steady state CPI prediction. It is designed for shared memory multicore processors with coarse-grained multithreading. The thread status is represented by the MCM states. The program characteristics are modeled as the transition probabilities, representing the system moving between active and suspended thread states. The MT model extrapolates these probabilities for the actual application size (AS) from the smaller AS performance. This aspect of the framework, along with, the use of mathematical expressions for the actual AS performance information, results in a tremendous reduction in the CPI prediction time. The framework is validated using an electromagnetics application. The average performance prediction error for steady state CPI results with 12 different MSs is less than 1%. The total run time of model is of the order of minutes, whereas the actual application execution time is in terms of days.
Using school grounds for nature studies: An exploratory study of elementary teachers' experiences
NASA Astrophysics Data System (ADS)
Willis, Tamra Lee
2001-06-01
The purpose of this study was to gain understanding of the experiences of elementary teachers who used school grounds to do nature studies. Following an inductive, naturalistic approach, the goal was to explore the phenomenon using words of teachers as guides to understanding. Interviews were conducted with a purposeful sampling of ten quality public school teachers in grades K--5 who were well-known for their schoolyard nature programs. Interview questions were focused by a theoretical framework of environmental cognition. Data were gathered about how teachers came to use the outdoors to teach and how they experienced teaching nature studies on the school grounds. A conceptual model of Quality Teachers of Schoolyard Nature Studies was delineated. The model consisted of three components: teacher past and present experiences with nature, teacher beliefs relevant to using the school grounds for nature studies, and teacher action efficacy pertaining to schoolyard nature programs. The model suggested a relationship between teachers' personal experiences' with nature and their beliefs about sharing nature with children. In addition, the model connected teachers' beliefs about schoolyard nature to their action efficacy, i.e. action behavior reflected through motivation and commitment. The participants shared many common experiences and beliefs. Most had extensive childhood experiences in nature and memories of adults who shared nature with them. They did not consider themselves nature experts, but felt they knew the basics of natural science from their own experiences outdoors and from working with children. The teachers' beliefs about schoolyard nature studies developed from several dimensions of their lives: experiences with nature, experiences teaching, and experiences with students. They were motivated to share nature with students on the school grounds by their beliefs that students would come to appreciate and understand nature, just as they had during their own experiences. In addition, they believed that schoolyard nature programs benefitted student learning and enjoyment of learning. The action efficacy of the teachers was influenced by their beliefs about schoolyard nature programs and beliefs in their own competence to overcome challenges and achieve goals. Implications for educational practice and further research were cited.
Toward Enhancing OpenMP's Work-Sharing Directives
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chapman, B M; Huang, L; Jin, H
2006-05-17
OpenMP provides a portable programming interface for shared memory parallel computers (SMPs). Although this interface has proven successful for small SMPs, it requires greater flexibility in light of the steadily growing size of individual SMPs and the recent advent of multithreaded chips. In this paper, we describe two application development experiences that exposed these expressivity problems in the current OpenMP specification. We then propose mechanisms to overcome these limitations, including thread subteams and thread topologies. Thus, we identify language features that improve OpenMP application performance on emerging and large-scale platforms while preserving ease of programming.
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2002-01-01
The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.
Zhu, Hao; Sun, Yan; Rajagopal, Gunaretnam; Mondry, Adrian; Dhar, Pawan
2004-01-01
Background Many arrhythmias are triggered by abnormal electrical activity at the ionic channel and cell level, and then evolve spatio-temporally within the heart. To understand arrhythmias better and to diagnose them more precisely by their ECG waveforms, a whole-heart model is required to explore the association between the massively parallel activities at the channel/cell level and the integrative electrophysiological phenomena at organ level. Methods We have developed a method to build large-scale electrophysiological models by using extended cellular automata, and to run such models on a cluster of shared memory machines. We describe here the method, including the extension of a language-based cellular automaton to implement quantitative computing, the building of a whole-heart model with Visible Human Project data, the parallelization of the model on a cluster of shared memory computers with OpenMP and MPI hybrid programming, and a simulation algorithm that links cellular activity with the ECG. Results We demonstrate that electrical activities at channel, cell, and organ levels can be traced and captured conveniently in our extended cellular automaton system. Examples of some ECG waveforms simulated with a 2-D slice are given to support the ECG simulation algorithm. A performance evaluation of the 3-D model on a four-node cluster is also given. Conclusions Quantitative multicellular modeling with extended cellular automata is a highly efficient and widely applicable method to weave experimental data at different levels into computational models. This process can be used to investigate complex and collective biological activities that can be described neither by their governing differentiation equations nor by discrete parallel computation. Transparent cluster computing is a convenient and effective method to make time-consuming simulation feasible. Arrhythmias, as a typical case, can be effectively simulated with the methods described. PMID:15339335
Parallelization of elliptic solver for solving 1D Boussinesq model
NASA Astrophysics Data System (ADS)
Tarwidi, D.; Adytia, D.
2018-03-01
In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.
Vergauwe, Evie; Hartstra, Egbert; Barrouillet, Pierre; Brass, Marcel
2015-07-15
Working memory is often defined in cognitive psychology as a system devoted to the simultaneous processing and maintenance of information. In line with the time-based resource-sharing model of working memory (TBRS; Barrouillet and Camos, 2015; Barrouillet et al., 2004), there is accumulating evidence that, when memory items have to be maintained while performing a concurrent activity, memory performance depends on the cognitive load of this activity, independently of the domain involved. The present study used fMRI to identify regions in the brain that are sensitive to variations in cognitive load in a domain-general way. More precisely, we aimed at identifying brain areas that activate during maintenance of memory items as a direct function of the cognitive load induced by both verbal and spatial concurrent tasks. Results show that the right IFJ and bilateral SPL/IPS are the only areas showing an increased involvement as cognitive load increases and do so in a domain general manner. When correlating the fMRI signal with the approximated cognitive load as defined by the TBRS model, it was shown that the main focus of the cognitive load-related activation is located in the right IFJ. The present findings indicate that the IFJ makes domain-general contributions to time-based resource-sharing in working memory and allowed us to generate the novel hypothesis by which the IFJ might be the neural basis for the process of rapid switching. We argue that the IFJ might be a crucial part of a central attentional bottleneck in the brain because of its inability to upload more than one task rule at once. Copyright © 2015 Elsevier Inc. All rights reserved.
Shared Semantics and the Use of Organizational Memories for E-Mail Communications.
ERIC Educational Resources Information Center
Schwartz, David G.
1998-01-01
Examines the use of shared semantics information to link concepts in an organizational memory to e-mail communications. Presents a framework for determining shared semantics based on organizational and personal user profiles. Illustrates how shared semantics are used by the HyperMail system to help link organizational memories (OM) content to…
A class Hierarchical, object-oriented approach to virtual memory management
NASA Technical Reports Server (NTRS)
Russo, Vincent F.; Campbell, Roy H.; Johnston, Gary M.
1989-01-01
The Choices family of operating systems exploits class hierarchies and object-oriented programming to facilitate the construction of customized operating systems for shared memory and networked multiprocessors. The software is being used in the Tapestry laboratory to study the performance of algorithms, mechanisms, and policies for parallel systems. Described here are the architectural design and class hierarchy of the Choices virtual memory management system. The software and hardware mechanisms and policies of a virtual memory system implement a memory hierarchy that exploits the trade-off between response times and storage capacities. In Choices, the notion of a memory hierarchy is captured by abstract classes. Concrete subclasses of those abstractions implement a virtual address space, segmentation, paging, physical memory management, secondary storage, and remote (that is, networked) storage. Captured in the notion of a memory hierarchy are classes that represent memory objects. These classes provide a storage mechanism that contains encapsulated data and have methods to read or write the memory object. Each of these classes provides specializations to represent the memory hierarchy.
C-MOS array design techniques: SUMC multiprocessor system study
NASA Technical Reports Server (NTRS)
Clapp, W. A.; Helbig, W. A.; Merriam, A. S.
1972-01-01
The current capabilities of LSI techniques for speed and reliability, plus the possibilities of assembling large configurations of LSI logic and storage elements, have demanded the study of multiprocessors and multiprocessing techniques, problems, and potentialities. Evaluated are three previous systems studies for a space ultrareliable modular computer multiprocessing system, and a new multiprocessing system is proposed that is flexibly configured with up to four central processors, four 1/0 processors, and 16 main memory units, plus auxiliary memory and peripheral devices. This multiprocessor system features a multilevel interrupt, qualified S/360 compatibility for ground-based generation of programs, virtual memory management of a storage hierarchy through 1/0 processors, and multiport access to multiple and shared memory units.
Efficient partitioning and assignment on programs for multiprocessor execution
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1993-01-01
The general problem studied is that of segmenting or partitioning programs for distribution across a multiprocessor system. Efficient partitioning and the assignment of program elements are of great importance since the time consumed in this overhead activity may easily dominate the computation, effectively eliminating any gains made by the use of the parallelism. In this study, the partitioning of sequentially structured programs (written in FORTRAN) is evaluated. Heuristics, developed for similar applications are examined. Finally, a model for queueing networks with finite queues is developed which may be used to analyze multiprocessor system architectures with a shared memory approach to the problem of partitioning. The properties of sequentially written programs form obstacles to large scale (at the procedure or subroutine level) parallelization. Data dependencies of even the minutest nature, reflecting the sequential development of the program, severely limit parallelism. The design of heuristic algorithms is tied to the experience gained in the parallel splitting. Parallelism obtained through the physical separation of data has seen some success, especially at the data element level. Data parallelism on a grander scale requires models that accurately reflect the effects of blocking caused by finite queues. A model for the approximation of the performance of finite queueing networks is developed. This model makes use of the decomposition approach combined with the efficiency of product form solutions.
A model of attention-guided visual perception and recognition.
Rybak, I A; Gusakova, V I; Golovan, A V; Podladchikova, L N; Shevtsova, N A
1998-08-01
A model of visual perception and recognition is described. The model contains: (i) a low-level subsystem which performs both a fovea-like transformation and detection of primary features (edges), and (ii) a high-level subsystem which includes separated 'what' (sensory memory) and 'where' (motor memory) structures. Image recognition occurs during the execution of a 'behavioral recognition program' formed during the primary viewing of the image. The recognition program contains both programmed attention window movements (stored in the motor memory) and predicted image fragments (stored in the sensory memory) for each consecutive fixation. The model shows the ability to recognize complex images (e.g. faces) invariantly with respect to shift, rotation and scale.
Cache-based error recovery for shared memory multiprocessor systems
NASA Technical Reports Server (NTRS)
Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.
1989-01-01
A multiprocessor cache-based checkpointing and recovery scheme for of recovering from transient processor errors in a shared-memory multiprocessor with private caches is presented. New implementation techniques that use checkpoint identifiers and recovery stacks to reduce performance degradation in processor utilization during normal execution are examined. This cache-based checkpointing technique prevents rollback propagation, provides for rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions that take error latency into account are presented.
NASA Technical Reports Server (NTRS)
Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana
2017-01-01
In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.
Partitioning problems in parallel, pipelined and distributed computing
NASA Technical Reports Server (NTRS)
Bokhari, S.
1985-01-01
The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest.
Reducing Interprocessor Dependence in Recoverable Distributed Shared Memory
NASA Technical Reports Server (NTRS)
Janssens, Bob; Fuchs, W. Kent
1994-01-01
Checkpointing techniques in parallel systems use dependency tracking and/or message logging to ensure that a system rolls back to a consistent state. Traditional dependency tracking in distributed shared memory (DSM) systems is expensive because of high communication frequency. In this paper we show that, if designed correctly, a DSM system only needs to consider dependencies due to the transfer of blocks of data, resulting in reduced dependency tracking overhead and reduced potential for rollback propagation. We develop an ownership timestamp scheme to tolerate the loss of block state information and develop a passive server model of execution where interactions between processors are considered atomic. With our scheme, dependencies are significantly reduced compared to the traditional message-passing model.
What we remember affects how we see: spatial working memory steers saccade programming.
Wong, Jason H; Peterson, Matthew S
2013-02-01
Relationships between visual attention, saccade programming, and visual working memory have been hypothesized for over a decade. Awh, Jonides, and Reuter-Lorenz (Journal of Experimental Psychology: Human Perception and Performance 24(3):780-90, 1998) and Awh et al. (Psychological Science 10(5):433-437, 1999) proposed that rehearsing a location in memory also leads to enhanced attentional processing at that location. In regard to eye movements, Belopolsky and Theeuwes (Attention, Perception & Psychophysics 71(3):620-631, 2009) found that holding a location in working memory affects saccade programming, albeit negatively. In three experiments, we attempted to replicate the findings of Belopolsky and Theeuwes (Attention, Perception & Psychophysics 71(3):620-631, 2009) and determine whether the spatial memory effect can occur in other saccade-cuing paradigms, including endogenous central arrow cues and exogenous irrelevant singletons. In the first experiment, our results were the opposite of those in Belopolsky and Theeuwes (Attention, Perception & Psychophysics 71(3):620-631, 2009), in that we found facilitation (shorter saccade latencies) instead of inhibition when the saccade target matched the region in spatial working memory. In Experiment 2, we sought to determine whether the spatial working memory effect would generalize to other endogenous cuing tasks, such as a central arrow that pointed to one of six possible peripheral locations. As in Experiment 1, we found that saccade programming was facilitated when the cued location coincided with the saccade target. In Experiment 3, we explored how spatial memory interacts with other types of cues, such as a peripheral color singleton target or irrelevant onset. In both cases, the eyes were more likely to go to either singleton when it coincided with the location held in spatial working memory. On the basis of these results, we conclude that spatial working memory and saccade programming are likely to share common overlapping circuitry.
The OpenMP Implementation of NAS Parallel Benchmarks and its Performance
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry
1999-01-01
As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.
Hypercluster Parallel Processor
NASA Technical Reports Server (NTRS)
Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela
1992-01-01
Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.
A Parallel Saturation Algorithm on Shared Memory Architectures
NASA Technical Reports Server (NTRS)
Ezekiel, Jonathan; Siminiceanu
2007-01-01
Symbolic state-space generators are notoriously hard to parallelize. However, the Saturation algorithm implemented in the SMART verification tool differs from other sequential symbolic state-space generators in that it exploits the locality of ring events in asynchronous system models. This paper explores whether event locality can be utilized to efficiently parallelize Saturation on shared-memory architectures. Conceptually, we propose to parallelize the ring of events within a decision diagram node, which is technically realized via a thread pool. We discuss the challenges involved in our parallel design and conduct experimental studies on its prototypical implementation. On a dual-processor dual core PC, our studies show speed-ups for several example models, e.g., of up to 50% for a Kanban model, when compared to running our algorithm only on a single core.
Implementing Shared Memory Parallelism in MCBEND
NASA Astrophysics Data System (ADS)
Bird, Adam; Long, David; Dobson, Geoff
2017-09-01
MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
A revised limbic system model for memory, emotion and behaviour.
Catani, Marco; Dell'acqua, Flavio; Thiebaut de Schotten, Michel
2013-09-01
Emotion, memories and behaviour emerge from the coordinated activities of regions connected by the limbic system. Here, we propose an update of the limbic model based on the seminal work of Papez, Yakovlev and MacLean. In the revised model we identify three distinct but partially overlapping networks: (i) the Hippocampal-diencephalic and parahippocampal-retrosplenial network dedicated to memory and spatial orientation; (ii) The temporo-amygdala-orbitofrontal network for the integration of visceral sensation and emotion with semantic memory and behaviour; (iii) the default-mode network involved in autobiographical memories and introspective self-directed thinking. The three networks share cortical nodes that are emerging as principal hubs in connectomic analysis. This revised network model of the limbic system reconciles recent functional imaging findings with anatomical accounts of clinical disorders commonly associated with limbic pathology. Copyright © 2013 Elsevier Ltd. All rights reserved.
Austin, John R
2003-10-01
Previous research on transactive memory has found a positive relationship between transactive memory system development and group performance in single project laboratory and ad hoc groups. Closely related research on shared mental models and expertise recognition supports these findings. In this study, the author examined the relationship between transactive memory systems and performance in mature, continuing groups. A group's transactive memory system, measured as a combination of knowledge stock, knowledge specialization, transactive memory consensus, and transactive memory accuracy, is positively related to group goal performance, external group evaluations, and internal group evaluations. The positive relationship with group performance was found to hold for both task and external relationship transactive memory systems.
Working Memory in Children: A Time-Constrained Functioning Similar to Adults
ERIC Educational Resources Information Center
Portrat, Sophie; Camos, Valerie; Barrouillet, Pierre
2009-01-01
Within the time-based resource-sharing (TBRS) model, we tested a new conception of the relationships between processing and storage in which the core mechanisms of working memory (WM) are time constrained. However, our previous studies were restricted to adults. The current study aimed at demonstrating that these mechanisms are present and…
NASA Technical Reports Server (NTRS)
Stehle, Roy H.; Ogier, Richard G.
1993-01-01
Alternatives for realizing a packet-based network switch for use on a frequency division multiple access/time division multiplexed (FDMA/TDM) geostationary communication satellite were investigated. Each of the eight downlink beams supports eight directed dwells. The design needed to accommodate multicast packets with very low probability of loss due to contention. Three switch architectures were designed and analyzed. An output-queued, shared bus system yielded a functionally simple system, utilizing a first-in, first-out (FIFO) memory per downlink dwell, but at the expense of a large total memory requirement. A shared memory architecture offered the most efficiency in memory requirements, requiring about half the memory of the shared bus design. The processing requirement for the shared-memory system adds system complexity that may offset the benefits of the smaller memory. An alternative design using a shared memory buffer per downlink beam decreases circuit complexity through a distributed design, and requires at most 1000 packets of memory more than the completely shared memory design. Modifications to the basic packet switch designs were proposed to accommodate circuit-switched traffic, which must be served on a periodic basis with minimal delay. Methods for dynamically controlling the downlink dwell lengths were developed and analyzed. These methods adapt quickly to changing traffic demands, and do not add significant complexity or cost to the satellite and ground station designs. Methods for reducing the memory requirement by not requiring the satellite to store full packets were also proposed and analyzed. In addition, optimal packet and dwell lengths were computed as functions of memory size for the three switch architectures.
A message passing kernel for the hypercluster parallel processing test bed
NASA Technical Reports Server (NTRS)
Blech, Richard A.; Quealy, Angela; Cole, Gary L.
1989-01-01
A Message-Passing Kernel (MPK) for the Hypercluster parallel-processing test bed is described. The Hypercluster is being developed at the NASA Lewis Research Center to support investigations of parallel algorithms and architectures for computational fluid and structural mechanics applications. The Hypercluster resembles the hypercube architecture except that each node consists of multiple processors communicating through shared memory. The MPK efficiently routes information through the Hypercluster, using a message-passing protocol when necessary and faster shared-memory communication whenever possible. The MPK also interfaces all of the processors with the Hypercluster operating system (HYCLOPS), which runs on a Front-End Processor (FEP). This approach distributes many of the I/O tasks to the Hypercluster processors and eliminates the need for a separate I/O support program on the FEP.
Proceedings of the second SISAL users` conference
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feo, J T; Frerking, C; Miller, P J
1992-12-01
This report contains papers on the following topics: A sisal code for computing the fourier transform on S{sub N}; five ways to fill your knapsack; simulating material dislocation motion in sisal; candis as an interface for sisal; parallelisation and performance of the burg algorithm on a shared-memory multiprocessor; use of genetic algorithm in sisal to solve the file design problem; implementing FFT`s in sisal; programming and evaluating the performance of signal processing applications in the sisal programming environment; sisal and Von Neumann-based languages: translation and intercommunication; an IF2 code generator for ADAM architecture; program partitioning for NUMA multiprocessor computer systems;more » mapping functional parallelism on distributed memory machines; implicit array copying: prevention is better than cure ; mathematical syntax for sisal; an approach for optimizing recursive functions; implementing arrays in sisal 2.0; Fol: an object oriented extension to the sisal language; twine: a portable, extensible sisal execution kernel; and investigating the memory performance of the optimizing sisal compiler.« less
Kim, Dong H; Lloyd, Christopher; Fernandez, Douglas K; Spielman, Amanda; Bradshaw, David
2017-04-01
The passage of the Affordable Care Act saw the creation of Accountable Care Organizations (ACOs), a new approach to healthcare delivery moving from fee-for-service toward population health. This paper presents a case study of the Memorial Hermann ACO (MHACO), launched in response to the Medicare Shared Savings Program, with goals to align physician and hospital incentives, practice evidence-based medicine, develop care coordination, and increase efficiency. Building blocks included an affiliated primary care network, a clinical integration program (involving shared electronic medical record platforms and quality data reporting), and significant investments in information technology. Presented is the approach taken to form MHACO; the management structure, technology developed, and a 2-year experience. Incorporated in July 2012, the MHACO involved 22 000 Medicare patients. In 2015, Centers for Medicare and Medicaid Services released data showing a composite quality score between 80 and 85 (from a maximum 100) and nearly $53 million in total savings (or 11% of expected expenditure), making MHACO one of the most successful nationally.1 In fewer than 5 years, almost 500 ACOs have developed, and by some estimates, a quarter of Medicare patients are currently enrolled in an ACO. Although ACOs to date have focused on primary care, the future will increasingly involve specialists. At Memorial Hermann, neurosurgeons took an early role in forming collaborative partnerships with the hospital, and started programs that served as precursors to the ACO model. This paper ends with an overview of ACO development, likely changes going forward, and a discussion of the role of specialists in general, and of neurosurgeons in particular. Copyright © 2016 by the Congress of Neurological Surgeons.
High-performance computing — an overview
NASA Astrophysics Data System (ADS)
Marksteiner, Peter
1996-08-01
An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.
1987-01-01
A methodology for writing parallel programs for shared memory multiprocessors has been formalized as an extension to the Fortran language and implemented as a macro preprocessor. The extended language is known as the Force, and this manual describes how to write Force programs and execute them on the Flexible Computer Corporation Flex/32, the Encore Multimax and the Sequent Balance computers. The parallel extension macros are described in detail, but knowledge of Fortran is assumed.
Parallel discrete event simulation: A shared memory approach
NASA Technical Reports Server (NTRS)
Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.
1987-01-01
With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.
Fast quantum Monte Carlo on a GPU
NASA Astrophysics Data System (ADS)
Lutsyshyn, Y.
2015-02-01
We present a scheme for the parallelization of quantum Monte Carlo method on graphical processing units, focusing on variational Monte Carlo simulation of bosonic systems. We use asynchronous execution schemes with shared memory persistence, and obtain an excellent utilization of the accelerator. The CUDA code is provided along with a package that simulates liquid helium-4. The program was benchmarked on several models of Nvidia GPU, including Fermi GTX560 and M2090, and the Kepler architecture K20 GPU. Special optimization was developed for the Kepler cards, including placement of data structures in the register space of the Kepler GPUs. Kepler-specific optimization is discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel W.
Coupled-cluster methods provide highly accurate models of molecular structure by explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix-matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy efficient manner. We achieve up to 240 speedup compared with the best optimized shared memory implementation. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures, (Cray XC30&XC40, BlueGene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance. Nevertheless, we preserve a uni ed interface to both programming models to maintain the productivity of computational quantum chemists.« less
NASA Astrophysics Data System (ADS)
Lai, Siyan; Xu, Ying; Shao, Bo; Guo, Menghan; Lin, Xiaola
2017-04-01
In this paper we study on Monte Carlo method for solving systems of linear algebraic equations (SLAE) based on shared memory. Former research demostrated that GPU can effectively speed up the computations of this issue. Our purpose is to optimize Monte Carlo method simulation on GPUmemoryachritecture specifically. Random numbers are organized to storein shared memory, which aims to accelerate the parallel algorithm. Bank conflicts can be avoided by our Collaborative Thread Arrays(CTA)scheme. The results of experiments show that the shared memory based strategy can speed up the computaions over than 3X at most.
The Impact of Storage on Processing: How Is Information Maintained in Working Memory?
ERIC Educational Resources Information Center
Vergauwe, Evie; Camos, Valérie; Barrouillet, Pierre
2014-01-01
Working memory is typically defined as a system devoted to the simultaneous maintenance and processing of information. However, the interplay between these 2 functions is still a matter of debate in the literature, with views ranging from complete independence to complete dependence. The time-based resource-sharing model assumes that a central…
The CA3 Network as a Memory Store for Spatial Representations
ERIC Educational Resources Information Center
Papp, Gergely; Witter, Menno P.; Treves, Alessandro
2007-01-01
Comparative neuroanatomy suggests that the CA3 region of the mammalian hippocampus is directly homologous with the medio-dorsal pallium in birds and reptiles, with which it largely shares the basic organization of primitive cortex. Autoassociative memory models, which are generically applicable to cortical networks, then help assess how well CA3…
Practical Formal Verification of MPI and Thread Programs
NASA Astrophysics Data System (ADS)
Gopalakrishnan, Ganesh; Kirby, Robert M.
Large-scale simulation codes in science and engineering are written using the Message Passing Interface (MPI). Shared memory threads are widely used directly, or to implement higher level programming abstractions. Traditional debugging methods for MPI or thread programs are incapable of providing useful formal guarantees about coverage. They get bogged down in the sheer number of interleavings (schedules), often missing shallow bugs. In this tutorial we will introduce two practical formal verification tools: ISP (for MPI C programs) and Inspect (for Pthread C programs). Unlike other formal verification tools, ISP and Inspect run directly on user source codes (much like a debugger). They pursue only the relevant set of process interleavings, using our own customized Dynamic Partial Order Reduction algorithms. For a given test harness, DPOR allows these tools to guarantee the absence of deadlocks, instrumented MPI object leaks and communication races (using ISP), and shared memory races (using Inspect). ISP and Inspect have been used to verify large pieces of code: in excess of 10,000 lines of MPI/C for ISP in under 5 seconds, and about 5,000 lines of Pthread/C code in a few hours (and much faster with the use of a cluster or by exploiting special cases such as symmetry) for Inspect. We will also demonstrate the Microsoft Visual Studio and Eclipse Parallel Tools Platform integrations of ISP (these will be available on the LiveCD).
CaLRS: A Critical-Aware Shared LLC Request Scheduling Algorithm on GPGPU
Ma, Jianliang; Meng, Jinglei; Chen, Tianzhou; Wu, Minghui
2015-01-01
Ultra high thread-level parallelism in modern GPUs usually introduces numerous memory requests simultaneously. So there are always plenty of memory requests waiting at each bank of the shared LLC (L2 in this paper) and global memory. For global memory, various schedulers have already been developed to adjust the request sequence. But we find few work has ever focused on the service sequence on the shared LLC. We measured that a big number of GPU applications always queue at LLC bank for services, which provide opportunity to optimize the service order on LLC. Through adjusting the GPU memory request service order, we can improve the schedulability of SM. So we proposed a critical-aware shared LLC request scheduling algorithm (CaLRS) in this paper. The priority representative of memory request is critical for CaLRS. We use the number of memory requests that originate from the same warp but have not been serviced when they arrive at the shared LLC bank to represent the criticality of each warp. Experiments show that the proposed scheme can boost the SM schedulability effectively by promoting the scheduling priority of the memory requests with high criticality and improves the performance of GPU indirectly. PMID:25729772
Parallel 3D-TLM algorithm for simulation of the Earth-ionosphere cavity
NASA Astrophysics Data System (ADS)
Toledo-Redondo, Sergio; Salinas, Alfonso; Morente-Molinera, Juan Antonio; Méndez, Antonio; Fornieles, Jesús; Portí, Jorge; Morente, Juan Antonio
2013-03-01
A parallel 3D algorithm for solving time-domain electromagnetic problems with arbitrary geometries is presented. The technique employed is the Transmission Line Modeling (TLM) method implemented in Shared Memory (SM) environments. The benchmarking performed reveals that the maximum speedup depends on the memory size of the problem as well as multiple hardware factors, like the disposition of CPUs, cache, or memory. A maximum speedup of 15 has been measured for the largest problem. In certain circumstances of low memory requirements, superlinear speedup is achieved using our algorithm. The model is employed to model the Earth-ionosphere cavity, thus enabling a study of the natural electromagnetic phenomena that occur in it. The algorithm allows complete 3D simulations of the cavity with a resolution of 10 km, within a reasonable timescale.
RACER: Effective Race Detection Using AspectJ
NASA Technical Reports Server (NTRS)
Bodden, Eric; Havelund, Klaus
2008-01-01
The limits of coding with joint constraints on detected and undetected error rates Programming errors occur frequently in large software systems, and even more so if these systems are concurrent. In the past, researchers have developed specialized programs to aid programmers detecting concurrent programming errors such as deadlocks, livelocks, starvation and data races. In this work we propose a language extension to the aspect-oriented programming language AspectJ, in the form of three new built-in pointcuts, lock(), unlock() and may be Shared(), which allow programmers to monitor program events where locks are granted or handed back, and where values are accessed that may be shared amongst multiple Java threads. We decide thread-locality using a static thread-local objects analysis developed by others. Using the three new primitive pointcuts, researchers can directly implement efficient monitoring algorithms to detect concurrent programming errors online. As an example, we expose a new algorithm which we call RACER, an adoption of the well-known ERASER algorithm to the memory model of Java. We implemented the new pointcuts as an extension to the Aspect Bench Compiler, implemented the RACER algorithm using this language extension and then applied the algorithm to the NASA K9 Rover Executive. Our experiments proved our implementation very effective. In the Rover Executive RACER finds 70 data races. Only one of these races was previously known.We further applied the algorithm to two other multi-threaded programs written by Computer Science researchers, in which we found races as well.
MOIL-opt: Energy-Conserving Molecular Dynamics on a GPU/CPU system
Ruymgaart, A. Peter; Cardenas, Alfredo E.; Elber, Ron
2011-01-01
We report an optimized version of the molecular dynamics program MOIL that runs on a shared memory system with OpenMP and exploits the power of a Graphics Processing Unit (GPU). The model is of heterogeneous computing system on a single node with several cores sharing the same memory and a GPU. This is a typical laboratory tool, which provides excellent performance at minimal cost. Besides performance, emphasis is made on accuracy and stability of the algorithm probed by energy conservation for explicit-solvent atomically-detailed-models. Especially for long simulations energy conservation is critical due to the phenomenon known as “energy drift” in which energy errors accumulate linearly as a function of simulation time. To achieve long time dynamics with acceptable accuracy the drift must be particularly small. We identify several means of controlling long-time numerical accuracy while maintaining excellent speedup. To maintain a high level of energy conservation SHAKE and the Ewald reciprocal summation are run in double precision. Double precision summation of real-space non-bonded interactions improves energy conservation. In our best option, the energy drift using 1fs for a time step while constraining the distances of all bonds, is undetectable in 10ns simulation of solvated DHFR (Dihydrofolate reductase). Faster options, shaking only bonds with hydrogen atoms, are also very well behaved and have drifts of less than 1kcal/mol per nanosecond of the same system. CPU/GPU implementations require changes in programming models. We consider the use of a list of neighbors and quadratic versus linear interpolation in lookup tables of different sizes. Quadratic interpolation with a smaller number of grid points is faster than linear lookup tables (with finer representation) without loss of accuracy. Atomic neighbor lists were found most efficient. Typical speedups are about a factor of 10 compared to a single-core single-precision code. PMID:22328867
Wang, Qi; Lee, Dasom; Hou, Yubo
2017-07-01
Internet technology provides a new means of recalling and sharing personal memories in the digital age. What is the mnemonic consequence of posting personal memories online? Theories of transactive memory and autobiographical memory would make contrasting predictions. In the present study, college students completed a daily diary for a week, listing at the end of each day all the events that happened to them on that day. They also reported whether they posted any of the events online. Participants received a surprise memory test after the completion of the diary recording and then another test a week later. At both tests, events posted online were significantly more likely than those not posted online to be recalled. It appears that sharing memories online may provide unique opportunities for rehearsal and meaning-making that facilitate memory retention.
Parallel Computing for Probabilistic Response Analysis of High Temperature Composites
NASA Technical Reports Server (NTRS)
Sues, R. H.; Lua, Y. J.; Smith, M. D.
1994-01-01
The objective of this Phase I research was to establish the required software and hardware strategies to achieve large scale parallelism in solving PCM problems. To meet this objective, several investigations were conducted. First, we identified the multiple levels of parallelism in PCM and the computational strategies to exploit these parallelisms. Next, several software and hardware efficiency investigations were conducted. These involved the use of three different parallel programming paradigms and solution of two example problems on both a shared-memory multiprocessor and a distributed-memory network of workstations.
A simple GPU-accelerated two-dimensional MUSCL-Hancock solver for ideal magnetohydrodynamics
NASA Astrophysics Data System (ADS)
Bard, Christopher M.; Dorelli, John C.
2014-02-01
We describe our experience using NVIDIA's CUDA (Compute Unified Device Architecture) C programming environment to implement a two-dimensional second-order MUSCL-Hancock ideal magnetohydrodynamics (MHD) solver on a GTX 480 Graphics Processing Unit (GPU). Taking a simple approach in which the MHD variables are stored exclusively in the global memory of the GTX 480 and accessed in a cache-friendly manner (without further optimizing memory access by, for example, staging data in the GPU's faster shared memory), we achieved a maximum speed-up of ≈126 for a 10242 grid relative to the sequential C code running on a single Intel Nehalem (2.8 GHz) core. This speedup is consistent with simple estimates based on the known floating point performance, memory throughput and parallel processing capacity of the GTX 480.
Optics Program Modified for Multithreaded Parallel Computing
NASA Technical Reports Server (NTRS)
Lou, John; Bedding, Dave; Basinger, Scott
2006-01-01
A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
In Remembrance: September 11, 2001
ERIC Educational Resources Information Center
Haeseler, Martha P.
2002-01-01
In this article, the author shares her experience of being part of the creation of a memorial. mosaic dedicated to those who had died on September 11, 2001. Working with veterans at a long-term outpatient program within a Veterans Administration (VA) Mental Hygiene Clinic, she found that the physical process of constructing something from…
Execution time support for scientific programs on distributed memory machines
NASA Technical Reports Server (NTRS)
Berryman, Harry; Saltz, Joel; Scroggs, Jeffrey
1990-01-01
Optimizations are considered that are required for efficient execution of code segments that consists of loops over distributed data structures. The PARTI (Parallel Automated Runtime Toolkit at ICASE) execution time primitives are designed to carry out these optimizations and can be used to implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to carry out gather and scatter operations on distributed arrays. Communications patterns are derived at runtime, and the appropriate send and receive messages are automatically generated.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Taft, James R.
1999-01-01
Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
Hierarchical resilience with lightweight threads.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wheeler, Kyle Bruce
2011-10-01
This paper proposes methodology for providing robustness and resilience for a highly threaded distributed- and shared-memory environment based on well-defined inputs and outputs to lightweight tasks. These inputs and outputs form a failure 'barrier', allowing tasks to be restarted or duplicated as necessary. These barriers must be expanded based on task behavior, such as communication between tasks, but do not prohibit any given behavior. One of the trends in high-performance computing codes seems to be a trend toward self-contained functions that mimic functional programming. Software designers are trending toward a model of software design where their core functions are specifiedmore » in side-effect free or low-side-effect ways, wherein the inputs and outputs of the functions are well-defined. This provides the ability to copy the inputs to wherever they need to be - whether that's the other side of the PCI bus or the other side of the network - do work on that input using local memory, and then copy the outputs back (as needed). This design pattern is popular among new distributed threading environment designs. Such designs include the Barcelona STARS system, distributed OpenMP systems, the Habanero-C and Habanero-Java systems from Vivek Sarkar at Rice University, the HPX/ParalleX model from LSU, as well as our own Scalable Parallel Runtime effort (SPR) and the Trilinos stateless kernels. This design pattern is also shared by CUDA and several OpenMP extensions for GPU-type accelerators (e.g. the PGI OpenMP extensions).« less
A robot arm simulation with a shared memory multiprocessor machine
NASA Technical Reports Server (NTRS)
Kim, Sung-Soo; Chuang, Li-Ping
1989-01-01
A parallel processing scheme for a single chain robot arm is presented for high speed computation on a shared memory multiprocessor. A recursive formulation that is derived from a virtual work form of the d'Alembert equations of motion is utilized for robot arm dynamics. A joint drive system that consists of a motor rotor and gears is included in the arm dynamics model, in order to take into account gyroscopic effects due to the spinning of the rotor. The fine grain parallelism of mechanical and control subsystem models is exploited, based on independent computation associated with bodies, joint drive systems, and controllers. Efficiency and effectiveness of the parallel scheme are demonstrated through simulations of a telerobotic manipulator arm. Two different mechanical subsystem models, i.e., with and without gyroscopic effects, are compared, to show the trade-off between efficiency and accuracy.
NASA Technical Reports Server (NTRS)
Arenstorf, Norbert S.; Jordan, Harry F.
1987-01-01
A barrier is a method for synchronizing a large number of concurrent computer processes. After considering some basic synchronization mechanisms, a collection of barrier algorithms with either linear or logarithmic depth are presented. A graphical model is described that profiles the execution of the barriers and other parallel programming constructs. This model shows how the interaction between the barrier algorithms and the work that they synchronize can impact their performance. One result is that logarithmic tree structured barriers show good performance when synchronizing fixed length work, while linear self-scheduled barriers show better performance when synchronizing fixed length work with an imbedded critical section. The linear barriers are better able to exploit the process skew associated with critical sections. Timing experiments, performed on an eighteen processor Flex/32 shared memory multiprocessor, that support these conclusions are detailed.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.
2003-01-01
With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.
Rodriguez, Ramon M; Suarez-Alvarez, Beatriz; Lavín, José L; Mosén-Ansorena, David; Baragaño Raneros, Aroa; Márquez-Kisinousky, Leonardo; Aransay, Ana M; Lopez-Larrea, Carlos
2017-01-15
Epigenetic mechanisms play a critical role during differentiation of T cells by contributing to the formation of stable and heritable transcriptional patterns. To better understand the mechanisms of memory maintenance in CD8 + T cells, we performed genome-wide analysis of DNA methylation, histone marking (acetylated lysine 9 in histone H3 and trimethylated lysine 9 in histone), and gene-expression profiles in naive, effector memory (EM), and terminally differentiated EM (TEMRA) cells. Our results indicate that DNA demethylation and histone acetylation are coordinated to generate the transcriptional program associated with memory cells. Conversely, EM and TEMRA cells share a very similar epigenetic landscape. Nonetheless, the TEMRA transcriptional program predicts an innate immunity phenotype associated with genes never reported in these cells, including several mediators of NK cell activation (VAV3 and LYN) and a large array of NK receptors (e.g., KIR2DL3, KIR2DL4, KIR2DL1, KIR3DL1, KIR2DS5). In addition, we identified up to 161 genes that encode transcriptional regulators, some of unknown function in CD8 + T cells, and that were differentially expressed in the course of differentiation. Overall, these results provide new insights into the regulatory networks involved in memory CD8 + T cell maintenance and T cell terminal differentiation. Copyright © 2017 by The American Association of Immunologists, Inc.
A mechanism for efficient debugging of parallel programs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, B.P.; Choi, J.D.
1988-01-01
This paper addresses the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors (SMMP). The authors describe the use of flowback analysis to provide information on causal relationships between events in a program's execution without re-executing the program for debugging. The authors introduce a mechanism called incremental tracing that, by using semantic analyses of the debugged program, makes the flowback analysis practical with only a small amount of trace generated during execution. The extend flowback analysis to apply to parallel programs and describe a method to detect race conditions in the interactions ofmore » the co-operating processes.« less
Software-Controlled Caches in the VMP Multiprocessor
1986-03-01
programming system level that Processors is tuned for the VMP design. In this vein, we are interested in exploring how far the software support can go to ...handled in software, analogously to the handling agement of the shared program state is familiar and of virtual memory page faults. Hardware support for...ensure good behavior, as opposed to how Each cache miss results in bus traffic. Table 2 pro- vides the bus cost for the "average" cache miss. Fig
On the impact of communication complexity in the design of parallel numerical algorithms
NASA Technical Reports Server (NTRS)
Gannon, D.; Vanrosendale, J.
1984-01-01
This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation.
On the impact of communication complexity on the design of parallel numerical algorithms
NASA Technical Reports Server (NTRS)
Gannon, D. B.; Van Rosendale, J.
1984-01-01
This paper describes two models of the cost of data movement in parallel numerical alorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In this second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm-independent upper bounds on system performance are derived for several problems that are important to scientific computation.
Li, Ji-Qing; Zhang, Yu-Shan; Ji, Chang-Ming; Wang, Ai-Jing; Lund, Jay R
2013-01-01
This paper examines long-term optimal operation using dynamic programming for a large hydropower system of 10 reservoirs in Northeast China. Besides considering flow and hydraulic head, the optimization explicitly includes time-varying electricity market prices to maximize benefit. Two techniques are used to reduce the 'curse of dimensionality' of dynamic programming with many reservoirs. Discrete differential dynamic programming (DDDP) reduces the search space and computer memory needed. Object-oriented programming (OOP) and the ability to dynamically allocate and release memory with the C++ language greatly reduces the cumulative effect of computer memory for solving multi-dimensional dynamic programming models. The case study shows that the model can reduce the 'curse of dimensionality' and achieve satisfactory results.
ERIC Educational Resources Information Center
Vergauwe, Evie; Barrouillet, Pierre; Camos, Valerie
2009-01-01
Examinations of interference between visual and spatial materials in working memory have suggested domain- and process-based fractionations of visuo-spatial working memory. The present study examined the role of central time-based resource sharing in visuo-spatial working memory and assessed its role in obtained interference patterns. Visual and…
A New Extension Model: The Memorial Middle School Agricultural Extension and Education Center
ERIC Educational Resources Information Center
Skelton, Peter; Seevers, Brenda
2010-01-01
The Memorial Middle School Agricultural Extension and Education Center is a new model for Extension. The center applies the Cooperative Extension Service System philosophy and mission to developing public education-based programs. Programming primarily serves middle school students and teachers through agricultural and natural resource science…
An implementation of the SNR high speed network communication protocol (Receiver part)
NASA Astrophysics Data System (ADS)
Wan, Wen-Jyh
1995-03-01
This thesis work is to implement the receiver pan of the SNR high speed network transport protocol. The approach was to use the Systems of Communicating Machines (SCM) as the formal definition of the protocol. Programs were developed on top of the Unix system using C programming language. The Unix system features that were adopted for this implementation were multitasking, signals, shared memory, semaphores, sockets, timers and process control. The problems encountered, and solved, were signal loss, shared memory conflicts, process synchronization, scheduling, data alignment and errors in the SCM specification itself. The result was a correctly functioning program which implemented the SNR protocol. The system was tested using different connection modes, lost packets, duplicate packets and large data transfers. The contributions of this thesis are: (1) implementation of the receiver part of the SNR high speed transport protocol; (2) testing and integration with the transmitter part of the SNR transport protocol on an FDDI data link layered network; (3) demonstration of the functions of the SNR transport protocol such as connection management, sequenced delivery, flow control and error recovery using selective repeat methods of retransmission; and (4) modifications to the SNR transport protocol specification such as corrections for incorrect predicate conditions, defining of additional packet types formats, solutions for signal lost and processes contention problems etc.
Programming parallel architectures: The BLAZE family of languages
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush
1988-01-01
Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boman, Erik G.
This LDRD project was a campus exec fellowship to fund (in part) Donald Nguyen’s PhD research at UT-Austin. His work has focused on parallel programming models, and scheduling irregular algorithms on shared-memory systems using the Galois framework. Galois provides a simple but powerful way for users and applications to automatically obtain good parallel performance using certain supported data containers. The naïve user can write serial code, while advanced users can optimize performance by advanced features, such as specifying the scheduling policy. Galois was used to parallelize two sparse matrix reordering schemes: RCM and Sloan. Such reordering is important in high-performancemore » computing to obtain better data locality and thus reduce run times.« less
Debugging Fortran on a shared memory machine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allen, T.R.; Padua, D.A.
1987-01-01
Debugging on a parallel processor is more difficult than debugging on a serial machine because errors in a parallel program may introduce nondeterminism. The approach to parallel debugging presented here attempts to reduce the problem of debugging on a parallel machine to that of debugging on a serial machine by automatically detecting nondeterminism. 20 refs., 6 figs.
A Simple GPU-Accelerated Two-Dimensional MUSCL-Hancock Solver for Ideal Magnetohydrodynamics
NASA Technical Reports Server (NTRS)
Bard, Christopher; Dorelli, John C.
2013-01-01
We describe our experience using NVIDIA's CUDA (Compute Unified Device Architecture) C programming environment to implement a two-dimensional second-order MUSCL-Hancock ideal magnetohydrodynamics (MHD) solver on a GTX 480 Graphics Processing Unit (GPU). Taking a simple approach in which the MHD variables are stored exclusively in the global memory of the GTX 480 and accessed in a cache-friendly manner (without further optimizing memory access by, for example, staging data in the GPU's faster shared memory), we achieved a maximum speed-up of approx. = 126 for a sq 1024 grid relative to the sequential C code running on a single Intel Nehalem (2.8 GHz) core. This speedup is consistent with simple estimates based on the known floating point performance, memory throughput and parallel processing capacity of the GTX 480.
An Interactive Simulation Program for Exploring Computational Models of Auto-Associative Memory.
Fink, Christian G
2017-01-01
While neuroscience students typically learn about activity-dependent plasticity early in their education, they often struggle to conceptually connect modification at the synaptic scale with network-level neuronal dynamics, not to mention with their own everyday experience of recalling a memory. We have developed an interactive simulation program (based on the Hopfield model of auto-associative memory) that enables the user to visualize the connections generated by any pattern of neural activity, as well as to simulate the network dynamics resulting from such connectivity. An accompanying set of student exercises introduces the concepts of pattern completion, pattern separation, and sparse versus distributed neural representations. Results from a conceptual assessment administered before and after students worked through these exercises indicate that the simulation program is a useful pedagogical tool for illustrating fundamental concepts of computational models of memory.
Parallel processing for scientific computations
NASA Technical Reports Server (NTRS)
Alkhatib, Hasan S.
1995-01-01
The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.
A pervasive parallel framework for visualization: final report for FWP 10-014707
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth D.
2014-01-01
We are on the threshold of a transformative change in the basic architecture of highperformance computing. The use of accelerator processors, characterized by large core counts, shared but asymmetrical memory, and heavy thread loading, is quickly becoming the norm in high performance computing. These accelerators represent significant challenges in updating our existing base of software. An intrinsic problem with this transition is a fundamental programming shift from message passing processes to much more fine thread scheduling with memory sharing. Another problem is the lack of stability in accelerator implementation; processor and compiler technology is currently changing rapidly. This report documentsmore » the results of our three-year ASCR project to address these challenges. Our project includes the development of the Dax toolkit, which contains the beginnings of new algorithms for a new generation of computers and the underlying infrastructure to rapidly prototype and build further algorithms as necessary.« less
NASA Technical Reports Server (NTRS)
Harper, Richard E.; Butler, Bryan P.
1990-01-01
The Draper fault-tolerant processor with fault-tolerant shared memory (FTP/FTSM), which is designed to allow application tasks to continue execution during the memory alignment process, is described. Processor performance is not affected by memory alignment. In addition, the FTP/FTSM incorporates a hardware scrubber device to perform the memory alignment quickly during unused memory access cycles. The FTP/FTSM architecture is described, followed by an estimate of the time required for channel reintegration.
Wang, Qi
2006-01-01
The relations of maternal reminiscing style and child self-concept to children's shared and independent autobiographical memories were examined in a sample of 189 three-year-olds and their mothers from Chinese families in China, first-generation Chinese immigrant families in the United States, and European American families. Mothers shared memories with their children and completed questionnaires; children recounted autobiographical events and described themselves with a researcher. Independent of culture, gender, child age, and language skills, maternal elaborations and evaluations were associated with children's shared memory reports, and maternal evaluations and child agentic self-focus were associated with children's independent memory reports. Maternal style and child self-concept further mediated cultural influences on children's memory. The findings provide insight into the social-cultural construction of autobiographical memory.
An enhanced Ada run-time system for real-time embedded processors
NASA Technical Reports Server (NTRS)
Sims, J. T.
1991-01-01
An enhanced Ada run-time system has been developed to support real-time embedded processor applications. The primary focus of this development effort has been on the tasking system and the memory management facilities of the run-time system. The tasking system has been extended to support efficient and precise periodic task execution as required for control applications. Event-driven task execution providing a means of task-asynchronous control and communication among Ada tasks is supported in this system. Inter-task control is even provided among tasks distributed on separate physical processors. The memory management system has been enhanced to provide object allocation and protected access support for memory shared between disjoint processors, each of which is executing a distinct Ada program.
Effect of virtual memory on efficient solution of two model problems
NASA Technical Reports Server (NTRS)
Lambiotte, J. J., Jr.
1977-01-01
Computers with virtual memory architecture allow programs to be written as if they were small enough to be contained in memory. Two types of problems are investigated to show that this luxury can lead to quite an inefficient performance if the programmer does not interact strongly with the characteristics of the operating system when developing the program. The two problems considered are the simultaneous solutions of a large linear system of equations by Gaussian elimination and a model three-dimensional finite-difference problem. The Control Data STAR-100 computer runs are made to demonstrate the inefficiencies of programming the problems in the manner one would naturally do if the problems were indeed, small enough to be contained in memory. Program redesigns are presented which achieve large improvements in performance through changes in the computational procedure and the data base arrangement.
SAHAYOG: A Testbed for Load Sharing under Failure,
1987-07-01
messages, shared memory and semaphores . To communicate using messages, processes create message queues using system-provided prim- itives. The message...The size of the memory that is to be shared is decided by the process when it makes a request for memory allocation. The semaphore option of IPC can be...used to prevent two or more concurrent processes from executing their critical sections at the same time. Semaphores must be used when the processes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buntinas, D.; Mercier, G.; Gropp, W.
2007-09-01
This paper presents the implementation of MPICH2 over the Nemesis communication subsystem and the evaluation of its shared-memory performance. We describe design issues as well as some of the optimization techniques we employed. We conducted a performance evaluation over shared memory using microbenchmarks. The evaluation shows that MPICH2 Nemesis has very low communication overhead, making it suitable for smaller-grained applications.
Buhusi, Catalin V; Meck, Warren H
2009-07-12
Individuals time as if using a stopwatch that can be stopped or reset on command. Here, we review behavioural and neurobiological data supporting the time-sharing hypothesis that perceived time depends on the attentional and memory resources allocated to the timing process. Neuroimaging studies in humans suggest that timekeeping tasks engage brain circuits typically involved in attention and working memory. Behavioural, pharmacological, lesion and electrophysiological studies in lower animals support this time-sharing hypothesis. When subjects attend to a second task, or when intruder events are presented, estimated durations are shorter, presumably due to resources being taken away from timing. Here, we extend the time-sharing hypothesis by proposing that resource reallocation is proportional to the perceived contrast, both in temporal and non-temporal features, between intruders and the timed events. New findings support this extension by showing that the effect of an intruder event is dependent on the relative duration of the intruder to the intertrial interval. The conclusion is that the brain circuits engaged by timekeeping comprise not only those primarily involved in time accumulation, but also those involved in the maintenance of attentional and memory resources for timing, and in the monitoring and reallocation of those resources among tasks.
Morey, Candice Coker; Cowan, Nelson; Morey, Richard D; Rouder, Jeffery N
2011-02-01
Prominent roles for general attention resources are posited in many models of working memory, but the manner in which these can be allocated differs between models or is not sufficiently specified. We varied the payoffs for correct responses in two temporally-overlapping recognition tasks, a visual array comparison task and a tone sequence comparison task. In the critical conditions, an increase in reward for one task corresponded to a decrease in reward for the concurrent task, but memory load remained constant. Our results show patterns of interference consistent with a trade-off between the tasks, suggesting that a shared resource can be flexibly divided, rather than only fully allotted to either of the tasks. Our findings support a role for a domain-general resource in models of working memory, and furthermore suggest that this resource is flexibly divisible.
Modeling of SONOS Memory Cell Erase Cycle
NASA Technical Reports Server (NTRS)
Phillips, Thomas A.; MacLeod, Todd C.; Ho, Fat H.
2011-01-01
Utilization of Silicon-Oxide-Nitride-Oxide-Silicon (SONOS) nonvolatile semiconductor memories as a flash memory has many advantages. These electrically erasable programmable read-only memories (EEPROMs) utilize low programming voltages, have a high erase/write cycle lifetime, are radiation hardened, and are compatible with high-density scaled CMOS for low power, portable electronics. In this paper, the SONOS memory cell erase cycle was investigated using a nonquasi-static (NQS) MOSFET model. Comparisons were made between the model predictions and experimental data.
Incorporating shared savings programs into primary care: from theory to practice.
Hayen, Arthur P; van den Berg, Michael J; Meijboom, Bert R; Struijs, Jeroen N; Westert, Gert P
2015-12-30
In several countries, health care policies gear toward strengthening the position of primary care physicians. Primary care physicians are increasingly expected to take accountability for overall spending and quality. Yet traditional models of paying physicians do not provide adequate incentives for taking on this new role. Under a so-called shared savings program physicians are instead incentivized to take accountability for spending and quality, as the program lets them share in cost savings when quality targets are met. We provide a structured approach to designing a shared savings program for primary care, and apply this approach to the design of a shared savings program for a Dutch chain of primary care providers, which is currently being piloted. Based on the literature, we defined five building blocks of shared savings models that encompass the definition of the scope of the program, the calculation of health care expenditures, the construction of a savings benchmark, the assessment of savings and the rules and conditions under which savings are shared. We apply insights from a variety of literatures to assess the relative merits of alternative design choices within these building blocks. The shared savings program uses an econometric model of provider expenditures as an input to calculating a casemix-corrected benchmark. The minimization of risk and uncertainty for both payer and provider is pertinent to the design of a shared savings program. In that respect, the primary care setting provides a number of unique opportunities for achieving cost and quality targets. Accountability can more readily be assumed due to the relatively long-lasting relationships between primary care physicians and patients. A stable population furthermore improves the confidence with which savings can be attributed to changes in population management. Challenges arise from the institutional context. The Dutch health care system has a fragmented structure and providers are typically small in size. Shared savings programs fit the concept of enhanced primary care. Incorporating a shared savings program into existing payment models could therefore contribute to the financial sustainability of this organizational form.
Error recovery in shared memory multiprocessors using private caches
NASA Technical Reports Server (NTRS)
Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.
1990-01-01
The problem of recovering from processor transient faults in shared memory multiprocesses systems is examined. A user-transparent checkpointing and recovery scheme using private caches is presented. Processes can recover from errors due to faulty processors by restarting from the checkpointed computation state. Implementation techniques using checkpoint identifiers and recovery stacks are examined as a means of reducing performance degradation in processor utilization during normal execution. This cache-based checkpointing technique prevents rollback propagation, provides rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions to take error latency into account are presented.
Working memory resources are shared across sensory modalities.
Salmela, V R; Moisala, M; Alho, K
2014-10-01
A common assumption in the working memory literature is that the visual and auditory modalities have separate and independent memory stores. Recent evidence on visual working memory has suggested that resources are shared between representations, and that the precision of representations sets the limit for memory performance. We tested whether memory resources are also shared across sensory modalities. Memory precision for two visual (spatial frequency and orientation) and two auditory (pitch and tone duration) features was measured separately for each feature and for all possible feature combinations. Thus, only the memory load was varied, from one to four features, while keeping the stimuli similar. In Experiment 1, two gratings and two tones-both containing two varying features-were presented simultaneously. In Experiment 2, two gratings and two tones-each containing only one varying feature-were presented sequentially. The memory precision (delayed discrimination threshold) for a single feature was close to the perceptual threshold. However, as the number of features to be remembered was increased, the discrimination thresholds increased more than twofold. Importantly, the decrease in memory precision did not depend on the modality of the other feature(s), or on whether the features were in the same or in separate objects. Hence, simultaneously storing one visual and one auditory feature had an effect on memory precision equal to those of simultaneously storing two visual or two auditory features. The results show that working memory is limited by the precision of the stored representations, and that working memory can be described as a resource pool that is shared across modalities.
Quasi-Optimal Elimination Trees for 2D Grids with Singularities
Paszyńska, A.; Paszyński, M.; Jopek, K.; ...
2015-01-01
We consmore » truct quasi-optimal elimination trees for 2D finite element meshes with singularities. These trees minimize the complexity of the solution of the discrete system. The computational cost estimates of the elimination process model the execution of the multifrontal algorithms in serial and in parallel shared-memory executions. Since the meshes considered are a subspace of all possible mesh partitions, we call these minimizers quasi-optimal. We minimize the cost functionals using dynamic programming. Finding these minimizers is more computationally expensive than solving the original algebraic system. Nevertheless, from the insights provided by the analysis of the dynamic programming minima, we propose a heuristic construction of the elimination trees that has cost O N e log N e , where N e is the number of elements in the mesh. We show that this heuristic ordering has similar computational cost to the quasi-optimal elimination trees found with dynamic programming and outperforms state-of-the-art alternatives in our numerical experiments.« less
Quasi-Optimal Elimination Trees for 2D Grids with Singularities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paszyńska, A.; Paszyński, M.; Jopek, K.
We consmore » truct quasi-optimal elimination trees for 2D finite element meshes with singularities. These trees minimize the complexity of the solution of the discrete system. The computational cost estimates of the elimination process model the execution of the multifrontal algorithms in serial and in parallel shared-memory executions. Since the meshes considered are a subspace of all possible mesh partitions, we call these minimizers quasi-optimal. We minimize the cost functionals using dynamic programming. Finding these minimizers is more computationally expensive than solving the original algebraic system. Nevertheless, from the insights provided by the analysis of the dynamic programming minima, we propose a heuristic construction of the elimination trees that has cost O N e log N e , where N e is the number of elements in the mesh. We show that this heuristic ordering has similar computational cost to the quasi-optimal elimination trees found with dynamic programming and outperforms state-of-the-art alternatives in our numerical experiments.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Werner, N.E.; Van Matre, S.W.
1985-05-01
This manual describes the CRI Subroutine Library and Utility Package. The CRI library provides Cray multitasking functionality on the four-processor shared memory VAX 11/780-4. Additional functionality has been added for more flexibility. A discussion of the library, utilities, error messages, and example programs is provided.
"Everybody Had a Piece ...": Collaborative Practice and Shared Decision Making at the Open Book
ERIC Educational Resources Information Center
Gordon, John; Ramdeholl, Dianne
2010-01-01
The Open Book, an adult literacy program in Brooklyn, from 1985-2002, remains, for many of the students and staff involved, a defining experience in their lives, a time that allowed them to see different possibilities, for themselves and society. In an attempt to preserve the field's collective historical memory, the authors in this chapter…
Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver
NASA Astrophysics Data System (ADS)
Moustafa, Salli; Dutka-Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre
2014-06-01
This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46 × 106 spatial cells and 1 × 1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40:74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool.
Multiprocessor shared-memory information exchange
DOE Office of Scientific and Technical Information (OSTI.GOV)
Santoline, L.L.; Bowers, M.D.; Crew, A.W.
1989-02-01
In distributed microprocessor-based instrumentation and control systems, the inter-and intra-subsystem communication requirements ultimately form the basis for the overall system architecture. This paper describes a software protocol which addresses the intra-subsystem communications problem. Specifically the protocol allows for multiple processors to exchange information via a shared-memory interface. The authors primary goal is to provide a reliable means for information to be exchanged between central application processor boards (masters) and dedicated function processor boards (slaves) in a single computer chassis. The resultant Multiprocessor Shared-Memory Information Exchange (MSMIE) protocol, a standard master-slave shared-memory interface suitable for use in nuclear safety systems, ismore » designed to pass unidirectional buffers of information between the processors while providing a minimum, deterministic cycle time for this data exchange.« less
Performance analysis and kernel size study of the Lynx real-time operating system
NASA Technical Reports Server (NTRS)
Liu, Yuan-Kwei; Gibson, James S.; Fernquist, Alan R.
1993-01-01
This paper analyzes the Lynx real-time operating system (LynxOS), which has been selected as the operating system for the Space Station Freedom Data Management System (DMS). The features of LynxOS are compared to other Unix-based operating system (OS). The tools for measuring the performance of LynxOS, which include a high-speed digital timer/counter board, a device driver program, and an application program, are analyzed. The timings for interrupt response, process creation and deletion, threads, semaphores, shared memory, and signals are measured. The memory size of the DMS Embedded Data Processor (EDP) is limited. Besides, virtual memory is not suitable for real-time applications because page swap timing may not be deterministic. Therefore, the DMS software, including LynxOS, has to fit in the main memory of an EDP. To reduce the LynxOS kernel size, the following steps are taken: analyzing the factors that influence the kernel size; identifying the modules of LynxOS that may not be needed in an EDP; adjusting the system parameters of LynxOS; reconfiguring the device drivers used in the LynxOS; and analyzing the symbol table. The reductions in kernel disk size, kernel memory size and total kernel size reduction from each step mentioned above are listed and analyzed.
A biometric latent curve analysis of memory decline in older men of the NAS-NRC twin registry.
McArdle, John J; Plassman, Brenda L
2009-09-01
Previous research has shown cognitive abilities to have different biometric patterns of age-changes. We examined the variation in episodic memory (word recall task) for over 6,000 twin pairs who were initially aged 59-75, and were subsequently re-assessed up to three more times over 12 years. In cross-sectional analyses, variation in the number of words recalled independent of age was explained largely by non-shared influences (65-72%), with clear additive genetic influences (12-32%), and marginal shared family influences (1-18%). The longitudinal phenotypic analysis of the word recall task showed systematic linear declines over age, but several nonlinear models with more dramatic changes at later ages, improved the overall fit. A two-part spline model for the longitudinal twin data with an optimal turning point at age 74 led to: (a) a separation of non-shared environmental influences and transient measurement error (~50%); (b) strong additive genetic components of this latent curve (~44% at age 60) with increases (over 50%) up to age 74, but with no additional genetic variation after age 74; (c) the smaller influences of shared family environment (~15% at age 74) were constant over all ages; (d) non-shared effects play an important role over most of the life-span but diminish after age 74.
Direct access inter-process shared memory
Brightwell, Ronald B; Pedretti, Kevin; Hudson, Trammell B
2013-10-22
A technique for directly sharing physical memory between processes executing on processor cores is described. The technique includes loading a plurality of processes into the physical memory for execution on a corresponding plurality of processor cores sharing the physical memory. An address space is mapped to each of the processes by populating a first entry in a top level virtual address table for each of the processes. The address space of each of the processes is cross-mapped into each of the processes by populating one or more subsequent entries of the top level virtual address table with the first entry in the top level virtual address table from other processes.
Memory Network For Distributed Data Processors
NASA Technical Reports Server (NTRS)
Bolen, David; Jensen, Dean; Millard, ED; Robinson, Dave; Scanlon, George
1992-01-01
Universal Memory Network (UMN) is modular, digital data-communication system enabling computers with differing bus architectures to share 32-bit-wide data between locations up to 3 km apart with less than one millisecond of latency. Makes it possible to design sophisticated real-time and near-real-time data-processing systems without data-transfer "bottlenecks". This enterprise network permits transmission of volume of data equivalent to an encyclopedia each second. Facilities benefiting from Universal Memory Network include telemetry stations, simulation facilities, power-plants, and large laboratories or any facility sharing very large volumes of data. Main hub of UMN is reflection center including smaller hubs called Shared Memory Interfaces.
Grouping and binding in visual short-term memory.
Quinlan, Philip T; Cohen, Dale J
2012-09-01
Findings of 2 experiments are reported that challenge the current understanding of visual short-term memory (VSTM). In both experiments, a single study display, containing 6 colored shapes, was presented briefly and then probed with a single colored shape. At stake is how VSTM retains a record of different objects that share common features: In the 1st experiment, 2 study items sometimes shared a common feature (either a shape or a color). The data revealed a color sharing effect, in which memory was much better for items that shared a common color than for items that did not. The 2nd experiment showed that the size of the color sharing effect depended on whether a single pair of items shared a common color or whether 2 pairs of items were so defined-memory for all items improved when 2 color groups were presented. In explaining performance, an account is advanced in which items compete for a fixed number of slots, but then memory recall for any given stored item is prone to error. A critical assumption is that items that share a common color are stored together in a slot as a chunk. The evidence provides further support for the idea that principles of perceptual organization may determine the manner in which items are stored in VSTM. PsycINFO Database Record (c) 2012 APA, all rights reserved.
Mind-to-mind heteroclinic coordination: Model of sequential episodic memory initiation.
Afraimovich, V S; Zaks, M A; Rabinovich, M I
2018-05-01
Retrieval of episodic memory is a dynamical process in the large scale brain networks. In social groups, the neural patterns, associated with specific events directly experienced by single members, are encoded, recalled, and shared by all participants. Here, we construct and study the dynamical model for the formation and maintaining of episodic memory in small ensembles of interacting minds. We prove that the unconventional dynamical attractor of this process-the nonsmooth heteroclinic torus-is structurally stable within the Lotka-Volterra-like sets of equations. Dynamics on this torus combines the absence of chaos with asymptotic instability of every separate trajectory; its adequate quantitative characteristics are length-related Lyapunov exponents. Variation of the coupling strength between the participants results in different types of sequential switching between metastable states; we interpret them as stages in formation and modification of the episodic memory.
Mind-to-mind heteroclinic coordination: Model of sequential episodic memory initiation
NASA Astrophysics Data System (ADS)
Afraimovich, V. S.; Zaks, M. A.; Rabinovich, M. I.
2018-05-01
Retrieval of episodic memory is a dynamical process in the large scale brain networks. In social groups, the neural patterns, associated with specific events directly experienced by single members, are encoded, recalled, and shared by all participants. Here, we construct and study the dynamical model for the formation and maintaining of episodic memory in small ensembles of interacting minds. We prove that the unconventional dynamical attractor of this process—the nonsmooth heteroclinic torus—is structurally stable within the Lotka-Volterra-like sets of equations. Dynamics on this torus combines the absence of chaos with asymptotic instability of every separate trajectory; its adequate quantitative characteristics are length-related Lyapunov exponents. Variation of the coupling strength between the participants results in different types of sequential switching between metastable states; we interpret them as stages in formation and modification of the episodic memory.
Static analysis of the hull plate using the finite element method
NASA Astrophysics Data System (ADS)
Ion, A.
2015-11-01
This paper aims at presenting the static analysis for two levels of a container ship's construction as follows: the first level is at the girder / hull plate and the second level is conducted at the entire strength hull of the vessel. This article will describe the work for the static analysis of a hull plate. We shall use the software package ANSYS Mechanical 14.5. The program is run on a computer with four Intel Xeon X5260 CPU processors at 3.33 GHz, 32 GB memory installed. In terms of software, the shared memory parallel version of ANSYS refers to running ANSYS across multiple cores on a SMP system. The distributed memory parallel version of ANSYS (Distributed ANSYS) refers to running ANSYS across multiple processors on SMP systems or DMP systems.
NASA Technical Reports Server (NTRS)
Burleigh, Scott C.
2011-01-01
Sptrace is a general-purpose space utilization tracing system that is conceptually similar to the commercial Purify product used to detect leaks and other memory usage errors. It is designed to monitor space utilization in any sort of heap, i.e., a region of data storage on some device (nominally memory; possibly shared and possibly persistent) with a flat address space. This software can trace usage of shared and/or non-volatile storage in addition to private RAM (random access memory). Sptrace is implemented as a set of C function calls that are invoked from within the software that is being examined. The function calls fall into two broad classes: (1) functions that are embedded within the heap management software [e.g., JPL's SDR (Simple Data Recorder) and PSM (Personal Space Management) systems] to enable heap usage analysis by populating a virtual time-sequenced log of usage activity, and (2) reporting functions that are embedded within the application program whose behavior is suspect. For ease of use, these functions may be wrapped privately inside public functions offered by the heap management software. Sptrace can be used for VxWorks or RTEMS realtime systems as easily as for Linux or OS/X systems.
ACCESS: A Communicating and Cooperating Expert Systems System.
1988-01-31
therefore more quickly accepted by programmers. This is in part due to the already familiar concepts of multi-processing environments (e.g. semaphores ...Di68] and monitors [Br75]) which can be viewed as a special case of synchronized shared memory models [Di6S]. Heterogeneous systems however, are by...locality of nodes is not possible and frequent access of memory is required. Synchronization of processes also suffers from a loss of efficiency in
Shared memories reveal shared structure in neural activity across individuals
Chen, J.; Leong, Y.C.; Honey, C.J.; Yong, C.H.; Norman, K.A.; Hasson, U.
2016-01-01
Our lives revolve around sharing experiences and memories with others. When different people recount the same events, how similar are their underlying neural representations? Participants viewed a fifty-minute movie, then verbally described the events during functional MRI, producing unguided detailed descriptions lasting up to forty minutes. As each person spoke, event-specific spatial patterns were reinstated in default-network, medial-temporal, and high-level visual areas. Individual event patterns were both highly discriminable from one another and similar between people, suggesting consistent spatial organization. In many high-order areas, patterns were more similar between people recalling the same event than between recall and perception, indicating systematic reshaping of percept into memory. These results reveal the existence of a common spatial organization for memories in high-level cortical areas, where encoded information is largely abstracted beyond sensory constraints; and that neural patterns during perception are altered systematically across people into shared memory representations for real-life events. PMID:27918531
Craston, Patrick; Wyble, Brad; Chennu, Srivas; Bowman, Howard
2009-03-01
Observers often miss a second target (T2) if it follows an identified first target item (T1) within half a second in rapid serial visual presentation (RSVP), a finding termed the attentional blink. If two targets are presented in immediate succession, however, accuracy is excellent (Lag 1 sparing). The resource sharing hypothesis proposes a dynamic distribution of resources over a time span of up to 600 msec during the attentional blink. In contrast, the ST(2) model argues that working memory encoding is serial during the attentional blink and that, due to joint consolidation, Lag 1 is the only case where resources are shared. Experiment 1 investigates the P3 ERP component evoked by targets in RSVP. The results suggest that, in this context, P3 amplitude is an indication of bottom-up strength rather than a measure of cognitive resource allocation. Experiment 2, employing a two-target paradigm, suggests that T1 consolidation is not affected by the presentation of T2 during the attentional blink. However, if targets are presented in immediate succession (Lag 1 sparing), they are jointly encoded into working memory. We use the ST(2) model's neural network implementation, which replicates a range of behavioral results related to the attentional blink, to generate "virtual ERPs" by summing across activation traces. We compare virtual to human ERPs and show how the results suggest a serial nature of working memory encoding as implied by the ST(2) model.
Reder, Lynne M.; Park, Heekyeong; Kieffaber, Paul D.
2009-01-01
There is a popular hypothesis that performance on implicit and explicit memory tasks reflects 2 distinct memory systems. Explicit memory is said to store those experiences that can be consciously recollected, and implicit memory is said to store experiences and affect subsequent behavior but to be unavailable to conscious awareness. Although this division based on awareness is a useful taxonomy for memory tasks, the authors review the evidence that the unconscious character of implicit memory does not necessitate that it be treated as a separate system of human memory. They also argue that some implicit and explicit memory tasks share the same memory representations and that the important distinction is whether the task (implicit or explicit) requires the formation of a new association. The authors review and critique dissociations from the behavioral, amnesia, and neuroimaging literatures that have been advanced in support of separate explicit and implicit memory systems by highlighting contradictory evidence and by illustrating how the data can be accounted for using a simple computational memory model that assumes the same memory representation for those disparate tasks. PMID:19210052
Choi, Hae-Yoon; Kensinger, Elizabeth A; Rajaram, Suparna
2017-09-01
Social transmission of memory and its consequence on collective memory have generated enduring interdisciplinary interest because of their widespread significance in interpersonal, sociocultural, and political arenas. We tested the influence of 3 key factors-emotional salience of information, group structure, and information distribution-on mnemonic transmission, social contagion, and collective memory. Participants individually studied emotionally salient (negative or positive) and nonemotional (neutral) picture-word pairs that were completely shared, partially shared, or unshared within participant triads, and then completed 3 consecutive recalls in 1 of 3 conditions: individual-individual-individual (control), collaborative-collaborative (identical group; insular structure)-individual, and collaborative-collaborative (reconfigured group; diverse structure)-individual. Collaboration enhanced negative memories especially in insular group structure and especially for shared information, and promoted collective forgetting of positive memories. Diverse group structure reduced this negativity effect. Unequally distributed information led to social contagion that creates false memories; diverse structure propagated a greater variety of false memories whereas insular structure promoted confidence in false recognition and false collective memory. A simultaneous assessment of network structure, information distribution, and emotional valence breaks new ground to specify how network structure shapes the spread of negative memories and false memories, and the emergence of collective memory. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
NASA Technical Reports Server (NTRS)
Mavriplis, D. J.; Das, Raja; Saltz, Joel; Vermeland, R. E.
1992-01-01
An efficient three dimensional unstructured Euler solver is parallelized on a Cray Y-MP C90 shared memory computer and on an Intel Touchstone Delta distributed memory computer. This paper relates the experiences gained and describes the software tools and hardware used in this study. Performance comparisons between two differing architectures are made.
Execution models for mapping programs onto distributed memory parallel computers
NASA Technical Reports Server (NTRS)
Sussman, Alan
1992-01-01
The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program.
Applications considerations in the system design of highly concurrent multiprocessors
NASA Technical Reports Server (NTRS)
Lundstrom, Stephen F.
1987-01-01
A flow model processor approach to parallel processing is described, using very-high-performance individual processors, high-speed circuit switched interconnection networks, and a high-speed synchronization capability to minimize the effect of the inherently serial portions of applications on performance. Design studies related to the determination of the number of processors, the memory organization, and the structure of the networks used to interconnect the processor and memory resources are discussed. Simulations indicate that applications centered on the large shared data memory should be able to sustain over 500 million floating point operations per second.
Development of a Dynamic Time Sharing Scheduled Environment Final Report CRADA No. TC-824-94E
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jette, M.; Caliga, D.
Massively parallel computers, such as the Cray T3D, have historically supported resource sharing solely with space sharing. In that method, multiple problems are solved by executing them on distinct processors. This project developed a dynamic time- and space-sharing scheduler to achieve greater interactivity and throughput than could be achieved with space-sharing alone. CRI and LLNL worked together on the design, testing, and review aspects of this project. There were separate software deliverables. CFU implemented a general purpose scheduling system as per the design specifications. LLNL ported the local gang scheduler software to the LLNL Cray T3D. In this approach, processorsmore » are allocated simultaneously to aU components of a parallel program (in a “gang”). Program execution is preempted as needed to provide for interactivity. Programs are also reIocated to different processors as needed to efficiently pack the computer’s torus of processors. In phase one, CRI developed an interface specification after discussions with LLNL for systemlevel software supporting a time- and space-sharing environment on the LLNL T3D. The two parties also discussed interface specifications for external control tools (such as scheduling policy tools, system administration tools) and applications programs. CRI assumed responsibility for the writing and implementation of all the necessary system software in this phase. In phase two, CRI implemented job-rolling on the Cray T3D, a mechanism for preempting a program, saving its state to disk, and later restoring its state to memory for continued execution. LLNL ported its gang scheduler to the LLNL T3D utilizing the CRI interface implemented in phases one and two. During phase three, the functionality and effectiveness of the LLNL gang scheduler was assessed to provide input to CRI time- and space-sharing, efforts. CRI will utilize this information in the development of general schedulers suitable for other sites and future architectures.« less
Wang, Manjie; Saudino, Kimberly J
2013-12-01
This is the first study to explore genetic and environmental contributions to individual differences in emotion regulation in toddlers, and the first to examine the genetic and environmental etiology underlying the association between emotion regulation and working memory. In a sample of 304 same-sex twin pairs (140 MZ, 164 DZ) at age 3, emotion regulation was assessed using the Behavior Rating Scale of the Bayley Scales of Infant Development (BRS; Bayley, 1993), and working memory was measured by the visually cued recall (VCR) task (Zelazo, Jacques, Burack, & Frye, 2002) and several memory tasks from the Mental Scale of the BSID. Based on model-fitting analyses, both emotion regulation and working memory were significantly influenced by genetic and nonshared environmental factors. Shared environmental effects were significant for working memory, but not for emotion regulation. Only genetic factors significantly contributed to the covariation between emotion regulation and working memory.
Wang, Manjie; Saudino, Kimberly J.
2014-01-01
This is the first study to explore genetic and environmental contributions to individual differences in emotion regulation in toddlers, and the first to examine the genetic and environmental etiology underlying the association between emotion regulation and working memory. In a sample of 304 same-sex twin pairs (140 MZ, 164 DZ) at age 3, emotion regulation was assessed using the Behavior Rating Scale of the Bayley Scales of Infant Development (BRS; Bayley, 1993), and working memory was measured by the visually cued recall (VCR) task (Zelazo et al., 2002) and several memory tasks from the Mental Scale of BSID. Based on model-fitting analyses, both emotion regulation and working memory were significantly influenced by genetic and nonshared environmental factors. Shared environmental effects were significant for working memory, but not for emotion regulation. Only genetic factors significantly contributed to the covariation between emotion regulation and working memory. PMID:24098922
Distributed-Memory Fast Maximal Independent Set
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kanewala Appuhamilage, Thejaka Amila J.; Zalewski, Marcin J.; Lumsdaine, Andrew
The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby’s seminal MIS algorithms, “Luby(A)” and “Luby(B),” to distributed-memory execution, and we evaluatemore » their performance. We compare our results with the “Filtered MIS” implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.« less
SONOS Nonvolatile Memory Cell Programming Characteristics
NASA Technical Reports Server (NTRS)
MacLeod, Todd C.; Phillips, Thomas A.; Ho, Fat D.
2010-01-01
Silicon-oxide-nitride-oxide-silicon (SONOS) nonvolatile memory is gaining favor over conventional EEPROM FLASH memory technology. This paper characterizes the SONOS write operation using a nonquasi-static MOSFET model. This includes floating gate charge and voltage characteristics as well as tunneling current, voltage threshold and drain current characterization. The characterization of the SONOS memory cell predicted by the model closely agrees with experimental data obtained from actual SONOS memory cells. The tunnel current, drain current, threshold voltage and read drain current all closely agreed with empirical data.
NASA Technical Reports Server (NTRS)
Neece, O.
2000-01-01
Organizational learning is an umbrella term that covers a variety of topics including; learning curves, productivity, organizational memory, organizational forgetting, knowledge transfer, knowledge sharing and knowledge creation. This treatise will review some of these theories in concert with a model of how organizations learn.
Estimating Performance of Single Bus, Shared Memory Multiprocessors
1987-05-01
Chandy78] K.M. Chandy, C.M. Sauer, "Approximate methods for analyzing queuing network models of computing systems," Computing Surveys, vol10 , no 3...Denning78] P. Denning, J. Buzen, "The operational analysis of queueing network models", Computing Sur- veys, vol10 , no 3, September 1978, pp 225-261
Schapiro, Anna C; McDevitt, Elizabeth A; Chen, Lang; Norman, Kenneth A; Mednick, Sara C; Rogers, Timothy T
2017-11-01
Semantic memory encompasses knowledge about both the properties that typify concepts (e.g. robins, like all birds, have wings) as well as the properties that individuate conceptually related items (e.g. robins, in particular, have red breasts). We investigate the impact of sleep on new semantic learning using a property inference task in which both kinds of information are initially acquired equally well. Participants learned about three categories of novel objects possessing some properties that were shared among category exemplars and others that were unique to an exemplar, with exposure frequency varying across categories. In Experiment 1, memory for shared properties improved and memory for unique properties was preserved across a night of sleep, while memory for both feature types declined over a day awake. In Experiment 2, memory for shared properties improved across a nap, but only for the lower-frequency category, suggesting a prioritization of weakly learned information early in a sleep period. The increase was significantly correlated with amount of REM, but was also observed in participants who did not enter REM, suggesting involvement of both REM and NREM sleep. The results provide the first evidence that sleep improves memory for the shared structure of object categories, while simultaneously preserving object-unique information.
A Tutorial on Parallel and Concurrent Programming in Haskell
NASA Astrophysics Data System (ADS)
Peyton Jones, Simon; Singh, Satnam
This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.
Modeling of Sonos Memory Cell Erase Cycle
NASA Technical Reports Server (NTRS)
Phillips, Thomas A.; MacLeond, Todd C.; Ho, Fat D.
2010-01-01
Silicon-oxide-nitride-oxide-silicon (SONOS) nonvolatile semiconductor memories (NVSMS) have many advantages. These memories are electrically erasable programmable read-only memories (EEPROMs). They utilize low programming voltages, endure extended erase/write cycles, are inherently resistant to radiation, and are compatible with high-density scaled CMOS for low power, portable electronics. The SONOS memory cell erase cycle was investigated using a nonquasi-static (NQS) MOSFET model. The SONOS floating gate charge and voltage, tunneling current, threshold voltage, and drain current were characterized during an erase cycle. Comparisons were made between the model predictions and experimental device data.
Camos, Valérie; Barrouillet, Pierre
2014-01-01
Working memory is the structure devoted to the maintenance of information at short term during concurrent processing activities. In this respect, the question regarding the nature of the mechanisms and systems fulfilling this maintenance function is of particular importance and has received various responses in the recent past. In the time-based resource-sharing (TBRS) model, we suggest that only two systems sustain the maintenance of information at the short term, counteracting the deleterious effect of temporal decay and interference. A non-attentional mechanism of verbal rehearsal, similar to the one described by Baddeley in the phonological loop model, uses language processes to reactivate phonological memory traces. Besides this domain-specific mechanism, an executive loop allows the reconstruction of memory traces through an attention-based mechanism of refreshing. The present paper reviews evidence of the involvement of these two independent systems in the maintenance of verbal memory items. PMID:25426049
ERIC Educational Resources Information Center
Hayes-Roth, Barbara
Two kinds of memory organization are distinguished: segregrated versus integrated. In segregated memory organizations, related learned propositions have separate memory representations. In integrated memory organizations, memory representations of related propositions share common subrepresentations. Segregated memory organizations facilitate…
Wiese, Holger; Schweinberger, Stefan R
2015-01-01
The present study examined whether semantic memory for newly learned people is structured by visual co-occurrence, shared semantics, or both. Participants were trained with pairs of simultaneously presented (i.e., co-occurring) preexperimentally unfamiliar faces, which either did or did not share additionally provided semantic information (occupation, place of living, etc.). Semantic information could also be shared between faces that did not co-occur. A subsequent priming experiment revealed faster responses for both co-occurrence/no shared semantics and no co-occurrence/shared semantics conditions, than for an unrelated condition. Strikingly, priming was strongest in the co-occurrence/shared semantics condition, suggesting additive effects of these factors. Additional analysis of event-related brain potentials yielded priming in the N400 component only for combined effects of visual co-occurrence and shared semantics, with more positive amplitudes in this than in the unrelated condition. Overall, these findings suggest that both semantic relatedness and visual co-occurrence are important when novel information is integrated into person-related semantic memory.
Parallel Navier-Stokes computations on shared and distributed memory architectures
NASA Technical Reports Server (NTRS)
Hayder, M. Ehtesham; Jayasimha, D. N.; Pillay, Sasi Kumar
1995-01-01
We study a high order finite difference scheme to solve the time accurate flow field of a jet using the compressible Navier-Stokes equations. As part of our ongoing efforts, we have implemented our numerical model on three parallel computing platforms to study the computational, communication, and scalability characteristics. The platforms chosen for this study are a cluster of workstations connected through fast networks (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and a distributed memory multiprocessor (the IBM SPI). Our focus in this study is on the LACE testbed. We present some results for the Cray YMP and the IBM SP1 mainly for comparison purposes. On the LACE testbed, we study: (1) the communication characteristics of Ethernet, FDDI, and the ALLNODE networks and (2) the overheads induced by the PVM message passing library used for parallelizing the application. We demonstrate that clustering of workstations is effective and has the potential to be computationally competitive with supercomputers at a fraction of the cost.
Expressing Parallelism with ROOT
NASA Astrophysics Data System (ADS)
Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.
2017-10-01
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Expressing Parallelism with ROOT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piparo, D.; Tejedor, E.; Guiraud, E.
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module inmore » Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.« less
System and method for programmable bank selection for banked memory subsystems
Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Hoenicke, Dirk; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan
2010-09-07
A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each of the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.
Vera, Javier
2018-01-01
What is the influence of short-term memory enhancement on the emergence of grammatical agreement systems in multi-agent language games? Agreement systems suppose that at least two words share some features with each other, such as gender, number, or case. Previous work, within the multi-agent language-game framework, has recently proposed models stressing the hypothesis that the emergence of a grammatical agreement system arises from the minimization of semantic ambiguity. On the other hand, neurobiological evidence argues for the hypothesis that language evolution has mainly related to an increasing of short-term memory capacity, which has allowed the online manipulation of words and meanings participating particularly in grammatical agreement systems. Here, the main aim is to propose a multi-agent language game for the emergence of a grammatical agreement system, under measurable long-range relations depending on the short-term memory capacity. Computer simulations, based on a parameter that measures the amount of short-term memory capacity, suggest that agreement marker systems arise in a population of agents equipped at least with a critical short-term memory capacity.
Early programming and late-acting checkpoints governing the development of CD4 T cell memory.
Dhume, Kunal; McKinstry, K Kai
2018-04-27
CD4 T cells contribute to protection against pathogens through numerous mechanisms. Incorporating the goal of memory CD4 T cell generation into vaccine strategies thus offers a powerful approach to improve their efficacy, especially in situations where humoral responses alone cannot confer long-term immunity. These threats include viruses such as influenza that mutate coat proteins to avoid neutralizing antibodies, but that are targeted by T cells that recognize more conserved protein epitopes shared by different strains. A major barrier in the design of such vaccines is that the mechanisms controlling the efficiency with which memory cells form remain incompletely understood. Here, we discuss recent insights into fate decisions controlling memory generation. We focus on the importance of three general cues: interleukin-2, antigen, and costimulatory interactions. It is increasingly clear that these signals have a powerful influence on the capacity of CD4 T cells to form memory during two distinct phases of the immune response. First, through 'programming' that occurs during initial priming, and second, through 'checkpoints' that operate later during the effector stage. These findings indicate that novel vaccine strategies must seek to optimize cognate interactions, during which interleukin-2-, antigen, and costimulation-dependent signals are tightly linked, well beyond initial antigen encounter to induce robust memory CD4 T cells. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Initial Performance Results on IBM POWER6
NASA Technical Reports Server (NTRS)
Saini, Subbash; Talcott, Dale; Jespersen, Dennis; Djomehri, Jahed; Jin, Haoqiang; Mehrotra, Piysuh
2008-01-01
The POWER5+ processor has a faster memory bus than that of the previous generation POWER5 processor (533 MHz vs. 400 MHz), but the measured per-core memory bandwidth of the latter is better than that of the former (5.7 GB/s vs. 4.3 GB/s). The reason for this is that in the POWER5+, the two cores on the chip share the L2 cache, L3 cache and memory bus. The memory controller is also on the chip and is shared by the two cores. This serializes the path to memory. For consistently good performance on a wide range of applications, the performance of the processor, the memory subsystem, and the interconnects (both latency and bandwidth) should be balanced. Recognizing this, IBM has designed the Power6 processor so as to avoid the bottlenecks due to the L2 cache, memory controller and buffer chips of the POWER5+. Unlike the POWER5+, each core in the POWER6 has its own L2 cache (4 MB - double that of the Power5+), memory controller and buffer chips. Each core in the POWER6 runs at 4.7 GHz instead of 1.9 GHz in POWER5+. In this paper, we evaluate the performance of a dual-core Power6 based IBM p6-570 system, and we compare its performance with that of a dual-core Power5+ based IBM p575+ system. In this evaluation, we have used the High- Performance Computing Challenge (HPCC) benchmarks, NAS Parallel Benchmarks (NPB), and four real-world applications--three from computational fluid dynamics and one from climate modeling.
Short-term plasticity as a neural mechanism supporting memory and attentional functions.
Jääskeläinen, Iiro P; Ahveninen, Jyrki; Andermann, Mark L; Belliveau, John W; Raij, Tommi; Sams, Mikko
2011-11-08
Based on behavioral studies, several relatively distinct perceptual and cognitive functions have been defined in cognitive psychology such as sensory memory, short-term memory, and selective attention. Here, we review evidence suggesting that some of these functions may be supported by shared underlying neuronal mechanisms. Specifically, we present, based on an integrative review of the literature, a hypothetical model wherein short-term plasticity, in the form of transient center-excitatory and surround-inhibitory modulations, constitutes a generic processing principle that supports sensory memory, short-term memory, involuntary attention, selective attention, and perceptual learning. In our model, the size and complexity of receptive fields/level of abstraction of neural representations, as well as the length of temporal receptive windows, increases as one steps up the cortical hierarchy. Consequently, the type of input (bottom-up vs. top down) and the level of cortical hierarchy that the inputs target, determine whether short-term plasticity supports purely sensory vs. semantic short-term memory or attentional functions. Furthermore, we suggest that rather than discrete memory systems, there are continuums of memory representations from short-lived sensory ones to more abstract longer-duration representations, such as those tapped by behavioral studies of short-term memory. Copyright © 2011 Elsevier B.V. All rights reserved.
A theory of working memory without consciousness or sustained activity
Trübutschek, Darinka; Marti, Sébastien; Ojeda, Andrés; King, Jean-Rémi; Mi, Yuanyuan; Tsodyks, Misha; Dehaene, Stanislas
2017-01-01
Working memory and conscious perception are thought to share similar brain mechanisms, yet recent reports of non-conscious working memory challenge this view. Combining visual masking with magnetoencephalography, we investigate the reality of non-conscious working memory and dissect its neural mechanisms. In a spatial delayed-response task, participants reported the location of a subjectively unseen target above chance-level after several seconds. Conscious perception and conscious working memory were characterized by similar signatures: a sustained desynchronization in the alpha/beta band over frontal cortex, and a decodable representation of target location in posterior sensors. During non-conscious working memory, such activity vanished. Our findings contradict models that identify working memory with sustained neural firing, but are compatible with recent proposals of ‘activity-silent’ working memory. We present a theoretical framework and simulations showing how slowly decaying synaptic changes allow cell assemblies to go dormant during the delay, yet be retrieved above chance-level after several seconds. DOI: http://dx.doi.org/10.7554/eLife.23871.001 PMID:28718763
Howe, Piers D. L.
2017-01-01
To understand how the visual system represents multiple moving objects and how those representations contribute to tracking, it is essential that we understand how the processes of attention and working memory interact. In the work described here we present an investigation of that interaction via a series of tracking and working memory dual-task experiments. Previously, it has been argued that tracking is resistant to disruption by a concurrent working memory task and that any apparent disruption is in fact due to observers making a response to the working memory task, rather than due to competition for shared resources. Contrary to this, in our experiments we find that when task order and response order confounds are avoided, all participants show a similar decrease in both tracking and working memory performance. However, if task and response order confounds are not adequately controlled for we find substantial individual differences, which could explain the previous conflicting reports on this topic. Our results provide clear evidence that tracking and working memory tasks share processing resources. PMID:28410383
Lapierre, Mark D; Cropper, Simon J; Howe, Piers D L
2017-01-01
To understand how the visual system represents multiple moving objects and how those representations contribute to tracking, it is essential that we understand how the processes of attention and working memory interact. In the work described here we present an investigation of that interaction via a series of tracking and working memory dual-task experiments. Previously, it has been argued that tracking is resistant to disruption by a concurrent working memory task and that any apparent disruption is in fact due to observers making a response to the working memory task, rather than due to competition for shared resources. Contrary to this, in our experiments we find that when task order and response order confounds are avoided, all participants show a similar decrease in both tracking and working memory performance. However, if task and response order confounds are not adequately controlled for we find substantial individual differences, which could explain the previous conflicting reports on this topic. Our results provide clear evidence that tracking and working memory tasks share processing resources.
Vergauwe, Evie; Barrouillet, Pierre; Camos, Valérie
2009-07-01
Examinations of interference between visual and spatial materials in working memory have suggested domain- and process-based fractionations of visuo-spatial working memory. The present study examined the role of central time-based resource sharing in visuo-spatial working memory and assessed its role in obtained interference patterns. Visual and spatial storage were combined with both visual and spatial on-line processing components in computer-paced working memory span tasks (Experiment 1) and in a selective interference paradigm (Experiment 2). The cognitive load of the processing components was manipulated to investigate its impact on concurrent maintenance for both within-domain and between-domain combinations of processing and storage components. In contrast to both domain- and process-based fractionations of visuo-spatial working memory, the results revealed that recall performance was determined by the cognitive load induced by the processing of items, rather than by the domain to which those items pertained. These findings are interpreted as evidence for a time-based resource-sharing mechanism in visuo-spatial working memory.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hull, L.C.
The Prickett and Lonnquist two-dimensional groundwater model has been programmed for the Apple II minicomputer. Both leaky and nonleaky confined aquifers can be simulated. The model was adapted from the FORTRAN version of Prickett and Lonnquist. In the configuration presented here, the program requires 64 K bits of memory. Because of the large number of arrays used in the program, and memory limitations of the Apple II, the maximum grid size that can be used is 20 rows by 20 columns. Input to the program is interactive, with prompting by the computer. Output consists of predicted lead values at themore » row-column intersections (nodes).« less
Working Memory and Decision-Making in a Frontoparietal Circuit Model
2017-01-01
Working memory (WM) and decision-making (DM) are fundamental cognitive functions involving a distributed interacting network of brain areas, with the posterior parietal cortex (PPC) and prefrontal cortex (PFC) at the core. However, the shared and distinct roles of these areas and the nature of their coordination in cognitive function remain poorly understood. Biophysically based computational models of cortical circuits have provided insights into the mechanisms supporting these functions, yet they have primarily focused on the local microcircuit level, raising questions about the principles for distributed cognitive computation in multiregional networks. To examine these issues, we developed a distributed circuit model of two reciprocally interacting modules representing PPC and PFC circuits. The circuit architecture includes hierarchical differences in local recurrent structure and implements reciprocal long-range projections. This parsimonious model captures a range of behavioral and neuronal features of frontoparietal circuits across multiple WM and DM paradigms. In the context of WM, both areas exhibit persistent activity, but, in response to intervening distractors, PPC transiently encodes distractors while PFC filters distractors and supports WM robustness. With regard to DM, the PPC module generates graded representations of accumulated evidence supporting target selection, while the PFC module generates more categorical responses related to action or choice. These findings suggest computational principles for distributed, hierarchical processing in cortex during cognitive function and provide a framework for extension to multiregional models. SIGNIFICANCE STATEMENT Working memory and decision-making are fundamental “building blocks” of cognition, and deficits in these functions are associated with neuropsychiatric disorders such as schizophrenia. These cognitive functions engage distributed networks with prefrontal cortex (PFC) and posterior parietal cortex (PPC) at the core. It is not clear, however, what the contributions of PPC and PFC are in light of the computations that subserve working memory and decision-making. We constructed a biophysical model of a reciprocally connected frontoparietal circuit that revealed shared and distinct functions for the PFC and PPC across working memory and decision-making tasks. Our parsimonious model connects circuit-level properties to cognitive functions and suggests novel design principles beyond those of local circuits for cognitive processing in multiregional brain networks. PMID:29114071
Working Memory and Decision-Making in a Frontoparietal Circuit Model.
Murray, John D; Jaramillo, Jorge; Wang, Xiao-Jing
2017-12-13
Working memory (WM) and decision-making (DM) are fundamental cognitive functions involving a distributed interacting network of brain areas, with the posterior parietal cortex (PPC) and prefrontal cortex (PFC) at the core. However, the shared and distinct roles of these areas and the nature of their coordination in cognitive function remain poorly understood. Biophysically based computational models of cortical circuits have provided insights into the mechanisms supporting these functions, yet they have primarily focused on the local microcircuit level, raising questions about the principles for distributed cognitive computation in multiregional networks. To examine these issues, we developed a distributed circuit model of two reciprocally interacting modules representing PPC and PFC circuits. The circuit architecture includes hierarchical differences in local recurrent structure and implements reciprocal long-range projections. This parsimonious model captures a range of behavioral and neuronal features of frontoparietal circuits across multiple WM and DM paradigms. In the context of WM, both areas exhibit persistent activity, but, in response to intervening distractors, PPC transiently encodes distractors while PFC filters distractors and supports WM robustness. With regard to DM, the PPC module generates graded representations of accumulated evidence supporting target selection, while the PFC module generates more categorical responses related to action or choice. These findings suggest computational principles for distributed, hierarchical processing in cortex during cognitive function and provide a framework for extension to multiregional models. SIGNIFICANCE STATEMENT Working memory and decision-making are fundamental "building blocks" of cognition, and deficits in these functions are associated with neuropsychiatric disorders such as schizophrenia. These cognitive functions engage distributed networks with prefrontal cortex (PFC) and posterior parietal cortex (PPC) at the core. It is not clear, however, what the contributions of PPC and PFC are in light of the computations that subserve working memory and decision-making. We constructed a biophysical model of a reciprocally connected frontoparietal circuit that revealed shared and distinct functions for the PFC and PPC across working memory and decision-making tasks. Our parsimonious model connects circuit-level properties to cognitive functions and suggests novel design principles beyond those of local circuits for cognitive processing in multiregional brain networks. Copyright © 2017 the authors 0270-6474/17/3712167-20$15.00/0.
Destination memory impairment in older people.
Gopie, Nigel; Craik, Fergus I M; Hasher, Lynn
2010-12-01
Older adults are assumed to have poor destination memory-knowing to whom they tell particular information-and anecdotes about them repeating stories to the same people are cited as informal evidence for this claim. Experiment 1 assessed young and older adults' destination memory by having participants tell facts (e.g., "A dime has 118 ridges around its edge") to pictures of famous people (e.g., Oprah Winfrey). Surprise recognition memory tests, which also assessed confidence, revealed that older adults, compared to young adults, were disproportionately impaired on destination memory relative to spared memory for the individual components (i.e., facts, faces) of the episode. Older adults also were more confident that they had not told a fact to a particular person when they actually had (i.e., a miss); this presumably causes them to repeat information more often than young adults. When the direction of information transfer was reversed in Experiment 2, such that the famous people shared information with the participants (i.e., a source memory experiment), age-related memory differences disappeared. In contrast to the destination memory experiment, older adults in the source memory experiment were more confident than young adults that someone had shared a fact with them when a different person actually had shared the fact (i.e., a false alarm). Overall, accuracy and confidence jointly influence age-related changes to destination memory, a fundamental component of successful communication. (c) 2010 APA, all rights reserved).
Slot, Esther M; van Viersen, Sietske; de Bree, Elise H; Kroesbergen, Evelyn H
2016-01-01
High comorbidity rates have been reported between mathematical learning disabilities (MD) and reading and spelling disabilities (RSD). Research has identified skills related to math, such as number sense (NS) and visuospatial working memory (visuospatial WM), as well as to literacy, such as phonological awareness (PA), rapid automatized naming (RAN) and verbal short-term memory (Verbal STM). In order to explain the high comorbidity rates between MD and RSD, 7-11-year-old children were assessed on a range of cognitive abilities related to literacy (PA, RAN, Verbal STM) and mathematical ability (visuospatial WM, NS). The group of children consisted of typically developing (TD) children (n = 32), children with MD (n = 26), children with RSD (n = 29), and combined MD and RSD (n = 43). It was hypothesized that, in line with the multiple deficit view on learning disorders, at least one unique predictor for both MD and RSD and a possible shared cognitive risk factor would be found to account for the comorbidity between the symptom dimensions literacy and math. Secondly, our hypotheses were that (a) a probabilistic multi-factorial risk factor model would provide a better fit to the data than a deterministic single risk factor model and (b) that a shared risk factor model would provide a better fit than the specific multi-factorial model. All our hypotheses were confirmed. NS and visuospatial WM were identified as unique cognitive predictors for MD, whereas PA and RAN were both associated with RSD. Also, a shared risk factor model with PA as a cognitive predictor for both RSD and MD fitted the data best, indicating that MD and RSD might co-occur due to a shared underlying deficit in phonological processing. Possible explanations are discussed in the context of sample selection and composition. This study shows that different cognitive factors play a role in mathematics and literacy, and that a phonological processing deficit might play a role in the occurrence of MD and RSD.
Effects of Network Structure, Competition and Memory Time on Social Spreading Phenomena
NASA Astrophysics Data System (ADS)
Gleeson, James P.; O'Sullivan, Kevin P.; Baños, Raquel A.; Moreno, Yamir
2016-04-01
Online social media has greatly affected the way in which we communicate with each other. However, little is known about what fundamental mechanisms drive dynamical information flow in online social systems. Here, we introduce a generative model for online sharing behavior that is analytically tractable and that can reproduce several characteristics of empirical micro-blogging data on hashtag usage, such as (time-dependent) heavy-tailed distributions of meme popularity. The presented framework constitutes a null model for social spreading phenomena that, in contrast to purely empirical studies or simulation-based models, clearly distinguishes the roles of two distinct factors affecting meme popularity: the memory time of users and the connectivity structure of the social network.
Genetic Complexity of Episodic Memory: A Twin Approach to Studies of Aging
Kremen, William S.; Spoon, Kelly M.; Jacobson, Kristen C.; Vasilopoulos, Terrie; McCaffery, Jeanne M.; Panizzon, Matthew S.; Franz, Carol E.; Vuoksimaa, Eero; Xian, Hong; Rana, Brinda K.; Toomey, Rosemary; McKenzie, Ruth; Lyons, Michael J.
2016-01-01
Episodic memory change is a central issue in cognitive aging, and understanding that process will require elucidation of its genetic underpinnings. A key limiting factor in genetically informed research on memory has been lack of attention to genetic and phenotypic complexity, as if “memory is memory” and all well-validated assessments are essentially equivalent. Here we applied multivariate twin models to data from late-middle-aged participants in the Vietnam Era Twin Study of Aging to examine the genetic architecture of 6 measures from 3 standard neuropsychological tests: the California Verbal Learning Test-2, and Wechsler Memory Scale-III Logical Memory (LM) and Visual Reproductions (VR). An advantage of the twin method is that it can estimate the extent to which latent genetic influences are shared or independent across different measures before knowing which specific genes are involved. The best-fitting model was a higher order common pathways model with a heritable higher order general episodic memory factor and three test-specific subfactors. More importantly, substantial genetic variance was accounted for by genetic influences that were specific to the latent LM and VR subfactors (28% and 30%, respectively) and independent of the general factor. Such unique genetic influences could partially account for replication failures. Moreover, if different genes influence different memory phenotypes, they could well have different age-related trajectories. This approach represents an important step toward providing critical information for all types of genetically informative studies of aging and memory. PMID:24956007
NASA Astrophysics Data System (ADS)
Gan, T.; Tarboton, D. G.; Dash, P. K.; Gichamo, T.; Horsburgh, J. S.
2017-12-01
Web based apps, web services and online data and model sharing technology are becoming increasingly available to support research. This promises benefits in terms of collaboration, platform independence, transparency and reproducibility of modeling workflows and results. However, challenges still exist in real application of these capabilities and the programming skills researchers need to use them. In this research we combined hydrologic modeling web services with an online data and model sharing system to develop functionality to support reproducible hydrologic modeling work. We used HydroDS, a system that provides web services for input data preparation and execution of a snowmelt model, and HydroShare, a hydrologic information system that supports the sharing of hydrologic data, model and analysis tools. To make the web services easy to use, we developed a HydroShare app (based on the Tethys platform) to serve as a browser based user interface for HydroDS. In this integration, HydroDS receives web requests from the HydroShare app to process the data and execute the model. HydroShare supports storage and sharing of the results generated by HydroDS web services. The snowmelt modeling example served as a use case to test and evaluate this approach. We show that, after the integration, users can prepare model inputs or execute the model through the web user interface of the HydroShare app without writing program code. The model input/output files and metadata describing the model instance are stored and shared in HydroShare. These files include a Python script that is automatically generated by the HydroShare app to document and reproduce the model input preparation workflow. Once stored in HydroShare, inputs and results can be shared with other users, or published so that other users can directly discover, repeat or modify the modeling work. This approach provides a collaborative environment that integrates hydrologic web services with a data and model sharing system to enable model development and execution. The entire system comprised of the HydroShare app, HydroShare and HydroDS web services is open source and contributes to capability for web based modeling research.
Shared Processing of Language and Music.
Atherton, Ryan P; Chrobak, Quin M; Rauscher, Frances H; Karst, Aaron T; Hanson, Matt D; Steinert, Steven W; Bowe, Kyra L
2018-01-01
The present study sought to explore whether musical information is processed by the phonological loop component of the working memory model of immediate memory. Original instantiations of this model primarily focused on the processing of linguistic information. However, the model was less clear about how acoustic information lacking phonological qualities is actively processed. Although previous research has generally supported shared processing of phonological and musical information, these studies were limited as a result of a number of methodological concerns (e.g., the use of simple tones as musical stimuli). In order to further investigate this issue, an auditory interference task was employed. Specifically, participants heard an initial stimulus (musical or linguistic) followed by an intervening stimulus (musical, linguistic, or silence) and were then asked to indicate whether a final test stimulus was the same as or different from the initial stimulus. Results indicated that mismatched interference conditions (i.e., musical - linguistic; linguistic - musical) resulted in greater interference than silence conditions, with matched interference conditions producing the greatest interference. Overall, these results suggest that processing of linguistic and musical information draws on at least some of the same cognitive resources.
Participative management and shared leadership: implementing a model.
Noonan, D
1995-01-01
The author identifies the development, implementation and outcomes of a task subgroup model of management that provides a mechanism for shared leadership, planning, decision making, implementation and evaluation by staff, patients and families on a program level. The conceptual model and its operationalization are outlined within the context of the rehabilitation program at the Providence Centre in Scarborough, Ontario.
Creativity and psychopathology: a shared vulnerability model.
Carson, Shelley H
2011-03-01
Creativity is considered a positive personal trait. However, highly creative people have demonstrated elevated risk for certain forms of psychopathology, including mood disorders, schizophrenia spectrum disorders, and alcoholism. A model of shared vulnerability explains the relation between creativity and psychopathology. This model, supported by recent findings from neuroscience and molecular genetics, suggests that the biological determinants conferring risk for psychopathology interact with protective cognitive factors to enhance creative ideation. Elements of shared vulnerability include cognitive disinhibition (which allows more stimuli into conscious awareness), an attentional style driven by novelty salience, and neural hyperconnectivity that may increase associations among disparate stimuli. These vulnerabilities interact with superior meta-cognitive protective factors, such as high IQ, increased working memory capacity, and enhanced cognitive flexibility, to enlarge the range and depth of stimuli available in conscious awareness to be manipulated and combined to form novel and original ideas.
Low working memory capacity is only spuriously related to poor reading comprehension.
Van Dyke, Julie A; Johns, Clinton L; Kukona, Anuenue
2014-06-01
Accounts of comprehension failure, whether in the case of readers with poor skill or when syntactic complexity is high, have overwhelmingly implicated working memory capacity as the key causal factor. However, extant research suggests that this position is not well supported by evidence on the span of active memory during online sentence processing, nor is it well motivated by models that make explicit claims about the memory mechanisms that support language processing. The current study suggests that sensitivity to interference from similar items in memory may provide a better explanation of comprehension failure. Through administration of a comprehensive skill battery, we found that the previously observed association of working memory with comprehension is likely due to the collinearity of working memory with many other reading-related skills, especially IQ. In analyses which removed variance shared with IQ, we found that receptive vocabulary knowledge was the only significant predictor of comprehension performance in our task out of a battery of 24 skill measures. In addition, receptive vocabulary and non-verbal memory for serial order-but not simple verbal memory or working memory-were the only predictors of reading times in the region where interference had its primary affect. We interpret these results in light of a model that emphasizes retrieval interference and the quality of lexical representations as key determinants of successful comprehension. Copyright © 2014 Elsevier B.V. All rights reserved.
El-Zawawy, Mohamed A.
2014-01-01
This paper introduces new approaches for the analysis of frequent statement and dereference elimination for imperative and object-oriented distributed programs running on parallel machines equipped with hierarchical memories. The paper uses languages whose address spaces are globally partitioned. Distributed programs allow defining data layout and threads writing to and reading from other thread memories. Three type systems (for imperative distributed programs) are the tools of the proposed techniques. The first type system defines for every program point a set of calculated (ready) statements and memory accesses. The second type system uses an enriched version of types of the first type system and determines which of the ready statements and memory accesses are used later in the program. The third type system uses the information gather so far to eliminate unnecessary statement computations and memory accesses (the analysis of frequent statement and dereference elimination). Extensions to these type systems are also presented to cover object-oriented distributed programs. Two advantages of our work over related work are the following. The hierarchical style of concurrent parallel computers is similar to the memory model used in this paper. In our approach, each analysis result is assigned a type derivation (serves as a correctness proof). PMID:24892098
HydroShare: A Platform for Collaborative Data and Model Sharing in Hydrology
NASA Astrophysics Data System (ADS)
Tarboton, D. G.; Idaszak, R.; Horsburgh, J. S.; Ames, D. P.; Goodall, J. L.; Couch, A.; Hooper, R. P.; Dash, P. K.; Stealey, M.; Yi, H.; Bandaragoda, C.; Castronova, A. M.
2017-12-01
HydroShare is an online, collaboration system for sharing of hydrologic data, analytical tools, and models. It supports the sharing of and collaboration around "resources" which are defined by standardized content types for data formats and models commonly used in hydrology. With HydroShare you can: Share your data and models with colleagues; Manage who has access to the content that you share; Share, access, visualize and manipulate a broad set of hydrologic data types and models; Use the web services application programming interface (API) to program automated and client access; Publish data and models and obtain a citable digital object identifier (DOI); Aggregate your resources into collections; Discover and access data and models published by others; Use web apps to visualize, analyze and run models on data in HydroShare. This presentation will describe the functionality and architecture of HydroShare highlighting its use as a virtual environment supporting education and research. HydroShare has components that support: (1) resource storage, (2) resource exploration, and (3) web apps for actions on resources. The HydroShare data discovery, sharing and publishing functions as well as HydroShare web apps provide the capability to analyze data and execute models completely in the cloud (servers remote from the user) overcoming desktop platform limitations. The HydroShare GIS app provides a basic capability to visualize spatial data. The HydroShare JupyterHub Notebook app provides flexible and documentable execution of Python code snippets for analysis and modeling in a way that results can be shared among HydroShare users and groups to support research collaboration and education. We will discuss how these developments can be used to support different types of educational efforts in Hydrology where being completely web based is of value in an educational setting as students can all have access to the same functionality regardless of their computer.
An Apple II Implementation of Man-Mod Manpower Planning Model.
1982-03-01
next page. It is highly recommended, to prevent the loss of data, that the user save the data at this point. If Choice (1 ), yes, is selected, the...approximately 30 seconds, but will clear and reload memory preventing any inadvertent memory changes which might cause program interruptions or erroneous cal... prgram . 70 MAN-MOD/PROGRAM (PROGRAM LISTING) 1000 REM MAN-MOD/PROGRAM PROGRAM: "FOR" IS IN QUOTES IN LINES 1004,10518,10520,10524,10526,10528,1072
Kaji, Tomohiro; Hijikata, Atsushi; Ishige, Akiko; Kitami, Toshimori; Watanabe, Takashi; Ohara, Osamu; Yanaka, Noriyuki; Okada, Mariko; Shimoda, Michiko; Taniguchi, Masaru
2016-01-01
Memory CD4+ T cells promote protective humoral immunity; however, how memory T cells acquire this activity remains unclear. This study demonstrates that CD4+ T cells develop into antigen-specific memory T cells that can promote the terminal differentiation of memory B cells far more effectively than their naive T-cell counterparts. Memory T cell development requires the transcription factor B-cell lymphoma 6 (Bcl6), which is known to direct T-follicular helper (Tfh) cell differentiation. However, unlike Tfh cells, memory T cell development did not require germinal center B cells. Curiously, memory T cells that develop in the absence of cognate B cells cannot promote memory B-cell recall responses and this defect was accompanied by down-regulation of genes associated with homeostasis and activation and up-regulation of genes inhibitory for T-cell responses. Although memory T cells display phenotypic and genetic signatures distinct from Tfh cells, both had in common the expression of a group of genes associated with metabolic pathways. This gene expression profile was not shared to any great extent with naive T cells and was not influenced by the absence of cognate B cells during memory T cell development. These results suggest that memory T cell development is programmed by stepwise expression of gatekeeper genes through serial interactions with different types of antigen-presenting cells, first licensing the memory lineage pathway and subsequently facilitating the functional development of memory T cells. Finally, we identified Gdpd3 as a candidate genetic marker for memory T cells. PMID:26714588
Interference due to shared features between action plans is influenced by working memory span.
Fournier, Lisa R; Behmer, Lawrence P; Stubblefield, Alexandra M
2014-12-01
In this study, we examined the interactions between the action plans that we hold in memory and the actions that we carry out, asking whether the interference due to shared features between action plans is due to selection demands imposed on working memory. Individuals with low and high working memory spans learned arbitrary motor actions in response to two different visual events (A and B), presented in a serial order. They planned a response to the first event (A) and while maintaining this action plan in memory they then executed a speeded response to the second event (B). Afterward, they executed the action plan for the first event (A) maintained in memory. Speeded responses to the second event (B) were delayed when it shared an action feature (feature overlap) with the first event (A), relative to when it did not (no feature overlap). The size of the feature-overlap delay was greater for low-span than for high-span participants. This indicates that interference due to overlapping action plans is greater when fewer working memory resources are available, suggesting that this interference is due to selection demands imposed on working memory. Thus, working memory plays an important role in managing current and upcoming action plans, at least for newly learned tasks. Also, managing multiple action plans is compromised in individuals who have low versus high working memory spans.
Destination Memory Impairment in Older People
Gopie, Nigel; Craik, Fergus I. M.; Hasher, Lynn
2012-01-01
Older adults are assumed to have poor destination memory— knowing to whom they tell particular information—and anecdotes about them repeating stories to the same people are cited as informal evidence for this claim. Experiment 1 assessed young and older adults’ destination memory by having participants tell facts (e.g., “A dime has 118 ridges around its edge”) to pictures of famous people (e.g., Oprah Winfrey). Surprise recognition memory tests, which also assessed confidence, revealed that older adults, compared to young adults, were disproportionately impaired on destination memory relative to spared memory for the individual components (i.e., facts, faces) of the episode. Older adults also were more confident that they had not told a fact to a particular person when they actually had (i.e., a miss); this presumably causes them to repeat information more often than young adults. When the direction of information transfer was reversed in Experiment 2, such that the famous people shared information with the participants (i.e., a source memory experiment), age-related memory differences disappeared. In contrast to the destination memory experiment, older adults in the source memory experiment were more confident than young adults that someone had shared a fact with them when a different person actually had shared the fact (i.e., a false alarm). Overall, accuracy and confidence jointly influence age-related changes to destination memory, a fundamental component of successful communication. PMID:20718537
GPU-accelerated phase-field simulation of dendritic solidification in a binary alloy
NASA Astrophysics Data System (ADS)
Yamanaka, Akinori; Aoki, Takayuki; Ogawa, Satoi; Takaki, Tomohiro
2011-03-01
The phase-field simulation for dendritic solidification of a binary alloy has been accelerated by using a graphic processing unit (GPU). To perform the phase-field simulation of the alloy solidification on GPU, a program code was developed with computer unified device architecture (CUDA). In this paper, the implementation technique of the phase-field model on GPU is presented. Also, we evaluated the acceleration performance of the three-dimensional solidification simulation by using a single NVIDIA TESLA C1060 GPU and the developed program code. The results showed that the GPU calculation for 5763 computational grids achieved the performance of 170 GFLOPS by utilizing the shared memory as a software-managed cache. Furthermore, it can be demonstrated that the computation with the GPU is 100 times faster than that with a single CPU core. From the obtained results, we confirmed the feasibility of realizing a real-time full three-dimensional phase-field simulation of microstructure evolution on a personal desktop computer.
Semantic graphs and associative memories
NASA Astrophysics Data System (ADS)
Pomi, Andrés; Mizraji, Eduardo
2004-12-01
Graphs have been increasingly utilized in the characterization of complex networks from diverse origins, including different kinds of semantic networks. Human memories are associative and are known to support complex semantic nets; these nets are represented by graphs. However, it is not known how the brain can sustain these semantic graphs. The vision of cognitive brain activities, shown by modern functional imaging techniques, assigns renewed value to classical distributed associative memory models. Here we show that these neural network models, also known as correlation matrix memories, naturally support a graph representation of the stored semantic structure. We demonstrate that the adjacency matrix of this graph of associations is just the memory coded with the standard basis of the concept vector space, and that the spectrum of the graph is a code invariant of the memory. As long as the assumptions of the model remain valid this result provides a practical method to predict and modify the evolution of the cognitive dynamics. Also, it could provide us with a way to comprehend how individual brains that map the external reality, almost surely with different particular vector representations, are nevertheless able to communicate and share a common knowledge of the world. We finish presenting adaptive association graphs, an extension of the model that makes use of the tensor product, which provides a solution to the known problem of branching in semantic nets.
NPS Collaborative Technology Testbed for ONR CKM Program
2005-01-11
or have access to the MIT E-Wall hosted by the TOC. The combination of E-Wall and agents lend themselves to the dynamic gathering and display of...display, intuitive icons or menus that is easy to activate and customize , and automatically seeks and connects to other like services/networks/agents...integration creates network- centric memory mechanism for developing shared understanding of SA events Data Base Integration of Sensor-DM Agents and
Verified Separate Compilation for C
2015-06-01
simulations, says that the visible set is closed under reachability. These two conditions, plus (6.2) and monotonicity of the REACH relation, imply...erase to a CompCert memory m. By erasure, we mean the removal of the “ juice ” that is unnecessary for execution (as in Curry-style type erasure of...simply typed lambda calculus). The “ juice ” has several components: permission shares controlling access to objects in the program logic; predicates in the
Literacy outcomes of children with early childhood speech sound disorders: impact of endophenotypes.
Lewis, Barbara A; Avrich, Allison A; Freebairn, Lisa A; Hansen, Amy J; Sucheston, Lara E; Kuo, Iris; Taylor, H Gerry; Iyengar, Sudha K; Stein, Catherine M
2011-12-01
To demonstrate that early childhood speech sound disorders (SSD) and later school-age reading, written expression, and spelling skills are influenced by shared endophenotypes that may be in part genetic. Children with SSD and their siblings were assessed at early childhood (ages 4-6 years) and followed at school age (7-12 years). The relationship of shared endophenotypes with early childhood SSD and school-age outcomes and the shared genetic influences on these outcomes were examined. Structural equation modeling demonstrated that oral motor skills, phonological awareness, phonological memory, vocabulary, and speeded naming have varying influences on reading decoding, spelling, spoken language, and written expression at school age. Genetic linkage studies demonstrated linkage for reading, spelling, and written expression measures to regions on chromosomes 1, 3, 6, and 15 that were previously linked to oral motor skills, articulation, phonological memory, and vocabulary at early childhood testing. Endophenotypes predict school-age literacy outcomes over and above that predicted by clinical diagnoses of SSD or language impairment. Findings suggest that these shared endophenotypes and common genetic influences affect early childhood SSD and later school-age reading, spelling, spoken language, and written expression skills.
Literacy Outcomes of Children With Early Childhood Speech Sound Disorders: Impact of Endophenotypes
Lewis, Barbara A.; Avrich, Allison A.; Freebairn, Lisa A.; Hansen, Amy J.; Sucheston, Lara E.; Kuo, Iris; Taylor, H. Gerry; Iyengar, Sudha K.; Stein, Catherine M.
2012-01-01
Purpose To demonstrate that early childhood speech sound disorders (SSD) and later school-age reading, written expression, and spelling skills are influenced by shared endophenotypes that may be in part genetic. Method Children with SSD and their siblings were assessed at early childhood (ages 4–6 years) and followed at school age (7–12 years). The relationship of shared endophenotypes with early childhood SSD and school-age outcomes and the shared genetic influences on these outcomes were examined. Results Structural equation modeling demonstrated that oral motor skills, phonological awareness, phonological memory, vocabulary, and speeded naming have varying influences on reading decoding, spelling, spoken language, and written expression at school age. Genetic linkage studies demonstrated linkage for reading, spelling, and written expression measures to regions on chromosomes 1, 3, 6, and 15 that were previously linked to oral motor skills, articulation, phonological memory, and vocabulary at early childhood testing. Conclusions Endophenotypes predict school-age literacy outcomes over and above that predicted by clinical diagnoses of SSD or language impairment. Findings suggest that these shared endophenotypes and common genetic influences affect early childhood SSD and later school-age reading, spelling, spoken language, and written expression skills. PMID:21930616
Parallel programming with Easy Java Simulations
NASA Astrophysics Data System (ADS)
Esquembre, F.; Christian, W.; Belloni, M.
2018-01-01
Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.
NASA Astrophysics Data System (ADS)
Fonseca, R. A.; Vieira, J.; Fiuza, F.; Davidson, A.; Tsung, F. S.; Mori, W. B.; Silva, L. O.
2013-12-01
A new generation of laser wakefield accelerators (LWFA), supported by the extreme accelerating fields generated in the interaction of PW-Class lasers and underdense targets, promises the production of high quality electron beams in short distances for multiple applications. Achieving this goal will rely heavily on numerical modelling to further understand the underlying physics and identify optimal regimes, but large scale modelling of these scenarios is computationally heavy and requires the efficient use of state-of-the-art petascale supercomputing systems. We discuss the main difficulties involved in running these simulations and the new developments implemented in the OSIRIS framework to address these issues, ranging from multi-dimensional dynamic load balancing and hybrid distributed/shared memory parallelism to the vectorization of the PIC algorithm. We present the results of the OASCR Joule Metric program on the issue of large scale modelling of LWFA, demonstrating speedups of over 1 order of magnitude on the same hardware. Finally, scalability to over ˜106 cores and sustained performance over ˜2 P Flops is demonstrated, opening the way for large scale modelling of LWFA scenarios.
Low latency memory access and synchronization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.
A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processormore » only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.« less
Low latency memory access and synchronization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.
A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processormore » only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.« less
OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers
NASA Astrophysics Data System (ADS)
Kimura, Keiji; Mase, Masayoshi; Mikami, Hiroki; Miyamoto, Takamichi; Shirako, Jun; Kasahara, Hironori
OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an METI/NEDO project entitled "Multicore Technology for Realtime Consumer Electronics." By using the OSCAR API as an interface between the OSCAR compiler and backend compilers, the OSCAR compiler enables hierarchical multigrain parallel processing with memory optimization under capacity restriction for cache memory, local memory, distributed shared memory, and on-chip/off-chip shared memory; data transfer using a DMA controller; and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating for various embedded multicores. In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OSCAR API is designed on a subset of the OpenMP. This paper describes the OSCAR API and its compatibility with the OSCAR compiler by showing code examples. Performance evaluations of the OSCAR compiler and the OSCAR API are carried out using an IBM Power5+ workstation, an IBM Power6 high-end SMP server, and a newly developed consumer electronics multicore chip RP2 by Renesas, Hitachi and Waseda. From the results of scalability evaluation, it is found that on an average, the OSCAR compiler with the OSCAR API can exploit 5.8 times speedup over the sequential execution on the Power5+ workstation with eight cores and 2.9 times speedup on RP2 with four cores, respectively. In addition, the OSCAR compiler can accelerate an IBM XL Fortran compiler up to 3.3 times on the Power6 SMP server. Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.
Location-Unbound Color-Shape Binding Representations in Visual Working Memory.
Saiki, Jun
2016-02-01
The mechanism by which nonspatial features, such as color and shape, are bound in visual working memory, and the role of those features' location in their binding, remains unknown. In the current study, I modified a redundancy-gain paradigm to investigate these issues. A set of features was presented in a two-object memory display, followed by a single object probe. Participants judged whether the probe contained any features of the memory display, regardless of its location. Response time distributions revealed feature coactivation only when both features of a single object in the memory display appeared together in the probe, regardless of the response time benefit from the probe and memory objects sharing the same location. This finding suggests that a shared location is necessary in the formation of bound representations but unnecessary in their maintenance. Electroencephalography data showed that amplitude modulations reflecting location-unbound feature coactivation were different from those reflecting the location-sharing benefit, consistent with the behavioral finding that feature-location binding is unnecessary in the maintenance of color-shape binding. © The Author(s) 2015.
Echterhoff, Gerald; Kopietz, René; Higgins, E Tory
2017-06-01
Communicators typically tune messages to their audience's attitude. Such audience tuning biases communicators' memory for the topic toward the audience's attitude to the extent that they create a shared reality with the audience. To investigate shared reality in intergroup communication, we first established that a reduced memory bias after tuning messages to an out-group (vs. in-group) audience is a subtle index of communicators' denial of shared reality to that out-group audience (Experiments 1a and 1b). We then examined whether the audience-tuning memory bias might emerge when the out-group audience's epistemic authority is enhanced, either by increasing epistemic expertise concerning the communication topic or by creating epistemic consensus among members of a multiperson out-group audience. In Experiment 2, when Germans communicated to a Turkish audience with an attitude about a Turkish (vs. German) target, the audience-tuning memory bias appeared. In Experiment 3, when the audience of German communicators consisted of 3 Turks who all held the same attitude toward the target, the memory bias again appeared. The association between message valence and memory valence was consistently higher when the audience's epistemic authority was high (vs. low). An integrative analysis across all studies also suggested that the memory bias increases with increasing strength of epistemic inputs (epistemic expertise, epistemic consensus, and audience-tuned message production). The findings suggest novel ways of overcoming intergroup biases in intergroup relations. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bartlett, Roscoe Ainsworth
2010-05-01
The ubiquitous use of raw pointers in higher-level code is the primary cause of all memory usage problems and memory leaks in C++ programs. This paper describes what might be considered a radical approach to the problem which is to encapsulate the use of all raw pointers and all raw calls to new and delete in higher-level C++ code. Instead, a set of cooperating template classes developed in the Trilinos package Teuchos are used to encapsulate every use of raw C++ pointers in every use case where it appears in high-level code. Included in the set of memory management classesmore » is the typical reference-counted smart pointer class similar to boost::shared ptr (and therefore C++0x std::shared ptr). However, what is missing in boost and the new standard library are non-reference counted classes for remaining use cases where raw C++ pointers would need to be used. These classes have a debug build mode where nearly all programmer errors are caught and gracefully reported at runtime. The default optimized build mode strips all runtime checks and allows the code to perform as efficiently as raw C++ pointers with reasonable usage. Also included is a novel approach for dealing with the circular references problem that imparts little extra overhead and is almost completely invisible to most of the code (unlike the boost and therefore C++0x approach). Rather than being a radical approach, encapsulating all raw C++ pointers is simply the logical progression of a trend in the C++ development and standards community that started with std::auto ptr and is continued (but not finished) with std::shared ptr in C++0x. Using the Teuchos reference-counted memory management classes allows one to remove unnecessary constraints in the use of objects by removing arbitrary lifetime ordering constraints which are a type of unnecessary coupling [23]. The code one writes with these classes will be more likely to be correct on first writing, will be less likely to contain silent (but deadly) memory usage errors, and will be much more robust to later refactoring and maintenance. The level of debug-mode runtime checking provided by the Teuchos memory management classes is stronger in many respects than what is provided by memory checking tools like Valgrind and Purify while being much less expensive. However, tools like Valgrind and Purify perform a number of types of checks (like usage of uninitialized memory) that makes these tools very valuable and therefore complement the Teuchos memory management debug-mode runtime checking. The Teuchos memory management classes and idioms largely address the technical issues in resolving the fragile built-in C++ memory management model (with the exception of circular references which has no easy solution but can be managed as discussed). All that remains is to teach these classes and idioms and expand their usage in C++ codes. The long-term viability of C++ as a usable and productive language depends on it. Otherwise, if C++ is no safer than C, then is the greater complexity of C++ worth what one gets as extra features? Given that C is smaller and easier to learn than C++ and since most programmers don't know object-orientation (or templates or X, Y, and Z features of C++) all that well anyway, then what really are most programmers getting extra out of C++ that would outweigh the extra complexity of C++ over C? C++ zealots will argue this point but the reality is that C++ popularity has peaked and is becoming less popular while the popularity of C has remained fairly stable over the last decade22. Idioms like are advocated in this paper can help to avert this trend but it will require wide community buy-in and a change in the way C++ is taught in order to have the greatest impact. To make these programs more secure, compiler vendors or static analysis tools (e.g. klocwork23) could implement a preprocessor-like language similar to OpenMP24 that would allow the programmer to declare (in comments) that certain blocks of code should be ''pointer-free'' or allow smaller blocks to be 'pointers allowed'. This would significantly improve the robustness of code that uses the memory management classes described here.« less
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing
NASA Technical Reports Server (NTRS)
Fricker, David M.
1997-01-01
The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
The Developmental Influence of Primary Memory Capacity on Working Memory and Academic Achievement
2015-01-01
In this study, we investigate the development of primary memory capacity among children. Children between the ages of 5 and 8 completed 3 novel tasks (split span, interleaved lists, and a modified free-recall task) that measured primary memory by estimating the number of items in the focus of attention that could be spontaneously recalled in serial order. These tasks were calibrated against traditional measures of simple and complex span. Clear age-related changes in these primary memory estimates were observed. There were marked individual differences in primary memory capacity, but each novel measure was predictive of simple span performance. Among older children, each measure shared variance with reading and mathematics performance, whereas for younger children, the interleaved lists task was the strongest single predictor of academic ability. We argue that these novel tasks have considerable potential for the measurement of primary memory capacity and provide new, complementary ways of measuring the transient memory processes that predict academic performance. The interleaved lists task also shared features with interference control tasks, and our findings suggest that young children have a particular difficulty in resisting distraction and that variance in the ability to resist distraction is also shared with measures of educational attainment. PMID:26075630
The developmental influence of primary memory capacity on working memory and academic achievement.
Hall, Debbora; Jarrold, Christopher; Towse, John N; Zarandi, Amy L
2015-08-01
In this study, we investigate the development of primary memory capacity among children. Children between the ages of 5 and 8 completed 3 novel tasks (split span, interleaved lists, and a modified free-recall task) that measured primary memory by estimating the number of items in the focus of attention that could be spontaneously recalled in serial order. These tasks were calibrated against traditional measures of simple and complex span. Clear age-related changes in these primary memory estimates were observed. There were marked individual differences in primary memory capacity, but each novel measure was predictive of simple span performance. Among older children, each measure shared variance with reading and mathematics performance, whereas for younger children, the interleaved lists task was the strongest single predictor of academic ability. We argue that these novel tasks have considerable potential for the measurement of primary memory capacity and provide new, complementary ways of measuring the transient memory processes that predict academic performance. The interleaved lists task also shared features with interference control tasks, and our findings suggest that young children have a particular difficulty in resisting distraction and that variance in the ability to resist distraction is also shared with measures of educational attainment. (c) 2015 APA, all rights reserved).
Measuring Transactiving Memory Systems Using Network Analysis
ERIC Educational Resources Information Center
King, Kylie Goodell
2017-01-01
Transactive memory systems (TMSs) describe the structures and processes that teams use to share information, work together, and accomplish shared goals. First introduced over three decades ago, TMSs have been measured in a variety of ways. This dissertation proposes the use of network analysis in measuring TMS. This is accomplished by describing…
Operator Influence of Unexploded Ordnance Sensor Technologies
2007-03-01
chart display ActiveX control Mscomct2.dll – date/time display ActiveX control Pnpscr.dll – Systran SCRAMNet replicated shared memory device...response value database rgm_p2.dll – Phase 2 shared memory API and implementation Commercial components StripM.ocx – strip chart display ActiveX
Runtime support for parallelizing data mining algorithms
NASA Astrophysics Data System (ADS)
Jin, Ruoming; Agrawal, Gagan
2002-03-01
With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.
Low working memory capacity is only spuriously related to poor reading comprehension
Van Dyke, Julie A.; Johns, Clinton L.; Kukona, Anuenue
2014-01-01
Accounts of comprehension failure, whether in the case of readers with poor skill or when syntactic complexity is high, have overwhelmingly implicated working memory capacity as the key causal factor. However, extant research suggests that this position is not well supported by evidence on the span of active memory during online sentence processing, nor is it well motivated by models that make explicit claims about the memory mechanisms that support language processing. The current study suggests that sensitivity to interference from similar items in memory may provide a better explanation of comprehension failure. Through administration of a comprehensive skill battery, we found that the previously observed association of working memory with comprehension is likely due to the collinearity of working memory with many other reading-related skills, especially IQ. In analyses which removed variance shared with IQ, we found that receptive vocabulary knowledge was the only significant predictor of comprehension performance in our task out of a battery of 24 skill measures. In addition, receptive vocabulary and non-verbal memory for serial order—but not simple verbal memory or working memory—were the only predictors of reading times in the region where interference had its primary affect. We interpret these results in light of a model that emphasizes retrieval interference and the quality of lexical representations as key determinants of successful comprehension. PMID:24657820
Concurrent working memory load can facilitate selective attention: evidence for specialized load.
Park, Soojin; Kim, Min-Shik; Chun, Marvin M
2007-10-01
Load theory predicts that concurrent working memory load impairs selective attention and increases distractor interference (N. Lavie, A. Hirst, J. W. de Fockert, & E. Viding). Here, the authors present new evidence that the type of concurrent working memory load determines whether load impairs selective attention or not. Working memory load was paired with a same/different matching task that required focusing on targets while ignoring distractors. When working memory items shared the same limited-capacity processing mechanisms with targets in the matching task, distractor interference increased. However, when working memory items shared processing with distractors in the matching task, distractor interference decreased, facilitating target selection. A specialized load account is proposed to describe the dissociable effects of working memory load on selective processing depending on whether the load overlaps with targets or with distractors. (c) 2007 APA
NASA Astrophysics Data System (ADS)
Buszko, Marian L.; Buszko, Dominik; Wang, Daniel C.
1998-04-01
A custom-written Common Gateway Interface (CGI) program for remote control of an NMR spectrometer using a World Wide Web browser has been described. The program, running on a UNIX workstation, uses multiple processes to handle concurrent tasks of interacting with the user and with the spectrometer. The program's parent process communicates with the browser and sends out commands to the spectrometer; the child process is mainly responsible for data acquisition. Communication between the processes is via the shared memory mechanism. The WWW pages that have been developed for the system make use of the frames feature of web browsers. The CGI program provides an intuitive user interface to the NMR spectrometer, making, in effect, a complex system an easy-to-use Web appliance.
Prins, Pjotr; Goto, Naohisa; Yates, Andrew; Gautier, Laurent; Willis, Scooter; Fields, Christopher; Katayama, Toshiaki
2012-01-01
Open-source software (OSS) encourages computer programmers to reuse software components written by others. In evolutionary bioinformatics, OSS comes in a broad range of programming languages, including C/C++, Perl, Python, Ruby, Java, and R. To avoid writing the same functionality multiple times for different languages, it is possible to share components by bridging computer languages and Bio* projects, such as BioPerl, Biopython, BioRuby, BioJava, and R/Bioconductor. In this chapter, we compare the two principal approaches for sharing software between different programming languages: either by remote procedure call (RPC) or by sharing a local call stack. RPC provides a language-independent protocol over a network interface; examples are RSOAP and Rserve. The local call stack provides a between-language mapping not over the network interface, but directly in computer memory; examples are R bindings, RPy, and languages sharing the Java Virtual Machine stack. This functionality provides strategies for sharing of software between Bio* projects, which can be exploited more often. Here, we present cross-language examples for sequence translation, and measure throughput of the different options. We compare calling into R through native R, RSOAP, Rserve, and RPy interfaces, with the performance of native BioPerl, Biopython, BioJava, and BioRuby implementations, and with call stack bindings to BioJava and the European Molecular Biology Open Software Suite. In general, call stack approaches outperform native Bio* implementations and these, in turn, outperform RPC-based approaches. To test and compare strategies, we provide a downloadable BioNode image with all examples, tools, and libraries included. The BioNode image can be run on VirtualBox-supported operating systems, including Windows, OSX, and Linux.
Weighted integration of short-term memory and sensory signals in the oculomotor system.
Deravet, Nicolas; Blohm, Gunnar; de Xivry, Jean-Jacques Orban; Lefèvre, Philippe
2018-05-01
Oculomotor behaviors integrate sensory and prior information to overcome sensory-motor delays and noise. After much debate about this process, reliability-based integration has recently been proposed and several models of smooth pursuit now include recurrent Bayesian integration or Kalman filtering. However, there is a lack of behavioral evidence in humans supporting these theoretical predictions. Here, we independently manipulated the reliability of visual and prior information in a smooth pursuit task. Our results show that both smooth pursuit eye velocity and catch-up saccade amplitude were modulated by visual and prior information reliability. We interpret these findings as the continuous reliability-based integration of a short-term memory of target motion with visual information, which support modeling work. Furthermore, we suggest that saccadic and pursuit systems share this short-term memory. We propose that this short-term memory of target motion is quickly built and continuously updated, and constitutes a general building block present in all sensorimotor systems.
Transactive memory systems scale for couples: development and validation
Hewitt, Lauren Y.; Roberts, Lynne D.
2015-01-01
People in romantic relationships can develop shared memory systems by pooling their cognitive resources, allowing each person access to more information but with less cognitive effort. Research examining such memory systems in romantic couples largely focuses on remembering word lists or performing lab-based tasks, but these types of activities do not capture the processes underlying couples’ transactive memory systems, and may not be representative of the ways in which romantic couples use their shared memory systems in everyday life. We adapted an existing measure of transactive memory systems for use with romantic couples (TMSS-C), and conducted an initial validation study. In total, 397 participants who each identified as being a member of a romantic relationship of at least 3 months duration completed the study. The data provided a good fit to the anticipated three-factor structure of the components of couples’ transactive memory systems (specialization, credibility and coordination), and there was reasonable evidence of both convergent and divergent validity, as well as strong evidence of test–retest reliability across a 2-week period. The TMSS-C provides a valuable tool that can quickly and easily capture the underlying components of romantic couples’ transactive memory systems. It has potential to help us better understand this intriguing feature of romantic relationships, and how shared memory systems might be associated with other important features of romantic relationships. PMID:25999873
Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Wang, Peng; Plimpton, Steven J
The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less
The Role of Metarepresentation in the Production and Resolution of Referring Expressions.
Horton, William S; Brennan, Susan E
2016-01-01
In this paper we consider the potential role of metarepresentation-the representation of another representation, or as commonly considered within cognitive science, the mental representation of another individual's knowledge and beliefs-in mediating definite reference and common ground in conversation. Using dialogues from a referential communication study in which speakers conversed in succession with two different addressees, we highlight ways in which interlocutors work together to successfully refer to objects, and achieve shared conceptualizations. We briefly review accounts of how such shared conceptualizations could be represented in memory, from simple associations between label and referent, to "triple co-presence" representations that track interlocutors in an episode of referring, to more elaborate metarepresentations that invoke theory of mind, mutual knowledge, or a model of a conversational partner. We consider how some forms of metarepresentation, once created and activated, could account for definite reference in conversation by appealing to ordinary processes in memory. We conclude that any representations that capture information about others' perspectives are likely to be relatively simple and subject to the same kinds of constraints on attention and memory that influence other kinds of cognitive representations.
NASA Technical Reports Server (NTRS)
Janetzke, David C.; Murthy, Durbha V.
1991-01-01
Aeroelastic analysis is multi-disciplinary and computationally expensive. Hence, it can greatly benefit from parallel processing. As part of an effort to develop an aeroelastic capability on a distributed memory transputer network, a parallel algorithm for the computation of aerodynamic influence coefficients is implemented on a network of 32 transputers. The aerodynamic influence coefficients are calculated using a 3-D unsteady aerodynamic model and a parallel discretization. Efficiencies up to 85 percent were demonstrated using 32 processors. The effect of subtask ordering, problem size, and network topology are presented. A comparison to results on a shared memory computer indicates that higher speedup is achieved on the distributed memory system.
Time and Cognitive Load in Working Memory
ERIC Educational Resources Information Center
Barrouillet, Pierre; Bernardin, Sophie; Portrat, Sophie; Vergauwe, Evie; Camos, Valerie
2007-01-01
According to the time-based resource-sharing model (P. Barrouillet, S. Bernardin, & V. Camos, 2004), the cognitive load a given task involves is a function of the proportion of time during which it captures attention, thus impeding other attention-demanding processes. Accordingly, the present study demonstrates that the disruptive effect on…
Modeling the Coupled Chemo-Thermo-Mechanical Behavior of Amorphous Polymer Networks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zimmerman, Jonathan A.; Nguyen, Thao D.; Xiao, Rui
2015-02-01
Amorphous polymers exhibit a rich landscape of time-dependent behavior including viscoelasticity, structural relaxation, and viscoplasticity. These time-dependent mechanisms can be exploited to achieve shape-memory behavior, which allows the material to store a programmed deformed shape indefinitely and to recover entirely the undeformed shape in response to specific environmental stimulus. The shape-memory performance of amorphous polymers depends on the coordination of multiple physical mechanisms, and considerable opportunities exist to tailor the polymer structure and shape-memory programming procedure to achieve the desired performance. The goal of this project was to use a combination of theoretical, numerical and experimental methods to investigate themore » effect of shape memory programming, thermo-mechanical properties, and physical and environmental aging on the shape memory performance. Physical and environmental aging occurs during storage and through exposure to solvents, such as water, and can significantly alter the viscoelastic behavior and shape memory behavior of amorphous polymers. This project – executed primarily by Professor Thao Nguyen and Graduate Student Rui Xiao at Johns Hopkins University in support of a DOE/NNSA Presidential Early Career Award in Science and Engineering (PECASE) – developed a theoretical framework for chemothermo- mechanical behavior of amorphous polymers to model the effects of physical aging and solvent-induced environmental factors on their thermoviscoelastic behavior.« less
DMA shared byte counters in a parallel computer
Chen, Dong; Gara, Alan G.; Heidelberger, Philip; Vranas, Pavlos
2010-04-06
A parallel computer system is constructed as a network of interconnected compute nodes. Each of the compute nodes includes at least one processor, a memory and a DMA engine. The DMA engine includes a processor interface for interfacing with the at least one processor, DMA logic, a memory interface for interfacing with the memory, a DMA network interface for interfacing with the network, injection and reception byte counters, injection and reception FIFO metadata, and status registers and control registers. The injection FIFOs maintain memory locations of the injection FIFO metadata memory locations including its current head and tail, and the reception FIFOs maintain the reception FIFO metadata memory locations including its current head and tail. The injection byte counters and reception byte counters may be shared between messages.
Division of attention as a function of the number of steps, visual shifts, and memory load
NASA Technical Reports Server (NTRS)
Chechile, R. A.; Butler, K.; Gutowski, W.; Palmer, E. A.
1986-01-01
The effects on divided attention of visual shifts and long-term memory retrieval during a monitoring task are considered. A concurrent vigilance task was standardized under all experimental conditions. The results show that subjects can perform nearly perfectly on all of the time-shared tasks if long-term memory retrieval is not required for monitoring. With the requirement of memory retrieval, however, there was a large decrease in accuracy for all of the time-shared activities. It was concluded that the attentional demand of longterm memory retrieval is appreciable (even for a well-learned motor sequence), and thus memory retrieval results in a sizable reduction in the capability of subjects to divide their attention. A selected bibliography on the divided attention literature is provided.
[Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].
Furuta, Takuya; Sato, Tatsuhiko
2015-01-01
Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.
Leahy, P.P.
1982-01-01
The Trescott computer program for modeling groundwater flow in three dimensions has been modified to (1) treat aquifer and confining bed pinchouts more realistically and (2) reduce the computer memory requirements needed for the input data. Using the original program, simulation of aquifer systems with nonrectangular external boundaries may result in a large number of nodes that are not involved in the numerical solution of the problem, but require computer storage. (USGS)
Welcoming nora: a family event.
Walsh, Allison J; Walsh, Paul R; Walsh, Jane M; Walsh, Gavin T
2011-01-01
In this column, Allison and Paul Walsh share the story of the birth of Nora, their third baby and their second child to be born at home. Allison and Paul share their individual memories of labor and birth. But their story is only part of the story of Nora's birth. Nora's birth was a family event, with Allison and Paul's other children very much part of the experience. Jane and Gavin share their own memories of their baby sister's birth.
Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster
NASA Technical Reports Server (NTRS)
Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland
2003-01-01
In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Colouring in the Blanks: Memory Drawings of the 1990 Kuwait Invasion
ERIC Educational Resources Information Center
Pepin-Wakefield, Yvonne
2009-01-01
This study used drawing tasks to examine the similarities and differences between females and males who shared a collective traumatic event in early childhood. Could these childhood memories be recorded, measured, and compared for gender differences in drawings by young adults who had shared a similar experience as children? Exploration of this…
ERIC Educational Resources Information Center
Kulkofsky, Sarah; Wang, Qi; Koh, Jessie Bee Kim
2009-01-01
This study examined maternal beliefs about the functions of memory sharing and the relations between these beliefs and mother-child reminiscing behaviors in a cross-cultural context. Sixty-three European American and 47 Chinese mothers completed an open-ended questionnaire concerning their beliefs about the functions of parent-child memory…
Berninger, Virginia W; Abbott, Robert D; Swanson, H Lee; Lovitt, Dan; Trivedi, Pam; Lin, Shin-Ju Cindy; Gould, Laura; Youngstrom, Marci; Shimada, Shirley; Amtmann, Dagmar
2010-04-01
The purpose of this study was to evaluate the contribution of working memory at the word and sentence levels of language to reading and writing outcomes. Measures of working memory at the word and sentence levels, reading and writing, were administered to 2nd (N = 122), 4th (N = 222), and 6th (N = 105) graders. Structural equation modeling was used to evaluate whether the 2 predictor working memory factors contributed unique variance beyond their shared covariance to each of 5 outcome factors: handwriting, spelling, composing, word reading, and reading comprehension. At each grade level, except for handwriting and composing in 6th grade, the word-level working memory factor contributed unique variance to each reading and writing outcome. The text-level working memory factor contributed unique variance to reading comprehension in 4th and 6th grade. The clinical significance of these findings for assessment and intervention is discussed.
Release from output interference in recognition memory: A test of the attention hypothesis.
Criss, Amy H; Salomão, Cristina; Malmberg, Kenneth J; Aue, William; Kılıç, Aslı; Claridge, MarkAvery
2018-05-01
Retrieval results in both costs and benefits to episodic memory. Output interference (OI) refers to the finding that episodic memory accuracy decreases with increasing test trials. Release from OI is the restoration of original accuracy at some point during the test. For example, a release from OI in recognition memory testing occurs when the semantic similarity between stimuli decreases midway through testing, suggesting that item representations stored on early trials cause interference on tests occurring on later trials to the extent that the earlier items share features with the latter items. In two recognition memory experiments, we demonstrate release from OI for words and faces. We also test whether release from OI is the result of interference or is due to a boost in attention caused by reorienting to a novel stimulus type. A test for the foils presented during the initial test list supports the interference account of OI. Implications for models of memory are discussed.
Stillbirth and stigma: the spoiling and repair of multiple social identities.
Brierley-Jones, Lyn; Crawley, Rosalind; Lomax, Samantha; Ayers, Susan
This study investigated mothers' experiences surrounding stillbirth in the United Kingdom, their memory making and sharing opportunities, and the effect these opportunities had on them. Qualitative data were generated from free text responses to open-ended questions. Thematic content analysis revealed that "stigma" was experienced by most women and Goffman's (1963) work on stigma was subsequently used as an analytical framework. Results suggest that stillbirth can spoil the identities of "patient," "mother," and "full citizen." Stigma was reported as arising from interactions with professionals, family, friends, work colleagues, and even casual acquaintances. Stillbirth produces common learning experiences often requiring "identity work" (Murphy, 2012). Memory making and sharing may be important in this work and further research is needed. Stigma can reduce the memory sharing opportunities for women after stillbirth and this may explain some of the differential mental health effects of memory making after stillbirth that is documented in the literature.
Parallelization of KENO-Va Monte Carlo code
NASA Astrophysics Data System (ADS)
Ramón, Javier; Peña, Jorge
1995-07-01
KENO-Va is a code integrated within the SCALE system developed by Oak Ridge that solves the transport equation through the Monte Carlo Method. It is being used at the Consejo de Seguridad Nuclear (CSN) to perform criticality calculations for fuel storage pools and shipping casks. Two parallel versions of the code: one for shared memory machines and other for distributed memory systems using the message-passing interface PVM have been generated. In both versions the neutrons of each generation are tracked in parallel. In order to preserve the reproducibility of the results in both versions, advanced seeds for random numbers were used. The CONVEX C3440 with four processors and shared memory at CSN was used to implement the shared memory version. A FDDI network of 6 HP9000/735 was employed to implement the message-passing version using proprietary PVM. The speedup obtained was 3.6 in both cases.
The New York State Model for Sharing Successful Programs: A Decade of Implementation and Evaluation.
ERIC Educational Resources Information Center
Egelston, Richard L.
To address educational reform needs in New York State, the State Education Department developed a research-based Sharing Successful Practices (SSP) Dissemination model. Under SSP, a program successful in meeting one district's needs can be adopted by other districts with similar needs. SSP has four components: validation, demonstration,…
Parra, Mario A; Mikulan, Ezequiel; Trujillo, Natalia; Sala, Sergio Della; Lopera, Francisco; Manes, Facundo; Starr, John; Ibanez, Agustin
2017-01-01
Alzheimer's disease (AD) as a disconnection syndrome which disrupts both brain information sharing and memory binding functions. The extent to which these two phenotypic expressions share pathophysiological mechanisms remains unknown. To unveil the electrophysiological correlates of integrative memory impairments in AD towards new memory biomarkers for its prodromal stages. Patients with 100% risk of familial AD (FAD) and healthy controls underwent assessment with the Visual Short-Term Memory binding test (VSTMBT) while we recorded their EEG. We applied a novel brain connectivity method (Weighted Symbolic Mutual Information) to EEG data. Patients showed significant deficits during the VSTMBT. A reduction of brain connectivity was observed during resting as well as during correct VSTM binding, particularly over frontal and posterior regions. An increase of connectivity was found during VSTM binding performance over central regions. While decreased connectivity was found in cases in more advanced stages of FAD, increased brain connectivity appeared in cases in earlier stages. Such altered patterns of task-related connectivity were found in 89% of the assessed patients. VSTM binding in the prodromal stages of FAD are associated to altered patterns of brain connectivity thus confirming the link between integrative memory deficits and impaired brain information sharing in prodromal FAD. While significant loss of brain connectivity seems to be a feature of the advanced stages of FAD increased brain connectivity characterizes its earlier stages. These findings are discussed in the light of recent proposals about the earliest pathophysiological mechanisms of AD and their clinical expression. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
How To Create and Conduct a Memory Enhancement Program.
ERIC Educational Resources Information Center
Meyer, Genevieve R.; Ober-Reynolds, Sharman
This report describes Memory Enhancement Group workshops which have been conducted at the Senior Health and Peer Counseling Center in Santa Monica, California and gives basic data regarding outcomes of the workshops. It provides a model of memory as a three-step process of registration or becoming aware, consolidation, and retrieval. It presents…
Continuous-Time Random Walk with multi-step memory: an application to market dynamics
NASA Astrophysics Data System (ADS)
Gubiec, Tomasz; Kutner, Ryszard
2017-11-01
An extended version of the Continuous-Time Random Walk (CTRW) model with memory is herein developed. This memory involves the dependence between arbitrary number of successive jumps of the process while waiting times between jumps are considered as i.i.d. random variables. This dependence was established analyzing empirical histograms for the stochastic process of a single share price on a market within the high frequency time scale. Then, it was justified theoretically by considering bid-ask bounce mechanism containing some delay characteristic for any double-auction market. Our model appeared exactly analytically solvable. Therefore, it enables a direct comparison of its predictions with their empirical counterparts, for instance, with empirical velocity autocorrelation function. Thus, the present research significantly extends capabilities of the CTRW formalism. Contribution to the Topical Issue "Continuous Time Random Walk Still Trendy: Fifty-year History, Current State and Outlook", edited by Ryszard Kutner and Jaume Masoliver.
The Meeting Point: Where Language Production and Working Memory Share Resources.
Ishkhanyan, Byurakn; Boye, Kasper; Mogensen, Jesper
2018-06-07
The interaction between working memory and language processing is widely discussed in cognitive research. However, those studies often explore the relationship between language comprehension and working memory (WM). The role of WM is rarely considered in language production, despite some evidence suggesting a relationship between the two cognitive systems. This study attempts to fill that gap by using a complex span task during language production. We make our predictions based on the reorganization of elementary functions neurocognitive model, a usage based theory about grammatical status, and language production models. In accordance with these theories, we expect an overlap between language production and WM at one or more levels of language planning. Our results show that WM is involved at the phonological encoding level of language production and that adding WM load facilitates language production, which leads us to suggest that an extra task-specific storage is being created while the task is performed.
Method for prefetching non-contiguous data structures
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Brewster, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY
2009-05-05
A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.
Howard, Lauren H; Festa, Cassandra; Lonsdorf, Elizabeth V
2018-05-01
The ability to learn socially is of critical importance across a wide variety of species, as it allows knowledge to be passed quickly among individuals without the need of time-consuming trial-and-error learning. Among primates, social learning research has been particularly focused on foraging tasks, including transmission dynamics and the demonstration characteristics that appear to support social learning. Less work has focused on the attentional salience of the information being viewed, especially in New World monkeys. We used a noninvasive eye-tracking paradigm previously used in human infants and great apes to examine the salience of social modeling for memory in capuchin monkeys. Like human infants and apes, capuchins were significantly more likely to remember an event that included a social model as opposed to a nonsocial model. This article provides some of the first evidence that capuchin memory is altered by the presence of a social model and presents a novel method for assessing cognitive capabilities in this species. Whether this "social memory bias" is shared across the primate order, or is present only in taxa that regularly rely on social information, is an important avenue for future research. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
NASA Technical Reports Server (NTRS)
Chow, Edward T.; Schatzel, Donald V.; Whitaker, William D.; Sterling, Thomas
2008-01-01
A Spaceborne Processor Array in Multifunctional Structure (SPAMS) can lower the total mass of the electronic and structural overhead of spacecraft, resulting in reduced launch costs, while increasing the science return through dynamic onboard computing. SPAMS integrates the multifunctional structure (MFS) and the Gilgamesh Memory, Intelligence, and Network Device (MIND) multi-core in-memory computer architecture into a single-system super-architecture. This transforms every inch of a spacecraft into a sharable, interconnected, smart computing element to increase computing performance while simultaneously reducing mass. The MIND in-memory architecture provides a foundation for high-performance, low-power, and fault-tolerant computing. The MIND chip has an internal structure that includes memory, processing, and communication functionality. The Gilgamesh is a scalable system comprising multiple MIND chips interconnected to operate as a single, tightly coupled, parallel computer. The array of MIND components shares a global, virtual name space for program variables and tasks that are allocated at run time to the distributed physical memory and processing resources. Individual processor- memory nodes can be activated or powered down at run time to provide active power management and to configure around faults. A SPAMS system is comprised of a distributed Gilgamesh array built into MFS, interfaces into instrument and communication subsystems, a mass storage interface, and a radiation-hardened flight computer.
NASA Astrophysics Data System (ADS)
Fang, Juan; Hao, Xiaoting; Fan, Qingwen; Chang, Zeqing; Song, Shuying
2017-05-01
In the Heterogeneous multi-core architecture, CPU and GPU processor are integrated on the same chip, which poses a new challenge to the last-level cache management. In this architecture, the CPU application and the GPU application execute concurrently, accessing the last-level cache. CPU and GPU have different memory access characteristics, so that they have differences in the sensitivity of last-level cache (LLC) capacity. For many CPU applications, a reduced share of the LLC could lead to significant performance degradation. On the contrary, GPU applications can tolerate increase in memory access latency when there is sufficient thread-level parallelism. Taking into account the GPU program memory latency tolerance characteristics, this paper presents a method that let GPU applications can access to memory directly, leaving lots of LLC space for CPU applications, in improving the performance of CPU applications and does not affect the performance of GPU applications. When the CPU application is cache sensitive, and the GPU application is insensitive to the cache, the overall performance of the system is improved significantly.
Linking agent-based models and stochastic models of financial markets
Feng, Ling; Li, Baowen; Podobnik, Boris; Preis, Tobias; Stanley, H. Eugene
2012-01-01
It is well-known that financial asset returns exhibit fat-tailed distributions and long-term memory. These empirical features are the main objectives of modeling efforts using (i) stochastic processes to quantitatively reproduce these features and (ii) agent-based simulations to understand the underlying microscopic interactions. After reviewing selected empirical and theoretical evidence documenting the behavior of traders, we construct an agent-based model to quantitatively demonstrate that “fat” tails in return distributions arise when traders share similar technical trading strategies and decisions. Extending our behavioral model to a stochastic model, we derive and explain a set of quantitative scaling relations of long-term memory from the empirical behavior of individual market participants. Our analysis provides a behavioral interpretation of the long-term memory of absolute and squared price returns: They are directly linked to the way investors evaluate their investments by applying technical strategies at different investment horizons, and this quantitative relationship is in agreement with empirical findings. Our approach provides a possible behavioral explanation for stochastic models for financial systems in general and provides a method to parameterize such models from market data rather than from statistical fitting. PMID:22586086
Linking agent-based models and stochastic models of financial markets.
Feng, Ling; Li, Baowen; Podobnik, Boris; Preis, Tobias; Stanley, H Eugene
2012-05-29
It is well-known that financial asset returns exhibit fat-tailed distributions and long-term memory. These empirical features are the main objectives of modeling efforts using (i) stochastic processes to quantitatively reproduce these features and (ii) agent-based simulations to understand the underlying microscopic interactions. After reviewing selected empirical and theoretical evidence documenting the behavior of traders, we construct an agent-based model to quantitatively demonstrate that "fat" tails in return distributions arise when traders share similar technical trading strategies and decisions. Extending our behavioral model to a stochastic model, we derive and explain a set of quantitative scaling relations of long-term memory from the empirical behavior of individual market participants. Our analysis provides a behavioral interpretation of the long-term memory of absolute and squared price returns: They are directly linked to the way investors evaluate their investments by applying technical strategies at different investment horizons, and this quantitative relationship is in agreement with empirical findings. Our approach provides a possible behavioral explanation for stochastic models for financial systems in general and provides a method to parameterize such models from market data rather than from statistical fitting.
A shared resource between declarative memory and motor memory.
Keisler, Aysha; Shadmehr, Reza
2010-11-03
The neural systems that support motor adaptation in humans are thought to be distinct from those that support the declarative system. Yet, during motor adaptation changes in motor commands are supported by a fast adaptive process that has important properties (rapid learning, fast decay) that are usually associated with the declarative system. The fast process can be contrasted to a slow adaptive process that also supports motor memory, but learns gradually and shows resistance to forgetting. Here we show that after people stop performing a motor task, the fast motor memory can be disrupted by a task that engages declarative memory, but the slow motor memory is immune from this interference. Furthermore, we find that the fast/declarative component plays a major role in the consolidation of the slow motor memory. Because of the competitive nature of declarative and nondeclarative memory during consolidation, impairment of the fast/declarative component leads to improvements in the slow/nondeclarative component. Therefore, the fast process that supports formation of motor memory is not only neurally distinct from the slow process, but it shares critical resources with the declarative memory system.
A shared resource between declarative memory and motor memory
Keisler, Aysha; Shadmehr, Reza
2010-01-01
The neural systems that support motor adaptation in humans are thought to be distinct from those that support the declarative system. Yet, during motor adaptation changes in motor commands are supported by a fast adaptive process that has important properties (rapid learning, fast decay) that are usually associated with the declarative system. The fast process can be contrasted to a slow adaptive process that also supports motor memory, but learns gradually and shows resistance to forgetting. Here we show that after people stop performing a motor task, the fast motor memory can be disrupted by a task that engages declarative memory, but the slow motor memory is immune from this interference. Furthermore, we find that the fast/declarative component plays a major role in the consolidation of the slow motor memory. Because of the competitive nature of declarative and non-declarative memory during consolidation, impairment of the fast/declarative component leads to improvements in the slow/non-declarative component. Therefore, the fast process that supports formation of motor memory is not only neurally distinct from the slow process, but it shares critical resources with the declarative memory system. PMID:21048140
Optimizing ROOT’s Performance Using C++ Modules
NASA Astrophysics Data System (ADS)
Vassilev, Vassil
2017-10-01
ROOT comes with a C++ compliant interpreter cling. Cling needs to understand the content of the libraries in order to interact with them. Exposing the full shared library descriptors to the interpreter at runtime translates into increased memory footprint. ROOT’s exploratory programming concepts allow implicit and explicit runtime shared library loading. It requires the interpreter to load the library descriptor. Re-parsing of descriptors’ content has a noticeable effect on the runtime performance. Present state-of-art lazy parsing technique brings the runtime performance to reasonable levels but proves to be fragile and can introduce correctness issues. An elegant solution is to load information from the descriptor lazily and in a non-recursive way. The LLVM community advances its C++ Modules technology providing an io-efficient, on-disk representation capable to reduce build times and peak memory usage. The feature is standardized as a C++ technical specification. C++ Modules are a flexible concept, which can be employed to match CMS and other experiments’ requirement for ROOT: to optimize both runtime memory usage and performance. Cling technically “inherits” the feature, however tweaking it to ROOT scale and beyond is a complex endeavor. The paper discusses the status of the C++ Modules in the context of ROOT, supported by few preliminary performance results. It shows a step-by-step migration plan and describes potential challenges which could appear.
VIRTUAL FRAME BUFFER INTERFACE
NASA Technical Reports Server (NTRS)
Wolfe, T. L.
1994-01-01
Large image processing systems use multiple frame buffers with differing architectures and vendor supplied user interfaces. This variety of architectures and interfaces creates software development, maintenance, and portability problems for application programs. The Virtual Frame Buffer Interface program makes all frame buffers appear as a generic frame buffer with a specified set of characteristics, allowing programmers to write code which will run unmodified on all supported hardware. The Virtual Frame Buffer Interface converts generic commands to actual device commands. The virtual frame buffer consists of a definition of capabilities and FORTRAN subroutines that are called by application programs. The virtual frame buffer routines may be treated as subroutines, logical functions, or integer functions by the application program. Routines are included that allocate and manage hardware resources such as frame buffers, monitors, video switches, trackballs, tablets and joysticks; access image memory planes; and perform alphanumeric font or text generation. The subroutines for the various "real" frame buffers are in separate VAX/VMS shared libraries allowing modification, correction or enhancement of the virtual interface without affecting application programs. The Virtual Frame Buffer Interface program was developed in FORTRAN 77 for a DEC VAX 11/780 or a DEC VAX 11/750 under VMS 4.X. It supports ADAGE IK3000, DEANZA IP8500, Low Resolution RAMTEK 9460, and High Resolution RAMTEK 9460 Frame Buffers. It has a central memory requirement of approximately 150K. This program was developed in 1985.
A Comparison of Three Programming Models for Adaptive Applications
NASA Technical Reports Server (NTRS)
Shan, Hong-Zhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswa, Rupak; Kwak, Dochan (Technical Monitor)
2000-01-01
We study the performance and programming effort for two major classes of adaptive applications under three leading parallel programming models. We find that all three models can achieve scalable performance on the state-of-the-art multiprocessor machines. The basic parallel algorithms needed for different programming models to deliver their best performance are similar, but the implementations differ greatly, far beyond the fact of using explicit messages versus implicit loads/stores. Compared with MPI and SHMEM, CC-SAS (cache-coherent shared address space) provides substantial ease of programming at the conceptual and program orchestration level, which often leads to the performance gain. However it may also suffer from the poor spatial locality of physically distributed shared data on large number of processors. Our CC-SAS implementation of the PARMETIS partitioner itself runs faster than in the other two programming models, and generates more balanced result for our application.
Multiprogramming performance degradation - Case study on a shared memory multiprocessor
NASA Technical Reports Server (NTRS)
Dimpsey, R. T.; Iyer, R. K.
1989-01-01
The performance degradation due to multiprogramming overhead is quantified for a parallel-processing machine. Measurements of real workloads were taken, and it was found that there is a moderate correlation between the completion time of a program and the amount of system overhead measured during program execution. Experiments in controlled environments were then conducted to calculate a lower bound on the performance degradation of parallel jobs caused by multiprogramming overhead. The results show that the multiprogramming overhead of parallel jobs consumes at least 4 percent of the processor time. When two or more serial jobs are introduced into the system, this amount increases to 5.3 percent
Parallel computation with the force
NASA Technical Reports Server (NTRS)
Jordan, H. F.
1985-01-01
A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.
Ma, Weiwei; Wu, Mengnan; Zhou, Siyan; Tao, Ye; Xie, Zuolei; Zhong, Yi
2018-05-20
Emerging evidence suggests that neuro-inflammation begins early and drives the pathogenesis of Alzheimer's disease (AD), and anti-inflammatory therapies are under clinical development. However, several anti-inflammatory compounds failed to improve memory in clinical trials, indicating that reducing inflammation alone might not be enough. On the other hand, neuro-inflammation is implicated in a number of mental disorders which share the same therapeutic targets. Based on these observations, we screened a batch of genes related with mental disorder and neuro-inflammation in a classical olfactory conditioning in an amyloid beta (Aβ) overexpression fly model. A Smoothened (SMO) mutant was identified as a genetic modifier of Aβ toxicity in 3-min memory and downregulation of SMO rescued Aβ-induced 3-min and 1-h memory deficiency. Also, Aβ activated innate inflammatory response in fly by increasing the expression of antimicrobial peptides, which were alleviated by downregulating SMO. Furthermore, pharmaceutical administration of a SMO antagonist LDE rescued Aβ-induced upregulation of SMO in astrocytes of mouse hippocampus, improved memory in Morris water maze (MWM), and reduced expression of astrocyte secreting pro-inflammatory factors IL-1β, TNFα and the microglia marker IBA-1 in an APP/PS1 transgenic mouse model. Our study suggests that SMO is an important conserved modulator of Aβ toxicity in both fly and mouse models of AD. Copyright © 2018. Published by Elsevier Ltd.
ERIC Educational Resources Information Center
Schweppe, Judith; Rummer, Ralf
2007-01-01
The general idea of language-based accounts of short-term memory is that retention of linguistic materials is based on representations within the language processing system. In the present sentence recall study, we address the question whether the assumption of shared representations holds for morphosyntactic information (here: grammatical gender…
The Precategorical Nature of Visual Short-Term Memory
ERIC Educational Resources Information Center
Quinlan, Philip T.; Cohen, Dale J.
2016-01-01
We conducted a series of recognition experiments that assessed whether visual short-term memory (VSTM) is sensitive to shared category membership of to-be-remembered (tbr) images of common objects. In Experiment 1 some of the tbr items shared the same basic level category (e.g., hand axe): Such items were no better retained than others. In the…
NASA Technical Reports Server (NTRS)
Shalkhauser, Mary JO; Quintana, Jorge A.; Soni, Nitin J.
1994-01-01
The NASA Lewis Research Center is developing a multichannel communication signal processing satellite (MCSPS) system which will provide low data rate, direct to user, commercial communications services. The focus of current space segment developments is a flexible, high-throughput, fault tolerant onboard information switching processor. This information switching processor (ISP) is a destination-directed packet switch which performs both space and time switching to route user information among numerous user ground terminals. Through both industry study contracts and in-house investigations, several packet switching architectures were examined. A contention-free approach, the shared memory per beam architecture, was selected for implementation. The shared memory per beam architecture, fault tolerance insertion, implementation, and demonstration plans are described.
The performance of disk arrays in shared-memory database machines
NASA Technical Reports Server (NTRS)
Katz, Randy H.; Hong, Wei
1993-01-01
In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metric data temperature as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley.
The infamous among us: Enhanced reputational memory for uncooperative ingroup members.
Hechler, Stefanie; Neyer, Franz J; Kessler, Thomas
2016-12-01
People remember uncooperative individuals better than cooperative ones. We hypothesize that this is particularly true when uncooperative individuals belong to one's ingroup, as their behavior violates positive expectations. Two studies examined the effect of minimal group categorization on reputational memory of the social behavior of particular ingroup and outgroup members. We manipulated uncooperative behavior as the unfair sharing of resources with ingroup members (Study 1), or as descriptions of cheating (Study 2). Participants evaluated several uncooperative and cooperative (and neutral) ingroup and outgroup members. In a surprise memory test, they had to recognize target faces and recall their behavior. We disentangled face recognition, reputational memory, and guessing biases with multinomial models of source monitoring. The results show enhanced reputational memory for uncooperative ingroup members, but not uncooperative outgroup members. In contrast, guessing behavior indicated that participants assumed more ingroup cooperation than outgroup cooperation. Our findings integrate prior research on memory for uncooperative person behavior and person memory in group contexts. We suggest that the ability to remember the uncooperative amidst the supposedly cooperative ingroup could stabilize intragroup cooperation. Copyright © 2016 Elsevier B.V. All rights reserved.
MPI, HPF or OpenMP: A Study with the NAS Benchmarks
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Hribar, Michelle; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)
1999-01-01
Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but the task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study,potentials of applying some of the techniques to realistic aerospace applications will be presented
NASA Technical Reports Server (NTRS)
OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)
1998-01-01
This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
MPI, HPF or OpenMP: A Study with the NAS Benchmarks
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Hribar, M.; Waheed, A.; Yan, J.; Saini, Subhash (Technical Monitor)
1999-01-01
Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but this task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study, we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study, potentials of applying some of the techniques to realistic aerospace applications will be presented.
Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Jin, Haoqiang; VanderWijngaart, Rob F.
2003-01-01
We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of grids, but had not previously been captured in bench-marks. The new suite, named NPB Multi-Zone, is extended from the NAS Parallel Benchmarks suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the Message Passing Interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on three different parallel computers. We also use an empirical formula to investigate the performance characteristics of the multi-zone benchmarks.
Buszko; Buszko; Wang
1998-04-01
A custom-written Common Gateway Interface (CGI) program for remote control of an NMR spectrometer using a World Wide Web browser has been described. The program, running on a UNIX workstation, uses multiple processes to handle concurrent tasks of interacting with the user and with the spectrometer. The program's parent process communicates with the browser and sends out commands to the spectrometer; the child process is mainly responsible for data acquisition. Communication between the processes is via the shared memory mechanism. The WWW pages that have been developed for the system make use of the frames feature of web browsers. The CGI program provides an intuitive user interface to the NMR spectrometer, making, in effect, a complex system an easy-to-use Web appliance. Copyright 1998 Academic Press.
Mapping the developmental constraints on working memory span performance.
Bayliss, Donna M; Jarrold, Christopher; Baddeley, Alan D; Gunn, Deborah M; Leigh, Eleanor
2005-07-01
This study investigated the constraints underlying developmental improvements in complex working memory span performance among 120 children of between 6 and 10 years of age. Independent measures of processing efficiency, storage capacity, rehearsal speed, and basic speed of processing were assessed to determine their contribution to age-related variance in complex span. Results showed that developmental improvements in complex span were driven by 2 age-related but separable factors: 1 associated with general speed of processing and 1 associated with storage ability. In addition, there was an age-related contribution shared between working memory, processing speed, and storage ability that was important for higher level cognition. These results pose a challenge for models of complex span performance that emphasize the importance of processing speed alone.
Optical memories in digital computing
NASA Technical Reports Server (NTRS)
Alford, C. O.; Gaylord, T. K.
1979-01-01
High capacity optical memories with relatively-high data-transfer rate and multiport simultaneous access capability may serve as basis for new computer architectures. Several computer structures that might profitably use memories are: a) simultaneous record-access system, b) simultaneously-shared memory computer system, and c) parallel digital processing structure.
Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster
NASA Technical Reports Server (NTRS)
Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)
2002-01-01
In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Neely, J. R.; Hornung, R.; Black, A.
This document serves as a detailed companion to the powerpoint slides presented as part of the ASC L2 milestone review for Integrated Codes milestone #4782 titled “Assess Newly Emerging Programming and Memory Models for Advanced Architectures on Integrated Codes”, due on 9/30/2014, and presented for formal program review on 9/12/2014. The program review committee is represented by Mike Zika (A Program Project Lead for Kull), Brian Pudliner (B Program Project Lead for Ares), Scott Futral (DEG Group Lead in LC), and Mike Glass (Sierra Project Lead at Sandia). This document, along with the presentation materials, and a letter of completionmore » signed by the review committee will act as proof of completion for this milestone.« less
Multiple memory systems as substrates for multiple decision systems
Doll, Bradley B.; Shohamy, Daphna; Daw, Nathaniel D.
2014-01-01
It has recently become widely appreciated that value-based decision making is supported by multiple computational strategies. In particular, animal and human behavior in learning tasks appears to include habitual responses described by prominent model-free reinforcement learning (RL) theories, but also more deliberative or goal-directed actions that can be characterized by a different class of theories, model-based RL. The latter theories evaluate actions by using a representation of the contingencies of the task (as with a learned map of a spatial maze), called an “internal model.” Given the evidence of behavioral and neural dissociations between these approaches, they are often characterized as dissociable learning systems, though they likely interact and share common mechanisms. In many respects, this division parallels a longstanding dissociation in cognitive neuroscience between multiple memory systems, describing, at the broadest level, separate systems for declarative and procedural learning. Procedural learning has notable parallels with model-free RL: both involve learning of habits and both are known to depend on parts of the striatum. Declarative memory, by contrast, supports memory for single events or episodes and depends on the hippocampus. The hippocampus is thought to support declarative memory by encoding temporal and spatial relations among stimuli and thus is often referred to as a relational memory system. Such relational encoding is likely to play an important role in learning an internal model, the representation that is central to model-based RL. Thus, insofar as the memory systems represent more general-purpose cognitive mechanisms that might subserve performance on many sorts of tasks including decision making, these parallels raise the question whether the multiple decision systems are served by multiple memory systems, such that one dissociation is grounded in the other. Here we investigated the relationship between model-based RL and relational memory by comparing individual differences across behavioral tasks designed to measure either capacity. Human subjects performed two tasks, a learning and generalization task (acquired equivalence) which involves relational encoding and depends on the hippocampus; and a sequential RL task that could be solved by either a model-based or model-free strategy. We assessed the correlation between subjects’ use of flexible, relational memory, as measured by generalization in the acquired equivalence task, and their differential reliance on either RL strategy in the decision task. We observed a significant positive relationship between generalization and model-based, but not model-free, choice strategies. These results are consistent with the hypothesis that model-based RL, like acquired equivalence, relies on a more general-purpose relational memory system. PMID:24846190
Black-Scholes model under subordination
NASA Astrophysics Data System (ADS)
Stanislavsky, A. A.
2003-02-01
In this paper, we consider a new mathematical extension of the Black-Scholes (BS) model in which the stochastic time and stock share price evolution is described by two independent random processes. The parent process is Brownian, and the directing process is inverse to the totally skewed, strictly α-stable process. The subordinated process represents the Brownian motion indexed by an independent, continuous and increasing process. This allows us to introduce the long-term memory effects in the classical BS model.
Reader set encoding for directory of shared cache memory in multiprocessor system
Ahn, Dnaiel; Ceze, Luis H.; Gara, Alan; Ohmacht, Martin; Xiaotong, Zhuang
2014-06-10
In a parallel processing system with speculative execution, conflict checking occurs in a directory lookup of a cache memory that is shared by all processors. In each case, the same physical memory address will map to the same set of that cache, no matter which processor originated that access. The directory includes a dynamic reader set encoding, indicating what speculative threads have read a particular line. This reader set encoding is used in conflict checking. A bitset encoding is used to specify particular threads that have read the line.
NASA Astrophysics Data System (ADS)
Tabik, S.; Romero, L. F.; Mimica, P.; Plata, O.; Zapata, E. L.
2012-09-01
A broad area in astronomy focuses on simulating extragalactic objects based on Very Long Baseline Interferometry (VLBI) radio-maps. Several algorithms in this scope simulate what would be the observed radio-maps if emitted from a predefined extragalactic object. This work analyzes the performance and scaling of this kind of algorithms on multi-socket, multi-core architectures. In particular, we evaluate a sharing approach, a privatizing approach and a hybrid approach on systems with complex memory hierarchy that includes shared Last Level Cache (LLC). In addition, we investigate which manual processes can be systematized and then automated in future works. The experiments show that the data-privatizing model scales efficiently on medium scale multi-socket, multi-core systems (up to 48 cores) while regardless of algorithmic and scheduling optimizations, the sharing approach is unable to reach acceptable scalability on more than one socket. However, the hybrid model with a specific level of data-sharing provides the best scalability over all used multi-socket, multi-core systems.
Insights on consciousness from taste memory research.
Gallo, Milagros
2016-01-01
Taste research in rodents supports the relevance of memory in order to determine the content of consciousness by modifying both taste perception and later action. Associated with this issue is the fact that taste and visual modalities share anatomical circuits traditionally related to conscious memory. This challenges the view of taste memory as a type of non-declarative unconscious memory.
Jiang, Michelle Y W; Vartanian, Lenny R
2016-03-01
This study examined the causal relationship between attention and memory bias toward thin-body images, and the indirect effect of attending to thin-body images on women's body dissatisfaction via memory. In a 2 (restrained vs. unrestrained eaters) × 2 (long vs. short exposure) quasi-experimental design, female participants (n = 90) were shown images of thin models for either 7 s or 150 ms, and then completed a measure of body dissatisfaction and a recognition test to assess their memory for the images. Both restrained and unrestrained eaters in the long exposure condition had better recognition memory for images of thin models than did those in the short exposure condition. Better recognition memory for images of thin models was associated with lower body dissatisfaction. Finally, exposure duration to images of thin models had an indirect effect on body dissatisfaction through recognition memory. These findings suggest that memory for body-related information may be more critical in influencing women's body image than merely the exposure itself, and that targeting memory bias might enhance the effectiveness of cognitive bias modification programs.
A constitutive theory for shape memory polymers: coupling of small and large deformation
NASA Astrophysics Data System (ADS)
Tan, Qiao; Liu, Liwu; Liu, Yanju; Leng, Jinsong; Yan, Xiangqiao; Wang, Haifang
2013-04-01
At high temperatures, SMPs share attributes like rubber and exhibit long-range reversibility. In contrast, at low temperatures they become very rigid and are susceptible to plastic, only small strains are allowable. But there relatively little literature has considered the unique small stain (rubber phase) and large stain (glass phase) coupling in SMPs when developing the constitutive modeling. In this work, we present a 3D constitutive model for shape memory polymers in both low temperature small strain regime and high temperature large strain regime. The theory is based on the work of Liu et al. [15]. Four steps of SMP's thermomechanical loadings cycle are considered in the constitutive model completely. The linear elastic and hyperelastic effects of SMP in different temperatures are also fully accounted for in the proposed model by adopt the neo-Hookean model and the Generalized Hooke's laws.
Hard Real-Time: C++ Versus RTSJ
NASA Technical Reports Server (NTRS)
Dvorak, Daniel L.; Reinholtz, William K.
2004-01-01
In the domain of hard real-time systems, which language is better: C++ or the Real-Time Specification for Java (RTSJ)? Although ordinary Java provides a more productive programming environment than C++ due to its automatic memory management, that benefit does not apply to RTSJ when using NoHeapRealtimeThread and non-heap memory areas. As a result, RTSJ programmers must manage non-heap memory explicitly. While that's not a deterrent for veteran real-time programmers-where explicit memory management is common-the lack of certain language features in RTSJ (and Java) makes that manual memory management harder to accomplish safely than in C++. This paper illustrates the problem for practitioners in the context of moving data and managing memory in a real-time producer/consumer pattern. The relative ease of implementation and safety of the C++ programming model suggests that RTSJ has a struggle ahead in the domain of hard real-time applications, despite its other attractive features.
Bavelier, Daphne; Newport, Elissa L.; Hall, Matt; Supalla, Ted; Boutla, Mrim
2008-01-01
Capacity limits in linguistic short-term memory (STM) are typically measured with forward span tasks in which participants are asked to recall lists of words in the order presented. Using such tasks, native signers of American Sign Language (ASL) exhibit smaller spans than native speakers (Boutla, Supalla, Newport, & Bavelier, 2004). Here, we test the hypothesis that this population difference reflects differences in the way speakers and signers maintain temporal order information in short-term memory. We show that native signers differ from speakers on measures of short-term memory that require maintenance of temporal order of the tested materials, but not on those in which temporal order is not required. In addition, we show that, in a recall task with free order, bilingual subjects are more likely to recall in temporal order when using English than ASL. We conclude that speakers and signers do share common short-term memory processes. However, whereas short-term memory for spoken English is predominantly organized in terms of temporal order, we argue that this dimension does not play as great a role in signers’ short-term memory. Other factors that may affect STM processes in signers are discussed. PMID:18083155
A Measurement and Simulation Based Methodology for Cache Performance Modeling and Tuning
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)
1998-01-01
We present a cache performance modeling methodology that facilitates the tuning of uniprocessor cache performance for applications executing on shared memory multiprocessors by accurately predicting the effects of source code level modifications. Measurements on a single processor are initially used for identifying parts of code where cache utilization improvements may significantly impact the overall performance. Cache simulation based on trace-driven techniques can be carried out without gathering detailed address traces. Minimal runtime information for modeling cache performance of a selected code block includes: base virtual addresses of arrays, virtual addresses of variables, and loop bounds for that code block. Rest of the information is obtained from the source code. We show that the cache performance predictions are as reliable as those obtained through trace-driven simulations. This technique is particularly helpful to the exploration of various "what-if' scenarios regarding the cache performance impact for alternative code structures. We explain and validate this methodology using a simple matrix-matrix multiplication program. We then apply this methodology to predict and tune the cache performance of two realistic scientific applications taken from the Computational Fluid Dynamics (CFD) domain.
Kiyonaga, Anastasia; Egner, Tobias
2014-01-01
It is unclear why and under what circumstances working memory (WM) and attention interact. Here, we apply the logic of the time-based resource-sharing (TBRS) model of WM (e.g., Barrouillet et al., 2004) to explore the mixed findings of a separate, but related, literature that studies the guidance of visual attention by WM contents. Specifically, we hypothesize that the linkage between WM representations and visual attention is governed by a time-shared cognitive resource that alternately refreshes internal (WM) and selects external (visual attention) information. If this were the case, WM content should guide visual attention (involuntarily), but only when there is time for it to be refreshed in an internal focus of attention. To provide an initial test for this hypothesis, we examined whether the amount of unoccupied time during a WM delay could impact the magnitude of attentional capture by WM contents. Participants were presented with a series of visual search trials while they maintained a WM cue for a delayed-recognition test. WM cues could coincide with the search target, a distracter, or neither. We varied both the number of searches to be performed, and the amount of available time to perform them. Slowing of visual search by a WM matching distracter-and facilitation by a matching target-were curtailed when the delay was filled with fast-paced (refreshing-preventing) search trials, as was subsequent memory probe accuracy. WM content may, therefore, only capture visual attention when it can be refreshed, suggesting that internal (WM) and external attention demands reciprocally impact one another because they share a limited resource. The TBRS rationale can thus be applied in a novel context to explain why WM contents capture attention, and under what conditions that effect should be observed.
Kiyonaga, Anastasia; Egner, Tobias
2014-01-01
It is unclear why and under what circumstances working memory (WM) and attention interact. Here, we apply the logic of the time-based resource-sharing (TBRS) model of WM (e.g., Barrouillet et al., 2004) to explore the mixed findings of a separate, but related, literature that studies the guidance of visual attention by WM contents. Specifically, we hypothesize that the linkage between WM representations and visual attention is governed by a time-shared cognitive resource that alternately refreshes internal (WM) and selects external (visual attention) information. If this were the case, WM content should guide visual attention (involuntarily), but only when there is time for it to be refreshed in an internal focus of attention. To provide an initial test for this hypothesis, we examined whether the amount of unoccupied time during a WM delay could impact the magnitude of attentional capture by WM contents. Participants were presented with a series of visual search trials while they maintained a WM cue for a delayed-recognition test. WM cues could coincide with the search target, a distracter, or neither. We varied both the number of searches to be performed, and the amount of available time to perform them. Slowing of visual search by a WM matching distracter—and facilitation by a matching target—were curtailed when the delay was filled with fast-paced (refreshing-preventing) search trials, as was subsequent memory probe accuracy. WM content may, therefore, only capture visual attention when it can be refreshed, suggesting that internal (WM) and external attention demands reciprocally impact one another because they share a limited resource. The TBRS rationale can thus be applied in a novel context to explain why WM contents capture attention, and under what conditions that effect should be observed. PMID:25221499
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak
1999-01-01
The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
Transfer after process-based object-location memory training in healthy older adults.
Zimmermann, Kathrin; von Bastian, Claudia C; Röcke, Christina; Martin, Mike; Eschen, Anne
2016-11-01
A substantial part of age-related episodic memory decline has been attributed to the decreasing ability of older adults to encode and retrieve associations among simultaneously processed information units from long-term memory. In addition, this ability seems to share unique variance with reasoning. In this study, we therefore examined whether process-based training of the ability to learn and remember associations has the potential to induce transfer effects to untrained episodic memory and reasoning tasks in healthy older adults (60-75 years). For this purpose, the experimental group (n = 36) completed 30 sessions of process-based object-location memory training, while the active control group (n = 31) practiced visual perception on the same material. Near (spatial episodic memory), intermediate (verbal episodic memory), and far transfer effects (reasoning) were each assessed with multiple tasks at four measurements (before, midway through, immediately after, and 4 months after training). Linear mixed-effects models revealed transfer effects on spatial episodic memory and reasoning that were still observed 4 months after training. These results provide first empirical evidence that process-based training can enhance healthy older adults' associative memory performance and positively affect untrained episodic memory and reasoning abilities. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Centrally managed unified shared virtual address space
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilkes, John
Systems, apparatuses, and methods for managing a unified shared virtual address space. A host may execute system software and manage a plurality of nodes coupled to the host. The host may send work tasks to the nodes, and for each node, the host may externally manage the node's view of the system's virtual address space. Each node may have a central processing unit (CPU) style memory management unit (MMU) with an internal translation lookaside buffer (TLB). In one embodiment, the host may be coupled to a given node via an input/output memory management unit (IOMMU) interface, where the IOMMU frontendmore » interface shares the TLB with the given node's MMU. In another embodiment, the host may control the given node's view of virtual address space via memory-mapped control registers.« less
Attention and Visuospatial Working Memory Share the Same Processing Resources
Feng, Jing; Pratt, Jay; Spence, Ian
2012-01-01
Attention and visuospatial working memory (VWM) share very similar characteristics; both have the same upper bound of about four items in capacity and they recruit overlapping brain regions. We examined whether both attention and VWM share the same processing resources using a novel dual-task costs approach based on a load-varying dual-task technique. With sufficiently large loads on attention and VWM, considerable interference between the two processes was observed. A further load increase on either process produced reciprocal increases in interference on both processes, indicating that attention and VWM share common resources. More critically, comparison among four experiments on the reciprocal interference effects, as measured by the dual-task costs, demonstrates no significant contribution from additional processing other than the shared processes. These results support the notion that attention and VWM share the same processing resources. PMID:22529826
Using Abstraction in Explicity Parallel Programs.
1991-07-01
However, we only rely on sequential consistency of memory operations. includ- ing reads. writes and any synchronization primitives provided by the...explicit synchronization primitives . This demonstrates the practical power of sequentially consistent memory, as opposed to weaker models of memory that...a small set of synchronization primitives , all pro- cedures have non-waiting specifications. This is in contrast to richer process-oriented
NASA Astrophysics Data System (ADS)
Tarboton, D. G.; Idaszak, R.; Horsburgh, J. S.; Ames, D. P.; Goodall, J. L.; Band, L. E.; Merwade, V.; Couch, A.; Hooper, R. P.; Maidment, D. R.; Dash, P. K.; Stealey, M.; Yi, H.; Gan, T.; Castronova, A. M.; Miles, B.; Li, Z.; Morsy, M. M.; Crawley, S.; Ramirez, M.; Sadler, J.; Xue, Z.; Bandaragoda, C.
2016-12-01
How do you share and publish hydrologic data and models for a large collaborative project? HydroShare is a new, web-based system for sharing hydrologic data and models with specific functionality aimed at making collaboration easier. HydroShare has been developed with U.S. National Science Foundation support under the auspices of the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) to support the collaboration and community cyberinfrastructure needs of the hydrology research community. Within HydroShare, we have developed new functionality for creating datasets, describing them with metadata, and sharing them with collaborators. We cast hydrologic datasets and models as "social objects" that can be shared, collaborated around, annotated, published and discovered. In addition to data and model sharing, HydroShare supports web application programs (apps) that can act on data stored in HydroShare, just as software programs on your PC act on your data locally. This can free you from some of the limitations of local computing capacity and challenges in installing and maintaining software on your own PC. HydroShare's web-based cyberinfrastructure can take work off your desk or laptop computer and onto infrastructure or "cloud" based data and processing servers. This presentation will describe HydroShare's collaboration functionality that enables both public and private sharing with individual users and collaborative user groups, and makes it easier for collaborators to iterate on shared datasets and models, creating multiple versions along the way, and publishing them with a permanent landing page, metadata description, and citable Digital Object Identifier (DOI) when the work is complete. This presentation will also describe the web app architecture that supports interoperability with third party servers functioning as application engines for analysis and processing of big hydrologic datasets. While developed to support the cyberinfrastructure needs of the hydrology community, the informatics infrastructure for programmatic interoperability of web resources has a generality beyond the solution of hydrology problems that will be discussed.
Masked Associative/Semantic Priming Effects across Languages with Highly Proficient Bilinguals
ERIC Educational Resources Information Center
Perea, Manuel; Dunabeitia, Jon Andoni; Carreiras, Manuel
2008-01-01
One key issue for models of bilingual memory is to what degree the semantic representation from one of the languages is shared with the other language. In the present paper, we examine whether there is an early, automatic semantic priming effect across languages for noncognates with highly proficient (Basque/Spanish) bilinguals. Experiment 1 was a…
System and method for memory allocation in a multiclass memory system
Loh, Gabriel; Meswani, Mitesh; Ignatowski, Michael; Nutter, Mark
2016-06-28
A system for memory allocation in a multiclass memory system includes a processor coupleable to a plurality of memories sharing a unified memory address space, and a library store to store a library of software functions. The processor identifies a type of a data structure in response to a memory allocation function call to the library for allocating memory to the data structure. Using the library, the processor allocates portions of the data structure among multiple memories of the multiclass memory system based on the type of the data structure.
Learnable Models for Information Diffusion and its Associated User Behavior in Micro-blogosphere
2012-08-30
According to the work of Even-Dar and Shapira (2007), we recall the definition of the ba- sic voter model on network G. In the model, each node of G...reason as follows. We started with the K distinct initial nodes and all the other nodes were neutral in the beginning. Recall that we set the average time... memory , running under Linux. Learning to predict opinion share and detect anti-majority opinionists in social networks 29 7 Conclusion Unlike the popular
Ordering of guarded and unguarded stores for no-sync I/O
Gara, Alan; Ohmacht, Martin
2013-06-25
A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.
LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kurzak, Jakub; Luszczek, Pitior; Faverge, Mathieu
2012-03-01
LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance LINPACK benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
ERIC Educational Resources Information Center
Burgess, Gregory C.; Gray, Jeremy R.; Conway, Andrew R. A.; Braver, Todd S.
2011-01-01
Fluid intelligence (gF) and working memory (WM) span predict success in demanding cognitive situations. Recent studies show that much of the variance in gF and WM span is shared, suggesting common neural mechanisms. This study provides a direct investigation of the degree to which shared variance in gF and WM span can be explained by neural…
Multi-core processing and scheduling performance in CMS
NASA Astrophysics Data System (ADS)
Hernández, J. M.; Evans, D.; Foulkes, S.
2012-12-01
Commodity hardware is going many-core. We might soon not be able to satisfy the job memory needs per core in the current single-core processing model in High Energy Physics. In addition, an ever increasing number of independent and incoherent jobs running on the same physical hardware not sharing resources might significantly affect processing performance. It will be essential to effectively utilize the multi-core architecture. CMS has incorporated support for multi-core processing in the event processing framework and the workload management system. Multi-core processing jobs share common data in memory, such us the code libraries, detector geometry and conditions data, resulting in a much lower memory usage than standard single-core independent jobs. Exploiting this new processing model requires a new model in computing resource allocation, departing from the standard single-core allocation for a job. The experiment job management system needs to have control over a larger quantum of resource since multi-core aware jobs require the scheduling of multiples cores simultaneously. CMS is exploring the approach of using whole nodes as unit in the workload management system where all cores of a node are allocated to a multi-core job. Whole-node scheduling allows for optimization of the data/workflow management (e.g. I/O caching, local merging) but efficient utilization of all scheduled cores is challenging. Dedicated whole-node queues have been setup at all Tier-1 centers for exploring multi-core processing workflows in CMS. We present the evaluation of the performance scheduling and executing multi-core workflows in whole-node queues compared to the standard single-core processing workflows.
Resonator memories and optical novelty filters
NASA Astrophysics Data System (ADS)
Anderson, Dana Z.; Erle, Marie C.
Optical resonators having holographic elements are potential candidates for storing information that can be accessed through content addressable or associative recall. Closely related to the resonator memory is the optical novelty filter, which can detect the differences between a test object and a set of reference objects. We discuss implementations of these devices using continuous optical media such as photorefractive materials. The discussion is framed in the context of neural network models. There are both formal and qualitative similarities between the resonator memory and optical novelty filter and network models. Mode competition arises in the theory of the resonator memory, much as it does in some network models. We show that the role of the phenomena of "daydreaming" in the real-time programmable optical resonator is very much akin to the role of "unlearning" in neural network memories. The theory of programming the real-time memory for a single mode is given in detail. This leads to a discussion of the optical novelty filter. Experimental results for the resonator memory, the real-time programmable memory, and the optical tracking novelty filter are reviewed. We also point to several issues that need to be addressed in order to implement more formal models of neural networks.
Resonator Memories And Optical Novelty Filters
NASA Astrophysics Data System (ADS)
Anderson, Dana Z.; Erie, Marie C.
1987-05-01
Optical resonators having holographic elements are potential candidates for storing information that can be accessed through content-addressable or associative recall. Closely related to the resonator memory is the optical novelty filter, which can detect the differences between a test object and a set of reference objects. We discuss implementations of these devices using continuous optical media such as photorefractive ma-terials. The discussion is framed in the context of neural network models. There are both formal and qualitative similarities between the resonator memory and optical novelty filter and network models. Mode competition arises in the theory of the resonator memory, much as it does in some network models. We show that the role of the phenomena of "daydream-ing" in the real-time programmable optical resonator is very much akin to the role of "unlearning" in neural network memories. The theory of programming the real-time memory for a single mode is given in detail. This leads to a discussion of the optical novelty filter. Experimental results for the resonator memory, the real-time programmable memory, and the optical tracking novelty filter are reviewed. We also point to several issues that need to be addressed in order to implement more formal models of neural networks.
Sawja: Static Analysis Workshop for Java
NASA Astrophysics Data System (ADS)
Hubert, Laurent; Barré, Nicolas; Besson, Frédéric; Demange, Delphine; Jensen, Thomas; Monfort, Vincent; Pichardie, David; Turpin, Tiphaine
Static analysis is a powerful technique for automatic verification of programs but raises major engineering challenges when developing a full-fledged analyzer for a realistic language such as Java. Efficiency and precision of such a tool rely partly on low level components which only depend on the syntactic structure of the language and therefore should not be redesigned for each implementation of a new static analysis. This paper describes the Sawja library: a static analysis workshop fully compliant with Java 6 which provides OCaml modules for efficiently manipulating Java bytecode programs. We present the main features of the library, including i) efficient functional data-structures for representing a program with implicit sharing and lazy parsing, ii) an intermediate stack-less representation, and iii) fast computation and manipulation of complete programs. We provide experimental evaluations of the different features with respect to time, memory and precision.
Projected phase-change memory devices.
Koelmans, Wabe W; Sebastian, Abu; Jonnalagadda, Vara Prasad; Krebs, Daniel; Dellmann, Laurent; Eleftheriou, Evangelos
2015-09-03
Nanoscale memory devices, whose resistance depends on the history of the electric signals applied, could become critical building blocks in new computing paradigms, such as brain-inspired computing and memcomputing. However, there are key challenges to overcome, such as the high programming power required, noise and resistance drift. Here, to address these, we present the concept of a projected memory device, whose distinguishing feature is that the physical mechanism of resistance storage is decoupled from the information-retrieval process. We designed and fabricated projected memory devices based on the phase-change storage mechanism and convincingly demonstrate the concept through detailed experimentation, supported by extensive modelling and finite-element simulations. The projected memory devices exhibit remarkably low drift and excellent noise performance. We also demonstrate active control and customization of the programming characteristics of the device that reliably realize a multitude of resistance states.
NASA Technical Reports Server (NTRS)
Plesea, Lucian
2006-01-01
A computer program automatically builds large, full-resolution mosaics of multispectral images of Earth landmasses from images acquired by Landsat 7, complete with matching of colors and blending between adjacent scenes. While the code has been used extensively for Landsat, it could also be used for other data sources. A single mosaic of as many as 8,000 scenes, represented by more than 5 terabytes of data and the largest set produced in this work, demonstrated what the code could do to provide global coverage. The program first statistically analyzes input images to determine areas of coverage and data-value distributions. It then transforms the input images from their original universal transverse Mercator coordinates to other geographical coordinates, with scaling. It applies a first-order polynomial brightness correction to each band in each scene. It uses a data-mask image for selecting data and blending of input scenes. Under control by a user, the program can be made to operate on small parts of the output image space, with check-point and restart capabilities. The program runs on SGI IRIX computers. It is capable of parallel processing using shared-memory code, large memories, and tens of central processing units. It can retrieve input data and store output data at locations remote from the processors on which it is executed.
ERIC Educational Resources Information Center
Beschorner, Beth
2013-01-01
This study examined the impact of a parent education program on the frequency of shared storybook reading and dialogic reading techniques. Additionally, the contextual factors that influenced the outcomes of the program were explored. Seventeen parents completed a nine-week face-to-face parent education program and fifteen parents completed a…
The effects of voice and manual control mode on dual task performance
NASA Technical Reports Server (NTRS)
Wickens, C. D.; Zenyuh, J.; Culp, V.; Marshak, W.
1986-01-01
Two fundamental principles of human performance, compatibility and resource competition, are combined with two structural dichotomies in the human information processing system, manual versus voice output, and left versus right cerebral hemisphere, in order to predict the optimum combination of voice and manual control with either hand, for time-sharing performance of a dicrete and continuous task. Eight right handed male subjected performed a discrete first-order tracking task, time-shared with an auditorily presented Sternberg Memory Search Task. Each task could be controlled by voice, or by the left or right hand, in all possible combinations except for a dual voice mode. When performance was analyzed in terms of a dual-task decrement from single task control conditions, the following variables influenced time-sharing efficiency in diminishing order of magnitude, (1) the modality of control, (discrete manual control of tracking was superior to discrete voice control of tracking and the converse was true with the memory search task), (2) response competition, (performance was degraded when both tasks were responded manually), (3) hemispheric competition, (performance degraded whenever two tasks were controlled by the left hemisphere) (i.e., voice or right handed control). The results confirm the value of predictive models invoice control implementation.
Checkpointing Shared Memory Programs at the Application-level
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bronevetsky, G; Schulz, M; Szwed, P
2004-09-08
Trends in high-performance computing are making it necessary for long-running applications to tolerate hardware faults. The most commonly used approach is checkpoint and restart(CPR)-the state of the computation is saved periodically on disk, and when a failure occurs, the computation is restarted from the last saved state. At present, it is the responsibility of the programmer to instrument applications for CPR. Our group is investigating the use of compiler technology to instrument codes to make them self-checkpointing and self-restarting, thereby providing an automatic solution to the problem of making long-running scientific applications resilient to hardware faults. Our previous work focusedmore » on message-passing programs. In this paper, we describe such a system for shared-memory programs running on symmetric multiprocessors. The system has two components: (i)a pre-compiler for source-to-source modification of applications, and (ii) a runtime system that implements a protocol for coordinating CPR among the threads of the parallel application. For the sake of concreteness, we focus on a non-trivial subset of OpenMP that includes barriers and locks. One of the advantages of this approach is that the ability to tolerate faults becomes embedded within the application itself, so applications become self-checkpointing and self-restarting on any platform. We demonstrate this by showing that our transformed benchmarks can checkpoint and restart on three different platforms (Windows/x86, Linux/x86, and Tru64/Alpha). Our experiments show that the overhead introduced by this approach is usually quite small; they also suggest ways in which the current implementation can be tuned to reduced overheads further.« less
Audience-tuning effects on memory: the role of shared reality.
Echterhoff, Gerald; Higgins, E Tory; Groll, Stephan
2005-09-01
After tuning to an audience, communicators' own memories for the topic often reflect the biased view expressed in their messages. Three studies examined explanations for this bias. Memories for a target person were biased when feedback signaled the audience's successful identification of the target but not after failed identification (Experiment 1). Whereas communicators tuning to an in-group audience exhibited the bias, communicators tuning to an out-group audience did not (Experiment 2). These differences did not depend on communicators' mood but were mediated by communicators' trust in their audience's judgment about other people (Experiments 2 and 3). Message and memory were more closely associated for high than for low trusters. Apparently, audience-tuning effects depend on the communicators' experience of a shared reality.
Rapid solution of large-scale systems of equations
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.
1994-01-01
The analysis and design of complex aerospace structures requires the rapid solution of large systems of linear and nonlinear equations, eigenvalue extraction for buckling, vibration and flutter modes, structural optimization and design sensitivity calculation. Computers with multiple processors and vector capabilities can offer substantial computational advantages over traditional scalar computer for these analyses. These computers fall into two categories: shared memory computers and distributed memory computers. This presentation covers general-purpose, highly efficient algorithms for generation/assembly or element matrices, solution of systems of linear and nonlinear equations, eigenvalue and design sensitivity analysis and optimization. All algorithms are coded in FORTRAN for shared memory computers and many are adapted to distributed memory computers. The capability and numerical performance of these algorithms will be addressed.
NASA Astrophysics Data System (ADS)
Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.
2017-11-01
Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
The HydroShare Collaborative Repository for the Hydrology Community
NASA Astrophysics Data System (ADS)
Tarboton, D. G.; Idaszak, R.; Horsburgh, J. S.; Ames, D. P.; Goodall, J. L.; Couch, A.; Hooper, R. P.; Dash, P. K.; Stealey, M.; Yi, H.; Bandaragoda, C.; Castronova, A. M.
2017-12-01
HydroShare is an online, collaboration system for sharing of hydrologic data, analytical tools, and models. It supports the sharing of, and collaboration around, "resources" which are defined by standardized content types for data formats and models commonly used in hydrology. With HydroShare you can: Share your data and models with colleagues; Manage who has access to the content that you share; Share, access, visualize and manipulate a broad set of hydrologic data types and models; Use the web services application programming interface (API) to program automated and client access; Publish data and models and obtain a citable digital object identifier (DOI); Aggregate your resources into collections; Discover and access data and models published by others; Use web apps to visualize, analyze and run models on data in HydroShare. This presentation will describe the functionality and architecture of HydroShare highlighting our approach to making this system easy to use and serving the needs of the hydrology community represented by the Consortium of Universities for the Advancement of Hydrologic Sciences, Inc. (CUAHSI). Metadata for uploaded files is harvested automatically or captured using easy to use web user interfaces. Users are encouraged to add or create resources in HydroShare early in the data life cycle. To encourage this we allow users to share and collaborate on HydroShare resources privately among individual users or groups, entering metadata while doing the work. HydroShare also provides enhanced functionality for users through web apps that provide tools and computational capability for actions on resources. HydroShare's architecture broadly is comprised of: (1) resource storage, (2) resource exploration website, and (3) web apps for actions on resources. System components are loosely coupled and interact through APIs, which enhances robustness, as components can be upgraded and advanced relatively independently. The full power of this paradigm is the extensibility it supports. Web apps are hosted on separate servers, which may be 3rd party servers. They are registered in HydroShare using a web app resource that configures the connectivity for them to be discovered and launched directly from resource types they are associated with.
Solutions and debugging for data consistency in multiprocessors with noncoherent caches
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bernstein, D.; Mendelson, B.; Breternitz, M. Jr.
1995-02-01
We analyze two important problems that arise in shared-memory multiprocessor systems. The stale data problem involves ensuring that data items in local memory of individual processors are current, independent of writes done by other processors. False sharing occurs when two processors have copies of the same shared data block but update different portions of the block. The false sharing problem involves guaranteeing that subsequent writes are properly combined. In modern architectures these problems are usually solved in hardware, by exploiting mechanisms for hardware controlled cache consistency. This leads to more expensive and nonscalable designs. Therefore, we are concentrating on softwaremore » methods for ensuring cache consistency that would allow for affordable and scalable multiprocessing systems. Unfortunately, providing software control is nontrivial, both for the compiler writer and for the application programmer. For this reason we are developing a debugging environment that will facilitate the development of compiler-based techniques and will help the programmer to tune his or her application using explicit cache management mechanisms. We extend the notion of a race condition for IBM Shared Memory System POWER/4, taking into consideration its noncoherent caches, and propose techniques for detection of false sharing problems. Identification of the stale data problem is discussed as well, and solutions are suggested.« less
HTMT-class Latency Tolerant Parallel Architecture for Petaflops Scale Computation
NASA Technical Reports Server (NTRS)
Sterling, Thomas; Bergman, Larry
2000-01-01
Computational Aero Sciences and other numeric intensive computation disciplines demand computing throughputs substantially greater than the Teraflops scale systems only now becoming available. The related fields of fluids, structures, thermal, combustion, and dynamic controls are among the interdisciplinary areas that in combination with sufficient resolution and advanced adaptive techniques may force performance requirements towards Petaflops. This will be especially true for compute intensive models such as Navier-Stokes are or when such system models are only part of a larger design optimization computation involving many design points. Yet recent experience with conventional MPP configurations comprising commodity processing and memory components has shown that larger scale frequently results in higher programming difficulty and lower system efficiency. While important advances in system software and algorithms techniques have had some impact on efficiency and programmability for certain classes of problems, in general it is unlikely that software alone will resolve the challenges to higher scalability. As in the past, future generations of high-end computers may require a combination of hardware architecture and system software advances to enable efficient operation at a Petaflops level. The NASA led HTMT project has engaged the talents of a broad interdisciplinary team to develop a new strategy in high-end system architecture to deliver petaflops scale computing in the 2004/5 timeframe. The Hybrid-Technology, MultiThreaded parallel computer architecture incorporates several advanced technologies in combination with an innovative dynamic adaptive scheduling mechanism to provide unprecedented performance and efficiency within practical constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention semiconductor logic. Wave Division Multiplexing optical communications can approach a peak per fiber bandwidth of 1 Tbps and the new Data Vortex network topology employing this technology can connect tens of thousands of ports providing a bi-section bandwidth on the order of a Petabyte per second with latencies well below 100 nanoseconds, even under heavy loads. Processor-in-Memory (PIM) technology combines logic and memory on the same chip exposing the internal bandwidth of the memory row buffers at low latency. And holographic storage photorefractive storage technologies provide high-density memory with access a thousand times faster than conventional disk technologies. Together these technologies enable a new class of shared memory system architecture with a peak performance in the range of a Petaflops but size and power requirements comparable to today's largest Teraflops scale systems. To achieve high-sustained performance, HTMT combines an advanced multithreading processor architecture with a memory-driven coarse-grained latency management strategy called "percolation", yielding high efficiency while reducing the much of the parallel programming burden. This paper will present the basic system architecture characteristics made possible through this series of advanced technologies and then give a detailed description of the new percolation approach to runtime latency management.
Static Memory Deduplication for Performance Optimization in Cloud Computing.
Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan
2017-04-27
In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible.
Static Memory Deduplication for Performance Optimization in Cloud Computing
Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan
2017-01-01
In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible. PMID:28448434
Schad, Daniel J.; Jünger, Elisabeth; Sebold, Miriam; Garbusow, Maria; Bernhardt, Nadine; Javadi, Amir-Homayoun; Zimmermann, Ulrich S.; Smolka, Michael N.; Heinz, Andreas; Rapp, Michael A.; Huys, Quentin J. M.
2014-01-01
Theories of decision-making and its neural substrates have long assumed the existence of two distinct and competing valuation systems, variously described as goal-directed vs. habitual, or, more recently and based on statistical arguments, as model-free vs. model-based reinforcement-learning. Though both have been shown to control choices, the cognitive abilities associated with these systems are under ongoing investigation. Here we examine the link to cognitive abilities, and find that individual differences in processing speed covary with a shift from model-free to model-based choice control in the presence of above-average working memory function. This suggests shared cognitive and neural processes; provides a bridge between literatures on intelligence and valuation; and may guide the development of process models of different valuation components. Furthermore, it provides a rationale for individual differences in the tendency to deploy valuation systems, which may be important for understanding the manifold neuropsychiatric diseases associated with malfunctions of valuation. PMID:25566131
Block-Parallel Data Analysis with DIY2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morozov, Dmitriy; Peterka, Tom
DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
Advanced Development of Certified OS Kernels
2015-06-01
It provides an infrastructure to map a physical page into multiple processes’ page maps in different address spaces. Their ownership mechanism ensures...of their shared memory infrastructure . Trap module The trap module specifies the behaviors of exception handlers and mCertiKOS system calls. In...layers), 1 pm for the shared memory infrastructure (3 layers), 3.5 pm for the thread management (10 layers), 1 pm for the process management (4 layers
A cache-aided multiprocessor rollback recovery scheme
NASA Technical Reports Server (NTRS)
Wu, Kun-Lung; Fuchs, W. Kent
1989-01-01
This paper demonstrates how previous uniprocessor cache-aided recovery schemes can be applied to multiprocessor architectures, for recovering from transient processor failures, utilizing private caches and a global shared memory. As with cache-aided uniprocessor recovery, the multiprocessor cache-aided recovery scheme of this paper can be easily integrated into standard bus-based snoopy cache coherence protocols. A consistent shared memory state is maintained without the necessity of global check-pointing.
Oyarzún, Javiera P; Morís, Joaquín; Luque, David; de Diego-Balaguer, Ruth; Fuentemilla, Lluís
2017-08-09
System memory consolidation is conceptualized as an active process whereby newly encoded memory representations are strengthened through selective memory reactivation during sleep. However, our learning experience is highly overlapping in content (i.e., shares common elements), and memories of these events are organized in an intricate network of overlapping associated events. It remains to be explored whether and how selective memory reactivation during sleep has an impact on these overlapping memories acquired during awake time. Here, we test in a group of adult women and men the prediction that selective memory reactivation during sleep entails the reactivation of associated events and that this may lead the brain to adaptively regulate whether these associated memories are strengthened or pruned from memory networks on the basis of their relative associative strength with the shared element. Our findings demonstrate the existence of efficient regulatory neural mechanisms governing how complex memory networks are shaped during sleep as a function of their associative memory strength. SIGNIFICANCE STATEMENT Numerous studies have demonstrated that system memory consolidation is an active, selective, and sleep-dependent process in which only subsets of new memories become stabilized through their reactivation. However, the learning experience is highly overlapping in content and thus events are encoded in an intricate network of related memories. It remains to be explored whether and how memory reactivation has an impact on overlapping memories acquired during awake time. Here, we show that sleep memory reactivation promotes strengthening and weakening of overlapping memories based on their associative memory strength. These results suggest the existence of an efficient regulatory neural mechanism that avoids the formation of cluttered memory representation of multiple events and promotes stabilization of complex memory networks. Copyright © 2017 the authors 0270-6474/17/377748-11$15.00/0.
A High Performance VLSI Computer Architecture For Computer Graphics
NASA Astrophysics Data System (ADS)
Chin, Chi-Yuan; Lin, Wen-Tai
1988-10-01
A VLSI computer architecture, consisting of multiple processors, is presented in this paper to satisfy the modern computer graphics demands, e.g. high resolution, realistic animation, real-time display etc.. All processors share a global memory which are partitioned into multiple banks. Through a crossbar network, data from one memory bank can be broadcasted to many processors. Processors are physically interconnected through a hyper-crossbar network (a crossbar-like network). By programming the network, the topology of communication links among processors can be reconfigurated to satisfy specific dataflows of different applications. Each processor consists of a controller, arithmetic operators, local memory, a local crossbar network, and I/O ports to communicate with other processors, memory banks, and a system controller. Operations in each processor are characterized into two modes, i.e. object domain and space domain, to fully utilize the data-independency characteristics of graphics processing. Special graphics features such as 3D-to-2D conversion, shadow generation, texturing, and reflection, can be easily handled. With the current high density interconnection (MI) technology, it is feasible to implement a 64-processor system to achieve 2.5 billion operations per second, a performance needed in most advanced graphics applications.
Smith, Philip L; Lilburn, Simon D; Corbett, Elaine A; Sewell, David K; Kyllingsbæk, Søren
2016-09-01
We investigated the capacity of visual short-term memory (VSTM) in a phase discrimination task that required judgments about the configural relations between pairs of black and white features. Sewell et al. (2014) previously showed that VSTM capacity in an orientation discrimination task was well described by a sample-size model, which views VSTM as a resource comprised of a finite number of noisy stimulus samples. The model predicts the invariance of [Formula: see text] , the sum of squared sensitivities across items, for displays of different sizes. For phase discrimination, the set-size effect significantly exceeded that predicted by the sample-size model for both simultaneously and sequentially presented stimuli. Instead, the set-size effect and the serial position curves with sequential presentation were predicted by an attention-weighted version of the sample-size model, which assumes that one of the items in the display captures attention and receives a disproportionate share of resources. The choice probabilities and response time distributions from the task were well described by a diffusion decision model in which the drift rates embodied the assumptions of the attention-weighted sample-size model. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Cheung, Wing-Yee; Wildschut, Tim; Sedikides, Constantine
2018-02-01
We compared and contrasted nostalgia with rumination and counterfactual thinking in terms of their autobiographical memory functions. Specifically, we assessed individual differences in nostalgia, rumination, and counterfactual thinking, which we then linked to self-reported functions or uses of autobiographical memory (Self-Regard, Boredom Reduction, Death Preparation, Intimacy Maintenance, Conversation, Teach/Inform, and Bitterness Revival). We tested which memory functions are shared and which are uniquely linked to nostalgia. The commonality among nostalgia, rumination, and counterfactual thinking resides in their shared positive associations with all memory functions: individuals who evinced a stronger propensity towards past-oriented thought (as manifested in nostalgia, rumination, and counterfactual thinking) reported greater overall recruitment of memories in the service of present functioning. The uniqueness of nostalgia resides in its comparatively strong positive associations with Intimacy Maintenance, Teach/Inform, and Self-Regard and weak association with Bitterness Revival. In all, nostalgia possesses a more positive functional signature than do rumination and counterfactual thinking.
Mnemonic convergence in social networks: The emergent properties of cognition at a collective level.
Coman, Alin; Momennejad, Ida; Drach, Rae D; Geana, Andra
2016-07-19
The development of shared memories, beliefs, and norms is a fundamental characteristic of human communities. These emergent outcomes are thought to occur owing to a dynamic system of information sharing and memory updating, which fundamentally depends on communication. Here we report results on the formation of collective memories in laboratory-created communities. We manipulated conversational network structure in a series of real-time, computer-mediated interactions in fourteen 10-member communities. The results show that mnemonic convergence, measured as the degree of overlap among community members' memories, is influenced by both individual-level information-processing phenomena and by the conversational social network structure created during conversational recall. By studying laboratory-created social networks, we show how large-scale social phenomena (i.e., collective memory) can emerge out of microlevel local dynamics (i.e., mnemonic reinforcement and suppression effects). The social-interactionist approach proposed herein points to optimal strategies for spreading information in social networks and provides a framework for measuring and forging collective memories in communities of individuals.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tsugane, Keisuke; Boku, Taisuke; Murai, Hitoshi
Recently, the Partitioned Global Address Space (PGAS) parallel programming model has emerged as a usable distributed memory programming model. XcalableMP (XMP) is a PGAS parallel programming language that extends base languages such as C and Fortran with directives in OpenMP-like style. XMP supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensionalmore » gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. Finally, the performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.« less
Tsugane, Keisuke; Boku, Taisuke; Murai, Hitoshi; ...
2016-06-01
Recently, the Partitioned Global Address Space (PGAS) parallel programming model has emerged as a usable distributed memory programming model. XcalableMP (XMP) is a PGAS parallel programming language that extends base languages such as C and Fortran with directives in OpenMP-like style. XMP supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensionalmore » gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. Finally, the performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.« less
Moreau, Noémie; Viallet, François; Champagne-Lavau, Maud
2013-09-01
Theory of mind (TOM) refers to the ability to infer one's own and other's mental states. Growing evidence highlighted the presence of impairment on the most complex TOM tasks in Alzheimer disease (AD). However, how TOM deficit is related to other cognitive dysfunctions and more specifically to episodic memory impairment - the prominent feature of this disease - is still under debate. Recent neuroanatomical findings have shown that remembering past events and inferring others' states of mind share the same cerebral network suggesting the two abilities share a common process .This paper proposes to review emergent evidence of TOM impairment in AD patients and to discuss the evidence of a relationship between TOM and episodic memory. We will discuss about AD patients' deficit in TOM being possibly related to their difficulties in recollecting memories of past social interactions. Copyright © 2013 Elsevier B.V. All rights reserved.
Mental time travel and the shaping of the human mind
Suddendorf, Thomas; Addis, Donna Rose; Corballis, Michael C.
2009-01-01
Episodic memory, enabling conscious recollection of past episodes, can be distinguished from semantic memory, which stores enduring facts about the world. Episodic memory shares a core neural network with the simulation of future episodes, enabling mental time travel into both the past and the future. The notion that there might be something distinctly human about mental time travel has provoked ingenious attempts to demonstrate episodic memory or future simulation in non-human animals, but we argue that they have not yet established a capacity comparable to the human faculty. The evolution of the capacity to simulate possible future events, based on episodic memory, enhanced fitness by enabling action in preparation of different possible scenarios that increased present or future survival and reproduction chances. Human language may have evolved in the first instance for the sharing of past and planned future events, and, indeed, fictional ones, further enhancing fitness in social settings. PMID:19528013
Neural Differentiation of Incorrectly Predicted Memories.
Kim, Ghootae; Norman, Kenneth A; Turk-Browne, Nicholas B
2017-02-22
When an item is predicted in a particular context but the prediction is violated, memory for that item is weakened (Kim et al., 2014). Here, we explore what happens when such previously mispredicted items are later reencountered. According to prior neural network simulations, this sequence of events-misprediction and subsequent restudy-should lead to differentiation of the item's neural representation from the previous context (on which the misprediction was based). Specifically, misprediction weakens connections in the representation to features shared with the previous context and restudy allows new features to be incorporated into the representation that are not shared with the previous context. This cycle of misprediction and restudy should have the net effect of moving the item's neural representation away from the neural representation of the previous context. We tested this hypothesis using human fMRI by tracking changes in item-specific BOLD activity patterns in the hippocampus, a key structure for representing memories and generating predictions. In left CA2/3/DG, we found greater neural differentiation for items that were repeatedly mispredicted and restudied compared with items from a control condition that was identical except without misprediction. We also measured prediction strength in a trial-by-trial fashion and found that greater misprediction for an item led to more differentiation, further supporting our hypothesis. Therefore, the consequences of prediction error go beyond memory weakening. If the mispredicted item is restudied, the brain adaptively differentiates its memory representation to improve the accuracy of subsequent predictions and to shield it from further weakening. SIGNIFICANCE STATEMENT Competition between overlapping memories leads to weakening of nontarget memories over time, making it easier to access target memories. However, a nontarget memory in one context might become a target memory in another context. How do such memories get restrengthened without increasing competition again? Computational models suggest that the brain handles this by reducing neural connections to the previous context and adding connections to new features that were not part of the previous context. The result is neural differentiation away from the previous context. Here, we provide support for this theory, using fMRI to track neural representations of individual memories in the hippocampus and how they change based on learning. Copyright © 2017 the authors 0270-6474/17/372022-10$15.00/0.
An Asynchronous Many-Task Implementation of In-Situ Statistical Analysis using Legion.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pebay, Philippe Pierre; Bennett, Janine Camille
2015-11-01
In this report, we propose a framework for the design and implementation of in-situ analy- ses using an asynchronous many-task (AMT) model, using the Legion programming model together with the MiniAero mini-application as a surrogate for full-scale parallel scientific computing applications. The bulk of this work consists of converting the Learn/Derive/Assess model which we had initially developed for parallel statistical analysis using MPI [PTBM11], from a SPMD to an AMT model. In this goal, we propose an original use of the concept of Legion logical regions as a replacement for the parallel communication schemes used for the only operation ofmore » the statistics engines that require explicit communication. We then evaluate this proposed scheme in a shared memory environment, using the Legion port of MiniAero as a proxy for a full-scale scientific application, as a means to provide input data sets of variable size for the in-situ statistical analyses in an AMT context. We demonstrate in particular that the approach has merit, and warrants further investigation, in collaboration with ongoing efforts to improve the overall parallel performance of the Legion system.« less
GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simmhan, Yogesh; Kumbhare, Alok; Wickramaarachchi, Charith
2014-08-25
Large scale graph processing is a major research area for Big Data exploration. Vertex centric programming models like Pregel are gaining traction due to their simple abstraction that allows for scalable execution on distributed systems naturally. However, there are limitations to this approach which cause vertex centric algorithms to under-perform due to poor compute to communication overhead ratio and slow convergence of iterative superstep. In this paper we introduce GoFFish a scalable sub-graph centric framework co-designed with a distributed persistent graph storage for large scale graph analytics on commodity clusters. We introduce a sub-graph centric programming abstraction that combines themore » scalability of a vertex centric approach with the flexibility of shared memory sub-graph computation. We map Connected Components, SSSP and PageRank algorithms to this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real world graphs and demonstrate its significant performance improvement, orders of magnitude in some cases, compared to Apache Giraph, the leading open source vertex centric implementation. We map Connected Components, SSSP and PageRank algorithms to this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real world graphs and demonstrate its significant performance improvement, orders of magnitude in some cases, compared to Apache Giraph, the leading open source vertex centric implementation.« less
Impact of auditory selective attention on verbal short-term memory and vocabulary development.
Majerus, Steve; Heiligenstein, Lucie; Gautherot, Nathalie; Poncelet, Martine; Van der Linden, Martial
2009-05-01
This study investigated the role of auditory selective attention capacities as a possible mediator of the well-established association between verbal short-term memory (STM) and vocabulary development. A total of 47 6- and 7-year-olds were administered verbal immediate serial recall and auditory attention tasks. Both task types probed processing of item and serial order information because recent studies have shown this distinction to be critical when exploring relations between STM and lexical development. Multiple regression and variance partitioning analyses highlighted two variables as determinants of vocabulary development: (a) a serial order processing variable shared by STM order recall and a selective attention task for sequence information and (b) an attentional variable shared by selective attention measures targeting item or sequence information. The current study highlights the need for integrative STM models, accounting for conjoined influences of attentional capacities and serial order processing capacities on STM performance and the establishment of the lexical language network.
Légaré, France; Moumjid-Ferdjaoui, Nora; Drolet, Renée; Stacey, Dawn; Härter, Martin; Bastian, Hilda; Beaulieu, Marie-Dominique; Borduas, Francine; Charles, Cathy; Coulter, Angela; Desroches, Sophie; Friedrich, Gwendolyn; Gafni, Amiram; Graham, Ian D.; Labrecque, Michel; LeBlanc, Annie; Légaré, Jean; Politi, Mary; Sargeant, Joan; Thomson, Richard
2014-01-01
Shared decision making is now making inroads in health care professionals’ continuing education curriculum, but there is no consensus on what core competencies are required by clinicians for effectively involving patients in health-related decisions. Ready-made programs for training clinicians in shared decision making are in high demand, but existing programs vary widely in their theoretical foundations, length, and content. An international, interdisciplinary group of 25 individuals met in 2012 to discuss theoretical approaches to making health-related decisions, compare notes on existing programs, take stock of stakeholders concerns, and deliberate on core competencies. This article summarizes the results of those discussions. Some participants believed that existing models already provide a sufficient conceptual basis for developing and implementing shared decision making competency-based training programs on a wide scale. Others argued that this would be premature as there is still no consensus on the definition of shared decision making or sufficient evidence to recommend specific competencies for implementing shared decision making. However, all participants agreed that there were 2 broad types of competencies that clinicians need for implementing shared decision making: relational competencies and risk communication competencies. Further multidisciplinary research could broaden and deepen our understanding of core competencies for shared decision making training. PMID:24347105
Memories of Crisis: Bohr, Kuhn, and the Quantum Mechanical ``Revolution''
NASA Astrophysics Data System (ADS)
Seth, Suman
2013-04-01
``The history of science, to my knowledge,'' wrote Thomas Kuhn, describing the years just prior to the development of matrix and wave mechanics, ``offers no equally clear, detailed, and cogent example of the creative functions of normal science and crisis.'' By 1924, most quantum theorists shared a sense that there was much wrong with all extant atomic models. Yet not all shared equally in the sense that the failure was either terribly surprising or particularly demoralizing. Not all agreed, that is, that a crisis for Bohr-like models was a crisis for quantum theory. This paper attempts to answer four questions: two about history, two about memory. First, which sub-groups of the quantum theoretical community saw themselves and their field in a state of crisis in the early 1920s? Second, why did they do so, and how was a sense of crisis related to their theoretical practices in physics? Third, do we regard the years before 1925 as a crisis because they were followed by the quantum mechanical revolution? And fourth, to reverse the last question, were we to call into the question the existence of a crisis (for some at least) does that make a subsequent revolution less revolutionary?
Tang, Woung-Ru; Chen, Kuan-Yu; Hsu, Sheng-Hui; Juang, Yeong-Yuh; Chiu, Shin-Che; Hsiao, Shu-Chun; Fujimori, Maiko; Fang, Chun-Kai
2014-03-01
Communication skills training (CST) based on the Japanese SHARE model of family-centered truth telling in Asian countries has been adopted in Taiwan. However, its effectiveness in Taiwan has only been preliminarily verified. This study aimed to test the effect of SHARE model-centered CST on Taiwanese healthcare providers' truth-telling preference, to determine the effect size, and to compare the effect of 1-day and 2-day CST programs on participants' truth-telling preference. For this one-group, pretest-posttest study, 10 CST programs were conducted from August 2010 to November 2011 under certified facilitators and with standard patients. Participants (257 healthcare personnel from northern, central, southern, and eastern Taiwan) chose the 1-day (n = 94) or 2-day (n = 163) CST program as convenient. Participants' self-reported truth-telling preference was measured before and immediately after CST programs, with CST program assessment afterward. The CST programs significantly improved healthcare personnel's truth-telling preference (mean pretest and posttest scores ± standard deviation (SD): 263.8 ± 27.0 vs. 281.8 ± 22.9, p < 0.001). The CST programs effected a significant, large (d = 0.91) improvement in overall truth-telling preference and significantly improved method of disclosure, emotional support, and additional information (p < 0.001). Participation in 1-day or 2-day CST programs did not significantly affect participants' truth-telling preference (p > 0.05) except for the setting subscale. Most participants were satisfied with the CST programs (93.8%) and were willing to recommend them to colleagues (98.5%). The SHARE model-centered CST programs significantly improved Taiwanese healthcare personnel's truth-telling preference. Future studies should objectively assess participants' truth-telling preference, for example, by cancer patients, their families, and other medical team personnel and at longer times after CST programs. Copyright © 2013 John Wiley & Sons, Ltd.
Shared prefetching to reduce execution skew in multi-threaded systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eichenberger, Alexandre E; Gunnels, John A
Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated basedmore » on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.« less
Software Simplifies the Sharing of Numerical Models
NASA Technical Reports Server (NTRS)
2014-01-01
To ease the sharing of climate models with university students, Goddard Space Flight Center awarded SBIR funding to Reston, Virginia-based Parabon Computation Inc., a company that specializes in cloud computing. The firm developed a software program capable of running climate models over the Internet, and also created an online environment for people to collaborate on developing such models.
Cricket: A Mapped, Persistent Object Store
NASA Technical Reports Server (NTRS)
Shekita, Eugene; Zwilling, Michael
1996-01-01
This paper describes Cricket, a new database storage system that is intended to be used as a platform for design environments and persistent programming languages. Cricket uses the memory management primitives of the Mach operating system to provide the abstraction of a shared, transactional single-level store that can be directly accessed by user applications. In this paper, we present the design and motivation for Cricket. We also present some initial performance results which show that, for its intended applications, Cricket can provide better performance than a general-purpose database storage system.
Fractional Steps methods for transient problems on commodity computer architectures
NASA Astrophysics Data System (ADS)
Krotkiewski, M.; Dabrowski, M.; Podladchikov, Y. Y.
2008-12-01
Fractional Steps methods are suitable for modeling transient processes that are central to many geological applications. Low memory requirements and modest computational complexity facilitates calculations on high-resolution three-dimensional models. An efficient implementation of Alternating Direction Implicit/Locally One-Dimensional schemes for an Opteron-based shared memory system is presented. The memory bandwidth usage, the main bottleneck on modern computer architectures, is specially addressed. High efficiency of above 2 GFlops per CPU is sustained for problems of 1 billion degrees of freedom. The optimized sequential implementation of all 1D sweeps is comparable in execution time to copying the used data in the memory. Scalability of the parallel implementation on up to 8 CPUs is close to perfect. Performing one timestep of the Locally One-Dimensional scheme on a system of 1000 3 unknowns on 8 CPUs takes only 11 s. We validate the LOD scheme using a computational model of an isolated inclusion subject to a constant far field flux. Next, we study numerically the evolution of a diffusion front and the effective thermal conductivity of composites consisting of multiple inclusions and compare the results with predictions based on the differential effective medium approach. Finally, application of the developed parabolic solver is suggested for a real-world problem of fluid transport and reactions inside a reservoir.
A shared neural ensemble links distinct contextual memories encoded close in time
NASA Astrophysics Data System (ADS)
Cai, Denise J.; Aharoni, Daniel; Shuman, Tristan; Shobe, Justin; Biane, Jeremy; Song, Weilin; Wei, Brandon; Veshkini, Michael; La-Vu, Mimi; Lou, Jerry; Flores, Sergio E.; Kim, Isaac; Sano, Yoshitake; Zhou, Miou; Baumgaertel, Karsten; Lavi, Ayal; Kamata, Masakazu; Tuszynski, Mark; Mayford, Mark; Golshani, Peyman; Silva, Alcino J.
2016-06-01
Recent studies suggest that a shared neural ensemble may link distinct memories encoded close in time. According to the memory allocation hypothesis, learning triggers a temporary increase in neuronal excitability that biases the representation of a subsequent memory to the neuronal ensemble encoding the first memory, such that recall of one memory increases the likelihood of recalling the other memory. Here we show in mice that the overlap between the hippocampal CA1 ensembles activated by two distinct contexts acquired within a day is higher than when they are separated by a week. Several findings indicate that this overlap of neuronal ensembles links two contextual memories. First, fear paired with one context is transferred to a neutral context when the two contexts are acquired within a day but not across a week. Second, the first memory strengthens the second memory within a day but not across a week. Older mice, known to have lower CA1 excitability, do not show the overlap between ensembles, the transfer of fear between contexts, or the strengthening of the second memory. Finally, in aged mice, increasing cellular excitability and activating a common ensemble of CA1 neurons during two distinct context exposures rescued the deficit in linking memories. Taken together, these findings demonstrate that contextual memories encoded close in time are linked by directing storage into overlapping ensembles. Alteration of these processes by ageing could affect the temporal structure of memories, thus impairing efficient recall of related information.
Programming Models for Concurrency and Real-Time
NASA Astrophysics Data System (ADS)
Vitek, Jan
Modern real-time applications are increasingly large, complex and concurrent systems which must meet stringent performance and predictability requirements. Programming those systems require fundamental advances in programming languages and runtime systems. This talk presents our work on Flexotasks, a programming model for concurrent, real-time systems inspired by stream-processing and concurrent active objects. Some of the key innovations in Flexotasks are that it support both real-time garbage collection and region-based memory with an ownership type system for static safety. Communication between tasks is performed by channels with a linear type discipline to avoid copying messages, and by a non-blocking transactional memory facility. We have evaluated our model empirically within two distinct implementations, one based on Purdue’s Ovm research virtual machine framework and the other on Websphere, IBM’s production real-time virtual machine. We have written a number of small programs, as well as a 30 KLOC avionics collision detector application. We show that Flexotasks are capable of executing periodic threads at 10 KHz with a standard deviation of 1.2us and have performance competitive with hand coded C programs.
Vector computer memory bank contention
NASA Technical Reports Server (NTRS)
Bailey, D. H.
1985-01-01
A number of vector supercomputers feature very large memories. Unfortunately the large capacity memory chips that are used in these computers are much slower than the fast central processing unit (CPU) circuitry. As a result, memory bank reservation times (in CPU ticks) are much longer than on previous generations of computers. A consequence of these long reservation times is that memory bank contention is sharply increased, resulting in significantly lowered performance rates. The phenomenon of memory bank contention in vector computers is analyzed using both a Markov chain model and a Monte Carlo simulation program. The results of this analysis indicate that future generations of supercomputers must either employ much faster memory chips or else feature very large numbers of independent memory banks.
Vector computer memory bank contention
NASA Technical Reports Server (NTRS)
Bailey, David H.
1987-01-01
A number of vector supercomputers feature very large memories. Unfortunately the large capacity memory chips that are used in these computers are much slower than the fast central processing unit (CPU) circuitry. As a result, memory bank reservation times (in CPU ticks) are much longer than on previous generations of computers. A consequence of these long reservation times is that memory bank contention is sharply increased, resulting in significantly lowered performance rates. The phenomenon of memory bank contention in vector computers is analyzed using both a Markov chain model and a Monte Carlo simulation program. The results of this analysis indicate that future generations of supercomputers must either employ much faster memory chips or else feature very large numbers of independent memory banks.
Grossberg, Stephen
2015-09-24
This article provides an overview of neural models of synaptic learning and memory whose expression in adaptive behavior depends critically on the circuits and systems in which the synapses are embedded. It reviews Adaptive Resonance Theory, or ART, models that use excitatory matching and match-based learning to achieve fast category learning and whose learned memories are dynamically stabilized by top-down expectations, attentional focusing, and memory search. ART clarifies mechanistic relationships between consciousness, learning, expectation, attention, resonance, and synchrony. ART models are embedded in ARTSCAN architectures that unify processes of invariant object category learning, recognition, spatial and object attention, predictive remapping, and eye movement search, and that clarify how conscious object vision and recognition may fail during perceptual crowding and parietal neglect. The generality of learned categories depends upon a vigilance process that is regulated by acetylcholine via the nucleus basalis. Vigilance can get stuck at too high or too low values, thereby causing learning problems in autism and medial temporal amnesia. Similar synaptic learning laws support qualitatively different behaviors: Invariant object category learning in the inferotemporal cortex; learning of grid cells and place cells in the entorhinal and hippocampal cortices during spatial navigation; and learning of time cells in the entorhinal-hippocampal system during adaptively timed conditioning, including trace conditioning. Spatial and temporal processes through the medial and lateral entorhinal-hippocampal system seem to be carried out with homologous circuit designs. Variations of a shared laminar neocortical circuit design have modeled 3D vision, speech perception, and cognitive working memory and learning. A complementary kind of inhibitory matching and mismatch learning controls movement. This article is part of a Special Issue entitled SI: Brain and Memory. Copyright © 2014 Elsevier B.V. All rights reserved.
The declarative/procedural model of lexicon and grammar.
Ullman, M T
2001-01-01
Our use of language depends upon two capacities: a mental lexicon of memorized words and a mental grammar of rules that underlie the sequential and hierarchical composition of lexical forms into predictably structured larger words, phrases, and sentences. The declarative/procedural model posits that the lexicon/grammar distinction in language is tied to the distinction between two well-studied brain memory systems. On this view, the memorization and use of at least simple words (those with noncompositional, that is, arbitrary form-meaning pairings) depends upon an associative memory of distributed representations that is subserved by temporal-lobe circuits previously implicated in the learning and use of fact and event knowledge. This "declarative memory" system appears to be specialized for learning arbitrarily related information (i.e., for associative binding). In contrast, the acquisition and use of grammatical rules that underlie symbol manipulation is subserved by frontal/basal-ganglia circuits previously implicated in the implicit (nonconscious) learning and expression of motor and cognitive "skills" and "habits" (e.g., from simple motor acts to skilled game playing). This "procedural" system may be specialized for computing sequences. This novel view of lexicon and grammar offers an alternative to the two main competing theoretical frameworks. It shares the perspective of traditional dual-mechanism theories in positing that the mental lexicon and a symbol-manipulating mental grammar are subserved by distinct computational components that may be linked to distinct brain structures. However, it diverges from these theories where they assume components dedicated to each of the two language capacities (that is, domain-specific) and in their common assumption that lexical memory is a rote list of items. Conversely, while it shares with single-mechanism theories the perspective that the two capacities are subserved by domain-independent computational mechanisms, it diverges from them where they link both capacities to a single associative memory system with broad anatomic distribution. The declarative/procedural model, but neither traditional dual- nor single-mechanism models, predicts double dissociations between lexicon and grammar, with associations among associative memory properties, memorized words and facts, and temporal-lobe structures, and among symbol-manipulation properties, grammatical rule products, motor skills, and frontal/basal-ganglia structures. In order to contrast lexicon and grammar while holding other factors constant, we have focused our investigations of the declarative/procedural model on morphologically complex word forms. Morphological transformations that are (largely) unproductive (e.g., in go-went, solemn-solemnity) are hypothesized to depend upon declarative memory. These have been contrasted with morphological transformations that are fully productive (e.g., in walk-walked, happy-happiness), whose computation is posited to be solely dependent upon grammatical rules subserved by the procedural system. Here evidence is presented from studies that use a range of psycholinguistic and neurolinguistic approaches with children and adults. It is argued that converging evidence from these studies supports the declarative/procedural model of lexicon and grammar.
Parallel performance investigations of an unstructured mesh Navier-Stokes solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
2000-01-01
A Reynolds-averaged Navier-Stokes solver based on unstructured mesh techniques for analysis of high-lift configurations is described. The method makes use of an agglomeration multigrid solver for convergence acceleration. Implicit line-smoothing is employed to relieve the stiffness associated with highly stretched meshes. A GMRES technique is also implemented to speed convergence at the expense of additional memory usage. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Convergence and scalability results are illustrated for various high-lift cases.
Experimental evaluation of multiprocessor cache-based error recovery
NASA Technical Reports Server (NTRS)
Janssens, Bob; Fuchs, W. K.
1991-01-01
Several variations of cache-based checkpointing for rollback error recovery in shared-memory multiprocessors have been recently developed. By modifying the cache replacement policy, these techniques use the inherent redundancy in the memory hierarchy to periodically checkpoint the computation state. Three schemes, different in the manner in which they avoid rollback propagation, are evaluated. By simulation with address traces from parallel applications running on an Encore Multimax shared-memory multiprocessor, the performance effect of integrating the recovery schemes in the cache coherence protocol are evaluated. The results indicate that the cache-based schemes can provide checkpointing capability with low performance overhead but uncontrollable high variability in the checkpoint interval.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Seyong; Vetter, Jeffrey S
Computer architecture experts expect that non-volatile memory (NVM) hierarchies will play a more significant role in future systems including mobile, enterprise, and HPC architectures. With this expectation in mind, we present NVL-C: a novel programming system that facilitates the efficient and correct programming of NVM main memory systems. The NVL-C programming abstraction extends C with a small set of intuitive language features that target NVM main memory, and can be combined directly with traditional C memory model features for DRAM. We have designed these new features to enable compiler analyses and run-time checks that can improve performance and guard againstmore » a number of subtle programming errors, which, when left uncorrected, can corrupt NVM-stored data. Moreover, to enable recovery of data across application or system failures, these NVL-C features include a flexible directive for specifying NVM transactions. So that our implementation might be extended to other compiler front ends and languages, the majority of our compiler analyses are implemented in an extended version of LLVM's intermediate representation (LLVM IR). We evaluate NVL-C on a number of applications to show its flexibility, performance, and correctness.« less
Performing a local reduction operation on a parallel computer
Blocksome, Michael A; Faraj, Daniel A
2013-06-04
A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
Performing a local reduction operation on a parallel computer
Blocksome, Michael A.; Faraj, Daniel A.
2012-12-11
A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
Promoting Continuous Quality Improvement in Online Teaching: The META Model
ERIC Educational Resources Information Center
Dittmar, Eileen; McCracken, Holly
2012-01-01
Experienced e-learning faculty members share strategies for implementing a comprehensive postsecondary faculty development program essential to continuous improvement of instructional skills. The high-impact META Model (centered around Mentoring, Engagement, Technology, and Assessment) promotes information sharing and content creation, and fosters…
Enhancing memory self-efficacy during menopause through a group memory strategies program.
Unkenstein, Anne E; Bei, Bei; Bryant, Christina A
2017-05-01
Anxiety about memory during menopause can affect quality of life. We aimed to improve memory self-efficacy during menopause using a group memory strategies program. The program was run five times for a total of 32 peri- and postmenopausal women, age between 47 and 60 years, recruited from hospital menopause and gynecology clinics. The 4-week intervention consisted of weekly 2-hour sessions, and covered how memory works, memory changes related to ageing, health and lifestyle factors, and specific memory strategies. Memory contentment (CT), reported frequency of forgetting (FF), use of memory strategies, psychological distress, and attitude toward menopause were measured. A double-baseline design was applied, with outcomes measured on two baseline occasions (1-month prior [T1] and in the first session [T2]), immediately postintervention (T3), and 3-month postintervention (T4). To describe changes in each variable between time points paired sample t tests were conducted. Mixed-effects models comparing the means of random slopes from T2 to T3 with those from T1 to T2 were conducted for each variable to test for treatment effects. Examination of the naturalistic changes in outcome measures from T1 to T2 revealed no significant changes (all Ps > 0.05). CT, reported FF, and use of memory strategies improved significantly more from T2 to T3, than from T1 to T2 (all Ps < 0.05). Neither attitude toward menopause nor psychological distress improved significantly more postintervention than during the double-baseline (all Ps > 0.05). Improvements in reported CT and FF were maintained after 3 months. The use of group interventions to improve memory self-efficacy during menopause warrants continued evaluation.
Finding Services for an Open Architecture: A Review of Existing Applications and Programs in PEO C4I
2011-01-01
2004) Two key SOA success factors listed were as follows: 1. Shared Services Strategy: Existence of a strategy to identify overlapping business and...model Architectural pattern 22 Finding Services for an Open Architecture or eliminating redundancies and overlaps through use of shared services 2...Funding Model: Existence of an IT funding model aligned with and supportive of a shared services strategy. (Sun Micro- systems, 2004) Become Data
Neo: an object model for handling electrophysiology data in multiple formats
Garcia, Samuel; Guarino, Domenico; Jaillet, Florent; Jennings, Todd; Pröpper, Robert; Rautenberg, Philipp L.; Rodgers, Chris C.; Sobolev, Andrey; Wachtler, Thomas; Yger, Pierre; Davison, Andrew P.
2014-01-01
Neuroscientists use many different software tools to acquire, analyze and visualize electrophysiological signals. However, incompatible data models and file formats make it difficult to exchange data between these tools. This reduces scientific productivity, renders potentially useful analysis methods inaccessible and impedes collaboration between labs. A common representation of the core data would improve interoperability and facilitate data-sharing. To that end, we propose here a language-independent object model, named “Neo,” suitable for representing data acquired from electroencephalographic, intracellular, or extracellular recordings, or generated from simulations. As a concrete instantiation of this object model we have developed an open source implementation in the Python programming language. In addition to representing electrophysiology data in memory for the purposes of analysis and visualization, the Python implementation provides a set of input/output (IO) modules for reading/writing the data from/to a variety of commonly used file formats. Support is included for formats produced by most of the major manufacturers of electrophysiology recording equipment and also for more generic formats such as MATLAB. Data representation and data analysis are conceptually separate: it is easier to write robust analysis code if it is focused on analysis and relies on an underlying package to handle data representation. For that reason, and also to be as lightweight as possible, the Neo object model and the associated Python package are deliberately limited to representation of data, with no functions for data analysis or visualization. Software for neurophysiology data analysis and visualization built on top of Neo automatically gains the benefits of interoperability, easier data sharing and automatic format conversion; there is already a burgeoning ecosystem of such tools. We intend that Neo should become the standard basis for Python tools in neurophysiology. PMID:24600386
Neo: an object model for handling electrophysiology data in multiple formats.
Garcia, Samuel; Guarino, Domenico; Jaillet, Florent; Jennings, Todd; Pröpper, Robert; Rautenberg, Philipp L; Rodgers, Chris C; Sobolev, Andrey; Wachtler, Thomas; Yger, Pierre; Davison, Andrew P
2014-01-01
Neuroscientists use many different software tools to acquire, analyze and visualize electrophysiological signals. However, incompatible data models and file formats make it difficult to exchange data between these tools. This reduces scientific productivity, renders potentially useful analysis methods inaccessible and impedes collaboration between labs. A common representation of the core data would improve interoperability and facilitate data-sharing. To that end, we propose here a language-independent object model, named "Neo," suitable for representing data acquired from electroencephalographic, intracellular, or extracellular recordings, or generated from simulations. As a concrete instantiation of this object model we have developed an open source implementation in the Python programming language. In addition to representing electrophysiology data in memory for the purposes of analysis and visualization, the Python implementation provides a set of input/output (IO) modules for reading/writing the data from/to a variety of commonly used file formats. Support is included for formats produced by most of the major manufacturers of electrophysiology recording equipment and also for more generic formats such as MATLAB. Data representation and data analysis are conceptually separate: it is easier to write robust analysis code if it is focused on analysis and relies on an underlying package to handle data representation. For that reason, and also to be as lightweight as possible, the Neo object model and the associated Python package are deliberately limited to representation of data, with no functions for data analysis or visualization. Software for neurophysiology data analysis and visualization built on top of Neo automatically gains the benefits of interoperability, easier data sharing and automatic format conversion; there is already a burgeoning ecosystem of such tools. We intend that Neo should become the standard basis for Python tools in neurophysiology.
... this page: //medlineplus.gov/ency/article/003257.htm Memory loss To use the sharing features on this ... Bethesda, MD 20894 U.S. Department of Health and Human Services National Institutes of Health Page last updated: ...
Adiabatic quantum optimization for associative memory recall
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seddiqi, Hadayat; Humble, Travis S.
Hopfield networks are a variant of associative memory that recall patterns stored in the couplings of an Ising model. Stored memories are conventionally accessed as fixed points in the network dynamics that correspond to energetic minima of the spin state. We show that memories stored in a Hopfield network may also be recalled by energy minimization using adiabatic quantum optimization (AQO). Numerical simulations of the underlying quantum dynamics allow us to quantify AQO recall accuracy with respect to the number of stored memories and noise in the input key. We investigate AQO performance with respect to how memories are storedmore » in the Ising model according to different learning rules. Our results demonstrate that AQO recall accuracy varies strongly with learning rule, a behavior that is attributed to differences in energy landscapes. Consequently, learning rules offer a family of methods for programming adiabatic quantum optimization that we expect to be useful for characterizing AQO performance.« less
Adiabatic Quantum Optimization for Associative Memory Recall
NASA Astrophysics Data System (ADS)
Seddiqi, Hadayat; Humble, Travis
2014-12-01
Hopfield networks are a variant of associative memory that recall patterns stored in the couplings of an Ising model. Stored memories are conventionally accessed as fixed points in the network dynamics that correspond to energetic minima of the spin state. We show that memories stored in a Hopfield network may also be recalled by energy minimization using adiabatic quantum optimization (AQO). Numerical simulations of the underlying quantum dynamics allow us to quantify AQO recall accuracy with respect to the number of stored memories and noise in the input key. We investigate AQO performance with respect to how memories are stored in the Ising model according to different learning rules. Our results demonstrate that AQO recall accuracy varies strongly with learning rule, a behavior that is attributed to differences in energy landscapes. Consequently, learning rules offer a family of methods for programming adiabatic quantum optimization that we expect to be useful for characterizing AQO performance.
Adiabatic quantum optimization for associative memory recall
Seddiqi, Hadayat; Humble, Travis S.
2014-12-22
Hopfield networks are a variant of associative memory that recall patterns stored in the couplings of an Ising model. Stored memories are conventionally accessed as fixed points in the network dynamics that correspond to energetic minima of the spin state. We show that memories stored in a Hopfield network may also be recalled by energy minimization using adiabatic quantum optimization (AQO). Numerical simulations of the underlying quantum dynamics allow us to quantify AQO recall accuracy with respect to the number of stored memories and noise in the input key. We investigate AQO performance with respect to how memories are storedmore » in the Ising model according to different learning rules. Our results demonstrate that AQO recall accuracy varies strongly with learning rule, a behavior that is attributed to differences in energy landscapes. Consequently, learning rules offer a family of methods for programming adiabatic quantum optimization that we expect to be useful for characterizing AQO performance.« less
NASA Astrophysics Data System (ADS)
Vukics, András
2012-06-01
C++QED is a versatile framework for simulating open quantum dynamics. It allows to build arbitrarily complex quantum systems from elementary free subsystems and interactions, and simulate their time evolution with the available time-evolution drivers. Through this framework, we introduce a design which should be generic for high-level representations of composite quantum systems. It relies heavily on the object-oriented and generic programming paradigms on one hand, and on the other hand, compile-time algorithms, in particular C++ template-metaprogramming techniques. The core of the design is the data structure which represents the state vectors of composite quantum systems. This data structure models the multi-array concept. The use of template metaprogramming is not only crucial to the design, but with its use all computations pertaining to the layout of the simulated system can be shifted to compile time, hence cutting on runtime. Program summaryProgram title: C++QED Catalogue identifier: AELU_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AELU_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions:http://cpc.cs.qub.ac.uk/licence/aelu_v1_0.html. The C++QED package contains other software packages, Blitz, Boost and FLENS, all of which may be distributed freely but have individual license requirements. Please see individual packages for license conditions. No. of lines in distributed program, including test data, etc.: 597 974 No. of bytes in distributed program, including test data, etc.: 4 874 839 Distribution format: tar.gz Programming language: C++ Computer: i386-i686, x86_64 Operating system: In principle cross-platform, as yet tested only on UNIX-like systems (including Mac OS X). RAM: The framework itself takes about 60 MB, which is fully shared. The additional memory taken by the program which defines the actual physical system (script) is typically less than 1 MB. The memory storing the actual data scales with the system dimension for state-vector manipulations, and the square of the dimension for density-operator manipulations. This might easily be GBs, and often the memory of the machine limits the size of the simulated system. Classification: 4.3, 4.13, 6.2, 20 External routines: Boost C++ libraries (http://www.boost.org/), GNU Scientific Library (http://www.gnu.org/software/gsl/), Blitz++ (http://www.oonumerics.org/blitz/), Linear Algebra Package - Flexible Library for Efficient Numerical Solutions (http://flens.sourceforge.net/). Nature of problem: Definition of (open) composite quantum systems out of elementary building blocks [1]. Manipulation of such systems, with emphasis on dynamical simulations such as Master-equation evolution [2] and Monte Carlo wave-function simulation [3]. Solution method: Master equation, Monte Carlo wave-function method. Restrictions: Total dimensionality of the system. Master equation - few thousands. Monte Carlo wave-function trajectory - several millions. Unusual features: Because of the heavy use of compile-time algorithms, compilation of programs written in the framework may take a long time and much memory (up to several GBs). Additional comments: The framework is not a program, but provides and implements an application-programming interface for developing simulations in the indicated problem domain. Supplementary information: http://cppqed.sourceforge.net/. Running time: Depending on the magnitude of the problem, can vary from a few seconds to weeks.
An anatomy memorial tribute: fostering a humanistic practice of medicine.
Vora, A
1998-01-01
Medical students' first "patients" are the individuals who donate their bodies for laboratory dissection, and these first lessons of medicine serve as a model for the doctor-patient relationship. An Anatomy Memorial Tribute was initiated by students at Mount Sinai School of Medicine to honor these donors. Students and faculty shared music, art, and readings of original poetry and prose. The event facilitated dialogue about attitudes and feelings with regards to death and dying. Controversial issues included anonymity versus identification of donors and the appropriateness of professionals showing emotion in public. The feedback from both students and faculty participants in the event was overwhelmingly positive. Students wrote that the tribute provided a sense of closure for their dissection experience and reinvolved them in shaping their education; faculty indicated that it was appropriate. Memorial tributes are a first step toward fostering the personal growth and emotional preparation required for competent and compassionate patient care. To encourage a humanistic approach to medical education, faculty have the opportunity to participate in such tributes, facilitate sensitive use of language in the anatomy laboratory, and expand the broader medical school curriculum in relation to death and dying. Medical students may expand the concept of memorial tributes and enhance their professional growth in this area by sharing information, ideas, and experiences through national organizations such as the Humanistic Medicine Group of the American Medical Students Association. The capacity of physicians to effectively serve patients facing the end of life is particularly relevant in the setting of palliative medicine.
We Remember, We Forget: Collaborative Remembering in Older Couples
ERIC Educational Resources Information Center
Harris, Celia B.; Keil, Paul G.; Sutton, John; Barnier, Amanda J.; McIlwain, Doris J. F.
2011-01-01
Transactive memory theory describes the processes by which benefits for memory can occur when remembering is shared in dyads or groups. In contrast, cognitive psychology experiments demonstrate that social influences on memory disrupt and inhibit individual recall. However, most research in cognitive psychology has focused on groups of strangers…
76 FR 12821 - 150th Anniversary of the Inauguration of Abraham Lincoln
Federal Register 2010, 2011, 2012, 2013, 2014
2011-03-09
... together by shared memories and common hopes. As we observe the 150th anniversary of his Inauguration, we... his memory enabled America to move beyond a young collection of States to become a free and unified... memory and uphold the principles he so nobly advanced. [[Page 12822
Expert Systems on Multiprocessor Architectures. Volume 2. Technical Reports
1991-06-01
Report RC 12936 (#58037). IBM T. J. Wartson Reiearch Center. July 1987. Alan Jay Smith. Cache memories. Coniputing Sitrry., 1.1(3): I.3-5:30...basic-shared is an instrument for ashared memory design. The components panels are processor- qload-scrolling-bar-panel, memory-qload-scrolling-bar-panel
Blanket Gate Would Address Blocks Of Memory
NASA Technical Reports Server (NTRS)
Lambe, John; Moopenn, Alexander; Thakoor, Anilkumar P.
1988-01-01
Circuit-chip area used more efficiently. Proposed gate structure selectively allows and restricts access to blocks of memory in electronic neural-type network. By breaking memory into independent blocks, gate greatly simplifies problem of reading from and writing to memory. Since blocks not used simultaneously, share operational amplifiers that prompt and read information stored in memory cells. Fewer operational amplifiers needed, and chip area occupied reduced correspondingly. Cost per bit drops as result.
The potential of multi-port optical memories in digital computing
NASA Technical Reports Server (NTRS)
Alford, C. O.; Gaylord, T. K.
1975-01-01
A high-capacity memory with a relatively high data transfer rate and multi-port simultaneous access capability may serve as the basis for new computer architectures. The implementation of a multi-port optical memory is discussed. Several computer structures are presented that might profitably use such a memory. These structures include (1) a simultaneous record access system, (2) a simultaneously shared memory computer system, and (3) a parallel digital processing structure.
Hutter, Russell R C; Allen, Richard J; Wood, Chantelle
2016-01-01
Recent research (e.g., Hutter, Crisp, Humphreys, Waters, & Moffit; Siebler) has confirmed that combining novel social categories involves two stages (e.g., Hampton; Hastie, Schroeder, & Weber). Furthermore, it is also evident that following stage 1 (constituent additivity), the second stage in these models involves cognitively effortful complex reasoning. However, while current theory and research has addressed how category conjunctions are initially represented to some degree, it is not clear precisely where we first combine or bind existing social constituent categories. For example, how and where do we compose and temporarily store a coherent representation of an individual who shares membership of "female" and "blacksmith" categories? In this article, we consider how the revised multi-component model of working memory (Baddeley) can assist in resolving the representational limitations in the extant two-stage theoretical models. This is a new approach to understanding how novel conjunctions form new bound "composite" representations.
Wiegand, Melanie A; Troyer, Angela K; Gojmerac, Christina; Murphy, Kelly J
2013-01-01
Many older adults are concerned about memory changes with age and consequently seek ways to optimize their memory function. Memory programs are known to be variably effective in improving memory knowledge, other aspects of metamemory, and/or objective memory, but little is known about their impact on implementing and sustaining lifestyle and healthcare-seeking intentions and behaviors. We evaluated a multidimensional, evidence-based intervention, the Memory and Aging Program, that provides education about memory and memory change, training in the use of practical memory strategies, and support for implementation of healthy lifestyle behavior changes. In a randomized controlled trial, 42 healthy older adults participated in a program (n = 21) or a waitlist control (n = 21) group. Relative to the control group, participants in the program implemented more healthy lifestyle behaviors by the end of the program and maintained these changes 1 month later. Similarly, program participants reported a decreased intention to seek unnecessary medical attention for their memory immediately after the program and 1 month later. Findings support the use of multidimensional memory programs to promote healthy lifestyles and influence healthcare-seeking behaviors. Discussion focuses on implications of these changes for maximizing cognitive health and minimizing impact on healthcare resources.
Declarative memory deficits and schizophrenia: problems and prospects.
Stone, William S; Hsi, Xiaolu
2011-11-01
Cognitive deficits are among the most important factors leading to poor functional outcomes in schizophrenia, with deficits in declarative memory among the largest and most robust of these. Thus far, attempts to enhance cognition in schizophrenia have shown only modest success, which underlies increasing efforts to develop effective treatment strategies. This review is divided into three main parts. The first section delineates the nature and extent of the deficits in both patients with schizophrenia and in their adult, non-psychotic relatives. The second part focuses on structural and functional abnormalities in the hippocampus, both in people with schizophrenia and in animal studies that model relevant features of the illness. The third section views problems in declarative memory and hippocampal function from the perspective of elevated rates of common medical disorders in schizophrenia, with a focus on insulin insensitivity/diabetes. The likelihood that poor glucose regulation/availability contribute to declarative memory deficits and hippocampal abnormalities is considered, along with the possibility that schizophrenia and poor glucose regulation share common etiologic elements, and with clinical implications of this perspective for enhancing declarative memory. Copyright © 2011 Elsevier Inc. All rights reserved.
Georgiades, Anna; Rijsdijk, Fruhling; Kane, Fergus; Rebollo-Mesa, Irene; Kalidindi, Sridevi; Schulze, Katja K; Stahl, Daniel; Walshe, Muriel; Sahakian, Barbara J; McDonald, Colm; Hall, Mei-Hua; Murray, Robin M; Kravariti, Eugenia
2016-06-01
Twin studies have lacked statistical power to apply advanced genetic modelling techniques to the search for cognitive endophenotypes for bipolar disorder. To quantify the shared genetic variability between bipolar disorder and cognitive measures. Structural equation modelling was performed on cognitive data collected from 331 twins/siblings of varying genetic relatedness, disease status and concordance for bipolar disorder. Using a parsimonious AE model, verbal episodic and spatial working memory showed statistically significant genetic correlations with bipolar disorder (rg = |0.23|-|0.27|), which lost statistical significance after covarying for affective symptoms. Using an ACE model, IQ and visual-spatial learning showed statistically significant genetic correlations with bipolar disorder (rg = |0.51|-|1.00|), which remained significant after covarying for affective symptoms. Verbal episodic and spatial working memory capture a modest fraction of the bipolar diathesis. IQ and visual-spatial learning may tap into genetic substrates of non-affective symptomatology in bipolar disorder. © The Royal College of Psychiatrists 2016.
Multi-core processing and scheduling performance in CMS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hernandez, J. M.; Evans, D.; Foulkes, S.
2012-01-01
Commodity hardware is going many-core. We might soon not be able to satisfy the job memory needs per core in the current single-core processing model in High Energy Physics. In addition, an ever increasing number of independent and incoherent jobs running on the same physical hardware not sharing resources might significantly affect processing performance. It will be essential to effectively utilize the multi-core architecture. CMS has incorporated support for multi-core processing in the event processing framework and the workload management system. Multi-core processing jobs share common data in memory, such us the code libraries, detector geometry and conditions data, resultingmore » in a much lower memory usage than standard single-core independent jobs. Exploiting this new processing model requires a new model in computing resource allocation, departing from the standard single-core allocation for a job. The experiment job management system needs to have control over a larger quantum of resource since multi-core aware jobs require the scheduling of multiples cores simultaneously. CMS is exploring the approach of using whole nodes as unit in the workload management system where all cores of a node are allocated to a multi-core job. Whole-node scheduling allows for optimization of the data/workflow management (e.g. I/O caching, local merging) but efficient utilization of all scheduled cores is challenging. Dedicated whole-node queues have been setup at all Tier-1 centers for exploring multi-core processing workflows in CMS. We present the evaluation of the performance scheduling and executing multi-core workflows in whole-node queues compared to the standard single-core processing workflows.« less
Endocannabinoid signaling and memory dynamics: A synaptic perspective.
Drumond, Ana; Madeira, Natália; Fonseca, Rosalina
2017-02-01
Memory acquisition is a key brain feature in which our human nature relies on. Memories evolve over time. Initially after learning, memories are labile and sensitive to disruption by the interference of concurrent events. Later on, after consolidation, memories are resistant to disruption. However, reactivation of previously consolidated memories renders them again in an unstable state and therefore susceptible to perturbation. Additionally, and depending on the characteristics of the stimuli, a parallel process may be initiated which ultimately leads to the extinction of the previously acquired response. This dynamic aspect of memory maintenance opens the possibility for an updating of previously acquired memories but it also creates several conceptual challenges. What is the time window for memory updating? What determines whether reconsolidation or extinction is triggered? In this review, we tried to re-examine the relationship between consolidation, reconsolidation and extinction, aiming for a unifying view of memory dynamics. Since cellular models of memory share common principles, we present the evidence that similar rules apply to the maintenance of synaptic plasticity. Recently, a new function of the endocannabinoid (eCB) signaling system has been described for associative forms of synaptic plasticity in amygdala synapses. The eCB system has emerged as a key modulator of memory dynamics by adjusting the outcome to stimuli intensity. We propose a key function of eCB in discriminative forms of learning by restricting associative plasticity in amygdala synapses. Since many neuropsychiatric disorders are associated with a dysregulation in memory dynamics, understanding the rules underlying memory maintenance paves the path to better clinical interventions. Copyright © 2016 Elsevier Inc. All rights reserved.
Benefits of flexible prioritization in working memory can arise without costs.
Myers, Nicholas E; Chekroud, Sammi R; Stokes, Mark G; Nobre, Anna C
2018-03-01
Most recent models conceptualize working memory (WM) as a continuous resource, divided up according to task demands. When an increasing number of items need to be remembered, each item receives a smaller chunk of the memory resource. These models predict that the allocation of attention to high-priority WM items during the retention interval should be a zero-sum game: improvements in remembering cued items come at the expense of uncued items because resources are dynamically transferred from uncued to cued representations. The current study provides empirical data challenging this model. Four precision retrocueing WM experiments assessed cued and uncued items on every trial. This permitted a test for trade-off of the memory resource. We found no evidence for trade-offs in memory across trials. Moreover, robust improvements in WM performance for cued items came at little or no cost to uncued items that were probed afterward, thereby increasing the net capacity of WM relative to neutral cueing conditions. An alternative mechanism of prioritization proposes that cued items are transferred into a privileged state within a response-gating bottleneck, in which an item uniquely controls upcoming behavior. We found evidence consistent with this alternative. When an uncued item was probed first, report of its orientation was biased away from the cued orientation to be subsequently reported. We interpret this bias as competition for behavioral control in the output-driving bottleneck. Other items in WM did not bias each other, making this result difficult to explain with a shared resource model. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Methodology for fast detection of false sharing in threaded scientific codes
Chung, I-Hsin; Cong, Guojing; Murata, Hiroki; Negishi, Yasushi; Wen, Hui-Fang
2014-11-25
A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.
Nonvolatile Memory Technology for Space Applications
NASA Technical Reports Server (NTRS)
Oldham, Timothy R.; Irom, Farokh; Friendlich, Mark; Nguyen, Duc; Kim, Hak; Berg, Melanie; LaBel, Kenneth A.
2010-01-01
This slide presentation reviews several forms of nonvolatile memory for use in space applications. The intent is to: (1) Determine inherent radiation tolerance and sensitivities, (2) Identify challenges for future radiation hardening efforts, (3) Investigate new failure modes and effects, and technology modeling programs. Testing includes total dose, single event (proton, laser, heavy ion), and proton damage (where appropriate). Test vehicles are expected to be a variety of non-volatile memory devices as available including Flash (NAND and NOR), Charge Trap, Nanocrystal Flash, Magnetic Memory (MRAM), Phase Change--Chalcogenide, (CRAM), Ferroelectric (FRAM), CNT, and Resistive RAM.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, Yao-Feng, E-mail: yfchang@utexas.edu; Zhou, Fei; Chen, Ying-Chen
2016-01-18
Self-compliance characteristics and reliability optimization are investigated in intrinsic unipolar silicon oxide (SiO{sub x})-based resistive switching (RS) memory using TiW/SiO{sub x}/TiW device structures. The program window (difference between SET voltage and RESET voltage) is dependent on external series resistance, demonstrating that the SET process is due to a voltage-triggered mechanism. The program window has been optimized for program/erase disturbance immunity and reliability for circuit-level applications. The SET and RESET transitions have also been characterized using a dynamic conductivity method, which distinguishes the self-compliance behavior due to an internal series resistance effect (filament) in SiO{sub x}-based RS memory. By using amore » conceptual “filament/resistive gap (GAP)” model of the conductive filament and a proton exchange model with appropriate assumptions, the internal filament resistance and GAP resistance can be estimated for high- and low-resistance states (HRS and LRS), and are found to be independent of external series resistance. Our experimental results not only provide insights into potential reliability issues but also help to clarify the switching mechanisms and device operating characteristics of SiO{sub x}-based RS memory.« less
Harding, Ian H; Yücel, Murat; Harrison, Ben J; Pantelis, Christos; Breakspear, Michael
2015-02-01
Cognitive control and working memory rely upon a common fronto-parietal network that includes the inferior frontal junction (IFJ), dorsolateral prefrontal cortex (dlPFC), pre-supplementary motor area/dorsal anterior cingulate cortex (pSMA/dACC), and intraparietal sulcus (IPS). This network is able to flexibly adapt its function in response to changing behavioral goals, mediating a wide range of cognitive demands. Here we apply dynamic causal modeling to functional magnetic resonance imaging data to characterize task-related alterations in the strength of network interactions across distinct cognitive processes. Evidence in favor of task-related connectivity dynamics was accrued across a very large space of possible network structures. Cognitive control and working memory demands were manipulated using a factorial combination of the multi-source interference task and a verbal 2-back working memory task, respectively. Both were found to alter the sensitivity of the IFJ to perceptual information, and to increase IFJ-to-pSMA/dACC connectivity. In contrast, increased connectivity from the pSMA/dACC to the IPS, as well as from the dlPFC to the IFJ, was uniquely driven by cognitive control demands; a task-induced negative influence of the dlPFC on the pSMA/dACC was specific to working memory demands. These results reflect a system of both shared and unique context-dependent dynamics within the fronto-parietal network. Mechanisms supporting cognitive engagement, response selection, and action evaluation may be shared across cognitive domains, while dynamic updating of task and context representations within this network are potentially specific to changing demands on cognitive control. Copyright © 2014 Elsevier Inc. All rights reserved.
An Adaptive Insertion and Promotion Policy for Partitioned Shared Caches
NASA Astrophysics Data System (ADS)
Mahrom, Norfadila; Liebelt, Michael; Raof, Rafikha Aliana A.; Daud, Shuhaizar; Hafizah Ghazali, Nur
2018-03-01
Cache replacement policies in chip multiprocessors (CMP) have been investigated extensively and proven able to enhance shared cache management. However, competition among multiple processors executing different threads that require simultaneous access to a shared memory may cause cache contention and memory coherence problems on the chip. These issues also exist due to some drawbacks of the commonly used Least Recently Used (LRU) policy employed in multiprocessor systems, which are because of the cache lines residing in the cache longer than required. In image processing analysis of for example extra pulmonary tuberculosis (TB), an accurate diagnosis for tissue specimen is required. Therefore, a fast and reliable shared memory management system to execute algorithms for processing vast amount of specimen image is needed. In this paper, the effects of the cache replacement policy in a partitioned shared cache are investigated. The goal is to quantify whether better performance can be achieved by using less complex replacement strategies. This paper proposes a Middle Insertion 2 Positions Promotion (MI2PP) policy to eliminate cache misses that could adversely affect the access patterns and the throughput of the processors in the system. The policy employs a static predefined insertion point, near distance promotion, and the concept of ownership in the eviction policy to effectively improve cache thrashing and to avoid resource stealing among the processors.
42 CFR § 512.500 - Sharing arrangements under the EPM.
Code of Federal Regulations, 2010 CFR
2017-10-01
... SERVICES (CONTINUED) HEALTH CARE INFRASTRUCTURE AND MODEL PROGRAMS EPISODE PAYMENT MODEL Financial... participant may enter into a sharing arrangement with an EPM collaborator to make a gainsharing payment, or to receive an alignment payment, or both. An EPM participant must not make a gainsharing payment or receive...
Hierarchical Traces for Reduced NSM Memory Requirements
NASA Astrophysics Data System (ADS)
Dahl, Torbjørn S.
This paper presents work on using hierarchical long term memory to reduce the memory requirements of nearest sequence memory (NSM) learning, a previously published, instance-based reinforcement learning algorithm. A hierarchical memory representation reduces the memory requirements by allowing traces to share common sub-sequences. We present moderated mechanisms for estimating discounted future rewards and for dealing with hidden state using hierarchical memory. We also present an experimental analysis of how the sub-sequence length affects the memory compression achieved and show that the reduced memory requirements do not effect the speed of learning. Finally, we analyse and discuss the persistence of the sub-sequences independent of specific trace instances.
NASA Technical Reports Server (NTRS)
1981-01-01
User requirements, guidelines, and standards for interconnecting an Applications Data Service (ADS) program for data sharing are discussed. Methods for effective sharing of information (catalogues, directories, and dictionaries) among member installations are addressed. An ADS Directory/Catalog architectural model is also given.
The impact of storage on processing: how is information maintained in working memory?
Vergauwe, Evie; Camos, Valérie; Barrouillet, Pierre
2014-07-01
Working memory is typically defined as a system devoted to the simultaneous maintenance and processing of information. However, the interplay between these 2 functions is still a matter of debate in the literature, with views ranging from complete independence to complete dependence. The time-based resource-sharing model assumes that a central bottleneck constrains the 2 functions to alternate in such a way that maintenance activities postpone concurrent processing, with each additional piece of information to be maintained resulting in an additional postponement. Using different kinds of memoranda, we examined in a series of 7 experiments the effect of increasing memory load on different processing tasks. The results reveal that, insofar as attention is needed for maintenance, processing times linearly increase at a rate of about 50 ms per verbal or visuospatial memory item, suggesting a very fast refresh rate in working memory. Our results also show an asymmetry between verbal and spatial information, in that spatial information can solely rely on attention for its maintenance while verbal information can also rely on a domain-specific maintenance mechanism independent from attention. The implications for the functioning of working memory are discussed, with a specific focus on how information is maintained in working memory. PsycINFO Database Record (c) 2014 APA, all rights reserved.
The Contribution of Working Memory to Fluid Reasoning: Capacity, Control, or Both?
ERIC Educational Resources Information Center
Chuderski, Adam; Necka, Edward
2012-01-01
Fluid reasoning shares a large part of its variance with working memory capacity (WMC). The literature on working memory (WM) suggests that the capacity of the focus of attention responsible for simultaneous maintenance and integration of information within WM, as well as the effectiveness of executive control exerted over WM, determines…
ERIC Educational Resources Information Center
Olivers, Christian N. L.; Meijer, Frank; Theeuwes, Jan
2006-01-01
In 7 experiments, the authors explored whether visual attention (the ability to select relevant visual information) and visual working memory (the ability to retain relevant visual information) share the same content representations. The presence of singleton distractors interfered more strongly with a visual search task when it was accompanied by…
Cache write generate for parallel image processing on shared memory architectures.
Wittenbrink, C M; Somani, A K; Chen, C H
1996-01-01
We investigate cache write generate, our cache mode invention. We demonstrate that for parallel image processing applications, the new mode improves main memory bandwidth, CPU efficiency, cache hits, and cache latency. We use register level simulations validated by the UW-Proteus system. Many memory, cache, and processor configurations are evaluated.
Mnemonic convergence in social networks: The emergent properties of cognition at a collective level
Coman, Alin; Momennejad, Ida; Drach, Rae D.; Geana, Andra
2016-01-01
The development of shared memories, beliefs, and norms is a fundamental characteristic of human communities. These emergent outcomes are thought to occur owing to a dynamic system of information sharing and memory updating, which fundamentally depends on communication. Here we report results on the formation of collective memories in laboratory-created communities. We manipulated conversational network structure in a series of real-time, computer-mediated interactions in fourteen 10-member communities. The results show that mnemonic convergence, measured as the degree of overlap among community members’ memories, is influenced by both individual-level information-processing phenomena and by the conversational social network structure created during conversational recall. By studying laboratory-created social networks, we show how large-scale social phenomena (i.e., collective memory) can emerge out of microlevel local dynamics (i.e., mnemonic reinforcement and suppression effects). The social-interactionist approach proposed herein points to optimal strategies for spreading information in social networks and provides a framework for measuring and forging collective memories in communities of individuals. PMID:27357678
A sample implementation for parallelizing Divide-and-Conquer algorithms on the GPU.
Mei, Gang; Zhang, Jiayin; Xu, Nengxiong; Zhao, Kunyang
2018-01-01
The strategy of Divide-and-Conquer (D&C) is one of the frequently used programming patterns to design efficient algorithms in computer science, which has been parallelized on shared memory systems and distributed memory systems. Tzeng and Owens specifically developed a generic paradigm for parallelizing D&C algorithms on modern Graphics Processing Units (GPUs). In this paper, by following the generic paradigm proposed by Tzeng and Owens, we provide a new and publicly available GPU implementation of the famous D&C algorithm, QuickHull, to give a sample and guide for parallelizing D&C algorithms on the GPU. The experimental results demonstrate the practicality of our sample GPU implementation. Our research objective in this paper is to present a sample GPU implementation of a classical D&C algorithm to help interested readers to develop their own efficient GPU implementations with fewer efforts.
Importance of balanced architectures in the design of high-performance imaging systems
NASA Astrophysics Data System (ADS)
Sgro, Joseph A.; Stanton, Paul C.
1999-03-01
Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.
OPSO - The OpenGL based Field Acquisition and Telescope Guiding System
NASA Astrophysics Data System (ADS)
Škoda, P.; Fuchs, J.; Honsa, J.
2006-07-01
We present OPSO, a modular pointing and auto-guiding system for the coudé spectrograph of the Ondřejov observatory 2m telescope. The current field and slit viewing CCD cameras with image intensifiers are giving only standard TV video output. To allow the acquisition and guiding of very faint targets, we have designed an image enhancing system working in real time on TV frames grabbed by BT878-based video capture card. Its basic capabilities include the sliding averaging of hundreds of frames with bad pixel masking and removal of outliers, display of median of set of frames, quick zooming, contrast and brightness adjustment, plotting of horizontal and vertical cross cuts of seeing disk within given intensity range and many more. From the programmer's point of view, the system consists of three tasks running in parallel on a Linux PC. One C task controls the video capturing over Video for Linux (v4l2) interface and feeds the frames into the large block of shared memory, where the core image processing is done by another C program calling the OpenGL library. The GUI is, however, dynamically built in Python from XML description of widgets prepared in Glade. All tasks are exchanging information by IPC calls using the shared memory segments.
Modeling the glass transition of amorphous networks for shape-memory behavior
NASA Astrophysics Data System (ADS)
Xiao, Rui; Choi, Jinwoo; Lakhera, Nishant; Yakacki, Christopher M.; Frick, Carl P.; Nguyen, Thao D.
2013-07-01
In this paper, a thermomechanical constitutive model was developed for the time-dependent behaviors of the glass transition of amorphous networks. The model used multiple discrete relaxation processes to describe the distribution of relaxation times for stress relaxation, structural relaxation, and stress-activated viscous flow. A non-equilibrium thermodynamic framework based on the fictive temperature was introduced to demonstrate the thermodynamic consistency of the constitutive theory. Experimental and theoretical methods were developed to determine the parameters describing the distribution of stress and structural relaxation times and the dependence of the relaxation times on temperature, structure, and driving stress. The model was applied to study the effects of deformation temperatures and physical aging on the shape-memory behavior of amorphous networks. The model was able to reproduce important features of the partially constrained recovery response observed in experiments. Specifically, the model demonstrated a strain-recovery overshoot for cases programmed below Tg and subjected to a constant mechanical load. This phenomenon was not observed for materials programmed above Tg. Physical aging, in which the material was annealed for an extended period of time below Tg, shifted the activation of strain recovery to higher temperatures and increased significantly the initial recovery rate. For fixed-strain recovery, the model showed a larger overshoot in the stress response for cases programmed below Tg, which was consistent with previous experimental observations. Altogether, this work demonstrates how an understanding of the time-dependent behaviors of the glass transition can be used to tailor the temperature and deformation history of the shape-memory programming process to achieve more complex shape recovery pathways, faster recovery responses, and larger activation stresses.
Widmer, Yves F; Bilican, Adem; Bruggmann, Rémy; Sprecher, Simon G
2018-06-20
Memory formation is achieved by genetically tightly controlled molecular pathways that result in a change of synaptic strength and synapse organization. While for short-term memory traces rapidly acting biochemical pathways are in place, the formation of long-lasting memories requires changes in the transcriptional program of a cell. Although many genes involved in learning and memory formation have been identified, little is known about the genetic mechanisms required for changing the transcriptional program during different phases of long-term memory formation. With Drosophila melanogaster as a model system we profiled transcriptomic changes in the mushroom body, a memory center in the fly brain, at distinct time intervals during appetitive olfactory long-term memory formation using the targeted DamID technique. We describe the gene expression profiles during these phases and tested 33 selected candidate genes for deficits in long-term memory formation using RNAi knockdown. We identified 10 genes that enhance or decrease memory when knocked-down in the mushroom body. For vajk-1 and hacd1 , the two strongest hits, we gained further support for their crucial role in appetitive learning and forgetting. These findings show that profiling gene expression changes in specific cell-types harboring memory traces provides a powerful entry point to identify new genes involved in learning and memory. The presented transcriptomic data may further be used as resource to study genes acting at different memory phases. Copyright © 2018, Genetics.
Nakahara, Kiyoshi; Adachi, Ken; Kawasaki, Keisuke; Matsuo, Takeshi; Sawahata, Hirohito; Majima, Kei; Takeda, Masaki; Sugiyama, Sayaka; Nakata, Ryota; Iijima, Atsuhiko; Tanigawa, Hisashi; Suzuki, Takafumi; Kamitani, Yukiyasu; Hasegawa, Isao
2016-01-01
Highly localized neuronal spikes in primate temporal cortex can encode associative memory; however, whether memory formation involves area-wide reorganization of ensemble activity, which often accompanies rhythmicity, or just local microcircuit-level plasticity, remains elusive. Using high-density electrocorticography, we capture local-field potentials spanning the monkey temporal lobes, and show that the visual pair-association (PA) memory is encoded in spatial patterns of theta activity in areas TE, 36, and, partially, in the parahippocampal cortex, but not in the entorhinal cortex. The theta patterns elicited by learned paired associates are distinct between pairs, but similar within pairs. This pattern similarity, emerging through novel PA learning, allows a machine-learning decoder trained on theta patterns elicited by a particular visual item to correctly predict the identity of those elicited by its paired associate. Our results suggest that the formation and sharing of widespread cortical theta patterns via learning-induced reorganization are involved in the mechanisms of associative memory representation. PMID:27282247
Spiegel, M A; Koester, D; Weigelt, M; Schack, T
2012-02-16
How much cognitive effort does it take to change a movement plan? In previous studies, it has been shown that humans plan and represent actions in advance, but it remains unclear whether or not action planning and verbal working memory share cognitive resources. Using a novel experimental paradigm, we combined in two experiments a grasp-to-place task with a verbal working memory task. Participants planned a placing movement toward one of two target positions and subsequently encoded and maintained visually presented letters. Both experiments revealed that re-planning the intended action reduced letter recall performance; execution time, however, was not influenced by action modifications. The results of Experiment 2 suggest that the action's interference with verbal working memory arose during the planning rather than the execution phase of the movement. Together, our results strongly suggest that movement planning and verbal working memory share common cognitive resources. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Organ donation and transplantation-the Chennai experience in India.
Shroff, S; Rao, S; Kurian, G; Suresh, S
2007-04-01
Tamil Nadu has been at the forefront of medical care in the country. It was the first state in the country that started a living kidney transplant program. It is also the first state to successfully start the cadaver programme after the passing of the "Transplantation of Human Organ Act" of 1994 and in the last 5 years has formed a network between hospitals for organ sharing. From the year 2000 to 2006 an organ sharing network was started in Tamil Nadu and the facilitator of this programme has been a non-government organization called MOHAN (acronym for Multi Organ Harvesting Aid Network) Foundation. The organs shared during the period number over 460 organs in two regions (both Tamil Nadu and Hyderabad). In Tamil Nadu the shared organs have included 166 Kidneys, 24 livers, 6 hearts, and 180 eyes. In 2003 sharing network was initiated by MOHAN in Hyderabad and to some extent the Tamil Nadu model was duplicated. with some success and 96 cadaver organs have been transplanted in the last 3 years. There are many advantages of organ sharing including the cost economics. At present there is a large pool of brain dead patients who could become potential organ donors in the major cities in India. Their organs are not being utilized for various support logistics. A multi-pronged strategy is required for the long term success of this program. These years in Tamil Nadu have been the years of learning, un-learning and relearning and the program today has matured slowly into what can perhaps be evolved as an Indian model. In all these years there have been various difficulties in its implementation and some of the key elements for the success of the program is the need to educate our own medical fraternity and seek their cooperation. The program requires trained counselors to be able to work in the intensive cares. The government's support is pivotal if this program to provide benefit to the common man. MOHAN Foundation has accumulated considerable experience to be able to evolve a model to take this program to the national level and more so as it recently has been granted 100% tax exemption on all donations to form a countrywide network for organ sharing.
Cycle accurate and cycle reproducible memory for an FPGA based hardware accelerator
Asaad, Sameh W.; Kapur, Mohit
2016-03-15
A method, system and computer program product are disclosed for using a Field Programmable Gate Array (FPGA) to simulate operations of a device under test (DUT). The DUT includes a device memory having a number of input ports, and the FPGA is associated with a target memory having a second number of input ports, the second number being less than the first number. In one embodiment, a given set of inputs is applied to the device memory at a frequency Fd and in a defined cycle of time, and the given set of inputs is applied to the target memory at a frequency Ft. Ft is greater than Fd and cycle accuracy is maintained between the device memory and the target memory. In an embodiment, a cycle accurate model of the DUT memory is created by separating the DUT memory interface protocol from the target memory storage array.
ERIC Educational Resources Information Center
Chindgren, Tina M.
2005-01-01
The communities of practice model for knowledge sharing is examined in this conceptual paper. Key themes reflected in the literature--the linkage between knowledge and activity and the importance of relationships--are explored within the context of programs and practices within the National Aeronautics and Aerospace Agency (NASA) learning…
Synapsin Determines Memory Strength after Punishment- and Relief-Learning
Niewalda, Thomas; Michels, Birgit; Jungnickel, Roswitha; Diegelmann, Sören; Kleber, Jörg; Kähne, Thilo
2015-01-01
Adverse life events can induce two kinds of memory with opposite valence, dependent on timing: “negative” memories for stimuli preceding them and “positive” memories for stimuli experienced at the moment of “relief.” Such punishment memory and relief memory are found in insects, rats, and man. For example, fruit flies (Drosophila melanogaster) avoid an odor after odor-shock training (“forward conditioning” of the odor), whereas after shock-odor training (“backward conditioning” of the odor) they approach it. Do these timing-dependent associative processes share molecular determinants? We focus on the role of Synapsin, a conserved presynaptic phosphoprotein regulating the balance between the reserve pool and the readily releasable pool of synaptic vesicles. We find that a lack of Synapsin leaves task-relevant sensory and motor faculties unaffected. In contrast, both punishment memory and relief memory scores are reduced. These defects reflect a true lessening of associative memory strength, as distortions in nonassociative processing (e.g., susceptibility to handling, adaptation, habituation, sensitization), discrimination ability, and changes in the time course of coincidence detection can be ruled out as alternative explanations. Reductions in punishment- and relief-memory strength are also observed upon an RNAi-mediated knock-down of Synapsin, and are rescued both by acutely restoring Synapsin and by locally restoring it in the mushroom bodies of mutant flies. Thus, both punishment memory and relief memory require the Synapsin protein and in this sense share genetic and molecular determinants. We note that corresponding molecular commonalities between punishment memory and relief memory in humans would constrain pharmacological attempts to selectively interfere with excessive associative punishment memories, e.g., after traumatic experiences. PMID:25972175
Synapsin determines memory strength after punishment- and relief-learning.
Niewalda, Thomas; Michels, Birgit; Jungnickel, Roswitha; Diegelmann, Sören; Kleber, Jörg; Kähne, Thilo; Gerber, Bertram
2015-05-13
Adverse life events can induce two kinds of memory with opposite valence, dependent on timing: "negative" memories for stimuli preceding them and "positive" memories for stimuli experienced at the moment of "relief." Such punishment memory and relief memory are found in insects, rats, and man. For example, fruit flies (Drosophila melanogaster) avoid an odor after odor-shock training ("forward conditioning" of the odor), whereas after shock-odor training ("backward conditioning" of the odor) they approach it. Do these timing-dependent associative processes share molecular determinants? We focus on the role of Synapsin, a conserved presynaptic phosphoprotein regulating the balance between the reserve pool and the readily releasable pool of synaptic vesicles. We find that a lack of Synapsin leaves task-relevant sensory and motor faculties unaffected. In contrast, both punishment memory and relief memory scores are reduced. These defects reflect a true lessening of associative memory strength, as distortions in nonassociative processing (e.g., susceptibility to handling, adaptation, habituation, sensitization), discrimination ability, and changes in the time course of coincidence detection can be ruled out as alternative explanations. Reductions in punishment- and relief-memory strength are also observed upon an RNAi-mediated knock-down of Synapsin, and are rescued both by acutely restoring Synapsin and by locally restoring it in the mushroom bodies of mutant flies. Thus, both punishment memory and relief memory require the Synapsin protein and in this sense share genetic and molecular determinants. We note that corresponding molecular commonalities between punishment memory and relief memory in humans would constrain pharmacological attempts to selectively interfere with excessive associative punishment memories, e.g., after traumatic experiences. Copyright © 2015 Niewalda et al.
Audience tuning effects in the context of situated and embodied processes.
Semin, Gün R
2018-03-05
This review provides an overview of the research on communication and the 'Saying is Believing' paradigm in the context of different perspectives on communication. The process of 'audience tuning' is shaped by a variety of situated factors in contexts that affect the communicators' confidence in their message. The overwhelming common denominator is that the combination of features that create ambiguity yields the optimal condition for the formation of shared realities. I conclude with an argument that the implied invariance of memory processes in shared reality work needs to be more attentive to the regulatory function of memories driving the expression of shared realities. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Kajiyama, Shinya; Fujito, Masamichi; Kasai, Hideo; Mizuno, Makoto; Yamaguchi, Takanori; Shinagawa, Yutaka
A novel 300MHz embedded flash memory for dual-core microcontrollers with a shared ROM architecture is proposed. One of its features is a three-stage pipeline read operation, which enables reduced access pitch and therefore reduces performance penalty due to conflict of shared ROM accesses. Another feature is a highly sensitive sense amplifier that achieves efficient pipeline operation with two-cycle latency one-cycle pitch as a result of a shortened sense time of 0.63ns. The combination of the pipeline architecture and proposed sense amplifiers significantly reduces access-conflict penalties with shared ROM and enhances performance of 32-bit RISC dual-core microcontrollers by 30%.
CELLFS: TAKING THE "DMA" OUT OF CELL PROGRAMMING
DOE Office of Scientific and Technical Information (OSTI.GOV)
IONKOV, LATCHESAR A.; MIRTCHOVSKI, ANDREY A.; NYRHINEN, AKI M.
In this paper we present a new programming model for the Cell BE architecture of scalar multiprocessors. They call this programming model CellFS. CellFS aims at simplifying the task of managing I/O between the local store of the processing units and main memory. The CellFS support library provides the means for transferring data via simple file I/O operations between the PPU and the SPU.
Automatic selection of dynamic data partitioning schemes for distributed memory multicomputers
NASA Technical Reports Server (NTRS)
Palermo, Daniel J.; Banerjee, Prithviraj
1995-01-01
For distributed memory multicomputers such as the Intel Paragon, the IBM SP-2, the NCUBE/2, and the Thinking Machines CM-5, the quality of the data partitioning for a given application is crucial to obtaining high performance. This task has traditionally been the user's responsibility, but in recent years much effort has been directed to automating the selection of data partitioning schemes. Several researchers have proposed systems that are able to produce data distributions that remain in effect for the entire execution of an application. For complex programs, however, such static data distributions may be insufficient to obtain acceptable performance. The selection of distributions that dynamically change over the course of a program's execution adds another dimension to the data partitioning problem. In this paper, we present a technique that can be used to automatically determine which partitionings are most beneficial over specific sections of a program while taking into account the added overhead of performing redistribution. This system is being built as part of the PARADIGM (PARAllelizing compiler for DIstributed memory General-purpose Multicomputers) project at the University of Illinois. The complete system will provide a fully automated means to parallelize programs written in a serial programming model obtaining high performance on a wide range of distributed-memory multicomputers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Venkata, Manjunath Gorentla; Aderholdt, William F
The pre-exascale systems are expected to have a significant amount of hierarchical and heterogeneous on-node memory, and this trend of system architecture in extreme-scale systems is expected to continue into the exascale era. along with hierarchical-heterogeneous memory, the system typically has a high-performing network ad a compute accelerator. This system architecture is not only effective for running traditional High Performance Computing (HPC) applications (Big-Compute), but also for running data-intensive HPC applications and Big-Data applications. As a consequence, there is a growing desire to have a single system serve the needs of both Big-Compute and Big-Data applications. Though the system architecturemore » supports the convergence of the Big-Compute and Big-Data, the programming models and software layer have yet to evolve to support either hierarchical-heterogeneous memory systems or the convergence. A programming abstraction to address this problem. The programming abstraction is implemented as a software library and runs on pre-exascale and exascale systems supporting current and emerging system architecture. Using distributed data-structures as a central concept, it provides (1) a simple, usable, and portable abstraction for hierarchical-heterogeneous memory and (2) a unified programming abstraction for Big-Compute and Big-Data applications.« less
Adapting cultural mixture modeling for continuous measures of knowledge and memory fluency.
Tan, Yin-Yin Sarah; Mueller, Shane T
2016-09-01
Previous research (e.g., cultural consensus theory (Romney, Weller, & Batchelder, American Anthropologist, 88, 313-338, 1986); cultural mixture modeling (Mueller & Veinott, 2008)) has used overt response patterns (i.e., responses to questionnaires and surveys) to identify whether a group shares a single coherent attitude or belief set. Yet many domains in social science have focused on implicit attitudes that are not apparent in overt responses but still may be detected via response time patterns. We propose a method for modeling response times as a mixture of Gaussians, adapting the strong-consensus model of cultural mixture modeling to model this implicit measure of knowledge strength. We report the results of two behavioral experiments and one simulation experiment that establish the usefulness of the approach, as well as some of the boundary conditions under which distinct groups of shared agreement might be recovered, even when the group identity is not known. The results reveal that the ability to recover and identify shared-belief groups depends on (1) the level of noise in the measurement, (2) the differential signals for strong versus weak attitudes, and (3) the similarity between group attitudes. Consequently, the method shows promise for identifying latent groups among a population whose overt attitudes do not differ, but whose implicit or covert attitudes or knowledge may differ.
Cox, Gregory E; Hemmer, Pernille; Aue, William R; Criss, Amy H
2018-04-01
The development of memory theory has been constrained by a focus on isolated tasks rather than the processes and information that are common to situations in which memory is engaged. We present results from a study in which 453 participants took part in five different memory tasks: single-item recognition, associative recognition, cued recall, free recall, and lexical decision. Using hierarchical Bayesian techniques, we jointly analyzed the correlations between tasks within individuals-reflecting the degree to which tasks rely on shared cognitive processes-and within items-reflecting the degree to which tasks rely on the same information conveyed by the item. Among other things, we find that (a) the processes involved in lexical access and episodic memory are largely separate and rely on different kinds of information, (b) access to lexical memory is driven primarily by perceptual aspects of a word, (c) all episodic memory tasks rely to an extent on a set of shared processes which make use of semantic features to encode both single words and associations between words, and (d) recall involves additional processes likely related to contextual cuing and response production. These results provide a large-scale picture of memory across different tasks which can serve to drive the development of comprehensive theories of memory. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Review: Leon N. Cooper's Science and Human Experience: Values, Culture, and the Mind.
Lynch, Gary S
2015-01-01
Why are we reviewing a book written by someone who shared in the 1972 Nobel Prize in Physics for work on superconductivity? Because shortly after winning the prize, Leon N. Cooper transitioned into brain research-specifically, the biological basis of memory. He became director of the Brown University Institute for Brain and Neural Systems, whose interdisciplinary program allowed him to integrate research on the brain, physics, and even philosophy. His new book tackles a diverse spectrum of topics and questions, including these: Does science have limits? Where does order come from? Can we understand consciousness?
Review: Leon N. Cooper’s Science and Human Experience: Values, Culture, and the Mind
Lynch, Gary S.
2015-01-01
Why are we reviewing a book written by someone who shared in the 1972 Nobel Prize in Physics for work on superconductivity? Because shortly after winning the prize, Leon N. Cooper transitioned into brain research—specifically, the biological basis of memory. He became director of the Brown University Institute for Brain and Neural Systems, whose interdisciplinary program allowed him to integrate research on the brain, physics, and even philosophy. His new book tackles a diverse spectrum of topics and questions, including these: Does science have limits? Where does order come from? Can we understand consciousness? PMID:27358665
Parallel Gaussian elimination of a block tridiagonal matrix using multiple microcomputers
NASA Technical Reports Server (NTRS)
Blech, Richard A.
1989-01-01
The solution of a block tridiagonal matrix using parallel processing is demonstrated. The multiprocessor system on which results were obtained and the software environment used to program that system are described. Theoretical partitioning and resource allocation for the Gaussian elimination method used to solve the matrix are discussed. The results obtained from running 1, 2 and 3 processor versions of the block tridiagonal solver are presented. The PASCAL source code for these solvers is given in the appendix, and may be transportable to other shared memory parallel processors provided that the synchronization outlines are reproduced on the target system.
Test program for 4-K memory card, JOLT microprocessor
NASA Technical Reports Server (NTRS)
Lilley, R. W.
1976-01-01
A memory test program is described for use with the JOLT microcomputer 4,096-word memory board used in development of an Omega navigation receiver. The program allows a quick test of the memory board by cycling the memory through all possible bit combinations in all words.
Computer architecture evaluation for structural dynamics computations: Project summary
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1989-01-01
The intent of the proposed effort is the examination of the impact of the elements of parallel architectures on the performance realized in a parallel computation. To this end, three major projects are developed: a language for the expression of high level parallelism, a statistical technique for the synthesis of multicomputer interconnection networks based upon performance prediction, and a queueing model for the analysis of shared memory hierarchies.
Medical Music Therapy: A Model Program for Clinical Practice, Education, Training and Research
ERIC Educational Resources Information Center
Standley, Jayne
2005-01-01
This monograph evolved from the unique, innovative partnership between the Florida State University Music Therapy Program and Tallahassee Memorial HealthCare. Its purpose is to serve as a model for music therapy educators, students, clinicians, and the hospital administrators who might employ them. This book should prove a valuable resource for…
ERIC Educational Resources Information Center
Olivers, Christian N. L.
2009-01-01
An important question is whether visual attention (the ability to select relevant visual information) and visual working memory (the ability to retain relevant visual information) share the same content representations. Some past research has indicated that they do: Singleton distractors interfered more strongly with a visual search task when they…
Principe, Gabrielle F.; Schindewolf, Erica
2012-01-01
Research on factors that can affect the accuracy of children’s autobiographical remembering has important implications for understanding the abilities of young witnesses to provide legal testimony. In this article, we review our own recent research on one factor that has much potential to induce errors in children’s event recall, namely natural memory sharing conversations with peers and parents. Our studies provide compelling evidence that not only can the content of conversations about the past intrude into later memory but that such exchanges can prompt the generation of entirely false narratives that are more detailed than true accounts of experienced events. Further, our work show that deeper and more creative participation in memory sharing dialogues can boost the damaging effects of conversationally conveyed misinformation. Implications of this collection of findings for children’s testimony are discussed. PMID:23129880
On nonlinear finite element analysis in single-, multi- and parallel-processors
NASA Technical Reports Server (NTRS)
Utku, S.; Melosh, R.; Islam, M.; Salama, M.
1982-01-01
Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
Blocksome, Michael A.; Mamidala, Amith R.
2013-09-03
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Blocksome, Michael A; Mamidala, Amith R
2014-02-11
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Chen, Yousu; Wu, Di
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Decomposing the relation between Rapid Automatized Naming (RAN) and reading ability.
Arnell, Karen M; Joanisse, Marc F; Klein, Raymond M; Busseri, Michael A; Tannock, Rosemary
2009-09-01
The Rapid Automatized Naming (RAN) test involves rapidly naming sequences of items presented in a visual array. RAN has generated considerable interest because RAN performance predicts reading achievement. This study sought to determine what elements of RAN are responsible for the shared variance between RAN and reading performance using a series of cognitive tasks and a latent variable modelling approach. Participants performed RAN measures, a test of reading speed and comprehension, and six tasks, which tapped various hypothesised components of the RAN. RAN shared 10% of the variance with reading comprehension and 17% with reading rate. Together, the decomposition tasks explained 52% and 39% of the variance shared between RAN and reading comprehension and between RAN and reading rate, respectively. Significant predictors suggested that working memory encoding underlies part of the relationship between RAN and reading ability.
Parallel program debugging with flowback analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Choi, Jongdeok.
1989-01-01
This thesis describes the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors. The goal of the debugging system is to present to the programmer a graphical view of the dynamic program dependences while keeping the execution-time overhead low. The author first describes the use of flowback analysis to provide information on causal relationship between events in a programs' execution without re-executing the program for debugging. Execution time overhead is kept low by recording only a small amount of trace during a program's execution. He uses semantic analysis and a technique called incrementalmore » tracing to keep the time and space overhead low. As part of the semantic analysis, he uses a static program dependence graph structure that reduces the amount of work done at compile time and takes advantage of the dynamic information produced during execution time. The cornerstone of the incremental tracing concept is to generate a coarse trace during execution and fill incrementally, during the interactive portion of the debugging session, the gap between the information gathered in the coarse trace and the information needed to do the flowback analysis using the coarse trace. Then, he describes how to extend the flowback analysis to parallel programs. The flowback analysis can span process boundaries; i.e., the most recent modification to a shared variable might be traced to a different process than the one that contains the current reference. The static and dynamic program dependence graphs of the individual processes are tied together with synchronization and data dependence information to form complete graphs that represent the entire program.« less
2017-11-15
This major final rule addresses changes to the Medicare physician fee schedule (PFS) and other Medicare Part B payment policies such as changes to the Medicare Shared Savings Program, to ensure that our payment systems are updated to reflect changes in medical practice and the relative value of services, as well as changes in the statute. In addition, this final rule includes policies necessary to begin offering the expanded Medicare Diabetes Prevention Program model.
Légaré, France; Moumjid-Ferdjaoui, Nora; Drolet, Renée; Stacey, Dawn; Härter, Martin; Bastian, Hilda; Beaulieu, Marie-Dominique; Borduas, Francine; Charles, Cathy; Coulter, Angela; Desroches, Sophie; Friedrich, Gwendolyn; Gafni, Amiram; Graham, Ian D; Labrecque, Michel; LeBlanc, Annie; Légaré, Jean; Politi, Mary; Sargeant, Joan; Thomson, Richard
2013-01-01
Shared decision making is now making inroads in health care professionals' continuing education curriculum, but there is no consensus on what core competencies are required by clinicians for effectively involving patients in health-related decisions. Ready-made programs for training clinicians in shared decision making are in high demand, but existing programs vary widely in their theoretical foundations, length, and content. An international, interdisciplinary group of 25 individuals met in 2012 to discuss theoretical approaches to making health-related decisions, compare notes on existing programs, take stock of stakeholders concerns, and deliberate on core competencies. This article summarizes the results of those discussions. Some participants believed that existing models already provide a sufficient conceptual basis for developing and implementing shared decision making competency-based training programs on a wide scale. Others argued that this would be premature as there is still no consensus on the definition of shared decision making or sufficient evidence to recommend specific competencies for implementing shared decision making. However, all participants agreed that there were 2 broad types of competencies that clinicians need for implementing shared decision making: relational competencies and risk communication competencies. Further multidisciplinary research could broaden and deepen our understanding of core competencies for shared decision making training. Copyright © 2013 The Alliance for Continuing Education in the Health Professions, the Society for Academic Continuing Medical Education, and the Council on CME, Association for Hospital Medical Education.