Sample records for openmp memory model

  1. Support of Multidimensional Parallelism in the OpenMP Programming Model

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele

    2003-01-01

    OpenMP is the current standard for shared-memory programming. While providing ease of parallel programming, the OpenMP programming model also has limitations which often effect the scalability of applications. Examples for these limitations are work distribution and point-to-point synchronization among threads. We propose extensions to the OpenMP programming model which allow the user to easily distribute the work in multiple dimensions and synchronize the workflow among the threads. The proposed extensions include four new constructs and the associated runtime library. They do not require changes to the source code and can be implemented based on the existing OpenMP standard. We illustrate the concept in a prototype translator and test with benchmark codes and a cloud modeling code.

  2. A Programming Model Performance Study Using the NAS Parallel Benchmarks

    DOE PAGES

    Shan, Hongzhang; Blagojević, Filip; Min, Seung-Jai; ...

    2010-01-01

    Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the threemore » programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors. We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.« less

  3. Early Experiences Writing Performance Portable OpenMP 4 Codes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Joubert, Wayne; Hernandez, Oscar R

    In this paper, we evaluate the recently available directives in OpenMP 4 to parallelize a computational kernel using both the traditional shared memory approach and the newer accelerator targeting capabilities. In addition, we explore various transformations that attempt to increase application performance portability, and examine the expressiveness and performance implications of using these approaches. For example, we want to understand if the target map directives in OpenMP 4 improve data locality when mapped to a shared memory system, as opposed to the traditional first touch policy approach in traditional OpenMP. To that end, we use recent Cray and Intel compilersmore » to measure the performance variations of a simple application kernel when executed on the OLCF s Titan supercomputer with NVIDIA GPUs and the Beacon system with Intel Xeon Phi accelerators attached. To better understand these trade-offs, we compare our results from traditional OpenMP shared memory implementations to the newer accelerator programming model when it is used to target both the CPU and an attached heterogeneous device. We believe the results and lessons learned as presented in this paper will be useful to the larger user community by providing guidelines that can assist programmers in the development of performance portable code.« less

  4. Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mishra, Alok; Li, Lingda; Kong, Martin

    Here, the latest OpenMP standard offers automatic device offloading capabilities which facilitate GPU programming. Despite this, there remain many challenges. One of these is the unified memory feature introduced in recent GPUs. GPUs in current and future HPC systems have enhanced support for unified memory space. In such systems, CPU and GPU can access each other's memory transparently, that is, the data movement is managed automatically by the underlying system software and hardware. Memory over subscription is also possible in these systems. However, there is a significant lack of knowledge about how this mechanism will perform, and how programmers shouldmore » use it. We have modified several benchmarks codes, in the Rodinia benchmark suite, to study the behavior of OpenMP accelerator extensions and have used them to explore the impact of unified memory in an OpenMP context. We moreover modified the open source LLVM compiler to allow OpenMP programs to exploit unified memory. The results of our evaluation reveal that, while the performance of unified memory is comparable with that of normal GPU offloading for benchmarks with little data reuse, it suffers from significant overhead when GPU memory is over subcribed for benchmarks with large amount of data reuse. Based on these results, we provide several guidelines for programmers to achieve better performance with unified memory.« less

  5. A ROSE-based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liao, C; Quinlan, D; Panas, T

    2010-01-25

    OpenMP is a popular and evolving programming model for shared-memory platforms. It relies on compilers for optimal performance and to target modern hardware architectures. A variety of extensible and robust research compilers are key to OpenMP's sustainable success in the future. In this paper, we present our efforts to build an OpenMP 3.0 research compiler for C, C++, and Fortran; using the ROSE source-to-source compiler framework. Our goal is to support OpenMP research for ourselves and others. We have extended ROSE's internal representation to handle all of the OpenMP 3.0 constructs and facilitate their manipulation. Since OpenMP research is oftenmore » complicated by the tight coupling of the compiler translations and the runtime system, we present a set of rules to define a common OpenMP runtime library (XOMP) on top of multiple runtime libraries. These rules additionally define how to build a set of translations targeting XOMP. Our work demonstrates how to reuse OpenMP translations across different runtime libraries. This work simplifies OpenMP research by decoupling the problematic dependence between the compiler translations and the runtime libraries. We present an evaluation of our work by demonstrating an analysis tool for OpenMP correctness. We also show how XOMP can be defined using both GOMP and Omni and present comparative performance results against other OpenMP compilers.« less

  6. What Multilevel Parallel Programs do when you are not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Labarta, Jesus; Gimenez, Judit

    2004-01-01

    With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.

  7. The OpenMP Implementation of NAS Parallel Benchmarks and its Performance

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry

    1999-01-01

    As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.

  8. Improvement and speed optimization of numerical tsunami modelling program using OpenMP technology

    NASA Astrophysics Data System (ADS)

    Chernov, A.; Zaytsev, A.; Yalciner, A.; Kurkin, A.

    2009-04-01

    Currently, the basic problem of tsunami modeling is low speed of calculations which is unacceptable for services of the operative notification. Existing algorithms of numerical modeling of hydrodynamic processes of tsunami waves are developed without taking the opportunities of modern computer facilities. There is an opportunity to have considerable acceleration of process of calculations by using parallel algorithms. We discuss here new approach to parallelization tsunami modeling code using OpenMP Technology (for multiprocessing systems with the general memory). Nowadays, multiprocessing systems are easily accessible for everyone. The cost of the use of such systems becomes much lower comparing to the costs of clusters. This opportunity also benefits all programmers to apply multithreading algorithms on desktop computers of researchers. Other important advantage of the given approach is the mechanism of the general memory - there is no necessity to send data on slow networks (for example Ethernet). All memory is the common for all computing processes; it causes almost linear scalability of the program and processes. In the new version of NAMI DANCE using OpenMP technology and multi-threading algorithm provide 80% gain in speed in comparison with the one-thread version for dual-processor unit. The speed increased and 320% gain was attained for four core processor unit of PCs. Thus, it was possible to reduce considerably time of performance of calculations on the scientific workstations (desktops) without complete change of the program and user interfaces. The further modernization of algorithms of preparation of initial data and processing of results using OpenMP looks reasonable. The final version of NAMI DANCE with the increased computational speed can be used not only for research purposes but also in real time Tsunami Warning Systems.

  9. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barbara Chapman

    OpenMP was not well recognized at the beginning of the project, around year 2003, because of its limited use in DoE production applications and the inmature hardware support for an efficient implementation. Yet in the recent years, it has been graduately adopted both in HPC applications, mostly in the form of MPI+OpenMP hybrid code, and in mid-scale desktop applications for scientific and experimental studies. We have observed this trend and worked deligiently to improve our OpenMP compiler and runtimes, as well as to work with the OpenMP standard organization to make sure OpenMP are evolved in the direction close tomore » DoE missions. In the Center for Programming Models for Scalable Parallel Computing project, the HPCTools team at the University of Houston (UH), directed by Dr. Barbara Chapman, has been working with project partners, external collaborators and hardware vendors to increase the scalability and applicability of OpenMP for multi-core (and future manycore) platforms and for distributed memory systems by exploring different programming models, language extensions, compiler optimizations, as well as runtime library support.« less

  10. Implementing Shared Memory Parallelism in MCBEND

    NASA Astrophysics Data System (ADS)

    Bird, Adam; Long, David; Dobson, Geoff

    2017-09-01

    MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.

  11. Archer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Atzeni, Simone; Ahn, Dong; Gopalakrishnan, Ganesh

    2017-01-12

    Archer is built on top of the LLVM/Clang compilers that support OpenMP. It applies static and dynamic analysis techniques to detect data races in OpenMP programs generating a very low runtime and memory overhead. Static analyses identify data race free OpenMP regions and exclude them from runtime analysis, which is performed by ThreadSanitizer included in LLVM/Clang.

  12. The Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

    NASA Technical Reports Server (NTRS)

    Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)

    2001-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.

  13. Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster

    NASA Technical Reports Server (NTRS)

    Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland

    2003-01-01

    In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.

  14. Clomp

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gylenhaal, J.; Bronevetsky, G.

    2007-05-25

    CLOMP is the C version of the Livermore OpenMP benchmark deeloped to measure OpenMP overheads and other performance impacts due to threading (like NUMA memory layouts, memory contention, cache effects, etc.) in order to influence future system design. Current best-in-class implementations of OpenMP have overheads at least ten times larger than is required by many of our applications for effective use of OpenMP. This benchmark shows the significant negative performance impact of these relatively large overheads and of other thread effects. The CLOMP benchmark highly configurable to allow a variety of problem sizes and threading effects to be studied andmore » it carefully checks its results to catch many common threading errors. This benchmark is expected to be included as part of the Sequoia Benchmark suite for the Sequoia procurement.« less

  15. Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

    2000-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.

  16. Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster

    NASA Technical Reports Server (NTRS)

    Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.

  17. Testing New Programming Paradigms with NAS Parallel Benchmarks

    NASA Technical Reports Server (NTRS)

    Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

    2000-01-01

    Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.

  18. Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

    NASA Technical Reports Server (NTRS)

    Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)

    2000-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.

  19. Toward Enhancing OpenMP's Work-Sharing Directives

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chapman, B M; Huang, L; Jin, H

    2006-05-17

    OpenMP provides a portable programming interface for shared memory parallel computers (SMPs). Although this interface has proven successful for small SMPs, it requires greater flexibility in light of the steadily growing size of individual SMPs and the recent advent of multithreaded chips. In this paper, we describe two application development experiences that exposed these expressivity problems in the current OpenMP specification. We then propose mechanisms to overcome these limitations, including thread subteams and thread topologies. Thus, we identify language features that improve OpenMP application performance on emerging and large-scale platforms while preserving ease of programming.

  20. MPI, HPF or OpenMP: A Study with the NAS Benchmarks

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Hribar, Michelle; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

    1999-01-01

    Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but the task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study,potentials of applying some of the techniques to realistic aerospace applications will be presented

  1. MPI, HPF or OpenMP: A Study with the NAS Benchmarks

    NASA Technical Reports Server (NTRS)

    Jin, H.; Frumkin, M.; Hribar, M.; Waheed, A.; Yan, J.; Saini, Subhash (Technical Monitor)

    1999-01-01

    Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but this task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study, we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study, potentials of applying some of the techniques to realistic aerospace applications will be presented.

  2. Characterizing Task-Based OpenMP Programs

    PubMed Central

    Muddukrishna, Ananya; Jonsson, Peter A.; Brorsson, Mats

    2015-01-01

    Programmers struggle to understand performance of task-based OpenMP programs since profiling tools only report thread-based performance. Performance tuning also requires task-based performance in order to balance per-task memory hierarchy utilization against exposed task parallelism. We provide a cost-effective method to extract detailed task-based performance information from OpenMP programs. We demonstrate the utility of our method by quickly diagnosing performance problems and characterizing exposed task parallelism and per-task instruction profiles of benchmarks in the widely-used Barcelona OpenMP Tasks Suite. Programmers can tune performance faster and understand performance tradeoffs more effectively than existing tools by using our method to characterize task-based performance. PMID:25860023

  3. Hierarchical resilience with lightweight threads.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wheeler, Kyle Bruce

    2011-10-01

    This paper proposes methodology for providing robustness and resilience for a highly threaded distributed- and shared-memory environment based on well-defined inputs and outputs to lightweight tasks. These inputs and outputs form a failure 'barrier', allowing tasks to be restarted or duplicated as necessary. These barriers must be expanded based on task behavior, such as communication between tasks, but do not prohibit any given behavior. One of the trends in high-performance computing codes seems to be a trend toward self-contained functions that mimic functional programming. Software designers are trending toward a model of software design where their core functions are specifiedmore » in side-effect free or low-side-effect ways, wherein the inputs and outputs of the functions are well-defined. This provides the ability to copy the inputs to wherever they need to be - whether that's the other side of the PCI bus or the other side of the network - do work on that input using local memory, and then copy the outputs back (as needed). This design pattern is popular among new distributed threading environment designs. Such designs include the Barcelona STARS system, distributed OpenMP systems, the Habanero-C and Habanero-Java systems from Vivek Sarkar at Rice University, the HPX/ParalleX model from LSU, as well as our own Scalable Parallel Runtime effort (SPR) and the Trilinos stateless kernels. This design pattern is also shared by CUDA and several OpenMP extensions for GPU-type accelerators (e.g. the PGI OpenMP extensions).« less

  4. CLOMP v1.5

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gyllenhaal, J.

    CLOMP is the C version of the Livermore OpenMP benchmark developed to measure OpenMP overheads and other performance impacts due to threading. For simplicity, it does not use MPI by default but it is expected to be run on the resources a threaded MPI task would use (e.g., a portion of a shared memory compute node). Compiling with -DWITH_MPI allows packing one or more nodes with CLOMP tasks and having CLOMP report OpenMP performance for the slowest MPI task. On current systems, the strong scaling performance results for 4, 8, or 16 threads are of the most interest. Suggested weakmore » scaling inputs are provided for evaluating future systems. Since MPI is often used to place at least one MPI task per coherence or NUMA domain, it is recommended to focus OpenMP runtime measurements on a subset of node hardware where it is most possible to have low OpenMP overheads (e.g., within one coherence domain or NUMA domain).« less

  5. A portable approach for PIC on emerging architectures

    NASA Astrophysics Data System (ADS)

    Decyk, Viktor

    2016-03-01

    A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.

  6. Lattice QCD simulations using the OpenACC platform

    NASA Astrophysics Data System (ADS)

    Majumdar, Pushan

    2016-10-01

    In this article we will explore the OpenACC platform for programming Graphics Processing Units (GPUs). The OpenACC platform offers a directive based programming model for GPUs which avoids the detailed data flow control and memory management necessary in a CUDA programming environment. In the OpenACC model, programs can be written in high level languages with OpenMP like directives. We present some examples of QCD simulation codes using OpenACC and discuss their performance on the Fermi and Kepler GPUs.

  7. Rambrain - a library for virtually extending physical memory

    NASA Astrophysics Data System (ADS)

    Imgrund, Maximilian; Arth, Alexander

    2017-08-01

    We introduce Rambrain, a user space library that manages memory consumption of your code. Using Rambrain you can overcommit memory over the size of physical memory present in the system. Rambrain takes care of temporarily swapping out data to disk and can handle multiples of the physical memory size present. Rambrain is thread-safe, OpenMP and MPI compatible and supports Asynchronous IO. The library was designed to require minimal changes to existing programs and to be easy to use.

  8. Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

    DOE PAGES

    Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...

    2013-01-01

    Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less

  9. On the Performance of an Algebraic MultigridSolver on Multicore Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baker, A H; Schulz, M; Yang, U M

    2010-04-29

    Algebraic multigrid (AMG) solvers have proven to be extremely efficient on distributed-memory architectures. However, when executed on modern multicore cluster architectures, we face new challenges that can significantly harm AMG's performance. We discuss our experiences on such an architecture and present a set of techniques that help users to overcome the associated problems, including thread and process pinning and correct memory associations. We have implemented most of the techniques in a MultiCore SUPport library (MCSup), which helps to map OpenMP applications to multicore machines. We present results using both an MPI-only and a hybrid MPI/OpenMP model.

  10. OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms

    DOE PAGES

    Meng, Zhaoyi; Koniges, Alice; He, Yun Helen; ...

    2016-09-21

    In this paper, we investigate the OpenMP parallelization and optimization of two novel data classification algorithms. The new algorithms are based on graph and PDE solution techniques and provide significant accuracy and performance advantages over traditional data classification algorithms in serial mode. The methods leverage the Nystrom extension to calculate eigenvalue/eigenvectors of the graph Laplacian and this is a self-contained module that can be used in conjunction with other graph-Laplacian based methods such as spectral clustering. We use performance tools to collect the hotspots and memory access of the serial codes and use OpenMP as the parallelization language to parallelizemore » the most time-consuming parts. Where possible, we also use library routines. We then optimize the OpenMP implementations and detail the performance on traditional supercomputer nodes (in our case a Cray XC30), and test the optimization steps on emerging testbed systems based on Intel’s Knights Corner and Landing processors. We show both performance improvement and strong scaling behavior. Finally, a large number of optimization techniques and analyses are necessary before the algorithm reaches almost ideal scaling.« less

  11. OpenMP parallelization of a gridded SWAT (SWATG)

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin

    2017-12-01

    Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.

  12. Optics Program Modified for Multithreaded Parallel Computing

    NASA Technical Reports Server (NTRS)

    Lou, John; Bedding, Dave; Basinger, Scott

    2006-01-01

    A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.

  13. Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shan, Hongzhang; Williams, Samuel; Jong, Wibe de

    In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in tt native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant effort was required to safely and efficiently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI OpenMP hybrid implementations attain up to 65x better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6x better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less

  14. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns

    DOE PAGES

    Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel

    2014-07-22

    The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. We found that a major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diversemore » manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. Furthermore, the Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address a growing list of applications and domain libraries.« less

  15. Computer-Aided Parallelizer and Optimizer

    NASA Technical Reports Server (NTRS)

    Jin, Haoqiang

    2011-01-01

    The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

  16. What Scientific Applications can Benefit from Hardware Transactional Memory?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schindewolf, M; Bihari, B; Gyllenhaal, J

    2012-06-04

    Achieving efficient and correct synchronization of multiple threads is a difficult and error-prone task at small scale and, as we march towards extreme scale computing, will be even more challenging when the resulting application is supposed to utilize millions of cores efficiently. Transactional Memory (TM) is a promising technique to ease the burden on the programmer, but only recently has become available on commercial hardware in the new Blue Gene/Q system and hence the real benefit for realistic applications has not been studied, yet. This paper presents the first performance results of TM embedded into OpenMP on a prototype systemmore » of BG/Q and characterizes code properties that will likely lead to benefits when augmented with TM primitives. We first, study the influence of thread count, environment variables and memory layout on TM performance and identify code properties that will yield performance gains with TM. Second, we evaluate the combination of OpenMP with multiple synchronization primitives on top of MPI to determine suitable task to thread ratios per node. Finally, we condense our findings into a set of best practices. These are applied to a Monte Carlo Benchmark and a Smoothed Particle Hydrodynamics method. In both cases an optimized TM version, executed with 64 threads on one node, outperforms a simple TM implementation. MCB with optimized TM yields a speedup of 27.45 over baseline.« less

  17. Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

    NASA Astrophysics Data System (ADS)

    Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

    2015-09-01

    The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.

  18. Shared Memory Parallelization of an Implicit ADI-type CFD Code

    NASA Technical Reports Server (NTRS)

    Hauser, Th.; Huang, P. G.

    1999-01-01

    A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.

  19. Performance Study of Monte Carlo Codes on Xeon Phi Coprocessors — Testing MCNP 6.1 and Profiling ARCHER Geometry Module on the FS7ONNi Problem

    NASA Astrophysics Data System (ADS)

    Liu, Tianyu; Wolfe, Noah; Lin, Hui; Zieb, Kris; Ji, Wei; Caracappa, Peter; Carothers, Christopher; Xu, X. George

    2017-09-01

    This paper contains two parts revolving around Monte Carlo transport simulation on Intel Many Integrated Core coprocessors (MIC, also known as Xeon Phi). (1) MCNP 6.1 was recompiled into multithreading (OpenMP) and multiprocessing (MPI) forms respectively without modification to the source code. The new codes were tested on a 60-core 5110P MIC. The test case was FS7ONNi, a radiation shielding problem used in MCNP's verification and validation suite. It was observed that both codes became slower on the MIC than on a 6-core X5650 CPU, by a factor of 4 for the MPI code and, abnormally, 20 for the OpenMP code, and both exhibited limited capability of strong scaling. (2) We have recently added a Constructive Solid Geometry (CSG) module to our ARCHER code to provide better support for geometry modelling in radiation shielding simulation. The functions of this module are frequently called in the particle random walk process. To identify the performance bottleneck we developed a CSG proxy application and profiled the code using the geometry data from FS7ONNi. The profiling data showed that the code was primarily memory latency bound on the MIC. This study suggests that despite low initial porting e_ort, Monte Carlo codes do not naturally lend themselves to the MIC platform — just like to the GPUs, and that the memory latency problem needs to be addressed in order to achieve decent performance gain.

  20. Parallel protein secondary structure prediction based on neural networks.

    PubMed

    Zhong, Wei; Altun, Gulsah; Tian, Xinmin; Harrison, Robert; Tai, Phang C; Pan, Yi

    2004-01-01

    Protein secondary structure prediction has a fundamental influence on today's bioinformatics research. In this work, binary and tertiary classifiers of protein secondary structure prediction are implemented on Denoeux belief neural network (DBNN) architecture. Hydrophobicity matrix, orthogonal matrix, BLOSUM62 and PSSM (position specific scoring matrix) are experimented separately as the encoding schemes for DBNN. The experimental results contribute to the design of new encoding schemes. New binary classifier for Helix versus not Helix ( approximately H) for DBNN produces prediction accuracy of 87% when PSSM is used for the input profile. The performance of DBNN binary classifier is comparable to other best prediction methods. The good test results for binary classifiers open a new approach for protein structure prediction with neural networks. Due to the time consuming task of training the neural networks, Pthread and OpenMP are employed to parallelize DBNN in the hyperthreading enabled Intel architecture. Speedup for 16 Pthreads is 4.9 and speedup for 16 OpenMP threads is 4 in the 4 processors shared memory architecture. Both speedup performance of OpenMP and Pthread is superior to that of other research. With the new parallel training algorithm, thousands of amino acids can be processed in reasonable amount of time. Our research also shows that hyperthreading technology for Intel architecture is efficient for parallel biological algorithms.

  1. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

    PubMed

    Shrimankar, D D; Sathe, S R

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.

  2. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

    PubMed Central

    Shrimankar, D. D.; Sathe, S. R.

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868

  3. OpenMP 4.5 Validation and Verification Suite

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pophale, Swaroop S; Bernholdt, David E; Hernandez, Oscar R

    2017-12-15

    OpenMP, a directive-based programming API, introduce directives for accelerator devices that programmers are starting to use more frequently in production codes. To make sure OpenMP directives work correctly across architectures, it is critical to have a mechanism that tests for an implementation's conformance to the OpenMP standard. This testing process can uncover ambiguities in the OpenMP specification, which helps compiler developers and users make a better use of the standard. We fill this gap through our validation and verification test suite that focuses on the offload directives available in OpenMP 4.5.

  4. Roofline model toolkit: A practical tool for architectural and program analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, Yu Jung; Williams, Samuel; Van Straalen, Brian

    We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measuremore » sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.« less

  5. Using Intel's Knight Landing Processor to Accelerate Global Nested Air Quality Prediction Modeling System (GNAQPMS) Model

    NASA Astrophysics Data System (ADS)

    Wang, H.; Chen, H.; Chen, X.; Wu, Q.; Wang, Z.

    2016-12-01

    The Global Nested Air Quality Prediction Modeling System for Hg (GNAQPMS-Hg) is a global chemical transport model coupled Hg transport module to investigate the mercury pollution. In this study, we present our work of transplanting the GNAQPMS model on Intel Xeon Phi processor, Knights Landing (KNL) to accelerate the model. KNL is the second-generation product adopting Many Integrated Core Architecture (MIC) architecture. Compared with the first generation Knight Corner (KNC), KNL has more new hardware features, that it can be used as unique processor as well as coprocessor with other CPU. According to the Vtune tool, the high overhead modules in GNAQPMS model have been addressed, including CBMZ gas chemistry, advection and convection module, and wet deposition module. These high overhead modules were accelerated by optimizing code and using new techniques of KNL. The following optimized measures was done: 1) Changing the pure MPI parallel mode to hybrid parallel mode with MPI and OpenMP; 2.Vectorizing the code to using the 512-bit wide vector computation unit. 3. Reducing unnecessary memory access and calculation. 4. Reducing Thread Local Storage (TLS) for common variables with each OpenMP thread in CBMZ. 5. Changing the way of global communication from files writing and reading to MPI functions. After optimization, the performance of GNAQPMS is greatly increased both on CPU and KNL platform, the single-node test showed that optimized version has 2.6x speedup on two sockets CPU platform and 3.3x speedup on one socket KNL platform compared with the baseline version code, which means the KNL has 1.29x speedup when compared with 2 sockets CPU platform.

  6. spammpack, Version 2013-06-18

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2014-01-17

    This library is an implementation of the Sparse Approximate Matrix Multiplication (SpAMM) algorithm introduced. It provides a matrix data type, and an approximate matrix product, which exhibits linear scaling computational complexity for matrices with decay. The product error and the performance of the multiply can be tuned by choosing an appropriate tolerance. The library can be compiled for serial execution or parallel execution on shared memory systems with an OpenMP capable compiler

  7. Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

    NASA Technical Reports Server (NTRS)

    Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

    2016-01-01

    In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.

  8. Parallelization of interpolation, solar radiation and water flow simulation modules in GRASS GIS using OpenMP

    NASA Astrophysics Data System (ADS)

    Hofierka, Jaroslav; Lacko, Michal; Zubal, Stanislav

    2017-10-01

    In this paper, we describe the parallelization of three complex and computationally intensive modules of GRASS GIS using the OpenMP application programming interface for multi-core computers. These include the v.surf.rst module for spatial interpolation, the r.sun module for solar radiation modeling and the r.sim.water module for water flow simulation. We briefly describe the functionality of the modules and parallelization approaches used in the modules. Our approach includes the analysis of the module's functionality, identification of source code segments suitable for parallelization and proper application of OpenMP parallelization code to create efficient threads processing the subtasks. We document the efficiency of the solutions using the airborne laser scanning data representing land surface in the test area and derived high-resolution digital terrain model grids. We discuss the performance speed-up and parallelization efficiency depending on the number of processor threads. The study showed a substantial increase in computation speeds on a standard multi-core computer while maintaining the accuracy of results in comparison to the output from original modules. The presented parallelization approach showed the simplicity and efficiency of the parallelization of open-source GRASS GIS modules using OpenMP, leading to an increased performance of this geospatial software on standard multi-core computers.

  9. Use of general purpose graphics processing units with MODFLOW

    USGS Publications Warehouse

    Hughes, Joseph D.; White, Jeremy T.

    2013-01-01

    To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.

  10. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.

  11. On a model of three-dimensional bursting and its parallel implementation

    NASA Astrophysics Data System (ADS)

    Tabik, S.; Romero, L. F.; Garzón, E. M.; Ramos, J. I.

    2008-04-01

    A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables.

  12. Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jin, Shuangshuang; Chen, Yousu; Wu, Di

    2015-12-09

    Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less

  13. GPU acceleration of a petascale application for turbulent mixing at high Schmidt number using OpenMP 4.5

    NASA Astrophysics Data System (ADS)

    Clay, M. P.; Buaria, D.; Yeung, P. K.; Gotoh, T.

    2018-07-01

    This paper reports on the successful implementation of a massively parallel GPU-accelerated algorithm for the direct numerical simulation of turbulent mixing at high Schmidt number. The work stems from a recent development (Comput. Phys. Commun., vol. 219, 2017, 313-328), in which a low-communication algorithm was shown to attain high degrees of scalability on the Cray XE6 architecture when overlapping communication and computation via dedicated communication threads. An even higher level of performance has now been achieved using OpenMP 4.5 on the Cray XK7 architecture, where on each node the 16 integer cores of an AMD Interlagos processor share a single Nvidia K20X GPU accelerator. In the new algorithm, data movements are minimized by performing virtually all of the intensive scalar field computations in the form of combined compact finite difference (CCD) operations on the GPUs. A memory layout in departure from usual practices is found to provide much better performance for a specific kernel required to apply the CCD scheme. Asynchronous execution enabled by adding the OpenMP 4.5 NOWAIT clause to TARGET constructs improves scalability when used to overlap computation on the GPUs with computation and communication on the CPUs. On the 27-petaflops supercomputer Titan at Oak Ridge National Laboratory, USA, a GPU-to-CPU speedup factor of approximately 5 is consistently observed at the largest problem size of 81923 grid points for the scalar field computed with 8192 XK7 nodes.

  14. Real-time implementations of image segmentation algorithms on shared memory multicore architecture: a survey (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Akil, Mohamed

    2017-05-01

    The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.

  15. KITTEN Lightweight Kernel 0.1 Beta

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pedretti, Kevin; Levenhagen, Michael; Kelly, Suzanne

    2007-12-12

    The Kitten Lightweight Kernel is a simplified OS (operating system) kernel that is intended to manage a compute node's hardware resources. It provides a set of mechanisms to user-level applications for utilizing hardware resources (e.g., allocating memory, creating processes, accessing the network). Kitten is much simpler than general-purpose OS kernels, such as Linux or Windows, but includes all of the esssential functionality needed to support HPC (high-performance computing) MPI, PGAS and OpenMP applications. Kitten provides unique capabilities such as physically contiguous application memory, transparent large page support, and noise-free tick-less operation, which enable HPC applications to obtain greater efficiency andmore » scalability than with general purpose OS kernels.« less

  16. The Research of the Parallel Computing Development from the Angle of Cloud Computing

    NASA Astrophysics Data System (ADS)

    Peng, Zhensheng; Gong, Qingge; Duan, Yanyu; Wang, Yun

    2017-10-01

    Cloud computing is the development of parallel computing, distributed computing and grid computing. The development of cloud computing makes parallel computing come into people’s lives. Firstly, this paper expounds the concept of cloud computing and introduces two several traditional parallel programming model. Secondly, it analyzes and studies the principles, advantages and disadvantages of OpenMP, MPI and Map Reduce respectively. Finally, it takes MPI, OpenMP models compared to Map Reduce from the angle of cloud computing. The results of this paper are intended to provide a reference for the development of parallel computing.

  17. OpenMP performance for benchmark 2D shallow water equations using LBM

    NASA Astrophysics Data System (ADS)

    Sabri, Khairul; Rabbani, Hasbi; Gunawan, Putu Harry

    2018-03-01

    Shallow water equations or commonly referred as Saint-Venant equations are used to model fluid phenomena. These equations can be solved numerically using several methods, like Lattice Boltzmann method (LBM), SIMPLE-like Method, Finite Difference Method, Godunov-type Method, and Finite Volume Method. In this paper, the shallow water equation will be approximated using LBM or known as LABSWE and will be simulated in performance of parallel programming using OpenMP. To evaluate the performance between 2 and 4 threads parallel algorithm, ten various number of grids Lx and Ly are elaborated. The results show that using OpenMP platform, the computational time for solving LABSWE can be decreased. For instance using grid sizes 1000 × 500, the speedup of 2 and 4 threads is observed 93.54 s and 333.243 s respectively.

  18. Argobots: A Lightweight Low-Level Threading and Tasking Framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan

    In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. We describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less

  19. OpenMP Performance on the Columbia Supercomputer

    NASA Technical Reports Server (NTRS)

    Haoqiang, Jin; Hood, Robert

    2005-01-01

    This presentation discusses Columbia World Class Supercomputer which is one of the world's fastest supercomputers providing 61 TFLOPs (10/20/04). Conceived, designed, built, and deployed in just 120 days. A 20-node supercomputer built on proven 512-processor nodes. The largest SGI system in the world with over 10,000 Intel Itanium 2 processors and provides the largest node size incorporating commodity parts (512) and the largest shared-memory environment (2048) with 88% efficiency tops the scalar systems on the Top500 list.

  20. Kernel optimization for short-range molecular dynamics

    NASA Astrophysics Data System (ADS)

    Hu, Changjun; Wang, Xianmeng; Li, Jianjiang; He, Xinfu; Li, Shigang; Feng, Yangde; Yang, Shaofeng; Bai, He

    2017-02-01

    To optimize short-range force computations in Molecular Dynamics (MD) simulations, multi-threading and SIMD optimizations are presented in this paper. With respect to multi-threading optimization, a Partition-and-Separate-Calculation (PSC) method is designed to avoid write conflicts caused by using Newton's third law. Serial bottlenecks are eliminated with no additional memory usage. The method is implemented by using the OpenMP model. Furthermore, the PSC method is employed on Intel Xeon Phi coprocessors in both native and offload models. We also evaluate the performance of the PSC method under different thread affinities on the MIC architecture. In the SIMD execution, we explain the performance influence in the PSC method, considering the "if-clause" of the cutoff radius check. The experiment results show that our PSC method is relatively more efficient compared to some traditional methods. In double precision, our 256-bit SIMD implementation is about 3 times faster than the scalar version.

  1. A Locality-Based Threading Algorithm for the Configuration-Interaction Method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shan, Hongzhang; Williams, Samuel; Johnson, Calvin

    The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intelmore » Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.« less

  2. A Locality-Based Threading Algorithm for the Configuration-Interaction Method

    DOE PAGES

    Shan, Hongzhang; Williams, Samuel; Johnson, Calvin; ...

    2017-07-03

    The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intelmore » Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.« less

  3. Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications

    NASA Technical Reports Server (NTRS)

    Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele

    2004-01-01

    In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.

  4. PARALLELISATION OF THE MODEL-BASED ITERATIVE RECONSTRUCTION ALGORITHM DIRA.

    PubMed

    Örtenberg, A; Magnusson, M; Sandborg, M; Alm Carlsson, G; Malusek, A

    2016-06-01

    New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelisation of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelisation of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelised using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelisation of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelisation with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  5. [Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].

    PubMed

    Furuta, Takuya; Sato, Tatsuhiko

    2015-01-01

    Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.

  6. OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers

    NASA Astrophysics Data System (ADS)

    Kimura, Keiji; Mase, Masayoshi; Mikami, Hiroki; Miyamoto, Takamichi; Shirako, Jun; Kasahara, Hironori

    OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an METI/NEDO project entitled "Multicore Technology for Realtime Consumer Electronics." By using the OSCAR API as an interface between the OSCAR compiler and backend compilers, the OSCAR compiler enables hierarchical multigrain parallel processing with memory optimization under capacity restriction for cache memory, local memory, distributed shared memory, and on-chip/off-chip shared memory; data transfer using a DMA controller; and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating for various embedded multicores. In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OSCAR API is designed on a subset of the OpenMP. This paper describes the OSCAR API and its compatibility with the OSCAR compiler by showing code examples. Performance evaluations of the OSCAR compiler and the OSCAR API are carried out using an IBM Power5+ workstation, an IBM Power6 high-end SMP server, and a newly developed consumer electronics multicore chip RP2 by Renesas, Hitachi and Waseda. From the results of scalability evaluation, it is found that on an average, the OSCAR compiler with the OSCAR API can exploit 5.8 times speedup over the sequential execution on the Power5+ workstation with eight cores and 2.9 times speedup on RP2 with four cores, respectively. In addition, the OSCAR compiler can accelerate an IBM XL Fortran compiler up to 3.3 times on the Power6 SMP server. Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.

  7. GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

    NASA Astrophysics Data System (ADS)

    Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junmin; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa

    2017-08-01

    The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL). Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC), KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1) updating the pure Message Passing Interface (MPI) parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2) fully employing the 512 bit wide vector processing units (VPUs) on the KNL platform; (3) reducing unnecessary memory access to improve cache efficiency; (4) reducing the thread local storage (TLS) in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5) changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined performance and energy improvement, the KNL platform was 37.5 % more efficient on power consumption compared with the CPU platform. The optimisations also enabled much further parallel scalability on both the CPU cluster and the KNL cluster scaled to 40 CPU nodes and 30 KNL nodes, with a parallel efficiency of 70.4 and 42.2 %, respectively.

  8. A numerical differentiation library exploiting parallel architectures

    NASA Astrophysics Data System (ADS)

    Voglis, C.; Hadjidoukas, P. E.; Lagaris, I. E.; Papageorgiou, D. G.

    2009-08-01

    We present a software library for numerically estimating first and second order partial derivatives of a function by finite differencing. Various truncation schemes are offered resulting in corresponding formulas that are accurate to order O(h), O(h), and O(h), h being the differencing step. The derivatives are calculated via forward, backward and central differences. Care has been taken that only feasible points are used in the case where bound constraints are imposed on the variables. The Hessian may be approximated either from function or from gradient values. There are three versions of the software: a sequential version, an OpenMP version for shared memory architectures and an MPI version for distributed systems (clusters). The parallel versions exploit the multiprocessing capability offered by computer clusters, as well as modern multi-core systems and due to the independent character of the derivative computation, the speedup scales almost linearly with the number of available processors/cores. Program summaryProgram title: NDL (Numerical Differentiation Library) Catalogue identifier: AEDG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 73 030 No. of bytes in distributed program, including test data, etc.: 630 876 Distribution format: tar.gz Programming language: ANSI FORTRAN-77, ANSI C, MPI, OPENMP Computer: Distributed systems (clusters), shared memory systems Operating system: Linux, Solaris Has the code been vectorised or parallelized?: Yes RAM: The library uses O(N) internal storage, N being the dimension of the problem Classification: 4.9, 4.14, 6.5 Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, etc. The parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Restrictions: The library uses only double precision arithmetic. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 15 ms for the serial distribution, 0.6 s for the OpenMP and 4.2 s for the MPI parallel distribution on 2 processors.

  9. Argobots: A Lightweight Low-Level Threading and Tasking Framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan

    In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, either are too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by end users or high-level programming models. We describe the design, implementation, and performance characterization of Argobots and present integrations with three high-level models: OpenMP, MPI, and colocated I/O services. Evaluations show that (1) Argobots, while providing richer capabilities, is competitive with existing simpler generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency-hiding capabilities; and (4) I/O services with Argobots reduce interference with colocated applications while achieving performance competitive with that of a Pthreads approach.« less

  10. Argobots: A Lightweight Low-Level Threading and Tasking Framework

    DOE PAGES

    Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan; ...

    2017-10-24

    In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this article, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. Here, we describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less

  11. Argobots: A Lightweight Low-Level Threading and Tasking Framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan

    In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this article, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. Here, we describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less

  12. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Langer, Steven H.; Karlin, Ian; Marinak, Marty M.

    HYDRA is used to simulate a variety of experiments carried out at the National Ignition Facility (NIF) [4] and other high energy density physics facilities. HYDRA has packages to simulate radiation transfer, atomic physics, hydrodynamics, laser propagation, and a number of other physics effects. HYDRA has over one million lines of code and includes both MPI and thread-level (OpenMP and pthreads) parallelism. This paper measures the performance characteristics of HYDRA using hardware counters on an IBM BlueGene/Q system. We report key ratios such as bytes/instruction and memory bandwidth for several different physics packages. The total number of bytes read andmore » written per time step is also reported. We show that none of the packages which use significant time are memory bandwidth limited on a Blue Gene/Q. HYDRA currently issues very few SIMD instructions. The pressure on memory bandwidth will increase if high levels of SIMD instructions can be achieved.« less

  13. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.

  14. Use Computer-Aided Tools to Parallelize Large CFD Applications

    NASA Technical Reports Server (NTRS)

    Jin, H.; Frumkin, M.; Yan, J.

    2000-01-01

    Porting applications to high performance parallel computers is always a challenging task. It is time consuming and costly. With rapid progressing in hardware architectures and increasing complexity of real applications in recent years, the problem becomes even more sever. Today, scalability and high performance are mostly involving handwritten parallel programs using message-passing libraries (e.g. MPI). However, this process is very difficult and often error-prone. The recent reemergence of shared memory parallel (SMP) architectures, such as the cache coherent Non-Uniform Memory Access (ccNUMA) architecture used in the SGI Origin 2000, show good prospects for scaling beyond hundreds of processors. Programming on an SMP is simplified by working in a globally accessible address space. The user can supply compiler directives, such as OpenMP, to parallelize the code. As an industry standard for portable implementation of parallel programs for SMPs, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran, C and C++ to express shared memory parallelism. It promises an incremental path for parallel conversion of existing software, as well as scalability and performance for a complete rewrite or an entirely new development. Perhaps the main disadvantage of programming with directives is that inserted directives may not necessarily enhance performance. In the worst cases, it can create erroneous results. While vendors have provided tools to perform error-checking and profiling, automation in directive insertion is very limited and often failed on large programs, primarily due to the lack of a thorough enough data dependence analysis. To overcome the deficiency, we have developed a toolkit, CAPO, to automatically insert OpenMP directives in Fortran programs and apply certain degrees of optimization. CAPO is aimed at taking advantage of detailed inter-procedural dependence analysis provided by CAPTools, developed by the University of Greenwich, to reduce potential errors made by users. Earlier tests on NAS Benchmarks and ARC3D have demonstrated good success of this tool. In this study, we have applied CAPO to parallelize three large applications in the area of computational fluid dynamics (CFD): OVERFLOW, TLNS3D and INS3D. These codes are widely used for solving Navier-Stokes equations with complicated boundary conditions and turbulence model in multiple zones. Each one comprises of from 50K to 1,00k lines of FORTRAN77. As an example, CAPO took 77 hours to complete the data dependence analysis of OVERFLOW on a workstation (SGI, 175MHz, R10K processor). A fair amount of effort was spent on correcting false dependencies due to lack of necessary knowledge during the analysis. Even so, CAPO provides an easy way for user to interact with the parallelization process. The OpenMP version was generated within a day after the analysis was completed. Due to sequential algorithms involved, code sections in TLNS3D and INS3D need to be restructured by hand to produce more efficient parallel codes. An included figure shows preliminary test results of the generated OVERFLOW with several test cases in single zone. The MPI data points for the small test case were taken from a handcoded MPI version. As we can see, CAPO's version has achieved 18 fold speed up on 32 nodes of the SGI O2K. For the small test case, it outperformed the MPI version. These results are very encouraging, but further work is needed. For example, although CAPO attempts to place directives on the outer- most parallel loops in an interprocedural framework, it does not insert directives based on the best manual strategy. In particular, it lacks the support of parallelization at the multi-zone level. Future work will emphasize on the development of methodology to work in a multi-zone level and with a hybrid approach. Development of tools to perform more complicated code transformation is also needed.

  15. BLESS 2: accurate, memory-efficient and fast error correction method.

    PubMed

    Heo, Yun; Ramachandran, Anand; Hwu, Wen-Mei; Ma, Jian; Chen, Deming

    2016-08-01

    The most important features of error correction tools for sequencing data are accuracy, memory efficiency and fast runtime. The previous version of BLESS was highly memory-efficient and accurate, but it was too slow to handle reads from large genomes. We have developed a new version of BLESS to improve runtime and accuracy while maintaining a small memory usage. The new version, called BLESS 2, has an error correction algorithm that is more accurate than BLESS, and the algorithm has been parallelized using hybrid MPI and OpenMP programming. BLESS 2 was compared with five top-performing tools, and it was found to be the fastest when it was executed on two computing nodes using MPI, with each node containing twelve cores. Also, BLESS 2 showed at least 11% higher gain while retaining the memory efficiency of the previous version for large genomes. Freely available at https://sourceforge.net/projects/bless-ec dchen@illinois.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. Thread-level parallelization and optimization of NWChem for the Intel MIC architecture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shan, Hongzhang; Williams, Samuel; de Jong, Wibe

    In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant e ort was required to safely and efeciently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI+OpenMP hybrid implementations attain up to 65× better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6× better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less

  17. Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liao, C; Quinlan, D J; Willcock, J J

    2008-12-12

    Automatic introduction of OpenMP for sequential applications has attracted significant attention recently because of the proliferation of multicore processors and the simplicity of using OpenMP to express parallelism for shared-memory systems. However, most previous research has only focused on C and Fortran applications operating on primitive data types. C++ applications using high-level abstractions, such as STL containers and complex user-defined types, are largely ignored due to the lack of research compilers that are readily able to recognize high-level object-oriented abstractions and leverage their associated semantics. In this paper, we automatically parallelize C++ applications using ROSE, a multiple-language source-to-source compiler infrastructuremore » which preserves the high-level abstractions and gives us access to their semantics. Several representative parallelization candidate kernels are used to explore semantic-aware parallelization strategies for high-level abstractions, combined with extended compiler analyses. Those kernels include an array-base computation loop, a loop with task-level parallelism, and a domain-specific tree traversal. Our work extends the applicability of automatic parallelization to modern applications using high-level abstractions and exposes more opportunities to take advantage of multicore processors.« less

  18. Data preprocessing for determining outer/inner parallelization in the nested loop problem using OpenMP

    NASA Astrophysics Data System (ADS)

    Handhika, T.; Bustamam, A.; Ernastuti, Kerami, D.

    2017-07-01

    Multi-thread programming using OpenMP on the shared-memory architecture with hyperthreading technology allows the resource to be accessed by multiple processors simultaneously. Each processor can execute more than one thread for a certain period of time. However, its speedup depends on the ability of the processor to execute threads in limited quantities, especially the sequential algorithm which contains a nested loop. The number of the outer loop iterations is greater than the maximum number of threads that can be executed by a processor. The thread distribution technique that had been found previously only be applied by the high-level programmer. This paper generates a parallelization procedure for low-level programmer in dealing with 2-level nested loop problems with the maximum number of threads that can be executed by a processor is smaller than the number of the outer loop iterations. Data preprocessing which is related to the number of the outer loop and the inner loop iterations, the computational time required to execute each iteration and the maximum number of threads that can be executed by a processor are used as a strategy to determine which parallel region that will produce optimal speedup.

  19. Code Parallelization with CAPO: A User Manual

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)

    2001-01-01

    A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.

  20. Effective Vectorization with OpenMP 4.5

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huber, Joseph N.; Hernandez, Oscar R.; Lopez, Matthew Graham

    This paper describes how the Single Instruction Multiple Data (SIMD) model and its extensions in OpenMP work, and how these are implemented in different compilers. Modern processors are highly parallel computational machines which often include multiple processors capable of executing several instructions in parallel. Understanding SIMD and executing instructions in parallel allows the processor to achieve higher performance without increasing the power required to run it. SIMD instructions can significantly reduce the runtime of code by executing a single operation on large groups of data. The SIMD model is so integral to the processor s potential performance that, if SIMDmore » is not utilized, less than half of the processor is ever actually used. Unfortunately, using SIMD instructions is a challenge in higher level languages because most programming languages do not have a way to describe them. Most compilers are capable of vectorizing code by using the SIMD instructions, but there are many code features important for SIMD vectorization that the compiler cannot determine at compile time. OpenMP attempts to solve this by extending the C++/C and Fortran programming languages with compiler directives that express SIMD parallelism. OpenMP is used to pass hints to the compiler about the code to be executed in SIMD. This is a key resource for making optimized code, but it does not change whether or not the code can use SIMD operations. However, in many cases critical functions are limited by a poor understanding of how SIMD instructions are actually implemented, as SIMD can be implemented through vector instructions or simultaneous multi-threading (SMT). We have found that it is often the case that code cannot be vectorized, or is vectorized poorly, because the programmer does not have sufficient knowledge of how SIMD instructions work.« less

  1. Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

    NASA Technical Reports Server (NTRS)

    OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

    1998-01-01

    This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).

  2. Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks

    NASA Technical Reports Server (NTRS)

    Jin, Haoqiang; VanderWijngaart, Rob F.

    2003-01-01

    We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of grids, but had not previously been captured in bench-marks. The new suite, named NPB Multi-Zone, is extended from the NAS Parallel Benchmarks suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the Message Passing Interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on three different parallel computers. We also use an empirical formula to investigate the performance characteristics of the multi-zone benchmarks.

  3. Development of full wave code for modeling RF fields in hot non-uniform plasmas

    NASA Astrophysics Data System (ADS)

    Zhao, Liangji; Svidzinski, Vladimir; Spencer, Andrew; Kim, Jin-Soo

    2016-10-01

    FAR-TECH, Inc. is developing a full wave RF modeling code to model RF fields in fusion devices and in general plasma applications. As an important component of the code, an adaptive meshless technique is introduced to solve the wave equations, which allows resolving plasma resonances efficiently and adapting to the complexity of antenna geometry and device boundary. The computational points are generated using either a point elimination method or a force balancing method based on the monitor function, which is calculated by solving the cold plasma dispersion equation locally. Another part of the code is the conductivity kernel calculation, used for modeling the nonlocal hot plasma dielectric response. The conductivity kernel is calculated on a coarse grid of test points and then interpolated linearly onto the computational points. All the components of the code are parallelized using MPI and OpenMP libraries to optimize the execution speed and memory. The algorithm and the results of our numerical approach to solving 2-D wave equations in a tokamak geometry will be presented. Work is supported by the U.S. DOE SBIR program.

  4. Parallelization of elliptic solver for solving 1D Boussinesq model

    NASA Astrophysics Data System (ADS)

    Tarwidi, D.; Adytia, D.

    2018-03-01

    In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.

  5. A hybrid parallel framework for the cellular Potts model simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jiang, Yi; He, Kejing; Dong, Shoubin

    2009-01-01

    The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approachmore » achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).« less

  6. HPC Profiling with the Sun Studio™ Performance Tools

    NASA Astrophysics Data System (ADS)

    Itzkowitz, Marty; Maruyama, Yukon

    In this paper, we describe how to use the Sun Studio Performance Tools to understand the nature and causes of application performance problems. We first explore CPU and memory performance problems for single-threaded applications, giving some simple examples. Then, we discuss multi-threaded performance issues, such as locking and false-sharing of cache lines, in each case showing how the tools can help. We go on to describe OpenMP applications and the support for them in the performance tools. Then we discuss MPI applications, and the techniques used to profile them. Finally, we present our conclusions.

  7. Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores

    NASA Astrophysics Data System (ADS)

    Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei

    We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.

  8. Power and Performance Trade-offs for Space Time Adaptive Processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gawande, Nitin A.; Manzano Franco, Joseph B.; Tumeo, Antonino

    Computational efficiency – performance relative to power or energy – is one of the most important concerns when designing RADAR processing systems. This paper analyzes power and performance trade-offs for a typical Space Time Adaptive Processing (STAP) application. We study STAP implementations for CUDA and OpenMP on two computationally efficient architectures, Intel Haswell Core I7-4770TE and NVIDIA Kayla with a GK208 GPU. We analyze the power and performance of STAP’s computationally intensive kernels across the two hardware testbeds. We also show the impact and trade-offs of GPU optimization techniques. We show that data parallelism can be exploited for efficient implementationmore » on the Haswell CPU architecture. The GPU architecture is able to process large size data sets without increase in power requirement. The use of shared memory has a significant impact on the power requirement for the GPU. A balance between the use of shared memory and main memory access leads to an improved performance in a typical STAP application.« less

  9. Composing Data Parallel Code for a SPARQL Graph Engine

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Castellana, Vito G.; Tumeo, Antonino; Villa, Oreste

    Big data analytics process large amount of data to extract knowledge from them. Semantic databases are big data applications that adopt the Resource Description Framework (RDF) to structure metadata through a graph-based representation. The graph based representation provides several benefits, such as the possibility to perform in memory processing with large amounts of parallelism. SPARQL is a language used to perform queries on RDF-structured data through graph matching. In this paper we present a tool that automatically translates SPARQL queries to parallel graph crawling and graph matching operations. The tool also supports complex SPARQL constructs, which requires more than basicmore » graph matching for their implementation. The tool generates parallel code annotated with OpenMP pragmas for x86 Shared-memory Multiprocessors (SMPs). With respect to commercial database systems such as Virtuoso, our approach reduces memory occupation due to join operations and provides higher performance. We show the scaling of the automatically generated graph-matching code on a 48-core SMP.« less

  10. Efficient diagonalization of the sparse matrices produced within the framework of the UK R-matrix molecular codes

    NASA Astrophysics Data System (ADS)

    Galiatsatos, P. G.; Tennyson, J.

    2012-11-01

    The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.

  11. A feasibility study on porting the community land model onto accelerators using OpenACC

    DOE PAGES

    Wang, Dali; Wu, Wei; Winkler, Frank; ...

    2014-01-01

    As environmental models (such as Accelerated Climate Model for Energy (ACME), Parallel Reactive Flow and Transport Model (PFLOTRAN), Arctic Terrestrial Simulator (ATS), etc.) became more and more complicated, we are facing enormous challenges regarding to porting those applications onto hybrid computing architecture. OpenACC appears as a very promising technology, therefore, we have conducted a feasibility analysis on porting the Community Land Model (CLM), a terrestrial ecosystem model within the Community Earth System Models (CESM)). Specifically, we used automatic function testing platform to extract a small computing kernel out of CLM, then we apply this kernel into the actually CLM dataflowmore » procedure, and investigate the strategy of data parallelization and the benefit of data movement provided by current implementation of OpenACC. Even it is a non-intensive kernel, on a single 16-core computing node, the performance (based on the actual computation time using one GPU) of OpenACC implementation is 2.3 time faster than that of OpenMP implementation using single OpenMP thread, but it is 2.8 times slower than the performance of OpenMP implementation using 16 threads. On multiple nodes, MPI_OpenACC implementation demonstrated very good scalability on up to 128 GPUs on 128 computing nodes. This study also provides useful information for us to look into the potential benefits of “deep copy” capability and “routine” feature of OpenACC standards. In conclusion, we believe that our experience on the environmental model, CLM, can be beneficial to many other scientific research programs who are interested to porting their large scale scientific code using OpenACC onto high-end computers, empowered by hybrid computing architecture.« less

  12. A multi-threaded version of MCFM

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Campbell, John M.; Ellis, R. Keith; Giele, Walter T.

    We report on our findings modifying MCFM using OpenMP to implement multi-threading. By using OpenMP, the modified MCFM will execute on any processor, automatically adjusting to the number of available threads. We then modified the integration routine VEGAS to distribute the event evaluation over the threads, while combining all events at the end of every iteration to optimize the numerical integration. Furthermore, we took special care so that the results of the Monte Carlo integration were independent of the number of threads used, to facilitate the validation of the OpenMP version of MCFM.

  13. A multithreaded and GPU-optimized compact finite difference algorithm for turbulent mixing at high Schmidt number using petascale computing

    NASA Astrophysics Data System (ADS)

    Clay, M. P.; Yeung, P. K.; Buaria, D.; Gotoh, T.

    2017-11-01

    Turbulent mixing at high Schmidt number is a multiscale problem which places demanding requirements on direct numerical simulations to resolve fluctuations down the to Batchelor scale. We use a dual-grid, dual-scheme and dual-communicator approach where velocity and scalar fields are computed by separate groups of parallel processes, the latter using a combined compact finite difference (CCD) scheme on finer grid with a static 3-D domain decomposition free of the communication overhead of memory transposes. A high degree of scalability is achieved for a 81923 scalar field at Schmidt number 512 in turbulence with a modest inertial range, by overlapping communication with computation whenever possible. On the Cray XE6 partition of Blue Waters, use of a dedicated thread for communication combined with OpenMP locks and nested parallelism reduces CCD timings by 34% compared to an MPI baseline. The code has been further optimized for the 27-petaflops Cray XK7 machine Titan using GPUs as accelerators with the latest OpenMP 4.5 directives, giving 2.7X speedup compared to CPU-only execution at the largest problem size. Supported by NSF Grant ACI-1036170, the NCSA Blue Waters Project with subaward via UIUC, and a DOE INCITE allocation at ORNL.

  14. Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks

    NASA Technical Reports Server (NTRS)

    Turney, Raymond D.

    2001-01-01

    This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.

  15. fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data.

    PubMed

    Hung, Ling-Hong; Samudrala, Ram

    2014-06-15

    fast_protein_cluster is a fast, parallel and memory efficient package used to cluster 60 000 sets of protein models (with up to 550 000 models per set) generated by the Nutritious Rice for the World project. fast_protein_cluster is an optimized and extensible toolkit that supports Root Mean Square Deviation after optimal superposition (RMSD) and Template Modeling score (TM-score) as metrics. RMSD calculations using a laptop CPU are 60× faster than qcprot and 3× faster than current graphics processing unit (GPU) implementations. New GPU code further increases the speed of RMSD and TM-score calculations. fast_protein_cluster provides novel k-means and hierarchical clustering methods that are up to 250× and 2000× faster, respectively, than Clusco, and identify significantly more accurate models than Spicker and Clusco. fast_protein_cluster is written in C++ using OpenMP for multi-threading support. Custom streaming Single Instruction Multiple Data (SIMD) extensions and advanced vector extension intrinsics code accelerate CPU calculations, and OpenCL kernels support AMD and Nvidia GPUs. fast_protein_cluster is available under the M.I.T. license. (http://software.compbio.washington.edu/fast_protein_cluster) © The Author 2014. Published by Oxford University Press.

  16. Fast 2D FWI on a multi and many-cores workstation.

    NASA Astrophysics Data System (ADS)

    Thierry, Philippe; Donno, Daniela; Noble, Mark

    2014-05-01

    Following the introduction of x86 co-processors (Xeon Phi) and the performance increase of standard 2-socket workstations using the latest 12 cores E5-v2 x86-64 CPU, we present here a MPI + OpenMP implementation of an acoustic 2D FWI (full waveform inversion) code which simultaneously runs on the CPUs and on the co-processors installed in a workstation. The main advantage of running a 2D FWI on a workstation is to be able to quickly evaluate new features such as more complicated wave equations, new cost functions, finite-difference stencils or boundary conditions. Since the co-processor is made of 61 in-order x86 cores, each of them having up to 4 threads, this many-core can be seen as a shared memory SMP (symmetric multiprocessing) machine with its own IP address. Depending on the vendor, a single workstation can handle several co-processors making the workstation as a personal cluster under the desk. The original Fortran 90 CPU version of the 2D FWI code is just recompiled to get a Xeon Phi x86 binary. This multi and many-core configuration uses standard compilers and associated MPI as well as math libraries under Linux; therefore, the cost of code development remains constant, while improving computation time. We choose to implement the code with the so-called symmetric mode to fully use the capacity of the workstation, but we also evaluate the scalability of the code in native mode (i.e running only on the co-processor) thanks to the Linux ssh and NFS capabilities. Usual care of optimization and SIMD vectorization is used to ensure optimal performances, and to analyze the application performances and bottlenecks on both platforms. The 2D FWI implementation uses finite-difference time-domain forward modeling and a quasi-Newton (with L-BFGS algorithm) optimization scheme for the model parameters update. Parallelization is achieved through standard MPI shot gathers distribution and OpenMP for domain decomposition within the co-processor. Taking advantage of the 16 GB of memory available on the co-processor we are able to keep wavefields in memory to achieve the gradient computation by cross-correlation of forward and back-propagated wavefields needed by our time-domain FWI scheme, without heavy traffic on the i/o subsystem and PCIe bus. In this presentation we will also review some simple methodologies to determine performance expectation compared to real performances in order to get optimization effort estimation before starting any huge modification or rewriting of research codes. The key message is the ease of use and development of this hybrid configuration to reach not the absolute peak performance value but the optimal one that ensures the best balance between geophysical and computer developments.

  17. Parameters analysis of a porous medium model for treatment with hyperthermia using OpenMP

    NASA Astrophysics Data System (ADS)

    Freitas Reis, Ruy; dos Santos Loureiro, Felipe; Lobosco, Marcelo

    2015-09-01

    Cancer is the second cause of death in the world so treatments have been developed trying to work around this world health problem. Hyperthermia is not a new technique, but its use in cancer treatment is still at early stage of development. This treatment is based on overheat the target area to a threshold temperature that causes cancerous cell necrosis and apoptosis. To simulate this phenomenon using magnetic nanoparticles in an under skin cancer treatment, a three-dimensional porous medium model was adopted. This study presents a sensibility analysis of the model parameters such as the porosity and blood velocity. To ensure a second-order solution approach, a 7-points centered finite difference method was used for space discretization while a predictor-corrector method was used to time evolution. Due to the massive computations required to find the solution of a three-dimensional model, this paper also presents a first attempt to improve performance using OpenMP, a parallel programming API.

  18. Performance Analysis of and Tool Support for Transactional Memory on BG/Q

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schindewolf, M

    2011-12-08

    Martin Schindewolf worked during his internship at the Lawrence Livermore National Laboratory (LLNL) under the guidance of Martin Schulz at the Computer Science Group of the Center for Applied Scientific Computing. We studied the performance of the TM subsystem of BG/Q as well as researched the possibilities for tool support for TM. To study the performance, we run CLOMP-TM. CLOMP-TM is a benchmark designed for the purpose to quantify the overhead of OpenMP and compare different synchronization primitives. To advance CLOMP-TM, we added Message Passing Interface (MPI) routines for a hybrid parallelization. This enables to run multiple MPI tasks, eachmore » running OpenMP, on one node. With these enhancements, a beneficial MPI task to OpenMP thread ratio is determined. Further, the synchronization primitives are ranked as a function of the application characteristics. To demonstrate the usefulness of these results, we investigate a real Monte Carlo simulation called Monte Carlo Benchmark (MCB). Applying the lessons learned yields the best task to thread ratio. Further, we were able to tune the synchronization by transactifying the MCB. Further, we develop tools that capture the performance of the TM run time system and present it to the application's developer. The performance of the TM run time system relies on the built-in statistics. These tools use the Blue Gene Performance Monitoring (BGPM) interface to correlate the statistics from the TM run time system with performance counter values. This combination provides detailed insights in the run time behavior of the application and enables to track down the cause of degraded performance. Further, one tool has been implemented that separates the performance counters in three categories: Successful Speculation, Unsuccessful Speculation and No Speculation. All of the tools are crafted around IBM's xlc compiler for C and C++ and have been run and tested on a Q32 early access system.« less

  19. Acoustic 3D modeling by the method of integral equations

    NASA Astrophysics Data System (ADS)

    Malovichko, M.; Khokhlov, N.; Yavich, N.; Zhdanov, M.

    2018-02-01

    This paper presents a parallel algorithm for frequency-domain acoustic modeling by the method of integral equations (IE). The algorithm is applied to seismic simulation. The IE method reduces the size of the problem but leads to a dense system matrix. A tolerable memory consumption and numerical complexity were achieved by applying an iterative solver, accompanied by an effective matrix-vector multiplication operation, based on the fast Fourier transform (FFT). We demonstrate that, the IE system matrix is better conditioned than that of the finite-difference (FD) method, and discuss its relation to a specially preconditioned FD matrix. We considered several methods of matrix-vector multiplication for the free-space and layered host models. The developed algorithm and computer code were benchmarked against the FD time-domain solution. It was demonstrated that, the method could accurately calculate the seismic field for the models with sharp material boundaries and a point source and receiver located close to the free surface. We used OpenMP to speed up the matrix-vector multiplication, while MPI was used to speed up the solution of the system equations, and also for parallelizing across multiple sources. The practical examples and efficiency tests are presented as well.

  20. Facilitating arrhythmia simulation: the method of quantitative cellular automata modeling and parallel running

    PubMed Central

    Zhu, Hao; Sun, Yan; Rajagopal, Gunaretnam; Mondry, Adrian; Dhar, Pawan

    2004-01-01

    Background Many arrhythmias are triggered by abnormal electrical activity at the ionic channel and cell level, and then evolve spatio-temporally within the heart. To understand arrhythmias better and to diagnose them more precisely by their ECG waveforms, a whole-heart model is required to explore the association between the massively parallel activities at the channel/cell level and the integrative electrophysiological phenomena at organ level. Methods We have developed a method to build large-scale electrophysiological models by using extended cellular automata, and to run such models on a cluster of shared memory machines. We describe here the method, including the extension of a language-based cellular automaton to implement quantitative computing, the building of a whole-heart model with Visible Human Project data, the parallelization of the model on a cluster of shared memory computers with OpenMP and MPI hybrid programming, and a simulation algorithm that links cellular activity with the ECG. Results We demonstrate that electrical activities at channel, cell, and organ levels can be traced and captured conveniently in our extended cellular automaton system. Examples of some ECG waveforms simulated with a 2-D slice are given to support the ECG simulation algorithm. A performance evaluation of the 3-D model on a four-node cluster is also given. Conclusions Quantitative multicellular modeling with extended cellular automata is a highly efficient and widely applicable method to weave experimental data at different levels into computational models. This process can be used to investigate complex and collective biological activities that can be described neither by their governing differentiation equations nor by discrete parallel computation. Transparent cluster computing is a convenient and effective method to make time-consuming simulation feasible. Arrhythmias, as a typical case, can be effectively simulated with the methods described. PMID:15339335

  1. Load Balancing Strategies for Multiphase Flows on Structured Grids

    NASA Astrophysics Data System (ADS)

    Olshefski, Kristopher; Owkes, Mark

    2017-11-01

    The computation time required to perform large simulations of complex systems is currently one of the leading bottlenecks of computational research. Parallelization allows multiple processing cores to perform calculations simultaneously and reduces computational times. However, load imbalances between processors waste computing resources as processors wait for others to complete imbalanced tasks. In multiphase flows, these imbalances arise due to the additional computational effort required at the gas-liquid interface. However, many current load balancing schemes are only designed for unstructured grid applications. The purpose of this research is to develop a load balancing strategy while maintaining the simplicity of a structured grid. Several approaches are investigated including brute force oversubscription, node oversubscription through Message Passing Interface (MPI) commands, and shared memory load balancing using OpenMP. Each of these strategies are tested with a simple one-dimensional model prior to implementation into the three-dimensional NGA code. Current results show load balancing will reduce computational time by at least 30%.

  2. The MOLDY short-range molecular dynamics package

    NASA Astrophysics Data System (ADS)

    Ackland, G. J.; D'Mellow, K.; Daraszewicz, S. L.; Hepburn, D. J.; Uhrin, M.; Stratford, K.

    2011-12-01

    We describe a parallelised version of the MOLDY molecular dynamics program. This Fortran code is aimed at systems which may be described by short-range potentials and specifically those which may be addressed with the embedded atom method. This includes a wide range of transition metals and alloys. MOLDY provides a range of options in terms of the molecular dynamics ensemble used and the boundary conditions which may be applied. A number of standard potentials are provided, and the modular structure of the code allows new potentials to be added easily. The code is parallelised using OpenMP and can therefore be run on shared memory systems, including modern multicore processors. Particular attention is paid to the updates required in the main force loop, where synchronisation is often required in OpenMP implementations of molecular dynamics. We examine the performance of the parallel code in detail and give some examples of applications to realistic problems, including the dynamic compression of copper and carbon migration in an iron-carbon alloy. Program summaryProgram title: MOLDY Catalogue identifier: AEJU_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJU_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public License version 2 No. of lines in distributed program, including test data, etc.: 382 881 No. of bytes in distributed program, including test data, etc.: 6 705 242 Distribution format: tar.gz Programming language: Fortran 95/OpenMP Computer: Any Operating system: Any Has the code been vectorised or parallelized?: Yes. OpenMP is required for parallel execution RAM: 100 MB or more Classification: 7.7 Nature of problem: Moldy addresses the problem of many atoms (of order 10 6) interacting via a classical interatomic potential on a timescale of microseconds. It is designed for problems where statistics must be gathered over a number of equivalent runs, such as measuring thermodynamic properities, diffusion, radiation damage, fracture, twinning deformation, nucleation and growth of phase transitions, sputtering etc. In the vast majority of materials, the interactions are non-pairwise, and the code must be able to deal with many-body forces. Solution method: Molecular dynamics involves integrating Newton's equations of motion. MOLDY uses verlet (for good energy conservation) or predictor-corrector (for accurate trajectories) algorithms. It is parallelised using open MP. It also includes a static minimisation routine to find the lowest energy structure. Boundary conditions for surfaces, clusters, grain boundaries, thermostat (Nose), barostat (Parrinello-Rahman), and externally applied strain are provided. The initial configuration can be either a repeated unit cell or have all atoms given explictly. Initial velocities are generated internally, but it is also possible to specify the velocity of a particular atom. A wide range of interatomic force models are implemented, including embedded atom, Morse or Lennard-Jones. Thus the program is especially well suited to calculations of metals. Restrictions: The code is designed for short-ranged potentials, and there is no Ewald sum. Thus for long range interactions where all particles interact with all others, the order- N scaling will fail. Different interatomic potential forms require recompilation of the code. Additional comments: There is a set of associated open-source analysis software for postprocessing and visualisation. This includes local crystal structure recognition and identification of topological defects. Running time: A set of test modules for running time are provided. The code scales as order N. The parallelisation shows near-linear scaling with number of processors in a shared memory environment. A typical run of a few tens of nanometers for a few nanoseconds will run on a timescale of days on a multiprocessor desktop.

  3. Acceleration for 2D time-domain elastic full waveform inversion using a single GPU card

    NASA Astrophysics Data System (ADS)

    Jiang, Jinpeng; Zhu, Peimin

    2018-05-01

    Full waveform inversion (FWI) is a challenging procedure due to the high computational cost related to the modeling, especially for the elastic case. The graphics processing unit (GPU) has become a popular device for the high-performance computing (HPC). To reduce the long computation time, we design and implement the GPU-based 2D elastic FWI (EFWI) in time domain using a single GPU card. We parallelize the forward modeling and gradient calculations using the CUDA programming language. To overcome the limitation of relatively small global memory on GPU, the boundary saving strategy is exploited to reconstruct the forward wavefield. Moreover, the L-BFGS optimization method used in the inversion increases the convergence of the misfit function. A multiscale inversion strategy is performed in the workflow to obtain the accurate inversion results. In our tests, the GPU-based implementations using a single GPU device achieve >15 times speedup in forward modeling, and about 12 times speedup in gradient calculation, compared with the eight-core CPU implementations optimized by OpenMP. The test results from the GPU implementations are verified to have enough accuracy by comparing the results obtained from the CPU implementations.

  4. MILC Code Performance on High End CPU and GPU Supercomputer Clusters

    NASA Astrophysics Data System (ADS)

    DeTar, Carleton; Gottlieb, Steven; Li, Ruizi; Toussaint, Doug

    2018-03-01

    With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.

  5. Performance evaluation of throughput computing workloads using multi-core processors and graphics processors

    NASA Astrophysics Data System (ADS)

    Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.

    2017-11-01

    Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.

  6. PhreeqcRM: A reaction module for transport simulators based on the geochemical model PHREEQC

    USGS Publications Warehouse

    Parkhurst, David L.; Wissmeier, Laurin

    2015-01-01

    PhreeqcRM is a geochemical reaction module designed specifically to perform equilibrium and kinetic reaction calculations for reactive transport simulators that use an operator-splitting approach. The basic function of the reaction module is to take component concentrations from the model cells of the transport simulator, run geochemical reactions, and return updated component concentrations to the transport simulator. If multicomponent diffusion is modeled (e.g., Nernst–Planck equation), then aqueous species concentrations can be used instead of component concentrations. The reaction capabilities are a complete implementation of the reaction capabilities of PHREEQC. In each cell, the reaction module maintains the composition of all of the reactants, which may include minerals, exchangers, surface complexers, gas phases, solid solutions, and user-defined kinetic reactants.PhreeqcRM assigns initial and boundary conditions for model cells based on standard PHREEQC input definitions (files or strings) of chemical compositions of solutions and reactants. Additional PhreeqcRM capabilities include methods to eliminate reaction calculations for inactive parts of a model domain, transfer concentrations and other model properties, and retrieve selected results. The module demonstrates good scalability for parallel processing by using multiprocessing with MPI (message passing interface) on distributed memory systems, and limited scalability using multithreading with OpenMP on shared memory systems. PhreeqcRM is written in C++, but interfaces allow methods to be called from C or Fortran. By using the PhreeqcRM reaction module, an existing multicomponent transport simulator can be extended to simulate a wide range of geochemical reactions. Results of the implementation of PhreeqcRM as the reaction engine for transport simulators PHAST and FEFLOW are shown by using an analytical solution and the reactive transport benchmark of MoMaS.

  7. Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

    2003-01-01

    Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.

  8. Performance of hybrid programming models for multiscale cardiac simulations: preparing for petascale computation.

    PubMed

    Pope, Bernard J; Fitch, Blake G; Pitman, Michael C; Rice, John J; Reumann, Matthias

    2011-10-01

    Future multiscale and multiphysics models that support research into human disease, translational medical science, and treatment can utilize the power of high-performance computing (HPC) systems. We anticipate that computationally efficient multiscale models will require the use of sophisticated hybrid programming models, mixing distributed message-passing processes [e.g., the message-passing interface (MPI)] with multithreading (e.g., OpenMP, Pthreads). The objective of this study is to compare the performance of such hybrid programming models when applied to the simulation of a realistic physiological multiscale model of the heart. Our results show that the hybrid models perform favorably when compared to an implementation using only the MPI and, furthermore, that OpenMP in combination with the MPI provides a satisfactory compromise between performance and code complexity. Having the ability to use threads within MPI processes enables the sophisticated use of all processor cores for both computation and communication phases. Considering that HPC systems in 2012 will have two orders of magnitude more cores than what was used in this study, we believe that faster than real-time multiscale cardiac simulations can be achieved on these systems.

  9. A fast algorithm for forward-modeling of gravitational fields in spherical coordinates with 3D Gauss-Legendre quadrature

    NASA Astrophysics Data System (ADS)

    Zhao, G.; Liu, J.; Chen, B.; Guo, R.; Chen, L.

    2017-12-01

    Forward modeling of gravitational fields at large-scale requires to consider the curvature of the Earth and to evaluate the Newton's volume integral in spherical coordinates. To acquire fast and accurate gravitational effects for subsurface structures, subsurface mass distribution is usually discretized into small spherical prisms (called tesseroids). The gravity fields of tesseroids are generally calculated numerically. One of the commonly used numerical methods is the 3D Gauss-Legendre quadrature (GLQ). However, the traditional GLQ integration suffers from low computational efficiency and relatively poor accuracy when the observation surface is close to the source region. We developed a fast and high accuracy 3D GLQ integration based on the equivalence of kernel matrix, adaptive discretization and parallelization using OpenMP. The equivalence of kernel matrix strategy increases efficiency and reduces memory consumption by calculating and storing the same matrix elements in each kernel matrix just one time. In this method, the adaptive discretization strategy is used to improve the accuracy. The numerical investigations show that the executing time of the proposed method is reduced by two orders of magnitude compared with the traditional method that without these optimized strategies. High accuracy results can also be guaranteed no matter how close the computation points to the source region. In addition, the algorithm dramatically reduces the memory requirement by N times compared with the traditional method, where N is the number of discretization of the source region in the longitudinal direction. It makes the large-scale gravity forward modeling and inversion with a fine discretization possible.

  10. Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

    NASA Technical Reports Server (NTRS)

    Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

    2017-01-01

    In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.

  11. Innovative Language-Based & Object-Oriented Structured AMR Using Fortran 90 and OpenMP

    NASA Technical Reports Server (NTRS)

    Norton, C.; Balsara, D.

    1999-01-01

    Parallel adaptive mesh refinement (AMR) is an important numerical technique that leads to the efficient solution of many physical and engineering problems. In this paper, we describe how AMR programing can be performed in an object-oreinted way using the modern aspects of Fortran 90 combined with the parallelization features of OpenMP.

  12. Parallelization of TWOPORFLOW, a Cartesian Grid based Two-phase Porous Media Code for Transient Thermo-hydraulic Simulations

    NASA Astrophysics Data System (ADS)

    Trost, Nico; Jiménez, Javier; Imke, Uwe; Sanchez, Victor

    2014-06-01

    TWOPORFLOW is a thermo-hydraulic code based on a porous media approach to simulate single- and two-phase flow including boiling. It is under development at the Institute for Neutron Physics and Reactor Technology (INR) at KIT. The code features a 3D transient solution of the mass, momentum and energy conservation equations for two inter-penetrating fluids with a semi-implicit continuous Eulerian type solver. The application domain of TWOPORFLOW includes the flow in standard porous media and in structured porous media such as micro-channels and cores of nuclear power plants. In the latter case, the fluid domain is coupled to a fuel rod model, describing the heat flow inside the solid structure. In this work, detailed profiling tools have been utilized to determine the optimization potential of TWOPORFLOW. As a result, bottle-necks were identified and reduced in the most feasible way, leading for instance to an optimization of the water-steam property computation. Furthermore, an OpenMP implementation addressing the routines in charge of inter-phase momentum-, energy- and mass-coupling delivered good performance together with a high scalability on shared memory architectures. In contrast to that, the approach for distributed memory systems was to solve sub-problems resulting by the decomposition of the initial Cartesian geometry. Thread communication for the sub-problem boundary updates was accomplished by the Message Passing Interface (MPI) standard.

  13. A High Order, Locally-Adaptive Method for the Navier-Stokes Equations

    NASA Astrophysics Data System (ADS)

    Chan, Daniel

    1998-11-01

    I have extended the FOSLS method of Cai, Manteuffel and McCormick (1997) and implemented it within the framework of a spectral element formulation using the Legendre polynomial basis function. The FOSLS method solves the Navier-Stokes equations as a system of coupled first-order equations and provides the ellipticity that is needed for fast iterative matrix solvers like multigrid to operate efficiently. Each element is treated as an object and its properties are self-contained. Only C^0 continuity is imposed across element interfaces; this design allows local grid refinement and coarsening without the burden of having an elaborate data structure, since only information along element boundaries is needed. With the FORTRAN 90 programming environment, I can maintain a high computational efficiency by employing a hybrid parallel processing model. The OpenMP directives provides parallelism in the loop level which is executed in a shared-memory SMP and the MPI protocol allows the distribution of elements to a cluster of SMP's connected via a commodity network. This talk will provide timing results and a comparison with a second order finite difference method.

  14. Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.

    2003-01-01

    With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.

  15. Optimization of computations for adjoint field and Jacobian needed in 3D CSEM inversion

    NASA Astrophysics Data System (ADS)

    Dehiya, Rahul; Singh, Arun; Gupta, Pravin K.; Israil, M.

    2017-01-01

    We present the features and results of a newly developed code, based on Gauss-Newton optimization technique, for solving three-dimensional Controlled-Source Electromagnetic inverse problem. In this code a special emphasis has been put on representing the operations by block matrices for conjugate gradient iteration. We show how in the computation of Jacobian, the matrix formed by differentiation of system matrix can be made independent of frequency to optimize the operations at conjugate gradient step. The coarse level parallel computing, using OpenMP framework, is used primarily due to its simplicity in implementation and accessibility of shared memory multi-core computing machine to almost anyone. We demonstrate how the coarseness of modeling grid in comparison to source (comp`utational receivers) spacing can be exploited for efficient computing, without compromising the quality of the inverted model, by reducing the number of adjoint calls. It is also demonstrated that the adjoint field can even be computed on a grid coarser than the modeling grid without affecting the inversion outcome. These observations were reconfirmed using an experiment design where the deviation of source from straight tow line is considered. Finally, a real field data inversion experiment is presented to demonstrate robustness of the code.

  16. A Non-Equilibrium Sediment Transport Model for Coastal Inlets and Navigation Channels

    DTIC Science & Technology

    2011-01-01

    exchange of water , sediment, and nutrients between estuaries and the ocean. Because of the multiple interacting forces (waves, wind, tide, river...in parallel using OpenMP. The CMS takes advantage of the Surface- water Modeling System (SMS) interface for grid generation and model setup, as well...as for plotting and post- processing (Zundel, 2000). The circulation model in the CMS (called CMS-Flow) computes the unsteady water level and

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hornung, Richard D.; Hones, Holger E.

    The RAJA Performance Suite is designed to evaluate performance of the RAJA performance portability library on a wide variety of important high performance computing (HPC) algorithmic lulmels. These kernels assess compiler optimizations and various parallel programming model backends accessible through RAJA, such as OpenMP, CUDA, etc. The Initial version of the suite contains 25 computational kernels, each of which appears in 6 variants: Baseline SequcntiaJ, RAJA SequentiaJ, Baseline OpenMP, RAJA OpenMP, Baseline CUDA, RAJA CUDA. All variants of each kernel perform essentially the same mathematical operations and the loop body code for each kernel is identical across all variants. Theremore » are a few kernels, such as those that contain reduction operations, that require CUDA-specific coding for their CUDA variants. ActuaJ computer instructions executed and how they run in parallel differs depending on the parallel programming model backend used and which optimizations are perfonned by the compiler used to build the Perfonnance Suite executable. The Suite will be used primarily by RAJA developers to perform regular assessments of RAJA performance across a range of hardware platforms and compilers as RAJA features are being developed. It will also be used by LLNL hardware and software vendor panners for new defining requirements for future computing platform procurements and acceptance testing. In particular, the RAJA Performance Suite will be used for compiler acceptance testing of the upcoming CORAUSierra machine {initial LLNL delivery expected in late-2017/early 2018) and the CORAL-2 procurement. The Suite will aJso be used to generate concise source code reproducers of compiler and runtime issues we uncover so that we may provide them to relevant vendors to be fixed.« less

  18. NDL-v2.0: A new version of the numerical differentiation library for parallel architectures

    NASA Astrophysics Data System (ADS)

    Hadjidoukas, P. E.; Angelikopoulos, P.; Voglis, C.; Papageorgiou, D. G.; Lagaris, I. E.

    2014-07-01

    We present a new version of the numerical differentiation library (NDL) used for the numerical estimation of first and second order partial derivatives of a function by finite differencing. In this version we have restructured the serial implementation of the code so as to achieve optimal task-based parallelization. The pure shared-memory parallelization of the library has been based on the lightweight OpenMP tasking model allowing for the full extraction of the available parallelism and efficient scheduling of multiple concurrent library calls. On multicore clusters, parallelism is exploited by means of TORC, an MPI-based multi-threaded tasking library. The new MPI implementation of NDL provides optimal performance in terms of function calls and, furthermore, supports asynchronous execution of multiple library calls within legacy MPI programs. In addition, a Python interface has been implemented for all cases, exporting the functionality of our library to sequential Python codes. Catalog identifier: AEDG_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 63036 No. of bytes in distributed program, including test data, etc.: 801872 Distribution format: tar.gz Programming language: ANSI Fortran-77, ANSI C, Python. Computer: Distributed systems (clusters), shared memory systems. Operating system: Linux, Unix. Has the code been vectorized or parallelized?: Yes. RAM: The library uses O(N) internal storage, N being the dimension of the problem. It can use up to O(N2) internal storage for Hessian calculations, if a task throttling factor has not been set by the user. Classification: 4.9, 4.14, 6.5. Catalog identifier of previous version: AEDG_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180(2009)1404 Does the new version supersede the previous version?: Yes Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, and sensitivity analysis. For a large number of scientific and engineering applications, the underlying functions correspond to simulation codes for which analytical estimation of derivatives is difficult or almost impossible. A parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with a carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Reasons for new version: The updated version was motivated by our endeavors to extend a parallel Bayesian uncertainty quantification framework [1], by incorporating higher order derivative information as in most state-of-the-art stochastic simulation methods such as Stochastic Newton MCMC [2] and Riemannian Manifold Hamiltonian MC [3]. The function evaluations are simulations with significant time-to-solution, which also varies with the input parameters such as in [1, 4]. The runtime of the N-body-type of problem changes considerably with the introduction of a longer cut-off between the bodies. In the first version of the library, the OpenMP-parallel subroutines spawn a new team of threads and distribute the function evaluations with a PARALLEL DO directive. This limits the functionality of the library as multiple concurrent calls require nested parallelism support from the OpenMP environment. Therefore, either their function evaluations will be serialized or processor oversubscription is likely to occur due to the increased number of OpenMP threads. In addition, the Hessian calculations include two explicit parallel regions that compute first the diagonal and then the off-diagonal elements of the array. Due to the barrier between the two regions, the parallelism of the calculations is not fully exploited. These issues have been addressed in the new version by first restructuring the serial code and then running the function evaluations in parallel using OpenMP tasks. Although the MPI-parallel implementation of the first version is capable of fully exploiting the task parallelism of the PNDL routines, it does not utilize the caching mechanism of the serial code and, therefore, performs some redundant function evaluations in the Hessian and Jacobian calculations. This can lead to: (a) higher execution times if the number of available processors is lower than the total number of tasks, and (b) significant energy consumption due to wasted processor cycles. Overcoming these drawbacks, which become critical as the time of a single function evaluation increases, was the primary goal of this new version. Due to the code restructure, the MPI-parallel implementation (and the OpenMP-parallel in accordance) avoids redundant calls, providing optimal performance in terms of the number of function evaluations. Another limitation of the library was that the library subroutines were collective and synchronous calls. In the new version, each MPI process can issue any number of subroutines for asynchronous execution. We introduce two library calls that provide global and local task synchronizations, similarly to the BARRIER and TASKWAIT directives of OpenMP. The new MPI-implementation is based on TORC, a new tasking library for multicore clusters [5-7]. TORC improves the portability of the software, as it relies exclusively on the POSIX-Threads and MPI programming interfaces. It allows MPI processes to utilize multiple worker threads, offering a hybrid programming and execution environment similar to MPI+OpenMP, in a completely transparent way. Finally, to further improve the usability of our software, a Python interface has been implemented on top of both the OpenMP and MPI versions of the library. This allows sequential Python codes to exploit shared and distributed memory systems. Summary of revisions: The revised code improves the performance of both parallel (OpenMP and MPI) implementations. The functionality and the user-interface of the MPI-parallel version have been extended to support the asynchronous execution of multiple PNDL calls, issued by one or multiple MPI processes. A new underlying tasking library increases portability and allows MPI processes to have multiple worker threads. For both implementations, an interface to the Python programming language has been added. Restrictions: The library uses only double precision arithmetic. The MPI implementation assumes the homogeneity of the execution environment provided by the operating system. Specifically, the processes of a single MPI application must have identical address space and a user function resides at the same virtual address. In addition, address space layout randomization should not be used for the application. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 23 ms for the serial distribution, 25 ms for the OpenMP with 2 threads, 53 ms and 1.01 s for the MPI parallel distribution using 2 threads and 2 processes respectively and yield-time for idle workers equal to 10 ms. References: [1] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Bayesian uncertainty quantification and propagation in molecular dynamics simulations: a high performance computing framework, J. Chem. Phys 137 (14). [2] H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, O. Ghattas, Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations, SIAM J. Sci. Comput. 33 (1) (2011) 407-432. [3] M. Girolami, B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods, J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73 (2) (2011) 123-214. [4] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Data driven, predictive molecular dynamics for nanoscale flow simulations under uncertainty, J. Phys. Chem. B 117 (47) (2013) 14808-14816. [5] P.E. Hadjidoukas, E. Lappas, V.V. Dimakopoulos, A runtime library for platform-independent task parallelism, in: PDP, IEEE, 2012, pp. 229-236. [6] C. Voglis, P.E. Hadjidoukas, D.G. Papageorgiou, I. Lagaris, A parallel hybrid optimization algorithm for fitting interatomic potentials, Appl. Soft Comput. 13 (12) (2013) 4481-4492. [7] P.E. Hadjidoukas, C. Voglis, V.V. Dimakopoulos, I. Lagaris, D.G. Papageorgiou, Supporting adaptive and irregular parallelism for non-linear numerical optimization, Appl. Math. Comput. 231 (2014) 544-559.

  19. OpenMP-accelerated SWAT simulation using Intel C and FORTRAN compilers: Development and benchmark

    NASA Astrophysics Data System (ADS)

    Ki, Seo Jin; Sugimura, Tak; Kim, Albert S.

    2015-02-01

    We developed a practical method to accelerate execution of Soil and Water Assessment Tool (SWAT) using open (free) computational resources. The SWAT source code (rev 622) was recompiled using a non-commercial Intel FORTRAN compiler in Ubuntu 12.04 LTS Linux platform, and newly named iOMP-SWAT in this study. GNU utilities of make, gprof, and diff were used to develop the iOMP-SWAT package, profile memory usage, and check identicalness of parallel and serial simulations. Among 302 SWAT subroutines, the slowest routines were identified using GNU gprof, and later modified using Open Multiple Processing (OpenMP) library in an 8-core shared memory system. In addition, a C wrapping function was used to rapidly set large arrays to zero by cross compiling with the original SWAT FORTRAN package. A universal speedup ratio of 2.3 was achieved using input data sets of a large number of hydrological response units. As we specifically focus on acceleration of a single SWAT run, the use of iOMP-SWAT for parameter calibrations will significantly improve the performance of SWAT optimization.

  20. Parallelization strategies for continuum-generalized method of moments on the multi-thread systems

    NASA Astrophysics Data System (ADS)

    Bustamam, A.; Handhika, T.; Ernastuti, Kerami, D.

    2017-07-01

    Continuum-Generalized Method of Moments (C-GMM) covers the Generalized Method of Moments (GMM) shortfall which is not as efficient as Maximum Likelihood estimator by using the continuum set of moment conditions in a GMM framework. However, this computation would take a very long time since optimizing regularization parameter. Unfortunately, these calculations are processed sequentially whereas in fact all modern computers are now supported by hierarchical memory systems and hyperthreading technology, which allowing for parallel computing. This paper aims to speed up the calculation process of C-GMM by designing a parallel algorithm for C-GMM on the multi-thread systems. First, parallel regions are detected for the original C-GMM algorithm. There are two parallel regions in the original C-GMM algorithm, that are contributed significantly to the reduction of computational time: the outer-loop and the inner-loop. Furthermore, this parallel algorithm will be implemented with standard shared-memory application programming interface, i.e. Open Multi-Processing (OpenMP). The experiment shows that the outer-loop parallelization is the best strategy for any number of observations.

  1. MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Taft, James R.

    1999-01-01

    Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.

  2. Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators

    PubMed Central

    Wang, Wei; Xu, Lifan; Cavazos, John; Huang, Howie H.; Kay, Matthew

    2014-01-01

    Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC) coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least faster than the sequential implementation and faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other computational models of wave propagation in multi-dimensional media. PMID:24497950

  3. HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks

    PubMed Central

    Azad, Ariful; Ouzounis, Christos A; Kyrpides, Nikos C; Buluç, Aydin

    2018-01-01

    Abstract Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license. PMID:29315405

  4. HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks

    DOE PAGES

    Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.; ...

    2018-01-05

    Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times andmore » memory demands. In this paper, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ~70 million nodes with ~68 billion edges in ~2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. Finally, HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.« less

  5. HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.

    Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times andmore » memory demands. In this paper, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ~70 million nodes with ~68 billion edges in ~2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. Finally, HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.« less

  6. Multilevel Parallelization of AutoDock 4.2.

    PubMed

    Norgan, Andrew P; Coffman, Paul K; Kocher, Jean-Pierre A; Katzmann, David J; Sosa, Carlos P

    2011-04-28

    Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4). Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.

  7. Development of an Implicit, Charge and Energy Conserving 2D Electromagnetic PIC Code on Advanced Architectures

    NASA Astrophysics Data System (ADS)

    Payne, Joshua; Taitano, William; Knoll, Dana; Liebs, Chris; Murthy, Karthik; Feltman, Nicolas; Wang, Yijie; McCarthy, Colleen; Cieren, Emanuel

    2012-10-01

    In order to solve problems such as the ion coalescence and slow MHD shocks fully kinetically we developed a fully implicit 2D energy and charge conserving electromagnetic PIC code, PlasmaApp2D. PlasmaApp2D differs from previous implicit PIC implementations in that it will utilize advanced architectures such as GPUs and shared memory CPU systems, with problems too large to fit into cache. PlasmaApp2D will be a hybrid CPU-GPU code developed primarily to run on the DARWIN cluster at LANL utilizing four 12-core AMD Opteron CPUs and two NVIDIA Tesla GPUs per node. MPI will be used for cross-node communication, OpenMP will be used for on-node parallelism, and CUDA will be used for the GPUs. Development progress and initial results will be presented.

  8. Extension of the AMBER molecular dynamics software to Intel's Many Integrated Core (MIC) architecture

    NASA Astrophysics Data System (ADS)

    Needham, Perri J.; Bhuiyan, Ashraf; Walker, Ross C.

    2016-04-01

    We present an implementation of explicit solvent particle mesh Ewald (PME) classical molecular dynamics (MD) within the PMEMD molecular dynamics engine, that forms part of the AMBER v14 MD software package, that makes use of Intel Xeon Phi coprocessors by offloading portions of the PME direct summation and neighbor list build to the coprocessor. We refer to this implementation as pmemd MIC offload and in this paper present the technical details of the algorithm, including basic models for MPI and OpenMP configuration, and analyze the resultant performance. The algorithm provides the best performance improvement for large systems (>400,000 atoms), achieving a ∼35% performance improvement for satellite tobacco mosaic virus (1,067,095 atoms) when 2 Intel E5-2697 v2 processors (2 ×12 cores, 30M cache, 2.7 GHz) are coupled to an Intel Xeon Phi coprocessor (Model 7120P-1.238/1.333 GHz, 61 cores). The implementation utilizes a two-fold decomposition strategy: spatial decomposition using an MPI library and thread-based decomposition using OpenMP. We also present compiler optimization settings that improve the performance on Intel Xeon processors, while retaining simulation accuracy.

  9. Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

    NASA Astrophysics Data System (ADS)

    Hristov, Ivan; Goranov, Goran; Hristova, Radoslava

    2018-02-01

    We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named "Ivy Bridge-EP") in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named "Knights Landing" (KNL). The results show 2 times better performance on KNL processor.

  10. Bayer image parallel decoding based on GPU

    NASA Astrophysics Data System (ADS)

    Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

    2012-11-01

    In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.

  11. A Wideband Fast Multipole Method for the two-dimensional complex Helmholtz equation

    NASA Astrophysics Data System (ADS)

    Cho, Min Hyung; Cai, Wei

    2010-12-01

    A Wideband Fast Multipole Method (FMM) for the 2D Helmholtz equation is presented. It can evaluate the interactions between N particles governed by the fundamental solution of 2D complex Helmholtz equation in a fast manner for a wide range of complex wave number k, which was not easy with the original FMM due to the instability of the diagonalized conversion operator. This paper includes the description of theoretical backgrounds, the FMM algorithm, software structures, and some test runs. Program summaryProgram title: 2D-WFMM Catalogue identifier: AEHI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 4636 No. of bytes in distributed program, including test data, etc.: 82 582 Distribution format: tar.gz Programming language: C Computer: Any Operating system: Any operating system with gcc version 4.2 or newer Has the code been vectorized or parallelized?: Multi-core processors with shared memory RAM: Depending on the number of particles N and the wave number k Classification: 4.8, 4.12 External routines: OpenMP ( http://openmp.org/wp/) Nature of problem: Evaluate interaction between N particles governed by the fundamental solution of 2D Helmholtz equation with complex k. Solution method: Multilevel Fast Multipole Algorithm in a hierarchical quad-tree structure with cutoff level which combines low frequency method and high frequency method. Running time: Depending on the number of particles N, wave number k, and number of cores in CPU. CPU time increases as N log N.

  12. Massively parallel sparse matrix function calculations with NTPoly

    NASA Astrophysics Data System (ADS)

    Dawson, William; Nakajima, Takahito

    2018-04-01

    We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.

  13. Characterization of UMT2013 Performance on Advanced Architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Howell, Louis

    2014-12-31

    This paper presents part of a larger effort to make detailed assessments of several proxy applications on various advanced architectures, with the eventual goal of extending these assessments to codes of programmatic interest running more realistic simulations. The focus here is on UMT2013, a proxy implementation of deterministic transport for unstructured meshes. I present weak and strong MPI scaling results and studies of OpenMP efficiency on the Sequoia BG/Q system at LLNL, with comparison against similar tests on an Intel Sandy Bridge TLCC2 system. The hardware counters on BG/Q provide detailed information on many aspects of on-node performance, while informationmore » from the mpiP tool gives insight into the reasons for the differing scaling behavior on these two different architectures. Preliminary tests that exploit NVRAM as extended memory on an Ivy Bridge machine designed for “Big Data” applications are also included.« less

  14. Automation of Data Traffic Control on DSM Architecture

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

    2001-01-01

    The design of distributed shared memory (DSM) computers liberates users from the duty to distribute data across processors and allows for the incremental development of parallel programs using, for example, OpenMP or Java threads. DSM architecture greatly simplifies the development of parallel programs having good performance on a few processors. However, to achieve a good program scalability on DSM computers requires that the user understand data flow in the application and use various techniques to avoid data traffic congestions. In this paper we discuss a number of such techniques, including data blocking, data placement, data transposition and page size control and evaluate their efficiency on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks. We also present a tool which automates the detection of constructs causing data congestions in Fortran array oriented codes and advises the user on code transformations for improving data traffic in the application.

  15. Gregarious Data Re-structuring in a Many Core Architecture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shrestha, Sunil; Manzano Franco, Joseph B.; Marquez, Andres

    this paper, we have developed a new methodology that takes in consideration the access patterns from a single parallel actor (e.g. a thread), as well as, the access patterns of “grouped” parallel actors that share a resource (e.g. a distributed Level 3 cache). We start with a hierarchical tile code for our target machine and apply a series of transformations at the tile level to improve data residence in a given memory hierarchy level. The contribution of this paper includes (a) collaborative data restructuring for group reuse and (b) low overhead transformation technique to improve access pattern and bring closelymore » connected data elements together. Preliminary results in a many core architecture, Tilera TileGX, shows promising improvements over optimized OpenMP code (up to 31% increase in GFLOPS) and over our own previous work on fine grained runtimes (up to 16%) for selected kernels« less

  16. MOIL-opt: Energy-Conserving Molecular Dynamics on a GPU/CPU system

    PubMed Central

    Ruymgaart, A. Peter; Cardenas, Alfredo E.; Elber, Ron

    2011-01-01

    We report an optimized version of the molecular dynamics program MOIL that runs on a shared memory system with OpenMP and exploits the power of a Graphics Processing Unit (GPU). The model is of heterogeneous computing system on a single node with several cores sharing the same memory and a GPU. This is a typical laboratory tool, which provides excellent performance at minimal cost. Besides performance, emphasis is made on accuracy and stability of the algorithm probed by energy conservation for explicit-solvent atomically-detailed-models. Especially for long simulations energy conservation is critical due to the phenomenon known as “energy drift” in which energy errors accumulate linearly as a function of simulation time. To achieve long time dynamics with acceptable accuracy the drift must be particularly small. We identify several means of controlling long-time numerical accuracy while maintaining excellent speedup. To maintain a high level of energy conservation SHAKE and the Ewald reciprocal summation are run in double precision. Double precision summation of real-space non-bonded interactions improves energy conservation. In our best option, the energy drift using 1fs for a time step while constraining the distances of all bonds, is undetectable in 10ns simulation of solvated DHFR (Dihydrofolate reductase). Faster options, shaking only bonds with hydrogen atoms, are also very well behaved and have drifts of less than 1kcal/mol per nanosecond of the same system. CPU/GPU implementations require changes in programming models. We consider the use of a list of neighbors and quadratic versus linear interpolation in lookup tables of different sizes. Quadratic interpolation with a smaller number of grid points is faster than linear lookup tables (with finer representation) without loss of accuracy. Atomic neighbor lists were found most efficient. Typical speedups are about a factor of 10 compared to a single-core single-precision code. PMID:22328867

  17. A heterogeneous computing accelerated SCE-UA global optimization method using OpenMP, OpenCL, CUDA, and OpenACC.

    PubMed

    Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Liang, Ke; Hong, Yang

    2017-10-01

    The shuffled complex evolution optimization developed at the University of Arizona (SCE-UA) has been successfully applied in various kinds of scientific and engineering optimization applications, such as hydrological model parameter calibration, for many years. The algorithm possesses good global optimality, convergence stability and robustness. However, benchmark and real-world applications reveal the poor computational efficiency of the SCE-UA. This research aims at the parallelization and acceleration of the SCE-UA method based on powerful heterogeneous computing technology. The parallel SCE-UA is implemented on Intel Xeon multi-core CPU (by using OpenMP and OpenCL) and NVIDIA Tesla many-core GPU (by using OpenCL, CUDA, and OpenACC). The serial and parallel SCE-UA were tested based on the Griewank benchmark function. Comparison results indicate the parallel SCE-UA significantly improves computational efficiency compared to the original serial version. The OpenCL implementation obtains the best overall acceleration results however, with the most complex source code. The parallel SCE-UA has bright prospects to be applied in real-world applications.

  18. OpenRBC: Redefining the Frontier of Red Blood Cell Simulations at Protein Resolution

    NASA Astrophysics Data System (ADS)

    Tang, Yu-Hang; Lu, Lu; Li, He; Grinberg, Leopold; Sachdeva, Vipin; Evangelinos, Constantinos; Karniadakis, George

    We present a from-scratch development of OpenRBC, a coarse-grained molecular dynamics code, which is capable of performing an unprecedented in silico experiment - simulating an entire mammal red blood cell lipid bilayer and cytoskeleton modeled by 4 million mesoscopic particles - on a single shared memory node. To achieve this, we invented an adaptive spatial searching algorithm to accelerate the computation of short-range pairwise interactions in an extremely sparse 3D space. The algorithm is based on a Voronoi partitioning of the point cloud of coarse-grained particles, and is continuously updated over the course of the simulation. The algorithm enables the construction of a lattice-free cell list, i.e. the key spatial searching data structure in our code, in O (N) time and space space with cells whose position and shape adapts automatically to the local density and curvature. The code implements NUMA/NUCA-aware OpenMP parallelization and achieves perfect scaling with up to hundreds of hardware threads. The code outperforms a legacy solver by more than 8 times in time-to-solution and more than 20 times in problem size, thus providing a new venue for probing the cytomechanics of red blood cells. This work was supported by the Department of Energy (DOE) Collaboratory on Mathematics for Mesoscopic Model- ing of Materials (CM4). YHT acknowledges partial financial support from an IBM Ph.D. Scholarship Award.

  19. Variational data assimilation system "INM RAS - Black Sea"

    NASA Astrophysics Data System (ADS)

    Parmuzin, Eugene; Agoshkov, Valery; Assovskiy, Maksim; Giniatulin, Sergey; Zakharova, Natalia; Kuimov, Grigory; Fomin, Vladimir

    2013-04-01

    Development of Informational-Computational Systems (ICS) for Data Assimilation Procedures is one of multidisciplinary problems. To study and solve these problems one needs to apply modern results from different disciplines and recent developments in: mathematical modeling; theory of adjoint equations and optimal control; inverse problems; numerical methods theory; numerical algebra and scientific computing. The problems discussed above are studied in the Institute of Numerical Mathematics of the Russian Academy of Science (INM RAS) in ICS for Personal Computers (PC). Special problems and questions arise while effective ICS versions for PC are being developed. These problems and questions can be solved with applying modern methods of numerical mathematics and by solving "parallelism problem" using OpenMP technology and special linear algebra packages. In this work the results on the ICS development for PC-ICS "INM RAS - Black Sea" are presented. In the work the following problems and questions are discussed: practical problems that can be studied by ICS; parallelism problems and their solutions with applying of OpenMP technology and the linear algebra packages used in ICS "INM - Black Sea"; Interface of ICS. The results of ICS "INM RAS - Black Sea" testing are presented. Efficiency of technologies and methods applied are discussed. The work was supported by RFBR, grants No. 13-01-00753, 13-05-00715 and by The Ministry of education and science of Russian Federation, project 8291, project 11.519.11.1005 References: [1] V.I. Agoshkov, M.V. Assovskii, S.A. Lebedev, Numerical simulation of Black Sea hydrothermodynamics taking into account tide-forming forces. Russ. J. Numer. Anal. Math. Modelling (2012) 27, No.1, 5-31 [2] E.I. Parmuzin, V.I. Agoshkov, Numerical solution of the variational assimilation problem for sea surface temperature in the model of the Black Sea dynamics. Russ. J. Numer. Anal. Math. Modelling (2012) 27, No.1, 69-94 [3] V.B. Zalesny, N.A. Diansky, V.V. Fomin, S.N. Moshonkin, S.G. Demyshev, Numerical model of the circulation of Black Sea and Sea of Azov. Russ. J. Numer. Anal. Math. Modelling (2012) 27, No.1, 95-111 [4] V.I. Agoshkov, S.V. Giniatulin, G.V. Kuimov. OpenMP technology and linear algebra packages in the variation data assimilation systems. - Abstracts of the 1-st China-Russia Conference on Numerical Algebra with Applications in Radiactive Hydrodynamics, Beijing, China, October 16-18, 2012. [5] Zakharova N.B., Agoshkov V.I., Parmuzin E.I., The new method of ARGO buoys system observation data interpolation. Russian Journal of Numerical Analysis and Mathematical Modelling. Vol. 28, Issue 1, 2013.

  20. Accelerating 3D Elastic Wave Equations on Knights Landing based Intel Xeon Phi processors

    NASA Astrophysics Data System (ADS)

    Sourouri, Mohammed; Birger Raknes, Espen

    2017-04-01

    In advanced imaging methods like reverse-time migration (RTM) and full waveform inversion (FWI) the elastic wave equation (EWE) is numerically solved many times to create the seismic image or the elastic parameter model update. Thus, it is essential to optimize the solution time for solving the EWE as this will have a major impact on the total computational cost in running RTM or FWI. From a computational point of view applications implementing EWEs are associated with two major challenges. The first challenge is the amount of memory-bound computations involved, while the second challenge is the execution of such computations over very large datasets. So far, multi-core processors have not been able to tackle these two challenges, which eventually led to the adoption of accelerators such as Graphics Processing Units (GPUs). Compared to conventional CPUs, GPUs are densely populated with many floating-point units and fast memory, a type of architecture that has proven to map well to many scientific computations. Despite its architectural advantages, full-scale adoption of accelerators has yet to materialize. First, accelerators require a significant programming effort imposed by programming models such as CUDA or OpenCL. Second, accelerators come with a limited amount of memory, which also require explicit data transfers between the CPU and the accelerator over the slow PCI bus. The second generation of the Xeon Phi processor based on the Knights Landing (KNL) architecture, promises the computational capabilities of an accelerator but require the same programming effort as traditional multi-core processors. The high computational performance is realized through many integrated cores (number of cores and tiles and memory varies with the model) organized in tiles that are connected via a 2D mesh based interconnect. In contrary to accelerators, KNL is a self-hosted system, meaning explicit data transfers over the PCI bus are no longer required. However, like most accelerators, KNL sports a memory subsystem consisting of low-level caches and 16GB of high-bandwidth MCDRAM memory. For capacity computing, up to 400GB of conventional DDR4 memory is provided. Such a strict hierarchical memory layout means that data locality is imperative if the true potential of this product is to be harnessed. In this work, we study a series of optimizations specifically targeting KNL for our EWE based application to reduce the time-to-solution time for the following 3D model sizes in grid points: 1283, 2563 and 5123. We compare the results with an optimized version for multi-core CPUs running on a dual-socket Xeon E5 2680v3 system using OpenMP. Our initial naive implementation on the KNL is roughly 20% faster than the multi-core version, but by using only one thread per core and careful memory placement using the memkind library, we could achieve higher speedups. Additionally, by using the MCDRAM as cache for problem sizes that are smaller than 16 GB further performance improvements were unlocked. Depending on the problem size, our overall results indicate that the KNL based system is approximately 2.2x faster than the 24-core Xeon E5 2680v3 system, with only modest changes to the code.

  1. ATDM LANL FleCSI: Topology and Execution Framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bergen, Benjamin Karl

    FleCSI is a compile-time configurable C++ framework designed to support multi-physics application development. As such, FleCSI attempts to provide a very general set of infrastructure design patterns that can be specialized and extended to suit the needs of a broad variety of solver and data requirements. This means that FleCSI is potentially useful to many different ECP projects. Current support includes multidimensional mesh topology, mesh geometry, and mesh adjacency information, n-dimensional hashed-tree data structures, graph partitioning interfaces, and dependency closures (to identify data dependencies between distributed-memory address spaces). FleCSI introduces a functional programming model with control, execution, and data abstractionsmore » that are consistent with state-of-the-art task-based runtimes such as Legion and Charm++. The model also provides support for fine-grained, data-parallel execution with backend support for runtimes such as OpenMP and C++17. The FleCSI abstraction layer provides the developer with insulation from the underlying runtimes, while allowing support for multiple runtime systems, including conventional models like asynchronous MPI. The intent is to give developers a concrete set of user-friendly programming tools that can be used now, while allowing flexibility in choosing runtime implementations and optimizations that can be applied to architectures and runtimes that arise in the future. This project is essential to the ECP Ristra Next-Generation Code project, part of ASC ATDM, because it provides a hierarchically parallel programming model that is consistent with the design of modern system architectures, but which allows for the straightforward expression of algorithmic parallelism in a portably performant manner.« less

  2. GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations.

    PubMed

    Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji

    2015-07-01

    GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310-323. doi: 10.1002/wcms.1220.

  3. Parallel implementation of approximate atomistic models of the AMOEBA polarizable model

    NASA Astrophysics Data System (ADS)

    Demerdash, Omar; Head-Gordon, Teresa

    2016-11-01

    In this work we present a replicated data hybrid OpenMP/MPI implementation of a hierarchical progression of approximate classical polarizable models that yields speedups of up to ∼10 compared to the standard OpenMP implementation of the exact parent AMOEBA polarizable model. In addition, our parallel implementation exhibits reasonable weak and strong scaling. The resulting parallel software will prove useful for those who are interested in how molecular properties converge in the condensed phase with respect to the MBE, it provides a fruitful test bed for exploring different electrostatic embedding schemes, and offers an interesting possibility for future exascale computing paradigms.

  4. MIST: An Open Source Environmental Modelling Programming Language Incorporating Easy to Use Data Parallelism.

    NASA Astrophysics Data System (ADS)

    Bellerby, Tim

    2014-05-01

    Model Integration System (MIST) is open-source environmental modelling programming language that directly incorporates data parallelism. The language is designed to enable straightforward programming structures, such as nested loops and conditional statements to be directly translated into sequences of whole-array (or more generally whole data-structure) operations. MIST thus enables the programmer to use well-understood constructs, directly relating to the mathematical structure of the model, without having to explicitly vectorize code or worry about details of parallelization. A range of common modelling operations are supported by dedicated language structures operating on cell neighbourhoods rather than individual cells (e.g.: the 3x3 local neighbourhood needed to implement an averaging image filter can be simply accessed from within a simple loop traversing all image pixels). This facility hides details of inter-process communication behind more mathematically relevant descriptions of model dynamics. The MIST automatic vectorization/parallelization process serves both to distribute work among available nodes and separately to control storage requirements for intermediate expressions - enabling operations on very large domains for which memory availability may be an issue. MIST is designed to facilitate efficient interpreter based implementations. A prototype open source interpreter is available, coded in standard FORTRAN 95, with tools to rapidly integrate existing FORTRAN 77 or 95 code libraries. The language is formally specified and thus not limited to FORTRAN implementation or to an interpreter-based approach. A MIST to FORTRAN compiler is under development and volunteers are sought to create an ANSI-C implementation. Parallel processing is currently implemented using OpenMP. However, parallelization code is fully modularised and could be replaced with implementations using other libraries. GPU implementation is potentially possible.

  5. Parallelization of GeoClaw code for modeling geophysical flows with adaptive mesh refinement on many-core systems

    USGS Publications Warehouse

    Zhang, S.; Yuen, D.A.; Zhu, A.; Song, S.; George, D.L.

    2011-01-01

    We parallelized the GeoClaw code on one-level grid using OpenMP in March, 2011 to meet the urgent need of simulating tsunami waves at near-shore from Tohoku 2011 and achieved over 75% of the potential speed-up on an eight core Dell Precision T7500 workstation [1]. After submitting that work to SC11 - the International Conference for High Performance Computing, we obtained an unreleased OpenMP version of GeoClaw from David George, who developed the GeoClaw code as part of his PH.D thesis. In this paper, we will show the complementary characteristics of the two approaches used in parallelizing GeoClaw and the speed-up obtained by combining the advantage of each of the two individual approaches with adaptive mesh refinement (AMR), demonstrating the capabilities of running GeoClaw efficiently on many-core systems. We will also show a novel simulation of the Tohoku 2011 Tsunami waves inundating the Sendai airport and Fukushima Nuclear Power Plants, over which the finest grid distance of 20 meters is achieved through a 4-level AMR. This simulation yields quite good predictions about the wave-heights and travel time of the tsunami waves. ?? 2011 IEEE.

  6. BioFVM: an efficient, parallelized diffusive transport solver for 3-D biological simulations

    PubMed Central

    Ghaffarizadeh, Ahmadreza; Friedman, Samuel H.; Macklin, Paul

    2016-01-01

    Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can simulate release and uptake of many substrates by cell and bulk sources, diffusion and decay in large 3D domains. It has been parallelized with OpenMP, allowing efficient simulations on desktop workstations or single supercomputer nodes. The code is stable even for large time steps, with linear computational cost scalings. Solutions are first-order accurate in time and second-order accurate in space. The code can be run by itself or as part of a larger simulator. Availability and implementation: BioFVM is written in C ++ with parallelization in OpenMP. It is maintained and available for download at http://BioFVM.MathCancer.org and http://BioFVM.sf.net under the Apache License (v2.0). Contact: paul.macklin@usc.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26656933

  7. Parallel processing implementation for the coupled transport of photons and electrons using OpenMP

    NASA Astrophysics Data System (ADS)

    Doerner, Edgardo

    2016-05-01

    In this work the use of OpenMP to implement the parallel processing of the Monte Carlo (MC) simulation of the coupled transport for photons and electrons is presented. This implementation was carried out using a modified EGSnrc platform which enables the use of the Microsoft Visual Studio 2013 (VS2013) environment, together with the developing tools available in the Intel Parallel Studio XE 2015 (XE2015). The performance study of this new implementation was carried out in a desktop PC with a multi-core CPU, taking as a reference the performance of the original platform. The results were satisfactory, both in terms of scalability as parallelization efficiency.

  8. Issues Identified During September 2016 IBM OpenMP 4.5 Hackathon

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Richards, David F.

    In September, 2016 IBM hosted an OpenMP 4.5 Hackathon at the TJ Watson Research Center. Teams from LLNL, ORNL, SNL, LANL, and LBNL attended the event. As with the 2015 hackathon, IBM produced an extremely useful and successful event with unmatched support from compiler team, applications staff, and facilities. Approximately 24 IBM staff supported 4-day hackathon and spent significant time 4-6 weeks out to prepare environment and become familiar with apps. This hackathon was also the first event to feature LLVM & XL C/C++ and Fortran compilers. This report records many of the issues encountered by the LLNL teams duringmore » the hackathon.« less

  9. Roofline Analysis in the Intel® Advisor to Deliver Optimized Performance for applications on Intel® Xeon Phi™ Processor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Koskela, Tuomas S.; Lobet, Mathieu; Deslippe, Jack

    In this session we show, in two case studies, how the roofline feature of Intel Advisor has been utilized to optimize the performance of kernels of the XGC1 and PICSAR codes in preparation for Intel Knights Landing architecture. The impact of the implemented optimizations and the benefits of using the automatic roofline feature of Intel Advisor to study performance of large applications will be presented. This demonstrates an effective optimization strategy that has enabled these science applications to achieve up to 4.6 times speed-up and prepare for future exascale architectures. # Goal/Relevance of Session The roofline model [1,2] is amore » powerful tool for analyzing the performance of applications with respect to the theoretical peak achievable on a given computer architecture. It allows one to graphically represent the performance of an application in terms of operational intensity, i.e. the ratio of flops performed and bytes moved from memory in order to guide optimization efforts. Given the scale and complexity of modern science applications, it can often be a tedious task for the user to perform the analysis on the level of functions or loops to identify where performance gains can be made. With new Intel tools, it is now possible to automate this task, as well as base the estimates of peak performance on measurements rather than vendor specifications. The goal of this session is to demonstrate how the roofline feature of Intel Advisor can be used to balance memory vs. computation related optimization efforts and effectively identify performance bottlenecks. A series of typical optimization techniques: cache blocking, structure refactoring, data alignment, and vectorization illustrated by the kernel cases will be addressed. # Description of the codes ## XGC1 The XGC1 code [3] is a magnetic fusion Particle-In-Cell code that uses an unstructured mesh for its Poisson solver that allows it to accurately resolve the edge plasma of a magnetic fusion device. After recent optimizations to its collision kernel [4], most of the computing time is spent in the electron push (pushe) kernel, where these optimization efforts have been focused. The kernel code scaled well with MPI+OpenMP but had almost no automatic compiler vectorization, in part due to indirect memory addresses and in part due to low trip counts of low-level loops that would be candidates for vectorization. Particle blocking and sorting have been implemented to increase trip counts of low-level loops and improve memory locality, and OpenMP directives have been added to vectorize compute-intensive loops that were identified by Advisor. The optimizations have improved the performance of the pushe kernel 2x on Haswell processors and 1.7x on KNL. The KNL node-for-node performance has been brought to within 30% of a NERSC Cori phase I Haswell node and we expect to bridge this gap by reducing the memory footprint of compute intensive routines to improve cache reuse. ## PICSAR is a Fortran/Python high-performance Particle-In-Cell library targeting at MIC architectures first designed to be coupled with the PIC code WARP for the simulation of laser-matter interaction and particle accelerators. PICSAR also contains a FORTRAN stand-alone kernel for performance studies and benchmarks. A MPI domain decomposition is used between NUMA domains and a tile decomposition (cache-blocking) handled by OpenMP has been added for shared-memory parallelism and better cache management. The so-called current deposition and field gathering steps that compose the PIC time loop constitute major hotspots that have been rewritten to enable more efficient vectorization. Particle communications between tiles and MPI domain has been merged and parallelized. All considered, these improvements provide speedups of 3.1 for order 1 and 4.6 for order 3 interpolation shape factors on KNL configured in SNC4 quadrant flat mode. Performance is similar between a node of cori phase 1 and KNL at order 1 and better on KNL by a factor 1.6 at order 3 with the considered test case (homogeneous thermal plasma).« less

  10. Reconstruction for time-domain in vivo EPR 3D multigradient oximetric imaging--a parallel processing perspective.

    PubMed

    Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C

    2009-01-01

    Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.

  11. GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations

    PubMed Central

    Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji

    2015-01-01

    GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310–323. doi: 10.1002/wcms.1220 PMID:26753008

  12. Cpu/gpu Computing for AN Implicit Multi-Block Compressible Navier-Stokes Solver on Heterogeneous Platform

    NASA Astrophysics Data System (ADS)

    Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin

    2016-06-01

    CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.

  13. Parallel, Distributed Scripting with Python

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miller, P J

    2002-05-24

    Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI librarymore » gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.« less

  14. Application of a hybrid MPI/OpenMP approach for parallel groundwater model calibration using multi-core computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan

    2010-01-01

    Calibration of groundwater models involves hundreds to thousands of forward solutions, each of which may solve many transient coupled nonlinear partial differential equations, resulting in a computationally intensive problem. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelisms in software and hardware to reduce calibration time on multi-core computers. HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for direct solutions for a reactive transport model application, and a field-scale coupled flow and transport model application. In the reactive transport model, a single parallelizable loop is identified to account for over 97% of the total computational time using GPROF.more » Addition of a few lines of OpenMP compiler directives to the loop yields a speedup of about 10 on a 16-core compute node. For the field-scale model, parallelizable loops in 14 of 174 HGC5 subroutines that require 99% of the execution time are identified. As these loops are parallelized incrementally, the scalability is found to be limited by a loop where Cray PAT detects over 90% cache missing rates. With this loop rewritten, similar speedup as the first application is achieved. The OpenMP-parallelized code can be run efficiently on multiple workstations in a network or multiple compute nodes on a cluster as slaves using parallel PEST to speedup model calibration. To run calibration on clusters as a single task, the Levenberg Marquardt algorithm is added to HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, 100 200 compute cores are used to reduce the calibration time from weeks to a few hours for these two applications. This approach is applicable to most of the existing groundwater model codes for many applications.« less

  15. Numerical modeling of exciton-polariton Bose-Einstein condensate in a microcavity

    NASA Astrophysics Data System (ADS)

    Voronych, Oksana; Buraczewski, Adam; Matuszewski, Michał; Stobińska, Magdalena

    2017-06-01

    A novel, optimized numerical method of modeling of an exciton-polariton superfluid in a semiconductor microcavity was proposed. Exciton-polaritons are spin-carrying quasiparticles formed from photons strongly coupled to excitons. They possess unique properties, interesting from the point of view of fundamental research as well as numerous potential applications. However, their numerical modeling is challenging due to the structure of nonlinear differential equations describing their evolution. In this paper, we propose to solve the equations with a modified Runge-Kutta method of 4th order, further optimized for efficient computations. The algorithms were implemented in form of C++ programs fitted for parallel environments and utilizing vector instructions. The programs form the EPCGP suite which has been used for theoretical investigation of exciton-polaritons. Catalogue identifier: AFBQ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBQ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: BSD-3 No. of lines in distributed program, including test data, etc.: 2157 No. of bytes in distributed program, including test data, etc.: 498994 Distribution format: tar.gz Programming language: C++ with OpenMP extensions (main numerical program), Python (helper scripts). Computer: Modern PC (tested on AMD and Intel processors), HP BL2x220. Operating system: Unix/Linux and Windows. Has the code been vectorized or parallelized?: Yes (OpenMP) RAM: 200 MB for single run Classification: 7, 7.7. Nature of problem: An exciton-polariton superfluid is a novel, interesting physical system allowing investigation of high temperature Bose-Einstein condensation of exciton-polaritons-quasiparticles carrying spin. They have brought a lot of attention due to their unique properties and potential applications in polariton-based optoelectronic integrated circuits. This is an out-of-equilibrium quantum system confined within a semiconductor microcavity. It is described by a set of nonlinear differential equations similar in spirit to the Gross-Pitaevskii (GP) equation, but their unique properties do not allow standard GP solving frameworks to be utilized. Finding an accurate and efficient numerical algorithm as well as development of optimized numerical software is necessary for effective theoretical investigation of exciton-polaritons. Solution method: A Runge-Kutta method of 4th order was employed to solve the set of differential equations describing exciton-polariton superfluids. The method was fitted for the exciton-polariton equations and further optimized. The C++ programs utilize OpenMP extensions and vector operations in order to fully utilize the computer hardware. Running time: 6h for 100 ps evolution, depending on the values of parameters

  16. Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS

    NASA Astrophysics Data System (ADS)

    Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.

    2018-03-01

    Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.

  17. Parallelizing a peanut butter sandwich

    NASA Astrophysics Data System (ADS)

    Quenette, S. M.

    2005-12-01

    This poster aims to demonstrate, in a novel way, why contemporary computational code development is seemingly hard to a geodynamics modeler (i.e. a non-computer-scientist). For example, to utilise comtemporary computer hardware, parallelisation is required. But why do we chose the explicit approach (MPI) over an implicit (OpenMP) one? How does this relate to the typical geodynamics codes. And do we face this same style of problems in every day life? We aim to demonstrate that the little bit of complexity, fore-thought and effort is worth its while.

  18. Exploring performance and energy tradeoffs for irregular applications: A case study on the Tilera many-core architecture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Panyala, Ajay; Chavarría-Miranda, Daniel; Manzano, Joseph B.

    High performance, parallel applications with irregular data accesses are becoming a critical workload class for modern systems. In particular, the execution of such workloads on emerging many-core systems is expected to be a significant component of applications in data mining, machine learning, scientific computing and graph analytics. However, power and energy constraints limit the capabilities of individual cores, memory hierarchy and on-chip interconnect of such systems, thus leading to architectural and software trade-os that must be understood in the context of the intended application’s behavior. Irregular applications are notoriously hard to optimize given their data-dependent access patterns, lack of structuredmore » locality and complex data structures and code patterns. We have ported two irregular applications, graph community detection using the Louvain method (Grappolo) and high-performance conjugate gradient (HPCCG), to the Tilera many-core system and have conducted a detailed study of platform-independent and platform-specific optimizations that improve their performance as well as reduce their overall energy consumption. To conduct this study, we employ an auto-tuning based approach that explores the optimization design space along three dimensions - memory layout schemes, GCC compiler flag choices and OpenMP loop scheduling options. We leverage MIT’s OpenTuner auto-tuning framework to explore and recommend energy optimal choices for different combinations of parameters. We then conduct an in-depth architectural characterization to understand the memory behavior of the selected workloads. Finally, we perform a correlation study to demonstrate the interplay between the hardware behavior and application characteristics. Using auto-tuning, we demonstrate whole-node energy savings and performance improvements of up to 49:6% and 60% relative to a baseline instantiation, and up to 31% and 45:4% relative to manually optimized variants.« less

  19. 3D Elastic Wavefield Tomography

    NASA Astrophysics Data System (ADS)

    Guasch, L.; Warner, M.; Stekl, I.; Umpleby, A.; Shah, N.

    2010-12-01

    Wavefield tomography, or waveform inversion, aims to extract the maximum information from seismic data by matching trace by trace the response of the solid earth to seismic waves using numerical modelling tools. Its first formulation dates from the early 80's, when Albert Tarantola developed a solid theoretical basis that is still used today with little change. Due to computational limitations, the application of the method to 3D problems has been unaffordable until a few years ago, and then only under the acoustic approximation. Although acoustic wavefield tomography is widely used, a complete solution of the seismic inversion problem requires that we account properly for the physics of wave propagation, and so must include elastic effects. We have developed a 3D tomographic wavefield inversion code that incorporates the full elastic wave equation. The bottle neck of the different implementations is the forward modelling algorithm that generates the synthetic data to be compared with the field seismograms as well as the backpropagation of the residuals needed to form the direction update of the model parameters. Furthermore, one or two extra modelling runs are needed in order to calculate the step-length. Our approach uses a FD scheme explicit time-stepping by finite differences that are 4th order in space and 2nd order in time, which is a 3D version of the one developed by Jean Virieux in 1986. We chose the time domain because an explicit time scheme is much less demanding in terms of memory than its frequency domain analogue, although the discussion of wich domain is more efficient still remains open. We calculate the parameter gradients for Vp and Vs by correlating the normal and shear stress wavefields respectively. A straightforward application would lead to the storage of the wavefield at all grid points at each time-step. We tackled this problem using two different approaches. The first one makes better use of resources for small models of dimension equal or less than 300x300x300 nodes, and it under-samples the wavefield reducing the number of stored time-steps by an order of magnitude. For bigger models the wavefield is stored only at the boundaries of the model and then re-injected while the residuals are backpropagated allowing to compute the correlation 'on the fly'. In terms of computational resource, the elastic code is an order of magnitude more demanding than the equivalent acoustic code. We have combined shared memory with distributed memory parallelisation using OpenMP and MPI respectively. Thus, we take advantage of the increasingly common multi-core architecture processors. We have successfully applied our inversion algorithm to different realistic complex 3D models. The models had non-linear relations between pressure and shear wave velocities. The shorter wavelengths of the shear waves improve the resolution of the images obtained with respect to a purely acoustic approach.

  20. Hybrid MPI/OpenMP Implementation of the ORAC Molecular Dynamics Program for Generalized Ensemble and Fast Switching Alchemical Simulations.

    PubMed

    Procacci, Piero

    2016-06-27

    We present a new release (6.0β) of the ORAC program [Marsili et al. J. Comput. Chem. 2010, 31, 1106-1116] with a hybrid OpenMP/MPI (open multiprocessing message passing interface) multilevel parallelism tailored for generalized ensemble (GE) and fast switching double annihilation (FS-DAM) nonequilibrium technology aimed at evaluating the binding free energy in drug-receptor system on high performance computing platforms. The production of the GE or FS-DAM trajectories is handled using a weak scaling parallel approach on the MPI level only, while a strong scaling force decomposition scheme is implemented for intranode computations with shared memory access at the OpenMP level. The efficiency, simplicity, and inherent parallel nature of the ORAC implementation of the FS-DAM algorithm, project the code as a possible effective tool for a second generation high throughput virtual screening in drug discovery and design. The code, along with documentation, testing, and ancillary tools, is distributed under the provisions of the General Public License and can be freely downloaded at www.chim.unifi.it/orac .

  1. a Modeling Method of Fluttering Leaves Based on Point Cloud

    NASA Astrophysics Data System (ADS)

    Tang, J.; Wang, Y.; Zhao, Y.; Hao, W.; Ning, X.; Lv, K.; Shi, Z.; Zhao, M.

    2017-09-01

    Leaves falling gently or fluttering are common phenomenon in nature scenes. The authenticity of leaves falling plays an important part in the dynamic modeling of natural scenes. The leaves falling model has a widely applications in the field of animation and virtual reality. We propose a novel modeling method of fluttering leaves based on point cloud in this paper. According to the shape, the weight of leaves and the wind speed, three basic trajectories of leaves falling are defined, which are the rotation falling, the roll falling and the screw roll falling. At the same time, a parallel algorithm based on OpenMP is implemented to satisfy the needs of real-time in practical applications. Experimental results demonstrate that the proposed method is amenable to the incorporation of a variety of desirable effects.

  2. GPU-accelerated algorithms for many-particle continuous-time quantum walks

    NASA Astrophysics Data System (ADS)

    Piccinini, Enrico; Benedetti, Claudia; Siloi, Ilaria; Paris, Matteo G. A.; Bordone, Paolo

    2017-06-01

    Many-particle continuous-time quantum walks (CTQWs) represent a resource for several tasks in quantum technology, including quantum search algorithms and universal quantum computation. In order to design and implement CTQWs in a realistic scenario, one needs effective simulation tools for Hamiltonians that take into account static noise and fluctuations in the lattice, i.e. Hamiltonians containing stochastic terms. To this aim, we suggest a parallel algorithm based on the Taylor series expansion of the evolution operator, and compare its performances with those of algorithms based on the exact diagonalization of the Hamiltonian or a 4th order Runge-Kutta integration. We prove that both Taylor-series expansion and Runge-Kutta algorithms are reliable and have a low computational cost, the Taylor-series expansion showing the additional advantage of a memory allocation not depending on the precision of calculation. Both algorithms are also highly parallelizable within the SIMT paradigm, and are thus suitable for GPGPU computing. In turn, we have benchmarked 4 NVIDIA GPUs and 3 quad-core Intel CPUs for a 2-particle system over lattices of increasing dimension, showing that the speedup provided by GPU computing, with respect to the OPENMP parallelization, lies in the range between 8x and (more than) 20x, depending on the frequency of post-processing. GPU-accelerated codes thus allow one to overcome concerns about the execution time, and make it possible simulations with many interacting particles on large lattices, with the only limit of the memory available on the device.

  3. GPU Accelerated Browser for Neuroimaging Genomics.

    PubMed

    Zigon, Bob; Li, Huang; Yao, Xiaohui; Fang, Shiaofen; Hasan, Mohammad Al; Yan, Jingwen; Moore, Jason H; Saykin, Andrew J; Shen, Li

    2018-04-25

    Neuroimaging genomics is an emerging field that provides exciting opportunities to understand the genetic basis of brain structure and function. The unprecedented scale and complexity of the imaging and genomics data, however, have presented critical computational bottlenecks. In this work we present our initial efforts towards building an interactive visual exploratory system for mining big data in neuroimaging genomics. A GPU accelerated browsing tool for neuroimaging genomics is created that implements the ANOVA algorithm for single nucleotide polymorphism (SNP) based analysis and the VEGAS algorithm for gene-based analysis, and executes them at interactive rates. The ANOVA algorithm is 110 times faster than the 4-core OpenMP version, while the VEGAS algorithm is 375 times faster than its 4-core OpenMP counter part. This approach lays a solid foundation for researchers to address the challenges of mining large-scale imaging genomics datasets via interactive visual exploration.

  4. A Hybrid MPI/OpenMP Approach for Parallel Groundwater Model Calibration on Multicore Computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan

    2010-01-01

    Groundwater model calibration is becoming increasingly computationally time intensive. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelism in software and hardware to reduce calibration time on multicore computers with minimal parallelization effort. At first, HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for a uranium transport model with over a hundred species involving nearly a hundred reactions, and a field scale coupled flow and transport model. In the first application, a single parallelizable loop is identified to consume over 97% of the total computational time. With a few lines of OpenMP compiler directives inserted into the code,more » the computational time reduces about ten times on a compute node with 16 cores. The performance is further improved by selectively parallelizing a few more loops. For the field scale application, parallelizable loops in 15 of the 174 subroutines in HGC5 are identified to take more than 99% of the execution time. By adding the preconditioned conjugate gradient solver and BICGSTAB, and using a coloring scheme to separate the elements, nodes, and boundary sides, the subroutines for finite element assembly, soil property update, and boundary condition application are parallelized, resulting in a speedup of about 10 on a 16-core compute node. The Levenberg-Marquardt (LM) algorithm is added into HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, compute nodes at the number of adjustable parameters (when the forward difference is used for Jacobian approximation), or twice that number (if the center difference is used), are used to reduce the calibration time from days and weeks to a few hours for the two applications. This approach can be extended to global optimization scheme and Monte Carol analysis where thousands of compute nodes can be efficiently utilized.« less

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shipman, Galen M.

    These are the slides for a presentation on programming models in HPC, at the Los Alamos National Laboratory's Parallel Computing Summer School. The following topics are covered: Flynn's Taxonomy of computer architectures; single instruction single data; single instruction multiple data; multiple instruction multiple data; address space organization; definition of Trinity (Intel Xeon-Phi is a MIMD architecture); single program multiple data; multiple program multiple data; ExMatEx workflow overview; definition of a programming model, programming languages, runtime systems; programming model and environments; MPI (Message Passing Interface); OpenMP; Kokkos (Performance Portable Thread-Parallel Programming Model); Kokkos abstractions, patterns, policies, and spaces; RAJA, a systematicmore » approach to node-level portability and tuning; overview of the Legion Programming Model; mapping tasks and data to hardware resources; interoperability: supporting task-level models; Legion S3D execution and performance details; workflow, integration of external resources into the programming model.« less

  6. Coding for parallel execution of hardware-in-the-loop millimeter-wave scene generation models on multicore SIMD processor architectures

    NASA Astrophysics Data System (ADS)

    Olson, Richard F.

    2013-05-01

    Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.

  7. Fast hydrological model calibration based on the heterogeneous parallel computing accelerated shuffled complex evolution method

    NASA Astrophysics Data System (ADS)

    Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Hong, Yang; Zuo, Depeng; Ren, Minglei; Lei, Tianjie; Liang, Ke

    2018-01-01

    Hydrological model calibration has been a hot issue for decades. The shuffled complex evolution method developed at the University of Arizona (SCE-UA) has been proved to be an effective and robust optimization approach. However, its computational efficiency deteriorates significantly when the amount of hydrometeorological data increases. In recent years, the rise of heterogeneous parallel computing has brought hope for the acceleration of hydrological model calibration. This study proposed a parallel SCE-UA method and applied it to the calibration of a watershed rainfall-runoff model, the Xinanjiang model. The parallel method was implemented on heterogeneous computing systems using OpenMP and CUDA. Performance testing and sensitivity analysis were carried out to verify its correctness and efficiency. Comparison results indicated that heterogeneous parallel computing-accelerated SCE-UA converged much more quickly than the original serial version and possessed satisfactory accuracy and stability for the task of fast hydrological model calibration.

  8. Data Race Benchmark Collection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liao, Chunhua; Lin, Pei-Hung; Asplund, Joshua

    2017-03-21

    This project is a benchmark suite of Open-MP parallel codes that have been checked for data races. The programs are marked to show which do and do not have races. This allows them to be leveraged while testing and developing race detection tools.

  9. Exact diagonalization of quantum lattice models on coprocessors

    NASA Astrophysics Data System (ADS)

    Siro, T.; Harju, A.

    2016-10-01

    We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.

  10. A Coarse-to-Fine Geometric Scale-Invariant Feature Transform for Large Size High Resolution Satellite Image Registration

    PubMed Central

    Chang, Xueli; Du, Siliang; Li, Yingying; Fang, Shenghui

    2018-01-01

    Large size high resolution (HR) satellite image matching is a challenging task due to local distortion, repetitive structures, intensity changes and low efficiency. In this paper, a novel matching approach is proposed for the large size HR satellite image registration, which is based on coarse-to-fine strategy and geometric scale-invariant feature transform (SIFT). In the coarse matching step, a robust matching method scale restrict (SR) SIFT is implemented at low resolution level. The matching results provide geometric constraints which are then used to guide block division and geometric SIFT in the fine matching step. The block matching method can overcome the memory problem. In geometric SIFT, with area constraints, it is beneficial for validating the candidate matches and decreasing searching complexity. To further improve the matching efficiency, the proposed matching method is parallelized using OpenMP. Finally, the sensing image is rectified to the coordinate of reference image via Triangulated Irregular Network (TIN) transformation. Experiments are designed to test the performance of the proposed matching method. The experimental results show that the proposed method can decrease the matching time and increase the number of matching points while maintaining high registration accuracy. PMID:29702589

  11. Characterization of Proxy Application Performance on Advanced Architectures. UMT2013, MCB, AMG2013

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Howell, Louis H.; Gunney, Brian T.; Bhatele, Abhinav

    2015-10-09

    Three codes were tested at LLNL as part of a Tri-Lab effort to make detailed assessments of several proxy applications on various advanced architectures, with the eventual goal of extending these assessments to codes of programmatic interest running more realistic simulations. Teams from Sandia and Los Alamos tested proxy apps of their own. The focus in this report is on the LLNL codes UMT2013, MCB, and AMG2013. We present weak and strong MPI scaling results and studies of OpenMP efficiency on a large BG/Q system at LLNL, with comparison against similar tests on an Intel Sandy Bridge TLCC2 system. Themore » hardware counters on BG/Q provide detailed information on many aspects of on-node performance, while information from the mpiP tool gives insight into the reasons for the differing scaling behavior on these two different architectures. Results from three more speculative tests are also included: one that exploits NVRAM as extended memory, one that studies performance under a power bound, and one that illustrates the effects of changing the torus network mapping on BG/Q.« less

  12. Parallel heuristics for scalable community detection

    DOE PAGES

    Lu, Hao; Halappanavar, Mahantesh; Kalyanaraman, Ananth

    2015-08-14

    Community detection has become a fundamental operation in numerous graph-theoretic applications. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method ismore » also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains. Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number or fewer iterations, while providing real speedups of up to 16x using 32 threads.« less

  13. Hybrid multicore/vectorisation technique applied to the elastic wave equation on a staggered grid

    NASA Astrophysics Data System (ADS)

    Titarenko, Sofya; Hildyard, Mark

    2017-07-01

    In modern physics it has become common to find the solution of a problem by solving numerically a set of PDEs. Whether solving them on a finite difference grid or by a finite element approach, the main calculations are often applied to a stencil structure. In the last decade it has become usual to work with so called big data problems where calculations are very heavy and accelerators and modern architectures are widely used. Although CPU and GPU clusters are often used to solve such problems, parallelisation of any calculation ideally starts from a single processor optimisation. Unfortunately, it is impossible to vectorise a stencil structured loop with high level instructions. In this paper we suggest a new approach to rearranging the data structure which makes it possible to apply high level vectorisation instructions to a stencil loop and which results in significant acceleration. The suggested method allows further acceleration if shared memory APIs are used. We show the effectiveness of the method by applying it to an elastic wave propagation problem on a finite difference grid. We have chosen Intel architecture for the test problem and OpenMP (Open Multi-Processing) since they are extensively used in many applications.

  14. A novel heterogeneous algorithm to simulate multiphase flow in porous media on multicore CPU-GPU systems

    NASA Astrophysics Data System (ADS)

    McClure, J. E.; Prins, J. F.; Miller, C. T.

    2014-07-01

    Multiphase flow implementations of the lattice Boltzmann method (LBM) are widely applied to the study of porous medium systems. In this work, we construct a new variant of the popular "color" LBM for two-phase flow in which a three-dimensional, 19-velocity (D3Q19) lattice is used to compute the momentum transport solution while a three-dimensional, seven velocity (D3Q7) lattice is used to compute the mass transport solution. Based on this formulation, we implement a novel heterogeneous GPU-accelerated algorithm in which the mass transport solution is computed by multiple shared memory CPU cores programmed using OpenMP while a concurrent solution of the momentum transport is performed using a GPU. The heterogeneous solution is demonstrated to provide speedup of 2.6 × as compared to multi-core CPU solution and 1.8 × compared to GPU solution due to concurrent utilization of both CPU and GPU bandwidths. Furthermore, we verify that the proposed formulation provides an accurate physical representation of multiphase flow processes and demonstrate that the approach can be applied to perform heterogeneous simulations of two-phase flow in porous media using a typical GPU-accelerated workstation.

  15. Performance evaluation of canny edge detection on a tiled multicore architecture

    NASA Astrophysics Data System (ADS)

    Brethorst, Andrew Z.; Desai, Nehal; Enright, Douglas P.; Scrofano, Ronald

    2011-01-01

    In the last few years, a variety of multicore architectures have been used to parallelize image processing applications. In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these strategies are different ways Canny edge detection can be parallelized, as well as differences in data management. The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages, where each subimage is processed independently, in parallel. The results of the two strategies show that for the same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the compiler managed, loop-level parallelism implemented with OpenMP.

  16. Ice-sheet modelling accelerated by graphics cards

    NASA Astrophysics Data System (ADS)

    Brædstrup, Christian Fredborg; Damsgaard, Anders; Egholm, David Lundbek

    2014-11-01

    Studies of glaciers and ice sheets have increased the demand for high performance numerical ice flow models over the past decades. When exploring the highly non-linear dynamics of fast flowing glaciers and ice streams, or when coupling multiple flow processes for ice, water, and sediment, researchers are often forced to use super-computing clusters. As an alternative to conventional high-performance computing hardware, the Graphical Processing Unit (GPU) is capable of massively parallel computing while retaining a compact design and low cost. In this study, we present a strategy for accelerating a higher-order ice flow model using a GPU. By applying the newest GPU hardware, we achieve up to 180× speedup compared to a similar but serial CPU implementation. Our results suggest that GPU acceleration is a competitive option for ice-flow modelling when compared to CPU-optimised algorithms parallelised by the OpenMP or Message Passing Interface (MPI) protocols.

  17. How to Build MCNP 6.2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bull, Jeffrey S.

    This presentation describes how to build MCNP 6.2. MCNP®* 6.2 can be compiled on Macs, PCs, and most Linux systems. It can also be built for parallel execution using both OpenMP and Messing Passing Interface (MPI) methods. MCNP6 requires Fortran, C, and C++ compilers to build the code.

  18. batman: BAsic Transit Model cAlculatioN in Python

    NASA Astrophysics Data System (ADS)

    Kreidberg, Laura

    2015-11-01

    I introduce batman, a Python package for modeling exoplanet transit light curves. The batman package supports calculation of light curves for any radially symmetric stellar limb darkening law, using a new integration algorithm for models that cannot be quickly calculated analytically. The code uses C extension modules to speed up model calculation and is parallelized with OpenMP. For a typical light curve with 100 data points in transit, batman can calculate one million quadratic limb-darkened models in 30 seconds with a single 1.7 GHz Intel Core i5 processor. The same calculation takes seven minutes using the four-parameter nonlinear limb darkening model (computed to 1 ppm accuracy). Maximum truncation error for integrated models is an input parameter that can be set as low as 0.001 ppm, ensuring that the community is prepared for the precise transit light curves we anticipate measuring with upcoming facilities. The batman package is open source and publicly available at https://github.com/lkreidberg/batman .

  19. Numerical Aspects of Nonhydrostatic Implementations Applied to a Parallel Finite Element Tsunami Model

    NASA Astrophysics Data System (ADS)

    Fuchs, A.; Androsov, A.; Harig, S.; Hiller, W.; Rakowsky, N.

    2012-04-01

    Based on the jeopardy of devastating tsunamis and the unpredictability of such events, tsunami modelling as part of warning systems is still a contemporary topic. The tsunami group of Alfred Wegener Institute developed the simulation tool TsunAWI as contribution to the Early Warning System in Indonesia. Although the precomputed scenarios for this purpose qualify for satisfying deliverables, the study of further improvements continues. While TsunAWI is governed by the Shallow Water Equations, an extension of the model is based on a nonhydrostatic approach. At the arrival of a tsunami wave in coastal regions with rough bathymetry, the term containing the nonhydrostatic part of pressure, that is neglected in the original hydrostatic model, gains in importance. In consideration of this term, a better approximation of the wave is expected. Differences of hydrostatic and nonhydrostatic model results are contrasted in the standard benchmark problem of a solitary wave runup on a plane beach. The observation data provided by Titov and Synolakis (1995) serves as reference. The nonhydrostatic approach implies a set of equations that are similar to the Shallow Water Equations, so the variation of the code can be implemented on top. However, this additional routines cause a lot of issues you have to cope with. So far the computations of the model were purely explicit. In the nonhydrostatic version the determination of an additional unknown and the solution of a large sparse system of linear equations is necessary. The latter constitutes the lion's share of computing time and memory requirement. Since the corresponding matrix is only symmetric in structure and not in values, an iterative Krylov Subspace Method is used, in particular the restarted Generalized Minimal Residual Algorithm GMRES(m). With regard to optimization, we present a comparison of several combinations of sequential and parallel preconditioning techniques respective number of iterations and setup/application time. Since the used software package pARMS 3.2, that provides solving and preconditioning techniques, works via MPI parallelism, in an auxiliary branch we adapted TsunAWI and switched from OpenMP to MPI with attached importance to internal partition management.

  20. Random sphere packing model of heterogeneous propellants

    NASA Astrophysics Data System (ADS)

    Kochevets, Sergei Victorovich

    It is well recognized that combustion of heterogeneous propellants is strongly dependent on the propellant morphology. Recent developments in computing systems make it possible to start three-dimensional modeling of heterogeneous propellant combustion. A key component of such large scale computations is a realistic model of industrial propellants which retains the true morphology---a goal never achieved before. The research presented develops the Random Sphere Packing Model of heterogeneous propellants and generates numerical samples of actual industrial propellants. This is done by developing a sphere packing algorithm which randomly packs a large number of spheres with a polydisperse size distribution within a rectangular domain. First, the packing code is developed, optimized for performance, and parallelized using the OpenMP shared memory architecture. Second, the morphology and packing fraction of two simple cases of unimodal and bimodal packs are investigated computationally and analytically. It is shown that both the Loose Random Packing and Dense Random Packing limits are not well defined and the growth rate of the spheres is identified as the key parameter controlling the efficiency of the packing. For a properly chosen growth rate, computational results are found to be in excellent agreement with experimental data. Third, two strategies are developed to define numerical samples of polydisperse heterogeneous propellants: the Deterministic Strategy and the Random Selection Strategy. Using these strategies, numerical samples of industrial propellants are generated. The packing fraction is investigated and it is shown that the experimental values of the packing fraction can be achieved computationally. It is strongly believed that this Random Sphere Packing Model of propellants is a major step forward in the realistic computational modeling of heterogeneous propellant of combustion. In addition, a method of analysis of the morphology of heterogeneous propellants is developed which uses the concept of multi-point correlation functions. A set of intrinsic length scales of local density fluctuations in random heterogeneous propellants is identified by performing a Monte-Carlo study of the correlation functions. This method of analysis shows great promise for understanding the origins of the combustion instability of heterogeneous propellants, and is believed to become a valuable tool for the development of safe and reliable rocket engines.

  1. SToRM: A Model for 2D environmental hydraulics

    USGS Publications Warehouse

    Simões, Francisco J. M.

    2017-01-01

    A two-dimensional (depth-averaged) finite volume Godunov-type shallow water model developed for flow over complex topography is presented. The model, SToRM, is based on an unstructured cell-centered finite volume formulation and on nonlinear strong stability preserving Runge-Kutta time stepping schemes. The numerical discretization is founded on the classical and well established shallow water equations in hyperbolic conservative form, but the convective fluxes are calculated using auto-switching Riemann and diffusive numerical fluxes. Computational efficiency is achieved through a parallel implementation based on the OpenMP standard and the Fortran programming language. SToRM’s implementation within a graphical user interface is discussed. Field application of SToRM is illustrated by utilizing it to estimate peak flow discharges in a flooding event of the St. Vrain Creek in Colorado, U.S.A., in 2013, which reached 850 m3/s (~30,000 f3 /s) at the location of this study.

  2. Model-independent partial wave analysis using a massively-parallel fitting framework

    NASA Astrophysics Data System (ADS)

    Sun, L.; Aoude, R.; dos Reis, A. C.; Sokoloff, M.

    2017-10-01

    The functionality of GooFit, a GPU-friendly framework for doing maximum-likelihood fits, has been extended to extract model-independent {\\mathscr{S}}-wave amplitudes in three-body decays such as D + → h + h + h -. A full amplitude analysis is done where the magnitudes and phases of the {\\mathscr{S}}-wave amplitudes are anchored at a finite number of m 2(h + h -) control points, and a cubic spline is used to interpolate between these points. The amplitudes for {\\mathscr{P}}-wave and {\\mathscr{D}}-wave intermediate states are modeled as spin-dependent Breit-Wigner resonances. GooFit uses the Thrust library, with a CUDA backend for NVIDIA GPUs and an OpenMP backend for threads with conventional CPUs. Performance on a variety of platforms is compared. Executing on systems with GPUs is typically a few hundred times faster than executing the same algorithm on a single CPU.

  3. Parallelization of Lower-Upper Symmetric Gauss-Seidel Method for Chemically Reacting Flow

    NASA Technical Reports Server (NTRS)

    Yoon, Seokkwan; Jost, Gabriele; Chang, Sherry

    2005-01-01

    Development of technologies for exploration of the solar system has revived an interest in computational simulation of chemically reacting flows since planetary probe vehicles exhibit non-equilibrium phenomena during the atmospheric entry of a planet or a moon as well as the reentry to the Earth. Stability in combustion is essential for new propulsion systems. Numerical solution of real-gas flows often increases computational work by an order-of-magnitude compared to perfect gas flow partly because of the increased complexity of equations to solve. Recently, as part of Project Columbia, NASA has integrated a cluster of interconnected SGI Altix systems to provide a ten-fold increase in current supercomputing capacity that includes an SGI Origin system. Both the new and existing machines are based on cache coherent non-uniform memory access architecture. Lower-Upper Symmetric Gauss-Seidel (LU-SGS) relaxation method has been implemented into both perfect and real gas flow codes including Real-Gas Aerodynamic Simulator (RGAS). However, the vectorized RGAS code runs inefficiently on cache-based shared-memory machines such as SGI system. Parallelization of a Gauss-Seidel method is nontrivial due to its sequential nature. The LU-SGS method has been vectorized on an oblique plane in INS3D-LU code that has been one of the base codes for NAS Parallel benchmarks. The oblique plane has been called a hyperplane by computer scientists. It is straightforward to parallelize a Gauss-Seidel method by partitioning the hyperplanes once they are formed. Another way of parallelization is to schedule processors like a pipeline using software. Both hyperplane and pipeline methods have been implemented using openMP directives. The present paper reports the performance of the parallelized RGAS code on SGI Origin and Altix systems.

  4. Checkpointing Shared Memory Programs at the Application-level

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bronevetsky, G; Schulz, M; Szwed, P

    2004-09-08

    Trends in high-performance computing are making it necessary for long-running applications to tolerate hardware faults. The most commonly used approach is checkpoint and restart(CPR)-the state of the computation is saved periodically on disk, and when a failure occurs, the computation is restarted from the last saved state. At present, it is the responsibility of the programmer to instrument applications for CPR. Our group is investigating the use of compiler technology to instrument codes to make them self-checkpointing and self-restarting, thereby providing an automatic solution to the problem of making long-running scientific applications resilient to hardware faults. Our previous work focusedmore » on message-passing programs. In this paper, we describe such a system for shared-memory programs running on symmetric multiprocessors. The system has two components: (i)a pre-compiler for source-to-source modification of applications, and (ii) a runtime system that implements a protocol for coordinating CPR among the threads of the parallel application. For the sake of concreteness, we focus on a non-trivial subset of OpenMP that includes barriers and locks. One of the advantages of this approach is that the ability to tolerate faults becomes embedded within the application itself, so applications become self-checkpointing and self-restarting on any platform. We demonstrate this by showing that our transformed benchmarks can checkpoint and restart on three different platforms (Windows/x86, Linux/x86, and Tru64/Alpha). Our experiments show that the overhead introduced by this approach is usually quite small; they also suggest ways in which the current implementation can be tuned to reduced overheads further.« less

  5. Performance Analysis of GFDL's GCM Line-By-Line Radiative Transfer Model on GPU and MIC Architectures

    NASA Astrophysics Data System (ADS)

    Menzel, R.; Paynter, D.; Jones, A. L.

    2017-12-01

    Due to their relatively low computational cost, radiative transfer models in global climate models (GCMs) run on traditional CPU architectures generally consist of shortwave and longwave parameterizations over a small number of wavelength bands. With the rise of newer GPU and MIC architectures, however, the performance of high resolution line-by-line radiative transfer models may soon approach those of the physical parameterizations currently employed in GCMs. Here we present an analysis of the current performance of a new line-by-line radiative transfer model currently under development at GFDL. Although originally designed to specifically exploit GPU architectures through the use of CUDA, the radiative transfer model has recently been extended to include OpenMP in an effort to also effectively target MIC architectures such as Intel's Xeon Phi. Using input data provided by the upcoming Radiative Forcing Model Intercomparison Project (RFMIP, as part of CMIP 6), we compare model results and performance data for various model configurations and spectral resolutions run on both GPU and Intel Knights Landing architectures to analogous runs of the standard Oxford Reference Forward Model on traditional CPUs.

  6. Hierarchical parallelisation of functional renormalisation group calculations - hp-fRG

    NASA Astrophysics Data System (ADS)

    Rohe, Daniel

    2016-10-01

    The functional renormalisation group (fRG) has evolved into a versatile tool in condensed matter theory for studying important aspects of correlated electron systems. Practical applications of the method often involve a high numerical effort, motivating the question in how far High Performance Computing (HPC) can leverage the approach. In this work we report on a multi-level parallelisation of the underlying computational machinery and show that this can speed up the code by several orders of magnitude. This in turn can extend the applicability of the method to otherwise inaccessible cases. We exploit three levels of parallelisation: Distributed computing by means of Message Passing (MPI), shared-memory computing using OpenMP, and vectorisation by means of SIMD units (single-instruction-multiple-data). Results are provided for two distinct High Performance Computing (HPC) platforms, namely the IBM-based BlueGene/Q system JUQUEEN and an Intel Sandy-Bridge-based development cluster. We discuss how certain issues and obstacles were overcome in the course of adapting the code. Most importantly, we conclude that this vast improvement can actually be accomplished by introducing only moderate changes to the code, such that this strategy may serve as a guideline for other researcher to likewise improve the efficiency of their codes.

  7. Multi-phase SPH modelling of violent hydrodynamics on GPUs

    NASA Astrophysics Data System (ADS)

    Mokos, Athanasios; Rogers, Benedict D.; Stansby, Peter K.; Domínguez, José M.

    2015-11-01

    This paper presents the acceleration of multi-phase smoothed particle hydrodynamics (SPH) using a graphics processing unit (GPU) enabling large numbers of particles (10-20 million) to be simulated on just a single GPU card. With novel hardware architectures such as a GPU, the optimum approach to implement a multi-phase scheme presents some new challenges. Many more particles must be included in the calculation and there are very different speeds of sound in each phase with the largest speed of sound determining the time step. This requires efficient computation. To take full advantage of the hardware acceleration provided by a single GPU for a multi-phase simulation, four different algorithms are investigated: conditional statements, binary operators, separate particle lists and an intermediate global function. Runtime results show that the optimum approach needs to employ separate cell and neighbour lists for each phase. The profiler shows that this approach leads to a reduction in both memory transactions and arithmetic operations giving significant runtime gains. The four different algorithms are compared to the efficiency of the optimised single-phase GPU code, DualSPHysics, for 2-D and 3-D simulations which indicate that the multi-phase functionality has a significant computational overhead. A comparison with an optimised CPU code shows a speed up of an order of magnitude over an OpenMP simulation with 8 threads and two orders of magnitude over a single thread simulation. A demonstration of the multi-phase SPH GPU code is provided by a 3-D dam break case impacting an obstacle. This shows better agreement with experimental results than an equivalent single-phase code. The multi-phase GPU code enables a convergence study to be undertaken on a single GPU with a large number of particles that otherwise would have required large high performance computing resources.

  8. Acceleration of Semiempirical QM/MM Methods through Message Passage Interface (MPI), Hybrid MPI/Open Multiprocessing, and Self-Consistent Field Accelerator Implementations.

    PubMed

    Ojeda-May, Pedro; Nam, Kwangho

    2017-08-08

    The strategy and implementation of scalable and efficient semiempirical (SE) QM/MM methods in CHARMM are described. The serial version of the code was first profiled to identify routines that required parallelization. Afterward, the code was parallelized and accelerated with three approaches. The first approach was the parallelization of the entire QM/MM routines, including the Fock matrix diagonalization routines, using the CHARMM message passage interface (MPI) machinery. In the second approach, two different self-consistent field (SCF) energy convergence accelerators were implemented using density and Fock matrices as targets for their extrapolations in the SCF procedure. In the third approach, the entire QM/MM and MM energy routines were accelerated by implementing the hybrid MPI/open multiprocessing (OpenMP) model in which both the task- and loop-level parallelization strategies were adopted to balance loads between different OpenMP threads. The present implementation was tested on two solvated enzyme systems (including <100 QM atoms) and an S N 2 symmetric reaction in water. The MPI version exceeded existing SE QM methods in CHARMM, which include the SCC-DFTB and SQUANTUM methods, by at least 4-fold. The use of SCF convergence accelerators further accelerated the code by ∼12-35% depending on the size of the QM region and the number of CPU cores used. Although the MPI version displayed good scalability, the performance was diminished for large numbers of MPI processes due to the overhead associated with MPI communications between nodes. This issue was partially overcome by the hybrid MPI/OpenMP approach which displayed a better scalability for a larger number of CPU cores (up to 64 CPUs in the tested systems).

  9. Project APhiD: A Lorenz-gauged A-Φ decomposition for parallelized computation of ultra-broadband electromagnetic induction in a fully heterogeneous Earth

    NASA Astrophysics Data System (ADS)

    Weiss, Chester J.

    2013-08-01

    An essential element for computational hypothesis testing, data inversion and experiment design for electromagnetic geophysics is a robust forward solver, capable of easily and quickly evaluating the electromagnetic response of arbitrary geologic structure. The usefulness of such a solver hinges on the balance among competing desires like ease of use, speed of forward calculation, scalability to large problems or compute clusters, parsimonious use of memory access, accuracy and by necessity, the ability to faithfully accommodate a broad range of geologic scenarios over extremes in length scale and frequency content. This is indeed a tall order. The present study addresses recent progress toward the development of a forward solver with these properties. Based on the Lorenz-gauged Helmholtz decomposition, a new finite volume solution over Cartesian model domains endowed with complex-valued electrical properties is shown to be stable over the frequency range 10-2-1010 Hz and range 10-3-105 m in length scale. Benchmark examples are drawn from magnetotellurics, exploration geophysics, geotechnical mapping and laboratory-scale analysis, showing excellent agreement with reference analytic solutions. Computational efficiency is achieved through use of a matrix-free implementation of the quasi-minimum-residual (QMR) iterative solver, which eliminates explicit storage of finite volume matrix elements in favor of "on the fly" computation as needed by the iterative Krylov sequence. Further efficiency is achieved through sparse coupling matrices between the vector and scalar potentials whose non-zero elements arise only in those parts of the model domain where the conductivity gradient is non-zero. Multi-thread parallelization in the QMR solver through OpenMP pragmas is used to reduce the computational cost of its most expensive step: the single matrix-vector product at each iteration. High-level MPI communicators farm independent processes to available compute nodes for simultaneous computation of multi-frequency or multi-transmitter responses.

  10. Petascale computation performance of lightweight multiscale cardiac models using hybrid programming models.

    PubMed

    Pope, Bernard J; Fitch, Blake G; Pitman, Michael C; Rice, John J; Reumann, Matthias

    2011-01-01

    Future multiscale and multiphysics models must use the power of high performance computing (HPC) systems to enable research into human disease, translational medical science, and treatment. Previously we showed that computationally efficient multiscale models will require the use of sophisticated hybrid programming models, mixing distributed message passing processes (e.g. the message passing interface (MPI)) with multithreading (e.g. OpenMP, POSIX pthreads). The objective of this work is to compare the performance of such hybrid programming models when applied to the simulation of a lightweight multiscale cardiac model. Our results show that the hybrid models do not perform favourably when compared to an implementation using only MPI which is in contrast to our results using complex physiological models. Thus, with regards to lightweight multiscale cardiac models, the user may not need to increase programming complexity by using a hybrid programming approach. However, considering that model complexity will increase as well as the HPC system size in both node count and number of cores per node, it is still foreseeable that we will achieve faster than real time multiscale cardiac simulations on these systems using hybrid programming models.

  11. Teuchos C++ memory management classes, idioms, and related topics, the complete reference : a comprehensive strategy for safe and efficient memory management in C++ for high performance computing.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bartlett, Roscoe Ainsworth

    2010-05-01

    The ubiquitous use of raw pointers in higher-level code is the primary cause of all memory usage problems and memory leaks in C++ programs. This paper describes what might be considered a radical approach to the problem which is to encapsulate the use of all raw pointers and all raw calls to new and delete in higher-level C++ code. Instead, a set of cooperating template classes developed in the Trilinos package Teuchos are used to encapsulate every use of raw C++ pointers in every use case where it appears in high-level code. Included in the set of memory management classesmore » is the typical reference-counted smart pointer class similar to boost::shared ptr (and therefore C++0x std::shared ptr). However, what is missing in boost and the new standard library are non-reference counted classes for remaining use cases where raw C++ pointers would need to be used. These classes have a debug build mode where nearly all programmer errors are caught and gracefully reported at runtime. The default optimized build mode strips all runtime checks and allows the code to perform as efficiently as raw C++ pointers with reasonable usage. Also included is a novel approach for dealing with the circular references problem that imparts little extra overhead and is almost completely invisible to most of the code (unlike the boost and therefore C++0x approach). Rather than being a radical approach, encapsulating all raw C++ pointers is simply the logical progression of a trend in the C++ development and standards community that started with std::auto ptr and is continued (but not finished) with std::shared ptr in C++0x. Using the Teuchos reference-counted memory management classes allows one to remove unnecessary constraints in the use of objects by removing arbitrary lifetime ordering constraints which are a type of unnecessary coupling [23]. The code one writes with these classes will be more likely to be correct on first writing, will be less likely to contain silent (but deadly) memory usage errors, and will be much more robust to later refactoring and maintenance. The level of debug-mode runtime checking provided by the Teuchos memory management classes is stronger in many respects than what is provided by memory checking tools like Valgrind and Purify while being much less expensive. However, tools like Valgrind and Purify perform a number of types of checks (like usage of uninitialized memory) that makes these tools very valuable and therefore complement the Teuchos memory management debug-mode runtime checking. The Teuchos memory management classes and idioms largely address the technical issues in resolving the fragile built-in C++ memory management model (with the exception of circular references which has no easy solution but can be managed as discussed). All that remains is to teach these classes and idioms and expand their usage in C++ codes. The long-term viability of C++ as a usable and productive language depends on it. Otherwise, if C++ is no safer than C, then is the greater complexity of C++ worth what one gets as extra features? Given that C is smaller and easier to learn than C++ and since most programmers don't know object-orientation (or templates or X, Y, and Z features of C++) all that well anyway, then what really are most programmers getting extra out of C++ that would outweigh the extra complexity of C++ over C? C++ zealots will argue this point but the reality is that C++ popularity has peaked and is becoming less popular while the popularity of C has remained fairly stable over the last decade22. Idioms like are advocated in this paper can help to avert this trend but it will require wide community buy-in and a change in the way C++ is taught in order to have the greatest impact. To make these programs more secure, compiler vendors or static analysis tools (e.g. klocwork23) could implement a preprocessor-like language similar to OpenMP24 that would allow the programmer to declare (in comments) that certain blocks of code should be ''pointer-free'' or allow smaller blocks to be 'pointers allowed'. This would significantly improve the robustness of code that uses the memory management classes described here.« less

  12. FY17 CSSE L2 Milestone Report: Analyzing Power Usage Characteristics of Workloads Running on Trinity.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pedretti, Kevin

    This report summarizes the work performed as part of a FY17 CSSE L2 milestone to in- vestigate the power usage behavior of ASC workloads running on the ATS-1 Trinity plat- form. Techniques were developed to instrument application code regions of interest using the Power API together with the Kokkos profiling interface and Caliper annotation library. Experiments were performed to understand the power usage behavior of mini-applications and the SNL/ATDM SPARC application running on ATS-1 Trinity Haswell and Knights Landing compute nodes. A taxonomy of power measurement approaches was identified and presented, providing a guide for application developers to follow. Controlledmore » scaling study experiments were performed on up to 2048 nodes of Trinity along with smaller scale ex- periments on Trinity testbed systems. Additionally, power and energy system monitoring information from Trinity was collected and archived for post analysis of "in-the-wild" work- loads. Results were analyzed to assess the sensitivity of the workloads to ATS-1 compute node type (Haswell vs. Knights Landing), CPU frequency control, node-level power capping control, OpenMP configuration, Knights Landing on-package memory configuration, and algorithm/solver configuration. Overall, this milestone lays groundwork for addressing the long-term goal of determining how to best use and operate future ASC platforms to achieve the greatest benefit subject to a constrained power budget.« less

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Trędak, Przemysław, E-mail: przemyslaw.tredak@fuw.edu.pl; Rudnicki, Witold R.; Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Pawińskiego 5a, 02-106 Warsaw

    The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPUmore » to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.« less

  14. Scaling Up Coordinate Descent Algorithms for Large ℓ1 Regularization Problems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Scherrer, Chad; Halappanavar, Mahantesh; Tewari, Ambuj

    2012-07-03

    We present a generic framework for parallel coordinate descent (CD) algorithms that has as special cases the original sequential algorithms of Cyclic CD and Stochastic CD, as well as the recent parallel Shotgun algorithm of Bradley et al. We introduce two novel parallel algorithms that are also special cases---Thread-Greedy CD and Coloring-Based CD---and give performance measurements for an OpenMP implementation of these.

  15. GAMERA - The New Magnetospheric Code

    NASA Astrophysics Data System (ADS)

    Lyon, J.; Sorathia, K.; Zhang, B.; Merkin, V. G.; Wiltberger, M. J.; Daldorff, L. K. S.

    2017-12-01

    The Lyon-Fedder-Mobarry (LFM) code has been a main-line magnetospheric simulation code for 30 years. The code base, designed in the age of memory to memory vector ma- chines,is still in wide use for science production but needs upgrading to ensure the long term sustainability. In this presentation, we will discuss our recent efforts to update and improve that code base and also highlight some recent results. The new project GAM- ERA, Grid Agnostic MHD for Extended Research Applications, has kept the original design characteristics of the LFM and made significant improvements. The original de- sign included high order numerical differencing with very aggressive limiting, the ability to use arbitrary, but logically rectangular, grids, and maintenance of div B = 0 through the use of the Yee grid. Significant improvements include high-order upwinding and a non-clipping limiter. One other improvement with wider applicability is an im- proved averaging technique for the singularities in polar and spherical grids. The new code adopts a hybrid structure - multi-threaded OpenMP with an overarching MPI layer for large scale and coupled applications. The MPI layer uses a combination of standard MPI and the Global Array Toolkit from PNL to provide a lightweight mechanism for coupling codes together concurrently. The single processor code is highly efficient and can run magnetospheric simulations at the default CCMC resolution faster than real time on a MacBook pro. We have run the new code through the Athena suite of tests, and the results compare favorably with the codes available to the astrophysics community. LFM/GAMERA has been applied to many different situations ranging from the inner and outer heliosphere and magnetospheres of Venus, the Earth, Jupiter and Saturn. We present example results the Earth's magnetosphere including a coupled ring current (RCM), the magnetospheres of Jupiter and Saturn, and the inner heliosphere.

  16. Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures

    NASA Astrophysics Data System (ADS)

    Sandalski, Stou

    Smooth particle hydrodynamics is an efficient method for modeling the dynamics of fluids. It is commonly used to simulate astrophysical processes such as binary mergers. We present a newly developed GPU accelerated smooth particle hydrodynamics code for astrophysical simulations. The code is named neptune after the Roman god of water. It is written in OpenMP parallelized C++ and OpenCL and includes octree based hydrodynamic and gravitational acceleration. The design relies on object-oriented methodologies in order to provide a flexible and modular framework that can be easily extended and modified by the user. Several pre-built scenarios for simulating collisions of polytropes and black-hole accretion are provided. The code is released under the MIT Open Source license and publicly available at http://code.google.com/p/neptune-sph/.

  17. Cellular automata-based modelling and simulation of biofilm structure on multi-core computers.

    PubMed

    Skoneczny, Szymon

    2015-01-01

    The article presents a mathematical model of biofilm growth for aerobic biodegradation of a toxic carbonaceous substrate. Modelling of biofilm growth has fundamental significance in numerous processes of biotechnology and mathematical modelling of bioreactors. The process following double-substrate kinetics with substrate inhibition proceeding in a biofilm has not been modelled so far by means of cellular automata. Each process in the model proposed, i.e. diffusion of substrates, uptake of substrates, growth and decay of microorganisms and biofilm detachment, is simulated in a discrete manner. It was shown that for flat biofilm of constant thickness, the results of the presented model agree with those of a continuous model. The primary outcome of the study was to propose a mathematical model of biofilm growth; however a considerable amount of focus was also placed on the development of efficient algorithms for its solution. Two parallel algorithms were created, differing in the way computations are distributed. Computer programs were created using OpenMP Application Programming Interface for C++ programming language. Simulations of biofilm growth were performed on three high-performance computers. Speed-up coefficients of computer programs were compared. Both algorithms enabled a significant reduction of computation time. It is important, inter alia, in modelling and simulation of bioreactor dynamics.

  18. Final Report from The University of Texas at Austin for DEGAS: Dynamic Global Address Space programming environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Erez, Mattan; Yelick, Katherine; Sarkar, Vivek

    The Dynamic, Exascale Global Address Space programming environment (DEGAS) project will develop the next generation of programming models and runtime systems to meet the challenges of Exascale computing. Our approach is to provide an efficient and scalable programming model that can be adapted to application needs through the use of dynamic runtime features and domain-specific languages for computational kernels. We address the following technical challenges: Programmability: Rich set of programming constructs based on a Hierarchical Partitioned Global Address Space (HPGAS) model, demonstrated in UPC++. Scalability: Hierarchical locality control, lightweight communication (extended GASNet), and ef- ficient synchronization mechanisms (Phasers). Performance Portability:more » Just-in-time specialization (SEJITS) for generating hardware-specific code and scheduling libraries for domain-specific adaptive runtimes (Habanero). Energy Efficiency: Communication-optimal code generation to optimize energy efficiency by re- ducing data movement. Resilience: Containment Domains for flexible, domain-specific resilience, using state capture mechanisms and lightweight, asynchronous recovery mechanisms. Interoperability: Runtime and language interoperability with MPI and OpenMP to encourage broad adoption.« less

  19. VINE-A NUMERICAL CODE FOR SIMULATING ASTROPHYSICAL SYSTEMS USING PARTICLES. II. IMPLEMENTATION AND PERFORMANCE CHARACTERISTICS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nelson, Andrew F.; Wetzstein, M.; Naab, T.

    2009-10-01

    We continue our presentation of VINE. In this paper, we begin with a description of relevant architectural properties of the serial and shared memory parallel computers on which VINE is intended to run, and describe their influences on the design of the code itself. We continue with a detailed description of a number of optimizations made to the layout of the particle data in memory and to our implementation of a binary tree used to access that data for use in gravitational force calculations and searches for smoothed particle hydrodynamics (SPH) neighbor particles. We describe the modifications to the codemore » necessary to obtain forces efficiently from special purpose 'GRAPE' hardware, the interfaces required to allow transparent substitution of those forces in the code instead of those obtained from the tree, and the modifications necessary to use both tree and GRAPE together as a fused GRAPE/tree combination. We conclude with an extensive series of performance tests, which demonstrate that the code can be run efficiently and without modification in serial on small workstations or in parallel using the OpenMP compiler directives on large-scale, shared memory parallel machines. We analyze the effects of the code optimizations and estimate that they improve its overall performance by more than an order of magnitude over that obtained by many other tree codes. Scaled parallel performance of the gravity and SPH calculations, together the most costly components of most simulations, is nearly linear up to at least 120 processors on moderate sized test problems using the Origin 3000 architecture, and to the maximum machine sizes available to us on several other architectures. At similar accuracy, performance of VINE, used in GRAPE-tree mode, is approximately a factor 2 slower than that of VINE, used in host-only mode. Further optimizations of the GRAPE/host communications could improve the speed by as much as a factor of 3, but have not yet been implemented in VINE. Finally, we find that although parallel performance on small problems may reach a plateau beyond which more processors bring no additional speedup, performance never decreases, a factor important for running large simulations on many processors with individual time steps, where only a small fraction of the total particles require updates at any given moment.« less

  20. Study on Development of 1D-2D Coupled Real-time Urban Inundation Prediction model

    NASA Astrophysics Data System (ADS)

    Lee, Seungsoo

    2017-04-01

    In recent years, we are suffering abnormal weather condition due to climate change around the world. Therefore, countermeasures for flood defense are urgent task. In this research, study on development of 1D-2D coupled real-time urban inundation prediction model using predicted precipitation data based on remote sensing technology is conducted. 1 dimensional (1D) sewerage system analysis model which was introduced by Lee et al. (2015) is used to simulate inlet and overflow phenomena by interacting with surface flown as well as flows in conduits. 2 dimensional (2D) grid mesh refinement method is applied to depict road networks for effective calculation time. 2D surface model is coupled with 1D sewerage analysis model in order to consider bi-directional flow between both. Also parallel computing method, OpenMP, is applied to reduce calculation time. The model is estimated by applying to 25 August 2014 extreme rainfall event which caused severe inundation damages in Busan, Korea. Oncheoncheon basin is selected for study basin and observed radar data are assumed as predicted rainfall data. The model shows acceptable calculation speed with accuracy. Therefore it is expected that the model can be used for real-time urban inundation forecasting system to minimize damages.

  1. From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

    DOE PAGES

    Blazewicz, Marek; Hinder, Ian; Koppelman, David M.; ...

    2013-01-01

    Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization ismore » based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.« less

  2. Parallel transformation of K-SVD solar image denoising algorithm

    NASA Astrophysics Data System (ADS)

    Liang, Youwen; Tian, Yu; Li, Mei

    2017-02-01

    The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.

  3. GPU accelerated implementation of NCI calculations using promolecular density.

    PubMed

    Rubez, Gaëtan; Etancelin, Jean-Matthieu; Vigouroux, Xavier; Krajecki, Michael; Boisson, Jean-Charles; Hénon, Eric

    2017-05-30

    The NCI approach is a modern tool to reveal chemical noncovalent interactions. It is particularly attractive to describe ligand-protein binding. A custom implementation for NCI using promolecular density is presented. It is designed to leverage the computational power of NVIDIA graphics processing unit (GPU) accelerators through the CUDA programming model. The code performances of three versions are examined on a test set of 144 systems. NCI calculations are particularly well suited to the GPU architecture, which reduces drastically the computational time. On a single compute node, the dual-GPU version leads to a 39-fold improvement for the biggest instance compared to the optimal OpenMP parallel run (C code, icc compiler) with 16 CPU cores. Energy consumption measurements carried out on both CPU and GPU NCI tests show that the GPU approach provides substantial energy savings. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  4. Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU

    NASA Astrophysics Data System (ADS)

    Trędak, Przemysław; Rudnicki, Witold R.; Majewski, Jacek A.

    2016-09-01

    The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPU to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.

  5. Fast and Accurate Support Vector Machines on Large Scale Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vishnu, Abhinav; Narasimhan, Jayenthi; Holder, Larry

    Support Vector Machines (SVM) is a supervised Machine Learning and Data Mining (MLDM) algorithm, which has become ubiquitous largely due to its high accuracy and obliviousness to dimensionality. The objective of SVM is to find an optimal boundary --- also known as hyperplane --- which separates the samples (examples in a dataset) of different classes by a maximum margin. Usually, very few samples contribute to the definition of the boundary. However, existing parallel algorithms use the entire dataset for finding the boundary, which is sub-optimal for performance reasons. In this paper, we propose a novel distributed memory algorithm to eliminatemore » the samples which do not contribute to the boundary definition in SVM. We propose several heuristics, which range from early (aggressive) to late (conservative) elimination of the samples, such that the overall time for generating the boundary is reduced considerably. In a few cases, a sample may be eliminated (shrunk) pre-emptively --- potentially resulting in an incorrect boundary. We propose a scalable approach to synchronize the necessary data structures such that the proposed algorithm maintains its accuracy. We consider the necessary trade-offs of single/multiple synchronization using in-depth time-space complexity analysis. We implement the proposed algorithm using MPI and compare it with libsvm--- de facto sequential SVM software --- which we enhance with OpenMP for multi-core/many-core parallelism. Our proposed approach shows excellent efficiency using up to 4096 processes on several large datasets such as UCI HIGGS Boson dataset and Offending URL dataset.« less

  6. PAREMD: A parallel program for the evaluation of momentum space properties of atoms and molecules

    NASA Astrophysics Data System (ADS)

    Meena, Deep Raj; Gadre, Shridhar R.; Balanarayan, P.

    2018-03-01

    The present work describes a code for evaluating the electron momentum density (EMD), its moments and the associated Shannon information entropy for a multi-electron molecular system. The code works specifically for electronic wave functions obtained from traditional electronic structure packages such as GAMESS and GAUSSIAN. For the momentum space orbitals, the general expression for Gaussian basis sets in position space is analytically Fourier transformed to momentum space Gaussian basis functions. The molecular orbital coefficients of the wave function are taken as an input from the output file of the electronic structure calculation. The analytic expressions of EMD are evaluated over a fine grid and the accuracy of the code is verified by a normalization check and a numerical kinetic energy evaluation which is compared with the analytic kinetic energy given by the electronic structure package. Apart from electron momentum density, electron density in position space has also been integrated into this package. The program is written in C++ and is executed through a Shell script. It is also tuned for multicore machines with shared memory through OpenMP. The program has been tested for a variety of molecules and correlated methods such as CISD, Møller-Plesset second order (MP2) theory and density functional methods. For correlated methods, the PAREMD program uses natural spin orbitals as an input. The program has been benchmarked for a variety of Gaussian basis sets for different molecules showing a linear speedup on a parallel architecture.

  7. CCC7-119 Reactive Molecular Dynamics Simulations of Hot Spot Growth in Shocked Energetic Materials

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thompson, Aidan P.

    2015-03-01

    The purpose of this work is to understand how defects control initiation in energetic materials used in stockpile components; Sequoia gives us the core-count to run very large-scale simulations of up to 10 million atoms and; Using an OpenMP threaded implementation of the ReaxFF package in LAMMPS, we have been able to get good parallel efficiency running on 16k nodes of Sequoia, with 1 hardware thread per core.

  8. OpenMP Parallelization and Optimization of Graph-based Machine Learning Algorithms

    DTIC Science & Technology

    2016-05-01

    composed of hyper - spectral video sequences recording the release of chemical plumes at the Dugway Proving Ground. We use the 329 frames of the...video. Each frame is a hyper - spectral image with dimension 128 × 320 × 129, where 129 is the dimension of the channel of each pixel. The total number of...j=1 . Then we use the nested for- loop to calculate the values of WXY by the formula (1). We then put the corresponding value in an array which

  9. Compiled MPI: Cost-Effective Exascale Applications Development

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bronevetsky, G; Quinlan, D; Lumsdaine, A

    2012-04-10

    The complexity of petascale and exascale machines makes it increasingly difficult to develop applications that can take advantage of them. Future systems are expected to feature billion-way parallelism, complex heterogeneous compute nodes and poor availability of memory (Peter Kogge, 2008). This new challenge for application development is motivating a significant amount of research and development on new programming models and runtime systems designed to simplify large-scale application development. Unfortunately, DoE has significant multi-decadal investment in a large family of mission-critical scientific applications. Scaling these applications to exascale machines will require a significant investment that will dwarf the costs of hardwaremore » procurement. A key reason for the difficulty in transitioning today's applications to exascale hardware is their reliance on explicit programming techniques, such as the Message Passing Interface (MPI) programming model to enable parallelism. MPI provides a portable and high performance message-passing system that enables scalable performance on a wide variety of platforms. However, it also forces developers to lock the details of parallelization together with application logic, making it very difficult to adapt the application to significant changes in the underlying system. Further, MPI's explicit interface makes it difficult to separate the application's synchronization and communication structure, reducing the amount of support that can be provided by compiler and run-time tools. This is in contrast to the recent research on more implicit parallel programming models such as Chapel, OpenMP and OpenCL, which promise to provide significantly more flexibility at the cost of reimplementing significant portions of the application. We are developing CoMPI, a novel compiler-driven approach to enable existing MPI applications to scale to exascale systems with minimal modifications that can be made incrementally over the application's lifetime. It includes: (1) New set of source code annotations, inserted either manually or automatically, that will clarify the application's use of MPI to the compiler infrastructure, enabling greater accuracy where needed; (2) A compiler transformation framework that leverages these annotations to transform the original MPI source code to improve its performance and scalability; (3) Novel MPI runtime implementation techniques that will provide a rich set of functionality extensions to be used by applications that have been transformed by our compiler; and (4) A novel compiler analysis that leverages simple user annotations to automatically extract the application's communication structure and synthesize most complex code annotations.« less

  10. Optimization of groundwater artificial recharge systems using a genetic algorithm: a case study in Beijing, China

    NASA Astrophysics Data System (ADS)

    Hao, Qichen; Shao, Jingli; Cui, Yali; Zhang, Qiulan; Huang, Linxian

    2018-05-01

    An optimization approach is used for the operation of groundwater artificial recharge systems in an alluvial fan in Beijing, China. The optimization model incorporates a transient groundwater flow model, which allows for simulation of the groundwater response to artificial recharge. The facilities' operation with regard to recharge rates is formulated as a nonlinear programming problem to maximize the volume of surface water recharged into the aquifers under specific constraints. This optimization problem is solved by the parallel genetic algorithm (PGA) based on OpenMP, which could substantially reduce the computation time. To solve the PGA with constraints, the multiplicative penalty method is applied. In addition, the facilities' locations are implicitly determined on the basis of the results of the recharge-rate optimizations. Two scenarios are optimized and the optimal results indicate that the amount of water recharged into the aquifers will increase without exceeding the upper limits of the groundwater levels. Optimal operation of this artificial recharge system can also contribute to the more effective recovery of the groundwater storage capacity.

  11. Parallel Execution of Functional Mock-up Units in Buildings Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ozmen, Ozgur; Nutaro, James J.; New, Joshua Ryan

    2016-06-30

    A Functional Mock-up Interface (FMI) defines a standardized interface to be used in computer simulations to develop complex cyber-physical systems. FMI implementation by a software modeling tool enables the creation of a simulation model that can be interconnected, or the creation of a software library called a Functional Mock-up Unit (FMU). This report describes an FMU wrapper implementation that imports FMUs into a C++ environment and uses an Euler solver that executes FMUs in parallel using Open Multi-Processing (OpenMP). The purpose of this report is to elucidate the runtime performance of the solver when a multi-component system is imported asmore » a single FMU (for the whole system) or as multiple FMUs (for different groups of components as sub-systems). This performance comparison is conducted using two test cases: (1) a simple, multi-tank problem; and (2) a more realistic use case based on the Modelica Buildings Library. In both test cases, the performance gains are promising when each FMU consists of a large number of states and state events that are wrapped in a single FMU. Load balancing is demonstrated to be a critical factor in speeding up parallel execution of multiple FMUs.« less

  12. PDF modeling of turbulent flows on unstructured grids

    NASA Astrophysics Data System (ADS)

    Bakosi, Jozsef

    In probability density function (PDF) methods of turbulent flows, the joint PDF of several flow variables is computed by numerically integrating a system of stochastic differential equations for Lagrangian particles. Because the technique solves a transport equation for the PDF of the velocity and scalars, a mathematically exact treatment of advection, viscous effects and arbitrarily complex chemical reactions is possible; these processes are treated without closure assumptions. A set of algorithms is proposed to provide an efficient solution of the PDF transport equation modeling the joint PDF of turbulent velocity, frequency and concentration of a passive scalar in geometrically complex configurations. An unstructured Eulerian grid is employed to extract Eulerian statistics, to solve for quantities represented at fixed locations of the domain and to track particles. All three aspects regarding the grid make use of the finite element method. Compared to hybrid methods, the current methodology is stand-alone, therefore it is consistent both numerically and at the level of turbulence closure without the use of consistency conditions. Since both the turbulent velocity and scalar concentration fields are represented in a stochastic way, the method allows for a direct and close interaction between these fields, which is beneficial in computing accurate scalar statistics. Boundary conditions implemented along solid bodies are of the free-slip and no-slip type without the need for ghost elements. Boundary layers at no-slip boundaries are either fully resolved down to the viscous sublayer, explicitly modeling the high anisotropy and inhomogeneity of the low-Reynolds-number wall region without damping or wall-functions or specified via logarithmic wall-functions. As in moment closures and large eddy simulation, these wall-treatments provide the usual trade-off between resolution and computational cost as required by the given application. Particular attention is focused on modeling the dispersion of passive scalars in inhomogeneous turbulent flows. Two different micromixing models are investigated that incorporate the effect of small scale mixing on the transported scalar: the widely used interaction by exchange with the mean and the interaction by exchange with the conditional mean model. An adaptive algorithm to compute the velocity-conditioned scalar mean is proposed that homogenizes the statistical error over the sample space with no assumption on the shape of the underlying velocity PDF. The development also concentrates on a generally applicable micromixing timescale for complex flow domains. Several newly developed algorithms are described in detail that facilitate a stable numerical solution in arbitrarily complex flow geometries, including a stabilized mean-pressure projection scheme, the estimation of conditional and unconditional Eulerian statistics and their derivatives from stochastic particle fields employing finite element shapefunctions, particle tracking through unstructured grids, an efficient particle redistribution procedure and techniques related to efficient random number generation. The algorithm is validated and tested by computing three different turbulent flows: the fully developed turbulent channel flow, a street canyon (or cavity) flow and the turbulent wake behind a circular cylinder at a sub-critical Reynolds number. The solver has been parallelized and optimized for shared memory and multi-core architectures using the OpenMP standard. Relevant aspects of performance and parallelism on cache-based shared memory machines are discussed and presented in detail. The methodology shows great promise in the simulation of high-Reynolds-number incompressible inert or reactive turbulent flows in realistic configurations.

  13. Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

    NASA Astrophysics Data System (ADS)

    Bellerby, Tim

    2015-04-01

    PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195

  14. Parallel 3D Mortar Element Method for Adaptive Nonconforming Meshes

    NASA Technical Reports Server (NTRS)

    Feng, Huiyu; Mavriplis, Catherine; VanderWijngaart, Rob; Biswas, Rupak

    2004-01-01

    High order methods are frequently used in computational simulation for their high accuracy. An efficient way to avoid unnecessary computation in smooth regions of the solution is to use adaptive meshes which employ fine grids only in areas where they are needed. Nonconforming spectral elements allow the grid to be flexibly adjusted to satisfy the computational accuracy requirements. The method is suitable for computational simulations of unsteady problems with very disparate length scales or unsteady moving features, such as heat transfer, fluid dynamics or flame combustion. In this work, we select the Mark Element Method (MEM) to handle the non-conforming interfaces between elements. A new technique is introduced to efficiently implement MEM in 3-D nonconforming meshes. By introducing an "intermediate mortar", the proposed method decomposes the projection between 3-D elements and mortars into two steps. In each step, projection matrices derived in 2-D are used. The two-step method avoids explicitly forming/deriving large projection matrices for 3-D meshes, and also helps to simplify the implementation. This new technique can be used for both h- and p-type adaptation. This method is applied to an unsteady 3-D moving heat source problem. With our new MEM implementation, mesh adaptation is able to efficiently refine the grid near the heat source and coarsen the grid once the heat source passes. The savings in computational work resulting from the dynamic mesh adaptation is demonstrated by the reduction of the the number of elements used and CPU time spent. MEM and mesh adaptation, respectively, bring irregularity and dynamics to the computer memory access pattern. Hence, they provide a good way to gauge the performance of computer systems when running scientific applications whose memory access patterns are irregular and unpredictable. We select a 3-D moving heat source problem as the Unstructured Adaptive (UA) grid benchmark, a new component of the NAS Parallel Benchmarks (NPB). In this paper, we present some interesting performance results of ow OpenMP parallel implementation on different architectures such as the SGI Origin2000, SGI Altix, and Cray MTA-2.

  15. Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils.

    PubMed

    Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton

    2014-11-11

    We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.

  16. Tycho 2: A Proxy Application for Kinetic Transport Sweeps

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Garrett, Charles Kristopher; Warsa, James S.

    2016-09-14

    Tycho 2 is a proxy application that implements discrete ordinates (SN) kinetic transport sweeps on unstructured, 3D, tetrahedral meshes. It has been designed to be small and require minimal dependencies to make collaboration and experimentation as easy as possible. Tycho 2 has been released as open source software. The software is currently in a beta release with plans for a stable release (version 1.0) before the end of the year. The code is parallelized via MPI across spatial cells and OpenMP across angles. Currently, several parallelization algorithms are implemented.

  17. OFF, Open source Finite volume Fluid dynamics code: A free, high-order solver based on parallel, modular, object-oriented Fortran API

    NASA Astrophysics Data System (ADS)

    Zaghi, S.

    2014-07-01

    OFF, an open source (free software) code for performing fluid dynamics simulations, is presented. The aim of OFF is to solve, numerically, the unsteady (and steady) compressible Navier-Stokes equations of fluid dynamics by means of finite volume techniques: the research background is mainly focused on high-order (WENO) schemes for multi-fluids, multi-phase flows over complex geometries. To this purpose a highly modular, object-oriented application program interface (API) has been developed. In particular, the concepts of data encapsulation and inheritance available within Fortran language (from standard 2003) have been stressed in order to represent each fluid dynamics "entity" (e.g. the conservative variables of a finite volume, its geometry, etc…) by a single object so that a large variety of computational libraries can be easily (and efficiently) developed upon these objects. The main features of OFF can be summarized as follows: Programming LanguageOFF is written in standard (compliant) Fortran 2003; its design is highly modular in order to enhance simplicity of use and maintenance without compromising the efficiency; Parallel Frameworks Supported the development of OFF has been also targeted to maximize the computational efficiency: the code is designed to run on shared-memory multi-cores workstations and distributed-memory clusters of shared-memory nodes (supercomputers); the code's parallelization is based on Open Multiprocessing (OpenMP) and Message Passing Interface (MPI) paradigms; Usability, Maintenance and Enhancement in order to improve the usability, maintenance and enhancement of the code also the documentation has been carefully taken into account; the documentation is built upon comprehensive comments placed directly into the source files (no external documentation files needed): these comments are parsed by means of doxygen free software producing high quality html and latex documentation pages; the distributed versioning system referred as git has been adopted in order to facilitate the collaborative maintenance and improvement of the code; CopyrightsOFF is a free software that anyone can use, copy, distribute, study, change and improve under the GNU Public License version 3. The present paper is a manifesto of OFF code and presents the currently implemented features and ongoing developments. This work is focused on the computational techniques adopted and a detailed description of the main API characteristics is reported. OFF capabilities are demonstrated by means of one and two dimensional examples and a three dimensional real application.

  18. LFRic: Building a new Unified Model

    NASA Astrophysics Data System (ADS)

    Melvin, Thomas; Mullerworth, Steve; Ford, Rupert; Maynard, Chris; Hobson, Mike

    2017-04-01

    The LFRic project, named for Lewis Fry Richardson, aims to develop a replacement for the Met Office Unified Model in order to meet the challenges which will be presented by the next generation of exascale supercomputers. This project, a collaboration between the Met Office, STFC Daresbury and the University of Manchester, builds on the earlier GungHo project to redesign the dynamical core, in partnership with NERC. The new atmospheric model aims to retain the performance of the current ENDGame dynamical core and associated subgrid physics, while also enabling a far greater scalability and flexibility to accommodate future supercomputer architectures. Design of the model revolves around a principle of a 'separation of concerns', whereby the natural science aspects of the code can be developed without worrying about the underlying architecture, while machine dependent optimisations can be carried out at a high level. These principles are put into practice through the development of an autogenerated Parallel Systems software layer (known as the PSy layer) using a domain-specific compiler called PSyclone. The prototype model includes a re-write of the dynamical core using a mixed finite element method, in which different function spaces are used to represent the various fields. It is able to run in parallel with MPI and OpenMP and has been tested on over 200,000 cores. In this talk an overview of the both the natural science and computational science implementations of the model will be presented.

  19. A dual-trace model for visual sensory memory.

    PubMed

    Cappiello, Marcus; Zhang, Weiwei

    2016-11-01

    Visual sensory memory refers to a transient memory lingering briefly after the stimulus offset. Although previous literature suggests that visual sensory memory is supported by a fine-grained trace for continuous representation and a coarse-grained trace of categorical information, simultaneous separation and assessment of these traces can be difficult without a quantitative model. The present study used a continuous estimation procedure to test a novel mathematical model of the dual-trace hypothesis of visual sensory memory according to which visual sensory memory could be modeled as a mixture of 2 von Mises (2VM) distributions differing in standard deviation. When visual sensory memory and working memory (WM) for colors were distinguished using different experimental manipulations in the first 3 experiments, the 2VM model outperformed Zhang and Luck (2008) standard mixture model (SM) representing a mixture of a single memory trace and random guesses, even though SM outperformed 2VM for WM. Experiment 4 generalized 2VM's advantages of fitting visual sensory memory data over SM from color to orientation. Furthermore, a single trace model and 4 other alternative models were ruled out, suggesting the necessity and sufficiency of dual traces for visual sensory memory. Together these results support the dual-trace model of visual sensory memory and provide a preliminary inquiry into the nature of information loss from visual sensory memory to WM. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  20. Attention, Working Memory, and Long-Term Memory in Multimedia Learning: An Integrated Perspective Based on Process Models of Working Memory

    ERIC Educational Resources Information Center

    Schweppe, Judith; Rummer, Ralf

    2014-01-01

    Cognitive models of multimedia learning such as the Cognitive Theory of Multimedia Learning (Mayer 2009) or the Cognitive Load Theory (Sweller 1999) are based on different cognitive models of working memory (e.g., Baddeley 1986) and long-term memory. The current paper describes a working memory model that has recently gained popularity in basic…

  1. Evaluating the networking characteristics of the Cray XC-40 Intel Knights Landing-based Cori supercomputer at NERSC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Doerfler, Douglas; Austin, Brian; Cook, Brandon

    There are many potential issues associated with deploying the Intel Xeon Phi™ (code named Knights Landing [KNL]) manycore processor in a large-scale supercomputer. One in particular is the ability to fully utilize the high-speed communications network, given that the serial performance of a Xeon Phi TM core is a fraction of a Xeon®core. In this paper, we take a look at the trade-offs associated with allocating enough cores to fully utilize the Aries high-speed network versus cores dedicated to computation, e.g., the trade-off between MPI and OpenMP. In addition, we evaluate new features of Cray MPI in support of KNL,more » such as internode optimizations. We also evaluate one-sided programming models such as Unified Parallel C. We quantify the impact of the above trade-offs and features using a suite of National Energy Research Scientific Computing Center applications.« less

  2. spMC: an R-package for 3D lithological reconstructions based on spatial Markov chains

    NASA Astrophysics Data System (ADS)

    Sartore, Luca; Fabbri, Paolo; Gaetan, Carlo

    2016-09-01

    The paper presents the spatial Markov Chains (spMC) R-package and a case study of subsoil simulation/prediction located in a plain site of Northeastern Italy. spMC is a quite complete collection of advanced methods for data inspection, besides spMC implements Markov Chain models to estimate experimental transition probabilities of categorical lithological data. Furthermore, simulation methods based on most known prediction methods (as indicator Kriging and CoKriging) were implemented in spMC package. Moreover, other more advanced methods are available for simulations, e.g. path methods and Bayesian procedures, that exploit the maximum entropy. Since the spMC package was developed for intensive geostatistical computations, part of the code is implemented for parallel computations via the OpenMP constructs. A final analysis of this computational efficiency compares the simulation/prediction algorithms by using different numbers of CPU cores, and considering the example data set of the case study included in the package.

  3. A Parallel 2D Numerical Simulation of Tumor Cells Necrosis by Local Hyperthermia

    NASA Astrophysics Data System (ADS)

    Reis, R. F.; Loureiro, F. S.; Lobosco, M.

    2014-03-01

    Hyperthermia has been widely used in cancer treatment to destroy tumors. The main idea of the hyperthermia is to heat a specific region like a tumor so that above a threshold temperature the tumor cells are destroyed. This can be accomplished by many heat supply techniques and the use of magnetic nanoparticles that generate heat when an alternating magnetic field is applied has emerged as a promise technique. In the present paper, the Pennes bioheat transfer equation is adopted to model the thermal tumor ablation in the context of magnetic nanoparticles. Numerical simulations are carried out considering different injection sites for the nanoparticles in an attempt to achieve better hyperthermia conditions. Explicit finite difference method is employed to solve the equations. However, a large amount of computation is required for this purpose. Therefore, this work also presents an initial attempt to improve performance using OpenMP, a parallel programming API. Experimental results were quite encouraging: speedups around 35 were obtained on a 64-core machine.

  4. GPU accelerated particle visualization with Splotch

    NASA Astrophysics Data System (ADS)

    Rivi, M.; Gheller, C.; Dykes, T.; Krokos, M.; Dolag, K.

    2014-07-01

    Splotch is a rendering algorithm for exploration and visual discovery in particle-based datasets coming from astronomical observations or numerical simulations. The strengths of the approach are production of high quality imagery and support for very large-scale datasets through an effective mix of the OpenMP and MPI parallel programming paradigms. This article reports our experiences in re-designing Splotch for exploiting emerging HPC architectures nowadays increasingly populated with GPUs. A performance model is introduced to guide our re-factoring of Splotch. A number of parallelization issues are discussed, in particular relating to race conditions and workload balancing, towards achieving optimal performances. Our implementation was accomplished by using the CUDA programming paradigm. Our strategy is founded on novel schemes achieving optimized data organization and classification of particles. We deploy a reference cosmological simulation to present performance results on acceleration gains and scalability. We finally outline our vision for future work developments including possibilities for further optimizations and exploitation of hybrid systems and emerging accelerators.

  5. Order-memory and association-memory.

    PubMed

    Caplan, Jeremy B

    2015-09-01

    Two highly studied memory functions are memory for associations (items presented in pairs, such as SALT-PEPPER) and memory for order (a list of items whose order matters, such as a telephone number). Order- and association-memory are at the root of many forms of behaviour, from wayfinding, to language, to remembering people's names. Most researchers have investigated memory for order separately from memory for associations. Exceptions to this, associative-chaining models build an ordered list from associations between pairs of items, quite literally understanding association- and order-memory together. Alternatively, positional-coding models have been used to explain order-memory as a completely distinct function from association-memory. Both classes of model have found empirical support and both have faced serious challenges. I argue that models that combine both associative chaining and positional coding are needed. One such hybrid model, which relies on brain-activity rhythms, is promising, but remains to be tested rigourously. I consider two relatively understudied memory behaviours that demand a combination of order- and association-information: memory for the order of items within associations (is it William James or James William?) and judgments of relative order (who left the party earlier, Hermann or William?). Findings from these underexplored procedures are already difficult to reconcile with existing association-memory and order-memory models. Further work with such intermediate experimental paradigms has the potential to provide powerful findings to constrain and guide models into the future, with the aim of explaining a large range of memory functions, encompassing both association- and order-memory. (c) 2015 APA, all rights reserved).

  6. A New Conceptualization of Human Visual Sensory-Memory.

    PubMed

    Öğmen, Haluk; Herzog, Michael H

    2016-01-01

    Memory is an essential component of cognition and disorders of memory have significant individual and societal costs. The Atkinson-Shiffrin "modal model" forms the foundation of our understanding of human memory. It consists of three stores: Sensory Memory (SM), whose visual component is called iconic memory, Short-Term Memory (STM; also called working memory, WM), and Long-Term Memory (LTM). Since its inception, shortcomings of all three components of the modal model have been identified. While the theories of STM and LTM underwent significant modifications to address these shortcomings, models of the iconic memory remained largely unchanged: A high capacity but rapidly decaying store whose contents are encoded in retinotopic coordinates, i.e., according to how the stimulus is projected on the retina. The fundamental shortcoming of iconic memory models is that, because contents are encoded in retinotopic coordinates, the iconic memory cannot hold any useful information under normal viewing conditions when objects or the subject are in motion. Hence, half-century after its formulation, it remains an unresolved problem whether and how the first stage of the modal model serves any useful function and how subsequent stages of the modal model receive inputs from the environment. Here, we propose a new conceptualization of human visual sensory memory by introducing an additional component whose reference-frame consists of motion-grouping based coordinates rather than retinotopic coordinates. We review data supporting this new model and discuss how it offers solutions to the paradoxes of the traditional model of sensory memory.

  7. A simplified computational memory model from information processing.

    PubMed

    Zhang, Lanhua; Zhang, Dongsheng; Deng, Yuqin; Ding, Xiaoqian; Wang, Yan; Tang, Yiyuan; Sun, Baoliang

    2016-11-23

    This paper is intended to propose a computational model for memory from the view of information processing. The model, called simplified memory information retrieval network (SMIRN), is a bi-modular hierarchical functional memory network by abstracting memory function and simulating memory information processing. At first meta-memory is defined to express the neuron or brain cortices based on the biology and graph theories, and we develop an intra-modular network with the modeling algorithm by mapping the node and edge, and then the bi-modular network is delineated with intra-modular and inter-modular. At last a polynomial retrieval algorithm is introduced. In this paper we simulate the memory phenomena and functions of memorization and strengthening by information processing algorithms. The theoretical analysis and the simulation results show that the model is in accordance with the memory phenomena from information processing view.

  8. A New Conceptualization of Human Visual Sensory-Memory

    PubMed Central

    Öğmen, Haluk; Herzog, Michael H.

    2016-01-01

    Memory is an essential component of cognition and disorders of memory have significant individual and societal costs. The Atkinson–Shiffrin “modal model” forms the foundation of our understanding of human memory. It consists of three stores: Sensory Memory (SM), whose visual component is called iconic memory, Short-Term Memory (STM; also called working memory, WM), and Long-Term Memory (LTM). Since its inception, shortcomings of all three components of the modal model have been identified. While the theories of STM and LTM underwent significant modifications to address these shortcomings, models of the iconic memory remained largely unchanged: A high capacity but rapidly decaying store whose contents are encoded in retinotopic coordinates, i.e., according to how the stimulus is projected on the retina. The fundamental shortcoming of iconic memory models is that, because contents are encoded in retinotopic coordinates, the iconic memory cannot hold any useful information under normal viewing conditions when objects or the subject are in motion. Hence, half-century after its formulation, it remains an unresolved problem whether and how the first stage of the modal model serves any useful function and how subsequent stages of the modal model receive inputs from the environment. Here, we propose a new conceptualization of human visual sensory memory by introducing an additional component whose reference-frame consists of motion-grouping based coordinates rather than retinotopic coordinates. We review data supporting this new model and discuss how it offers solutions to the paradoxes of the traditional model of sensory memory. PMID:27375519

  9. A Temporal Ratio Model of Memory

    ERIC Educational Resources Information Center

    Brown, Gordon D. A.; Neath, Ian; Chater, Nick

    2007-01-01

    A model of memory retrieval is described. The model embodies four main claims: (a) temporal memory--traces of items are represented in memory partly in terms of their temporal distance from the present; (b) scale-similarity--similar mechanisms govern retrieval from memory over many different timescales; (c) local distinctiveness--performance on a…

  10. The reduction of adult neurogenesis in depression impairs the retrieval of new as well as remote episodic memory

    PubMed Central

    Fang, Jing; Demic, Selver; Cheng, Sen

    2018-01-01

    Major depressive disorder (MDD) is associated with an impairment of episodic memory, but the mechanisms underlying this deficit remain unclear. Animal models of MDD find impaired adult neurogenesis (AN) in the dentate gyrus (DG), and AN in DG has been suggested to play a critical role in reducing the interference between overlapping memories through pattern separation. Here, we study the effect of reduced AN in MDD on the accuracy of episodic memory using computational modeling. We focus on how memory is affected when periods with a normal rate of AN (asymptomatic states) alternate with periods with a low rate (depressive episodes), which has never been studied before. Also, unlike previous models of adult neurogenesis, which consider memories as static patterns, we model episodic memory as sequences of neural activity patterns. In our model, AN adds additional random components to the memory patterns, which results in the decorrelation of similar patterns. Consistent with previous studies, higher rates of AN lead to higher memory accuracy in our model, which implies that memories stored in the depressive state are impaired. Intriguingly, our model makes the novel prediction that memories stored in an earlier asymptomatic state are also impaired by a later depressive episode. This retrograde effect exacerbates with increased duration of the depressive episode. Finally, pattern separation at the sensory processing stage does not improve, but rather worsens, the accuracy of episodic memory retrieval, suggesting an explanation for why AN is found in brain areas serving memory rather than sensory function. In conclusion, while cognitive retrieval biases might contribute to episodic memory deficits in MDD, our model suggests a mechanistic explanation that affects all episodic memories, regardless of emotional relevance. PMID:29879169

  11. The reduction of adult neurogenesis in depression impairs the retrieval of new as well as remote episodic memory.

    PubMed

    Fang, Jing; Demic, Selver; Cheng, Sen

    2018-01-01

    Major depressive disorder (MDD) is associated with an impairment of episodic memory, but the mechanisms underlying this deficit remain unclear. Animal models of MDD find impaired adult neurogenesis (AN) in the dentate gyrus (DG), and AN in DG has been suggested to play a critical role in reducing the interference between overlapping memories through pattern separation. Here, we study the effect of reduced AN in MDD on the accuracy of episodic memory using computational modeling. We focus on how memory is affected when periods with a normal rate of AN (asymptomatic states) alternate with periods with a low rate (depressive episodes), which has never been studied before. Also, unlike previous models of adult neurogenesis, which consider memories as static patterns, we model episodic memory as sequences of neural activity patterns. In our model, AN adds additional random components to the memory patterns, which results in the decorrelation of similar patterns. Consistent with previous studies, higher rates of AN lead to higher memory accuracy in our model, which implies that memories stored in the depressive state are impaired. Intriguingly, our model makes the novel prediction that memories stored in an earlier asymptomatic state are also impaired by a later depressive episode. This retrograde effect exacerbates with increased duration of the depressive episode. Finally, pattern separation at the sensory processing stage does not improve, but rather worsens, the accuracy of episodic memory retrieval, suggesting an explanation for why AN is found in brain areas serving memory rather than sensory function. In conclusion, while cognitive retrieval biases might contribute to episodic memory deficits in MDD, our model suggests a mechanistic explanation that affects all episodic memories, regardless of emotional relevance.

  12. Graphics Processing Unit Acceleration of Gyrokinetic Turbulence Simulations

    NASA Astrophysics Data System (ADS)

    Hause, Benjamin; Parker, Scott; Chen, Yang

    2013-10-01

    We find a substantial increase in on-node performance using Graphics Processing Unit (GPU) acceleration in gyrokinetic delta-f particle-in-cell simulation. Optimization is performed on a two-dimensional slab gyrokinetic particle simulation using the Portland Group Fortran compiler with the OpenACC compiler directives and Fortran CUDA. Mixed implementation of both Open-ACC and CUDA is demonstrated. CUDA is required for optimizing the particle deposition algorithm. We have implemented the GPU acceleration on a third generation Core I7 gaming PC with two NVIDIA GTX 680 GPUs. We find comparable, or better, acceleration relative to the NERSC DIRAC cluster with the NVIDIA Tesla C2050 computing processor. The Tesla C 2050 is about 2.6 times more expensive than the GTX 580 gaming GPU. We also see enormous speedups (10 or more) on the Titan supercomputer at Oak Ridge with Kepler K20 GPUs. Results show speed-ups comparable or better than that of OpenMP models utilizing multiple cores. The use of hybrid OpenACC, CUDA Fortran, and MPI models across many nodes will also be discussed. Optimization strategies will be presented. We will discuss progress on optimizing the comprehensive three dimensional general geometry GEM code.

  13. A Parallel Vector Machine for the PM Programming Language

    NASA Astrophysics Data System (ADS)

    Bellerby, Tim

    2016-04-01

    PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using standard OpenMP and MPI. Performance analyses of the PM vector machine, demonstrating its scaling properties with respect to domain size and the number of processor nodes will be presented for a range of hardware configurations. The PM software and language definition are being made available under unrestrictive MIT and Creative Commons Attribution licenses respectively: www.pm-lang.org.

  14. Contrasting single and multi-component working-memory systems in dual tasking.

    PubMed

    Nijboer, Menno; Borst, Jelmer; van Rijn, Hedderik; Taatgen, Niels

    2016-05-01

    Working memory can be a major source of interference in dual tasking. However, there is no consensus on whether this interference is the result of a single working memory bottleneck, or of interactions between different working memory components that together form a complete working-memory system. We report a behavioral and an fMRI dataset in which working memory requirements are manipulated during multitasking. We show that a computational cognitive model that assumes a distributed version of working memory accounts for both behavioral and neuroimaging data better than a model that takes a more centralized approach. The model's working memory consists of an attentional focus, declarative memory, and a subvocalized rehearsal mechanism. Thus, the data and model favor an account where working memory interference in dual tasking is the result of interactions between different resources that together form a working-memory system. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. The Development of High-speed Full-function Storm Surge Model and the Case Study of 2013 Typhoon Haiyan

    NASA Astrophysics Data System (ADS)

    Tsai, Y. L.; Wu, T. R.; Lin, C. Y.; Chuang, M. H.; Lin, C. W.

    2016-02-01

    An ideal storm surge operational model should feature as: 1. Large computational domain which covers the complete typhoon life cycle. 2. Supporting both parametric and atmospheric models. 3. Capable of calculating inundation area for risk assessment. 4. Tides are included for accurate inundation simulation. Literature review shows that not many operational models reach the goals for the fast calculation, and most of the models have limited functions. In this paper, a well-developed COMCOT (COrnell Multi-grid Coupled of Tsunami Model) tsunami model is chosen as the kernel to establish a storm surge model which solves the nonlinear shallow water equations on both spherical and Cartesian coordinates directly. The complete evolution of storm surge including large-scale propagation and small-scale offshore run-up can be simulated by nested-grid scheme. The global tide model TPXO 7.2 established by Oregon State University is coupled to provide astronomical boundary conditions. The atmospheric model named WRF (Weather Research and Forecasting Model) is also coupled to provide metrological fields. The high-efficiency thin-film method is adopted to evaluate the storm surge inundation. Our in-house model has been optimized by OpenMp (Open Multi-Processing) with the performance which is 10 times faster than the original version and makes it an early-warning storm surge model. In this study, the thorough simulation of 2013 Typhoon Haiyan is performed. The detailed results will be presented in Oceanic Science Meeting of 2016 in terms of surge propagation and high-resolution inundation areas.

  16. A simplified computational memory model from information processing

    PubMed Central

    Zhang, Lanhua; Zhang, Dongsheng; Deng, Yuqin; Ding, Xiaoqian; Wang, Yan; Tang, Yiyuan; Sun, Baoliang

    2016-01-01

    This paper is intended to propose a computational model for memory from the view of information processing. The model, called simplified memory information retrieval network (SMIRN), is a bi-modular hierarchical functional memory network by abstracting memory function and simulating memory information processing. At first meta-memory is defined to express the neuron or brain cortices based on the biology and graph theories, and we develop an intra-modular network with the modeling algorithm by mapping the node and edge, and then the bi-modular network is delineated with intra-modular and inter-modular. At last a polynomial retrieval algorithm is introduced. In this paper we simulate the memory phenomena and functions of memorization and strengthening by information processing algorithms. The theoretical analysis and the simulation results show that the model is in accordance with the memory phenomena from information processing view. PMID:27876847

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas

    The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning.

  18. Analysis and optimization of gyrokinetic toroidal simulations on homogenous and heterogenous platforms

    DOE PAGES

    Ibrahim, Khaled Z.; Madduri, Kamesh; Williams, Samuel; ...

    2013-07-18

    The Gyrokinetic Toroidal Code (GTC) uses the particle-in-cell method to efficiently simulate plasma microturbulence. This paper presents novel analysis and optimization techniques to enhance the performance of GTC on large-scale machines. We introduce cell access analysis to better manage locality vs. synchronization tradeoffs on CPU and GPU-based architectures. Finally, our optimized hybrid parallel implementation of GTC uses MPI, OpenMP, and NVIDIA CUDA, achieves up to a 2× speedup over the reference Fortran version on multiple parallel systems, and scales efficiently to tens of thousands of cores.

  19. Effects of Working Memory Capacity and Domain Knowledge on Recall for Grocery Prices.

    PubMed

    Bermingham, Douglas; Gardner, Michael K; Woltz, Dan J

    2016-01-01

    Hambrick and Engle (2002) proposed 3 models of how domain knowledge and working memory capacity may work together to influence episodic memory: a "rich-get-richer" model, a "building blocks" model, and a "compensatory" model. Their results supported the rich-get-richer model, although later work by Hambrick and Oswald (2005) found support for a building blocks model. We investigated the effects of domain knowledge and working memory on recall of studied grocery prices. Working memory was measured with 3 simple span tasks. A contrast of realistic versus fictitious foods in the episodic memory task served as our manipulation of domain knowledge, because participants could not have domain knowledge of fictitious food prices. There was a strong effect for domain knowledge (realistic food-price pairs were easier to remember) and a moderate effect for working memory capacity (higher working memory capacity produced better recall). Furthermore, the interaction between domain knowledge and working memory produced a small but significant interaction in 1 measure of price recall. This supported the compensatory model and stands in contrast to previous research.

  20. Elements of episodic-like memory in animal models.

    PubMed

    Crystal, Jonathon D

    2009-03-01

    Representations of unique events from one's past constitute the content of episodic memories. A number of studies with non-human animals have revealed that animals remember specific episodes from their past (referred to as episodic-like memory). The development of animal models of memory holds enormous potential for gaining insight into the biological bases of human memory. Specifically, given the extensive knowledge of the rodent brain, the development of rodent models of episodic memory would open new opportunities to explore the neuroanatomical, neurochemical, neurophysiological, and molecular mechanisms of memory. Development of such animal models holds enormous potential for studying functional changes in episodic memory in animal models of Alzheimer's disease, amnesia, and other human memory pathologies. This article reviews several approaches that have been used to assess episodic-like memory in animals. The approaches reviewed include the discrimination of what, where, and when in a radial arm maze, dissociation of recollection and familiarity, object recognition, binding, unexpected questions, and anticipation of a reproductive state. The diversity of approaches may promote the development of converging lines of evidence on the difficult problem of assessing episodic-like memory in animals.

  1. Modeling individual differences in working memory performance: a source activation account

    PubMed Central

    Daily, Larry Z.; Lovett, Marsha C.; Reder, Lynne M.

    2008-01-01

    Working memory resources are needed for processing and maintenance of information during cognitive tasks. Many models have been developed to capture the effects of limited working memory resources on performance. However, most of these models do not account for the finding that different individuals show different sensitivities to working memory demands, and none of the models predicts individual subjects' patterns of performance. We propose a computational model that accounts for differences in working memory capacity in terms of a quantity called source activation, which is used to maintain goal-relevant information in an available state. We apply this model to capture the working memory effects of individual subjects at a fine level of detail across two experiments. This, we argue, strengthens the interpretation of source activation as working memory capacity. PMID:19079561

  2. Decoding memory features from hippocampal spiking activities using sparse classification models.

    PubMed

    Dong Song; Hampson, Robert E; Robinson, Brian S; Marmarelis, Vasilis Z; Deadwyler, Sam A; Berger, Theodore W

    2016-08-01

    To understand how memory information is encoded in the hippocampus, we build classification models to decode memory features from hippocampal CA3 and CA1 spatio-temporal patterns of spikes recorded from epilepsy patients performing a memory-dependent delayed match-to-sample task. The classification model consists of a set of B-spline basis functions for extracting memory features from the spike patterns, and a sparse logistic regression classifier for generating binary categorical output of memory features. Results show that classification models can extract significant amount of memory information with respects to types of memory tasks and categories of sample images used in the task, despite the high level of variability in prediction accuracy due to the small sample size. These results support the hypothesis that memories are encoded in the hippocampal activities and have important implication to the development of hippocampal memory prostheses.

  3. Time Constraints and Resource Sharing in Adults' Working Memory Spans

    ERIC Educational Resources Information Center

    Barrouillet, Pierre; Bernardin, Sophie; Camos, Valerie

    2004-01-01

    This article presents a new model that accounts for working memory spans in adults, the time-based resource-sharing model. The model assumes that both components (i.e., processing and maintenance) of the main working memory tasks require attention and that memory traces decay as soon as attention is switched away. Because memory retrievals are…

  4. Categorical Working Memory Representations are used in Delayed Estimation of Continuous Colors

    PubMed Central

    Hardman, Kyle O; Vergauwe, Evie; Ricker, Timothy J

    2016-01-01

    In the last decade, major strides have been made in understanding visual working memory through mathematical modeling of color production responses. In the delayed color estimation task (Wilken & Ma, 2004), participants are given a set of colored squares to remember and a few seconds later asked to reproduce those colors by clicking on a color wheel. The degree of error in these responses is characterized with mathematical models that estimate working memory precision and the proportion of items remembered by participants. A standard mathematical model of color memory assumes that items maintained in memory are remembered through memory for precise details about the particular studied shade of color. We contend that this model is incomplete in its present form because no mechanism is provided for remembering the coarse category of a studied color. In the present work we remedy this omission and present a model of visual working memory that includes both continuous and categorical memory representations. In two experiments we show that our new model outperforms this standard modeling approach, which demonstrates that categorical representations should be accounted for by mathematical models of visual working memory. PMID:27797548

  5. Categorical working memory representations are used in delayed estimation of continuous colors.

    PubMed

    Hardman, Kyle O; Vergauwe, Evie; Ricker, Timothy J

    2017-01-01

    In the last decade, major strides have been made in understanding visual working memory through mathematical modeling of color production responses. In the delayed color estimation task (Wilken & Ma, 2004), participants are given a set of colored squares to remember, and a few seconds later asked to reproduce those colors by clicking on a color wheel. The degree of error in these responses is characterized with mathematical models that estimate working memory precision and the proportion of items remembered by participants. A standard mathematical model of color memory assumes that items maintained in memory are remembered through memory for precise details about the particular studied shade of color. We contend that this model is incomplete in its present form because no mechanism is provided for remembering the coarse category of a studied color. In the present work, we remedy this omission and present a model of visual working memory that includes both continuous and categorical memory representations. In 2 experiments, we show that our new model outperforms this standard modeling approach, which demonstrates that categorical representations should be accounted for by mathematical models of visual working memory. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  6. Sleep on your memory traces: How sleep effects can be explained by Act-In, a functional memory model.

    PubMed

    Cherdieu, Mélaine; Versace, Rémy; Rey, Amandine E; Vallet, Guillaume T; Mazza, Stéphanie

    2018-06-01

    Numerous studies have explored the effect of sleep on memory. It is well known that a period of sleep, compared to a similar period of wakefulness, protects memories from interference, improves performance, and might also reorganize memory traces in a way that encourages creativity and rule extraction. It is assumed that these benefits come from the reactivation of brain networks, mainly involving the hippocampal structure, as well as from their synchronization with neocortical networks during sleep, thereby underpinning sleep-dependent memory consolidation and reorganization. However, this memory reorganization is difficult to explain within classical memory models. The present paper aims to describe whether the influence of sleep on memory could be explained using a multiple trace memory model that is consistent with the concept of embodied cognition: the Act-In (activation-integration) memory model. We propose an original approach to the results observed in sleep research on the basis of two simple mechanisms, namely activation and integration. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Attempting to model dissociations of memory.

    PubMed

    Reber, Paul J.

    2002-05-01

    Kinder and Shanks report simulations aimed at describing a single-system model of the dissociation between declarative and non-declarative memory. This model attempts to capture both Artificial Grammar Learning (AGL) and recognition memory with a single underlying representation. However, the model fails to reflect an essential feature of recognition memory - that it occurs after a single exposure - and the simulations may instead describe a potentially interesting property of over-training non-declarative memory.

  8. Models for Total-Dose Radiation Effects in Non-Volatile Memory

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Campbell, Philip Montgomery; Wix, Steven D.

    The objective of this work is to develop models to predict radiation effects in non- volatile memory: flash memory and ferroelectric RAM. In flash memory experiments have found that the internal high-voltage generators (charge pumps) are the most sensitive to radiation damage. Models are presented for radiation effects in charge pumps that demonstrate the experimental results. Floating gate models are developed for the memory cell in two types of flash memory devices by Intel and Samsung. These models utilize Fowler-Nordheim tunneling and hot electron injection to charge and erase the floating gate. Erase times are calculated from the models andmore » compared with experimental results for different radiation doses. FRAM is less sensitive to radiation than flash memory, but measurements show that above 100 Krad FRAM suffers from a large increase in leakage current. A model for this effect is developed which compares closely with the measurements.« less

  9. Implications of the Declarative/Procedural Model for Improving Second Language Learning: The Role of Memory Enhancement Techniques

    ERIC Educational Resources Information Center

    Ullman, Michael T.; Lovelett, Jarrett T.

    2018-01-01

    The declarative/procedural (DP) model posits that the learning, storage, and use of language critically depend on two learning and memory systems in the brain: declarative memory and procedural memory. Thus, on the basis of independent research on the memory systems, the model can generate specific and often novel predictions for language. Till…

  10. Reliability of Memories Protected by Multibit Error Correction Codes Against MBUs

    NASA Astrophysics Data System (ADS)

    Ming, Zhu; Yi, Xiao Li; Chang, Liu; Wei, Zhang Jian

    2011-02-01

    As technology scales, more and more memory cells can be placed in a die. Therefore, the probability that a single event induces multiple bit upsets (MBUs) in adjacent memory cells gets greater. Generally, multibit error correction codes (MECCs) are effective approaches to mitigate MBUs in memories. In order to evaluate the robustness of protected memories, reliability models have been widely studied nowadays. Instead of irradiation experiments, the models can be used to quickly evaluate the reliability of memories in the early design. To build an accurate model, some situations should be considered. Firstly, when MBUs are presented in memories, the errors induced by several events may overlap each other, which is more frequent than single event upset (SEU) case. Furthermore, radiation experiments show that the probability of MBUs strongly depends on angles of the radiation event. However, reliability models which consider the overlap of multiple bit errors and angles of radiation event have not been proposed in the present literature. In this paper, a more accurate model of memories with MECCs is presented. Both the overlap of multiple bit errors and angles of event are considered in the model, which produces a more precise analysis in the calculation of mean time to failure (MTTF) for memory systems under MBUs. In addition, memories with scrubbing and nonscrubbing are analyzed in the proposed model. Finally, we evaluate the reliability of memories under MBUs in Matlab. The simulation results verify the validity of the proposed model.

  11. Short-Term Memory and Aphasia: From Theory to Treatment.

    PubMed

    Minkina, Irene; Rosenberg, Samantha; Kalinyak-Fliszar, Michelene; Martin, Nadine

    2017-02-01

    This article reviews existing research on the interactions between verbal short-term memory and language processing impairments in aphasia. Theoretical models of short-term memory are reviewed, starting with a model assuming a separation between short-term memory and language, and progressing to models that view verbal short-term memory as a cognitive requirement of language processing. The review highlights a verbal short-term memory model derived from an interactive activation model of word retrieval. This model holds that verbal short-term memory encompasses the temporary activation of linguistic knowledge (e.g., semantic, lexical, and phonological features) during language production and comprehension tasks. Empirical evidence supporting this model, which views short-term memory in the context of the processes it subserves, is outlined. Studies that use a classic measure of verbal short-term memory (i.e., number of words/digits correctly recalled in immediate serial recall) as well as those that use more intricate measures (e.g., serial position effects in immediate serial recall) are discussed. Treatment research that uses verbal short-term memory tasks in an attempt to improve language processing is then summarized, with a particular focus on word retrieval. A discussion of the limitations of current research and possible future directions concludes the review. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  12. Short-Term Memory and Aphasia: From Theory to Treatment

    PubMed Central

    Minkina, Irene; Rosenberg, Samantha; Kalinyak-Fliszar, Michelene; Martin, Nadine

    2018-01-01

    This article reviews existing research on the interactions between verbal short-term memory and language processing impairments in aphasia. Theoretical models of short-term memory are reviewed, starting with a model assuming a separation between short-term memory and language, and progressing to models that view verbal short-term memory as a cognitive requirement of language processing. The review highlights a verbal short-term memory model derived from an interactive activation model of word retrieval. This model holds that verbal short-term memory encompasses the temporary activation of linguistic knowledge (e.g., semantic, lexical, and phonological features) during language production and comprehension tasks. Empirical evidence supporting this model, which views short-term memory in the context of the processes it subserves, is outlined. Studies that use a classic measure of verbal short-term memory (i.e., number of words/digits correctly recalled in immediate serial recall) as well as those that use more intricate measures (e.g., serial position effects in immediate serial recall) are discussed. Treatment research that uses verbal short-term memory tasks in an attempt to improve language processing is then summarized, with a particular focus on word retrieval. A discussion of the limitations of current research and possible future directions concludes the review. PMID:28201834

  13. Performance analysis of the FDTD method applied to holographic volume gratings: Multi-core CPU versus GPU computing

    NASA Astrophysics Data System (ADS)

    Francés, J.; Bleda, S.; Neipp, C.; Márquez, A.; Pascual, I.; Beléndez, A.

    2013-03-01

    The finite-difference time-domain method (FDTD) allows electromagnetic field distribution analysis as a function of time and space. The method is applied to analyze holographic volume gratings (HVGs) for the near-field distribution at optical wavelengths. Usually, this application requires the simulation of wide areas, which implies more memory and time processing. In this work, we propose a specific implementation of the FDTD method including several add-ons for a precise simulation of optical diffractive elements. Values in the near-field region are computed considering the illumination of the grating by means of a plane wave for different angles of incidence and including absorbing boundaries as well. We compare the results obtained by FDTD with those obtained using a matrix method (MM) applied to diffraction gratings. In addition, we have developed two optimized versions of the algorithm, for both CPU and GPU, in order to analyze the improvement of using the new NVIDIA Fermi GPU architecture versus highly tuned multi-core CPU as a function of the size simulation. In particular, the optimized CPU implementation takes advantage of the arithmetic and data transfer streaming SIMD (single instruction multiple data) extensions (SSE) included explicitly in the code and also of multi-threading by means of OpenMP directives. A good agreement between the results obtained using both FDTD and MM methods is obtained, thus validating our methodology. Moreover, the performance of the GPU is compared to the SSE+OpenMP CPU implementation, and it is quantitatively determined that a highly optimized CPU program can be competitive for a wider range of simulation sizes, whereas GPU computing becomes more powerful for large-scale simulations.

  14. A model of memory impairment in schizophrenia: cognitive and clinical factors associated with memory efficiency and memory errors.

    PubMed

    Brébion, Gildas; Bressan, Rodrigo A; Ohlsen, Ruth I; David, Anthony S

    2013-12-01

    Memory impairments in patients with schizophrenia have been associated with various cognitive and clinical factors. Hallucinations have been more specifically associated with errors stemming from source monitoring failure. We conducted a broad investigation of verbal memory and visual memory as well as source memory functioning in a sample of patients with schizophrenia. Various memory measures were tallied, and we studied their associations with processing speed, working memory span, and positive, negative, and depressive symptoms. Superficial and deep memory processes were differentially associated with processing speed, working memory span, avolition, depression, and attention disorders. Auditory/verbal and visual hallucinations were differentially associated with specific types of source memory error. We integrated all the results into a revised version of a previously published model of memory functioning in schizophrenia. The model describes the factors that affect memory efficiency, as well as the cognitive underpinnings of hallucinations within the source monitoring framework. © 2013.

  15. A general model for memory interference in a multiprocessor system with memory hierarchy

    NASA Technical Reports Server (NTRS)

    Taha, Badie A.; Standley, Hilda M.

    1989-01-01

    The problem of memory interference in a multiprocessor system with a hierarchy of shared buses and memories is addressed. The behavior of the processors is represented by a sequence of memory requests with each followed by a determined amount of processing time. A statistical queuing network model for determining the extent of memory interference in multiprocessor systems with clusters of memory hierarchies is presented. The performance of the system is measured by the expected number of busy memory clusters. The results of the analytic model are compared with simulation results, and the correlation between them is found to be very high.

  16. Using visual lateralization to model learning and memory in zebrafish larvae

    PubMed Central

    Andersson, Madelene Åberg; Ek, Fredrik; Olsson, Roger

    2015-01-01

    Impaired learning and memory are common symptoms of neurodegenerative and neuropsychiatric diseases. Present, there are several behavioural test employed to assess cognitive functions in animal models, including the frequently used novel object recognition (NOR) test. However, although atypical functional brain lateralization has been associated with neuropsychiatric conditions, spanning from schizophrenia to autism, few animal models are available to study this phenomenon in learning and memory deficits. Here we present a visual lateralization NOR model (VLNOR) in zebrafish larvae as an assay that combines brain lateralization and NOR. In zebrafish larvae, learning and memory are generally assessed by habituation, sensitization, or conditioning paradigms, which are all representatives of nondeclarative memory. The VLNOR is the first model for zebrafish larvae that studies a memory similar to the declarative memory described for mammals. We demonstrate that VLNOR can be used to study memory formation, storage, and recall of novel objects, both short and long term, in 10-day-old zebrafish. Furthermore we show that the VLNOR model can be used to study chemical modulation of memory formation and maintenance using dizocilpine (MK-801), a frequently used non-competitive antagonist of the NMDA receptor, used to test putative antipsychotics in animal models. PMID:25727677

  17. Cerebellar models of associative memory: Three papers from IEEE COMPCON spring 1989

    NASA Technical Reports Server (NTRS)

    Raugh, Michael R. (Editor)

    1989-01-01

    Three papers are presented on the following topics: (1) a cerebellar-model associative memory as a generalized random-access memory; (2) theories of the cerebellum - two early models of associative memory; and (3) intelligent network management and functional cerebellum synthesis.

  18. Latent change models of adult cognition: are changes in processing speed and working memory associated with changes in episodic memory?

    PubMed

    Hertzog, Christopher; Dixon, Roger A; Hultsch, David F; MacDonald, Stuart W S

    2003-12-01

    The authors used 6-year longitudinal data from the Victoria Longitudinal Study (VLS) to investigate individual differences in amount of episodic memory change. Latent change models revealed reliable individual differences in cognitive change. Changes in episodic memory were significantly correlated with changes in other cognitive variables, including speed and working memory. A structural equation model for the latent change scores showed that changes in speed and working memory predicted changes in episodic memory, as expected by processing resource theory. However, these effects were best modeled as being mediated by changes in induction and fact retrieval. Dissociations were detected between cross-sectional ability correlations and longitudinal changes. Shuffling the tasks used to define the Working Memory latent variable altered patterns of change correlations.

  19. Working Memory in the Classroom: An Inside Look at the Central Executive.

    PubMed

    Barker, Lauren A

    2016-01-01

    This article provides a review of working memory and its application to educational settings. A discussion of the varying definitions of working memory is presented. Special attention is given to the various multidisciplinary professionals who work with students with working memory deficits, and their unique understanding of the construct. Definitions and theories of working memory are briefly summarized and provide the foundation for understanding practical applications of working memory to assessment and intervention. Although definitions and models of working memory abound, there is limited consensus regarding universally accepted definitions and models. Current research indicates that developing new models of working memory may be an appropriate paradigm shift at this time. The integration of individual practitioner's knowledge regarding academic achievement, working memory and processing speed could provide a foundation for the future development of new working memory models. Future directions for research should aim to explain how tasks and behaviors are supported by the substrates of the cortico-striatal and the cerebro-cerebellar systems. Translation of neurobiological information into educational contexts will be helpful to inform all practitioners' knowledge of working memory constructs. It will also allow for universally accepted definitions and models of working memory to arise and facilitate more effective collaboration between disciplines working in educational setting.

  20. User Preference-Based Dual-Memory Neural Model With Memory Consolidation Approach.

    PubMed

    Nasir, Jauwairia; Yoo, Yong-Ho; Kim, Deok-Hwa; Kim, Jong-Hwan; Nasir, Jauwairia; Yong-Ho Yoo; Deok-Hwa Kim; Jong-Hwan Kim; Nasir, Jauwairia; Yoo, Yong-Ho; Kim, Deok-Hwa; Kim, Jong-Hwan

    2018-06-01

    Memory modeling has been a popular topic of research for improving the performance of autonomous agents in cognition related problems. Apart from learning distinct experiences correctly, significant or recurring experiences are expected to be learned better and be retrieved easier. In order to achieve this objective, this paper proposes a user preference-based dual-memory adaptive resonance theory network model, which makes use of a user preference to encode memories with various strengths and to learn and forget at various rates. Over a period of time, memories undergo a consolidation-like process at a rate proportional to the user preference at the time of encoding and the frequency of recall of a particular memory. Consolidated memories are easier to recall and are more stable. This dual-memory neural model generates distinct episodic memories and a flexible semantic-like memory component. This leads to an enhanced retrieval mechanism of experiences through two routes. The simulation results are presented to evaluate the proposed memory model based on various kinds of cues over a number of trials. The experimental results on Mybot are also presented. The results verify that not only are distinct experiences learned correctly but also that experiences associated with higher user preference and recall frequency are consolidated earlier. Thus, these experiences are recalled more easily relative to the unconsolidated experiences.

  1. GPU accelerated dynamic functional connectivity analysis for functional MRI data.

    PubMed

    Akgün, Devrim; Sakoğlu, Ünal; Esquivel, Johnny; Adinoff, Bryon; Mete, Mutlu

    2015-07-01

    Recent advances in multi-core processors and graphics card based computational technologies have paved the way for an improved and dynamic utilization of parallel computing techniques. Numerous applications have been implemented for the acceleration of computationally-intensive problems in various computational science fields including bioinformatics, in which big data problems are prevalent. In neuroimaging, dynamic functional connectivity (DFC) analysis is a computationally demanding method used to investigate dynamic functional interactions among different brain regions or networks identified with functional magnetic resonance imaging (fMRI) data. In this study, we implemented and analyzed a parallel DFC algorithm based on thread-based and block-based approaches. The thread-based approach was designed to parallelize DFC computations and was implemented in both Open Multi-Processing (OpenMP) and Compute Unified Device Architecture (CUDA) programming platforms. Another approach developed in this study to better utilize CUDA architecture is the block-based approach, where parallelization involves smaller parts of fMRI time-courses obtained by sliding-windows. Experimental results showed that the proposed parallel design solutions enabled by the GPUs significantly reduce the computation time for DFC analysis. Multicore implementation using OpenMP on 8-core processor provides up to 7.7× speed-up. GPU implementation using CUDA yielded substantial accelerations ranging from 18.5× to 157× speed-up once thread-based and block-based approaches were combined in the analysis. Proposed parallel programming solutions showed that multi-core processor and CUDA-supported GPU implementations accelerated the DFC analyses significantly. Developed algorithms make the DFC analyses more practical for multi-subject studies with more dynamic analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. Revisiting Molecular Dynamics on a CPU/GPU system: Water Kernel and SHAKE Parallelization.

    PubMed

    Ruymgaart, A Peter; Elber, Ron

    2012-11-13

    We report Graphics Processing Unit (GPU) and Open-MP parallel implementations of water-specific force calculations and of bond constraints for use in Molecular Dynamics simulations. We focus on a typical laboratory computing-environment in which a CPU with a few cores is attached to a GPU. We discuss in detail the design of the code and we illustrate performance comparable to highly optimized codes such as GROMACS. Beside speed our code shows excellent energy conservation. Utilization of water-specific lists allows the efficient calculations of non-bonded interactions that include water molecules and results in a speed-up factor of more than 40 on the GPU compared to code optimized on a single CPU core for systems larger than 20,000 atoms. This is up four-fold from a factor of 10 reported in our initial GPU implementation that did not include a water-specific code. Another optimization is the implementation of constrained dynamics entirely on the GPU. The routine, which enforces constraints of all bonds, runs in parallel on multiple Open-MP cores or entirely on the GPU. It is based on Conjugate Gradient solution of the Lagrange multipliers (CG SHAKE). The GPU implementation is partially in double precision and requires no communication with the CPU during the execution of the SHAKE algorithm. The (parallel) implementation of SHAKE allows an increase of the time step to 2.0fs while maintaining excellent energy conservation. Interestingly, CG SHAKE is faster than the usual bond relaxation algorithm even on a single core if high accuracy is expected. The significant speedup of the optimized components transfers the computational bottleneck of the MD calculation to the reciprocal part of Particle Mesh Ewald (PME).

  3. Model-Driven Study of Visual Memory

    DTIC Science & Technology

    2004-12-01

    dimensional stimuli (synthetic human faces ) afford important insights into episodic recognition memory. The results were well accommodated by a summed...the unusual properties of the z-transformed ROCS. 15. SUBJECT TERMS Memory, visual memory, computational model, human memory, faces , identity 16...3 Accomplishments/New Findings 3 Work on Objective One: Recognition Memory for Synthetic Faces . 3 Experim ent 1

  4. Evaluation of the Intel Xeon Phi Co-processor to accelerate the sensitivity map calculation for PET imaging

    NASA Astrophysics Data System (ADS)

    Dey, T.; Rodrigue, P.

    2015-07-01

    We aim to evaluate the Intel Xeon Phi coprocessor for acceleration of 3D Positron Emission Tomography (PET) image reconstruction. We focus on the sensitivity map calculation as one computational intensive part of PET image reconstruction, since it is a promising candidate for acceleration with the Many Integrated Core (MIC) architecture of the Xeon Phi. The computation of the voxels in the field of view (FoV) can be done in parallel and the 103 to 104 samples needed to calculate the detection probability of each voxel can take advantage of vectorization. We use the ray tracing kernels of the Embree project to calculate the hit points of the sample rays with the detector and in a second step the sum of the radiological path taking into account attenuation is determined. The core components are implemented using the Intel single instruction multiple data compiler (ISPC) to enable a portable implementation showing efficient vectorization either on the Xeon Phi and the Host platform. On the Xeon Phi, the calculation of the radiological path is also implemented in hardware specific intrinsic instructions (so-called `intrinsics') to allow manually-optimized vectorization. For parallelization either OpenMP and ISPC tasking (based on pthreads) are evaluated.Our implementation achieved a scalability factor of 0.90 on the Xeon Phi coprocessor (model 5110P) with 60 cores at 1 GHz. Only minor differences were found between parallelization with OpenMP and the ISPC tasking feature. The implementation using intrinsics was found to be about 12% faster than the portable ISPC version. With this version, a speedup of 1.43 was achieved on the Xeon Phi coprocessor compared to the host system (HP SL250s Gen8) equipped with two Xeon (E5-2670) CPUs, with 8 cores at 2.6 to 3.3 GHz each. Using a second Xeon Phi card the speedup could be further increased to 2.77. No significant differences were found between the results of the different Xeon Phi and the Host implementations. The examination showed that a reasonable speedup of sensitivity map calculation could be achieved on the Xeon Phi either by a portable or a hardware specific implementation.

  5. Forgetfulness can help you win games.

    PubMed

    Burridge, James; Gao, Yu; Mao, Yong

    2015-09-01

    We present a simple game model where agents with different memory lengths compete for finite resources. We show by simulation and analytically that an instability exists at a critical memory length, and as a result, different memory lengths can compete and coexist in a dynamical equilibrium. Our analytical formulation makes a connection to statistical urn models, and we show that temperature is mirrored by the agent's memory. Our simple model of memory may be incorporated into other game models with implications that we briefly discuss.

  6. Conceptual design and feasibility evaluation model of a 10 to the 8th power bit oligatomic mass memory. Volume 2: Feasibility evaluation model

    NASA Technical Reports Server (NTRS)

    Horst, R. L.; Nordstrom, M. J.

    1972-01-01

    The partially populated oligatomic mass memory feasibility model is described and evaluated. A system was desired to verify the feasibility of the oligatomic (mirror) memory approach as applicable to large scale solid state mass memories.

  7. Oscillations in Spurious States of the Associative Memory Model with Synaptic Depression

    NASA Astrophysics Data System (ADS)

    Murata, Shin; Otsubo, Yosuke; Nagata, Kenji; Okada, Masato

    2014-12-01

    The associative memory model is a typical neural network model that can store discretely distributed fixed-point attractors as memory patterns. When the network stores the memory patterns extensively, however, the model has other attractors besides the memory patterns. These attractors are called spurious memories. Both spurious states and memory states are in equilibrium, so there is little difference between their dynamics. Recent physiological experiments have shown that the short-term dynamic synapse called synaptic depression decreases its efficacy of transmission to postsynaptic neurons according to the activities of presynaptic neurons. Previous studies revealed that synaptic depression destabilizes the memory states when the number of memory patterns is finite. However, it is very difficult to study the dynamical properties of the spurious states if the number of memory patterns is proportional to the number of neurons. We investigate the effect of synaptic depression on spurious states by Monte Carlo simulation. The results demonstrate that synaptic depression does not affect the memory states but mainly destabilizes the spurious states and induces periodic oscillations.

  8. On Productive Knowledge and Levels of Questions.

    ERIC Educational Resources Information Center

    Andre, Thomas

    A model is proposed for memory that stresses a distinction between episodic memory for encoded personal experience and semantic memory for abstractors and generalizations. Basically, the model holds that questions influence the nature of memory representations formed during instruction, and that memory representation controls the way in which…

  9. Computational Model-Based Prediction of Human Episodic Memory Performance Based on Eye Movements

    NASA Astrophysics Data System (ADS)

    Sato, Naoyuki; Yamaguchi, Yoko

    Subjects' episodic memory performance is not simply reflected by eye movements. We use a ‘theta phase coding’ model of the hippocampus to predict subjects' memory performance from their eye movements. Results demonstrate the ability of the model to predict subjects' memory performance. These studies provide a novel approach to computational modeling in the human-machine interface.

  10. A Bayesian Model of the Memory Colour Effect.

    PubMed

    Witzel, Christoph; Olkkonen, Maria; Gegenfurtner, Karl R

    2018-01-01

    According to the memory colour effect, the colour of a colour-diagnostic object is not perceived independently of the object itself. Instead, it has been shown through an achromatic adjustment method that colour-diagnostic objects still appear slightly in their typical colour, even when they are colourimetrically grey. Bayesian models provide a promising approach to capture the effect of prior knowledge on colour perception and to link these effects to more general effects of cue integration. Here, we model memory colour effects using prior knowledge about typical colours as priors for the grey adjustments in a Bayesian model. This simple model does not involve any fitting of free parameters. The Bayesian model roughly captured the magnitude of the measured memory colour effect for photographs of objects. To some extent, the model predicted observed differences in memory colour effects across objects. The model could not account for the differences in memory colour effects across different levels of realism in the object images. The Bayesian model provides a particularly simple account of memory colour effects, capturing some of the multiple sources of variation of these effects.

  11. A Bayesian Model of the Memory Colour Effect

    PubMed Central

    Olkkonen, Maria; Gegenfurtner, Karl R.

    2018-01-01

    According to the memory colour effect, the colour of a colour-diagnostic object is not perceived independently of the object itself. Instead, it has been shown through an achromatic adjustment method that colour-diagnostic objects still appear slightly in their typical colour, even when they are colourimetrically grey. Bayesian models provide a promising approach to capture the effect of prior knowledge on colour perception and to link these effects to more general effects of cue integration. Here, we model memory colour effects using prior knowledge about typical colours as priors for the grey adjustments in a Bayesian model. This simple model does not involve any fitting of free parameters. The Bayesian model roughly captured the magnitude of the measured memory colour effect for photographs of objects. To some extent, the model predicted observed differences in memory colour effects across objects. The model could not account for the differences in memory colour effects across different levels of realism in the object images. The Bayesian model provides a particularly simple account of memory colour effects, capturing some of the multiple sources of variation of these effects. PMID:29760874

  12. The AIP Model of EMDR Therapy and Pathogenic Memories

    PubMed Central

    Hase, Michael; Balmaceda, Ute M.; Ostacoli, Luca; Liebermann, Peter; Hofmann, Arne

    2017-01-01

    Eye Movement Desensitization and Reprocessing (EMDR) therapy has been widely recognized as an efficacious treatment for post-traumatic stress disorder (PTSD). In the last years more insight has been gained regarding the efficacy of EMDR therapy in a broad field of mental disorders beyond PTSD. The cornerstone of EMDR therapy is its unique model of pathogenesis and change: the adaptive information processing (AIP) model. The AIP model developed by F. Shapiro has found support and differentiation in recent studies on the importance of memories in the pathogenesis of a range of mental disorders beside PTSD. However, theoretical publications or research on the application of the AIP model are still rare. The increasing acceptance of ideas that relate the origin of many mental disorders to the formation and consolidation of implicit dysfunctional memory lead to formation of the theory of pathogenic memories. Within the theory of pathogenic memories these implicit dysfunctional memories are considered to form basis of a variety of mental disorders. The theory of pathogenic memories seems compatible to the AIP model of EMDR therapy, which offers strategies to effectively access and transmute these memories leading to amelioration or resolution of symptoms. Merging the AIP model with the theory of pathogenic memories may initiate research. In consequence, patients suffering from such memory-based disorders may be earlier diagnosed and treated more effectively. PMID:28983265

  13. A Mathematical Model for the Hippocampus: Towards the Understanding of Episodic Memory and Imagination

    NASA Astrophysics Data System (ADS)

    Tsuda, I.; Yamaguti, Y.; Kuroda, S.; Fukushima, Y.; Tsukada, M.

    How does the brain encode episode? Based on the fact that the hippocampus is responsible for the formation of episodic memory, we have proposed a mathematical model for the hippocampus. Because episodic memory includes a time series of events, an underlying dynamics for the formation of episodic memory is considered to employ an association of memories. David Marr correctly pointed out in his theory of archecortex for a simple memory that the hippocampal CA3 is responsible for the formation of associative memories. However, a conventional mathematical model of associative memory simply guarantees a single association of memory unless a rule for an order of successive association of memories is given. The recent clinical studies in Maguire's group for the patients with the hippocampal lesion show that the patients cannot make a new story, because of the lack of ability of imagining new things. Both episodic memory and imagining things include various common characteristics: imagery, the sense of now, retrieval of semantic information, and narrative structures. Taking into account these findings, we propose a mathematical model of the hippocampus in order to understand the common mechanism of episodic memory and imagination.

  14. A bio-inspired memory model for structural health monitoring

    NASA Astrophysics Data System (ADS)

    Zheng, Wei; Zhu, Yong

    2009-04-01

    Long-term structural health monitoring (SHM) systems need intelligent management of the monitoring data. By analogy with the way the human brain processes memories, we present a bio-inspired memory model (BIMM) that does not require prior knowledge of the structure parameters. The model contains three time-domain areas: a sensory memory area, a short-term memory area and a long-term memory area. First, the initial parameters of the structural state are specified to establish safety criteria. Then the large amount of monitoring data that falls within the safety limits is filtered while the data outside the safety limits are captured instantly in the sensory memory area. Second, disturbance signals are distinguished from danger signals in the short-term memory area. Finally, the stable data of the structural balance state are preserved in the long-term memory area. A strategy for priority scheduling via fuzzy c-means for the proposed model is then introduced. An experiment on bridge tower deformation demonstrates that the proposed model can be applied for real-time acquisition, limited-space storage and intelligent mining of the monitoring data in a long-term SHM system.

  15. Chemical Memory Reactions Induced Bursting Dynamics in Gene Expression

    PubMed Central

    Tian, Tianhai

    2013-01-01

    Memory is a ubiquitous phenomenon in biological systems in which the present system state is not entirely determined by the current conditions but also depends on the time evolutionary path of the system. Specifically, many memorial phenomena are characterized by chemical memory reactions that may fire under particular system conditions. These conditional chemical reactions contradict to the extant stochastic approaches for modeling chemical kinetics and have increasingly posed significant challenges to mathematical modeling and computer simulation. To tackle the challenge, I proposed a novel theory consisting of the memory chemical master equations and memory stochastic simulation algorithm. A stochastic model for single-gene expression was proposed to illustrate the key function of memory reactions in inducing bursting dynamics of gene expression that has been observed in experiments recently. The importance of memory reactions has been further validated by the stochastic model of the p53-MDM2 core module. Simulations showed that memory reactions is a major mechanism for realizing both sustained oscillations of p53 protein numbers in single cells and damped oscillations over a population of cells. These successful applications of the memory modeling framework suggested that this innovative theory is an effective and powerful tool to study memory process and conditional chemical reactions in a wide range of complex biological systems. PMID:23349679

  16. Chemical memory reactions induced bursting dynamics in gene expression.

    PubMed

    Tian, Tianhai

    2013-01-01

    Memory is a ubiquitous phenomenon in biological systems in which the present system state is not entirely determined by the current conditions but also depends on the time evolutionary path of the system. Specifically, many memorial phenomena are characterized by chemical memory reactions that may fire under particular system conditions. These conditional chemical reactions contradict to the extant stochastic approaches for modeling chemical kinetics and have increasingly posed significant challenges to mathematical modeling and computer simulation. To tackle the challenge, I proposed a novel theory consisting of the memory chemical master equations and memory stochastic simulation algorithm. A stochastic model for single-gene expression was proposed to illustrate the key function of memory reactions in inducing bursting dynamics of gene expression that has been observed in experiments recently. The importance of memory reactions has been further validated by the stochastic model of the p53-MDM2 core module. Simulations showed that memory reactions is a major mechanism for realizing both sustained oscillations of p53 protein numbers in single cells and damped oscillations over a population of cells. These successful applications of the memory modeling framework suggested that this innovative theory is an effective and powerful tool to study memory process and conditional chemical reactions in a wide range of complex biological systems.

  17. [Anterograde declarative memory and its models].

    PubMed

    Barbeau, E-J; Puel, M; Pariente, J

    2010-01-01

    Patient H.M.'s recent death provides the opportunity to highlight the importance of his contribution to a better understanding of the anterograde amnesic syndrome. The thorough study of this patient over five decades largely contributed to shape the unitary model of declarative memory. This model holds that declarative memory is a single system that cannot be fractionated into subcomponents. As a system, it depends mainly on medial temporal lobes structures. The objective of this review is to present the main characteristics of different modular models that have been proposed as alternatives to the unitary model. It is also an opportunity to present different patients, who, although less famous than H.M., helped make signification contribution to the field of memory. The characteristics of the five main modular models are presented, including the most recent one (the perceptual-mnemonic model). The differences as well as how these models converge are highlighted. Different possibilities that could help reconcile unitary and modular approaches are considered. Although modular models differ significantly in many aspects, all converge to the notion that memory for single items and semantic memory could be dissociated from memory for complex material and context-rich episodes. In addition, these models converge concerning the involvement of critical brain structures for these stages: Item and semantic memory, as well as familiarity, are thought to largely depend on anterior subhippocampal areas, while relational, context-rich memory and recollective experiences are thought to largely depend on the hippocampal formation. Copyright © 2010 Elsevier Masson SAS. All rights reserved.

  18. The Neuroanatomical, Neurophysiological and Psychological Basis of Memory: Current Models and Their Origins

    PubMed Central

    Camina, Eduardo; Güell, Francisco

    2017-01-01

    This review aims to classify and clarify, from a neuroanatomical, neurophysiological, and psychological perspective, different memory models that are currently widespread in the literature as well as to describe their origins. We believe it is important to consider previous developments without which one cannot adequately understand the kinds of models that are now current in the scientific literature. This article intends to provide a comprehensive and rigorous overview for understanding and ordering the latest scientific advances related to this subject. The main forms of memory presented include sensory memory, short-term memory, and long-term memory. Information from the world around us is first stored by sensory memory, thus enabling the storage and future use of such information. Short-term memory (or memory) refers to information processed in a short period of time. Long-term memory allows us to store information for long periods of time, including information that can be retrieved consciously (explicit memory) or unconsciously (implicit memory). PMID:28713278

  19. The Neuroanatomical, Neurophysiological and Psychological Basis of Memory: Current Models and Their Origins.

    PubMed

    Camina, Eduardo; Güell, Francisco

    2017-01-01

    This review aims to classify and clarify, from a neuroanatomical, neurophysiological, and psychological perspective, different memory models that are currently widespread in the literature as well as to describe their origins. We believe it is important to consider previous developments without which one cannot adequately understand the kinds of models that are now current in the scientific literature. This article intends to provide a comprehensive and rigorous overview for understanding and ordering the latest scientific advances related to this subject. The main forms of memory presented include sensory memory, short-term memory, and long-term memory. Information from the world around us is first stored by sensory memory, thus enabling the storage and future use of such information. Short-term memory (or memory) refers to information processed in a short period of time. Long-term memory allows us to store information for long periods of time, including information that can be retrieved consciously (explicit memory) or unconsciously (implicit memory).

  20. Failure of self-consistency in the discrete resource model of visual working memory.

    PubMed

    Bays, Paul M

    2018-06-03

    The discrete resource model of working memory proposes that each individual has a fixed upper limit on the number of items they can store at one time, due to division of memory into a few independent "slots". According to this model, responses on short-term memory tasks consist of a mixture of noisy recall (when the tested item is in memory) and random guessing (when the item is not in memory). This provides two opportunities to estimate capacity for each observer: first, based on their frequency of random guesses, and second, based on the set size at which the variability of stored items reaches a plateau. The discrete resource model makes the simple prediction that these two estimates will coincide. Data from eight published visual working memory experiments provide strong evidence against such a correspondence. These results present a challenge for discrete models of working memory that impose a fixed capacity limit. Copyright © 2018 The Author. Published by Elsevier Inc. All rights reserved.

  1. Remembering the Past and Imagining the Future: A Neural Model of Spatial Memory and Imagery

    ERIC Educational Resources Information Center

    Byrne, Patrick; Becker, Suzanna; Burgess, Neil

    2007-01-01

    The authors model the neural mechanisms underlying spatial cognition, integrating neuronal systems and behavioral data, and address the relationships between long-term memory, short-term memory, and imagery, and between egocentric and allocentric and visual and ideothetic representations. Long-term spatial memory is modeled as attractor dynamics…

  2. A Formal Model of Capacity Limits in Working Memory

    ERIC Educational Resources Information Center

    Oberauer, Klaus; Kliegl, Reinhold

    2006-01-01

    A mathematical model of working-memory capacity limits is proposed on the key assumption of mutual interference between items in working memory. Interference is assumed to arise from overwriting of features shared by these items. The model was fit to time-accuracy data of memory-updating tasks from four experiments using nonlinear mixed effect…

  3. Foreign Language Methods and an Information Processing Model of Memory.

    ERIC Educational Resources Information Center

    Willebrand, Julia

    The major approaches to language teaching (audiolingual method, generative grammar, Community Language Learning and Silent Way) are investigated to discover whether or not they are compatible in structure with an information-processing model of memory (IPM). The model of memory used was described by Roberta Klatzky in "Human Memory:…

  4. A Multinomial Model of Event-Based Prospective Memory

    ERIC Educational Resources Information Center

    Smith, Rebekah E.; Bayen, Ute J.

    2004-01-01

    Prospective memory is remembering to perform an action in the future. The authors introduce the 1st formal model of event-based prospective memory, namely, a multinomial model that includes 2 separate parameters related to prospective memory processes. The 1st measures preparatory attentional processes, and the 2nd measures retrospective memory…

  5. Performance Analysis of a Hybrid Overset Multi-Block Application on Multiple Architectures

    NASA Technical Reports Server (NTRS)

    Djomehri, M. Jahed; Biswas, Rupak

    2003-01-01

    This paper presents a detailed performance analysis of a multi-block overset grid compu- tational fluid dynamics app!ication on multiple state-of-the-art computer architectures. The application is implemented using a hybrid MPI+OpenMP programming paradigm that exploits both coarse and fine-grain parallelism; the former via MPI message passing and the latter via OpenMP directives. The hybrid model also extends the applicability of multi-block programs to large clusters of SNIP nodes by overcoming the restriction that the number of processors be less than the number of grid blocks. A key kernel of the application, namely the LU-SGS linear solver, had to be modified to enhance the performance of the hybrid approach on the target machines. Investigations were conducted on cacheless Cray SX6 vector processors, cache-based IBM Power3 and Power4 architectures, and single system image SGI Origin3000 platforms. Overall results for complex vortex dynamics simulations demonstrate that the SX6 achieves the highest performance and outperforms the RISC-based architectures; however, the best scaling performance was achieved on the Power3.

  6. Multi-GPU parallel algorithm design and analysis for improved inversion of probability tomography with gravity gradiometry data

    NASA Astrophysics Data System (ADS)

    Hou, Zhenlong; Huang, Danian

    2017-09-01

    In this paper, we make a study on the inversion of probability tomography (IPT) with gravity gradiometry data at first. The space resolution of the results is improved by multi-tensor joint inversion, depth weighting matrix and the other methods. Aiming at solving the problems brought by the big data in the exploration, we present the parallel algorithm and the performance analysis combining Compute Unified Device Architecture (CUDA) with Open Multi-Processing (OpenMP) based on Graphics Processing Unit (GPU) accelerating. In the test of the synthetic model and real data from Vinton Dome, we get the improved results. It is also proved that the improved inversion algorithm is effective and feasible. The performance of parallel algorithm we designed is better than the other ones with CUDA. The maximum speedup could be more than 200. In the performance analysis, multi-GPU speedup and multi-GPU efficiency are applied to analyze the scalability of the multi-GPU programs. The designed parallel algorithm is demonstrated to be able to process larger scale of data and the new analysis method is practical.

  7. Global magnetohydrodynamic simulations on multiple GPUs

    NASA Astrophysics Data System (ADS)

    Wong, Un-Hong; Wong, Hon-Cheng; Ma, Yonghui

    2014-01-01

    Global magnetohydrodynamic (MHD) models play the major role in investigating the solar wind-magnetosphere interaction. However, the huge computation requirement in global MHD simulations is also the main problem that needs to be solved. With the recent development of modern graphics processing units (GPUs) and the Compute Unified Device Architecture (CUDA), it is possible to perform global MHD simulations in a more efficient manner. In this paper, we present a global magnetohydrodynamic (MHD) simulator on multiple GPUs using CUDA 4.0 with GPUDirect 2.0. Our implementation is based on the modified leapfrog scheme, which is a combination of the leapfrog scheme and the two-step Lax-Wendroff scheme. GPUDirect 2.0 is used in our implementation to drive multiple GPUs. All data transferring and kernel processing are managed with CUDA 4.0 API instead of using MPI or OpenMP. Performance measurements are made on a multi-GPU system with eight NVIDIA Tesla M2050 (Fermi architecture) graphics cards. These measurements show that our multi-GPU implementation achieves a peak performance of 97.36 GFLOPS in double precision.

  8. The Effects of Emotion on Episodic Memory for TV Commercials.

    ERIC Educational Resources Information Center

    Thorson, Esther; Friestad, Marian

    Based on the associational nature of memory, the distinction between episodic and semantic memory, and the notion of memory strength, a model was developed of the role of emotion in the memory of television commercials. The model generated the following hypotheses: (1) emotional commercials will more likely be recalled than nonemotional…

  9. Heterogeneous Hardware Parallelism Review of the IN2P3 2016 Computing School

    NASA Astrophysics Data System (ADS)

    Lafage, Vincent

    2017-11-01

    Parallel and hybrid Monte Carlo computation. The Monte Carlo method is the main workhorse for computation of particle physics observables. This paper provides an overview of various HPC technologies that can be used today: multicore (OpenMP, HPX), manycore (OpenCL). The rewrite of a twenty years old Fortran 77 Monte Carlo will illustrate the various programming paradigms in use beyond language implementation. The problem of parallel random number generator will be addressed. We will give a short report of the one week school dedicated to these recent approaches, that took place in École Polytechnique in May 2016.

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Messer, Bronson; Harris, James A; Parete-Koon, Suzanne T

    We describe recent development work on the core-collapse supernova code CHIMERA. CHIMERA has consumed more than 100 million cpu-hours on Oak Ridge Leadership Computing Facility (OLCF) platforms in the past 3 years, ranking it among the most important applications at the OLCF. Most of the work described has been focused on exploiting the multicore nature of the current platform (Jaguar) via, e.g., multithreading using OpenMP. In addition, we have begun a major effort to marshal the computational power of GPUs with CHIMERA. The impending upgrade of Jaguar to Titan a 20+ PF machine with an NVIDIA GPU on many nodesmore » makes this work essential.« less

  11. Performance Portability Strategies for Grid C++ Expression Templates

    NASA Astrophysics Data System (ADS)

    Boyle, Peter A.; Clark, M. A.; DeTar, Carleton; Lin, Meifeng; Rana, Verinder; Vaquero Avilés-Casco, Alejandro

    2018-03-01

    One of the key requirements for the Lattice QCD Application Development as part of the US Exascale Computing Project is performance portability across multiple architectures. Using the Grid C++ expression template as a starting point, we report on the progress made with regards to the Grid GPU offloading strategies. We present both the successes and issues encountered in using CUDA, OpenACC and Just-In-Time compilation. Experimentation and performance on GPUs with a SU(3)×SU(3) streaming test will be reported. We will also report on the challenges of using current OpenMP 4.x for GPU offloading in the same code.

  12. 3D-PDR: Three-dimensional photodissociation region code

    NASA Astrophysics Data System (ADS)

    Bisbas, T. G.; Bell, T. A.; Viti, S.; Yates, J.; Barlow, M. J.

    2018-03-01

    3D-PDR is a three-dimensional photodissociation region code written in Fortran. It uses the Sundials package (written in C) to solve the set of ordinary differential equations and it is the successor of the one-dimensional PDR code UCL_PDR (ascl:1303.004). Using the HEALpix ray-tracing scheme (ascl:1107.018), 3D-PDR solves a three-dimensional escape probability routine and evaluates the attenuation of the far-ultraviolet radiation in the PDR and the propagation of FIR/submm emission lines out of the PDR. The code is parallelized (OpenMP) and can be applied to 1D and 3D problems.

  13. XaNSoNS: GPU-accelerated simulator of diffraction patterns of nanoparticles

    NASA Astrophysics Data System (ADS)

    Neverov, V. S.

    XaNSoNS is an open source software with GPU support, which simulates X-ray and neutron 1D (or 2D) diffraction patterns and pair-distribution functions (PDF) for amorphous or crystalline nanoparticles (up to ∼107 atoms) of heterogeneous structural content. Among the multiple parameters of the structure the user may specify atomic displacements, site occupancies, molecular displacements and molecular rotations. The software uses general equations nonspecific to crystalline structures to calculate the scattering intensity. It supports four major standards of parallel computing: MPI, OpenMP, Nvidia CUDA and OpenCL, enabling it to run on various architectures, from CPU-based HPCs to consumer-level GPUs.

  14. Working memory, situation models, and synesthesia

    DOE PAGES

    Radvansky, Gabriel A.; Gibson, Bradley S.; McNerney, M. Windy

    2013-03-04

    Research on language comprehension suggests a strong relationship between working memory span measures and language comprehension. However, there is also evidence that this relationship weakens at higher levels of comprehension, such as the situation model level. The current study explored this relationship by comparing 10 grapheme–color synesthetes who have additional color experiences when they read words that begin with different letters and 48 normal controls on a number of tests of complex working memory capacity and processing at the situation model level. On all tests of working memory capacity, the synesthetes outperformed the controls. Importantly, there was no carryover benefitmore » for the synesthetes for processing at the situation model level. This reinforces the idea that although some aspects of language comprehension are related to working memory span scores, this applies less directly to situation model levels. As a result, this suggests that theories of working memory must take into account this limitation, and the working memory processes that are involved in situation model construction and processing must be derived.« less

  15. Working memory, situation models, and synesthesia

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Radvansky, Gabriel A.; Gibson, Bradley S.; McNerney, M. Windy

    Research on language comprehension suggests a strong relationship between working memory span measures and language comprehension. However, there is also evidence that this relationship weakens at higher levels of comprehension, such as the situation model level. The current study explored this relationship by comparing 10 grapheme–color synesthetes who have additional color experiences when they read words that begin with different letters and 48 normal controls on a number of tests of complex working memory capacity and processing at the situation model level. On all tests of working memory capacity, the synesthetes outperformed the controls. Importantly, there was no carryover benefitmore » for the synesthetes for processing at the situation model level. This reinforces the idea that although some aspects of language comprehension are related to working memory span scores, this applies less directly to situation model levels. As a result, this suggests that theories of working memory must take into account this limitation, and the working memory processes that are involved in situation model construction and processing must be derived.« less

  16. Memory complaints in epilepsy: An examination of the role of mood and illness perceptions.

    PubMed

    Tinson, Deborah; Crockford, Christopher; Gharooni, Sara; Russell, Helen; Zoeller, Sophie; Leavy, Yvonne; Lloyd, Rachel; Duncan, Susan

    2018-03-01

    The study examined the role of mood and illness perceptions in explaining the variance in the memory complaints of patients with epilepsy. Forty-four patients from an outpatient tertiary care center and 43 volunteer controls completed a formal assessment of memory and a verbal fluency test, as well as validated self-report questionnaires on memory complaints, mood, and illness perceptions. In hierarchical multiple regression analyses, objective memory test performance and verbal fluency did not contribute significantly to the variance in memory complaints for either patients or controls. In patients, illness perceptions and mood were highly correlated. Illness perceptions correlated more highly with memory complaints than mood and were therefore added to the multiple regression analysis. This accounted for an additional 25% of the variance, after controlling for objective memory test performance and verbal fluency, and the model was significant (model B). In order to compare with other studies, mood was added to a second model, instead of illness perceptions. This accounted for an additional 24% of the variance, which was again significant (model C). In controls, low mood accounted for 11% of the variance in memory complaints (model C2). A measure of illness perceptions was more highly correlated with the memory complaints of patients with epilepsy than with a measure of mood. In a hierarchical multiple regression model, illness perceptions accounted for 25% of the variance in memory complaints. Illness perceptions could provide useful information in a clinical investigation into the self-reported memory complaints of patients with epilepsy, alongside the assessment of mood and formal memory testing. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Synaptic tagging, evaluation of memories, and the distal reward problem.

    PubMed

    Päpper, Marc; Kempter, Richard; Leibold, Christian

    2011-01-01

    Long-term synaptic plasticity exhibits distinct phases. The synaptic tagging hypothesis suggests an early phase in which synapses are prepared, or "tagged," for protein capture, and a late phase in which those proteins are integrated into the synapses to achieve memory consolidation. The synapse specificity of the tags is consistent with conventional neural network models of associative memory. Memory consolidation through protein synthesis, however, is neuron specific, and its functional role in those models has not been assessed. Here, using a theoretical network model, we test the tagging hypothesis on its potential to prolong memory lifetimes in an online-learning paradigm. We find that protein synthesis, though not synapse specific, prolongs memory lifetimes if it is used to evaluate memory items on a cellular level. In our model we assume that only "important" memory items evoke protein synthesis such that these become more stable than "unimportant" items, which do not evoke protein synthesis. The network model comprises an equilibrium distribution of synaptic states that is very susceptible to the storage of new items: Most synapses are in a state in which they are plastic and can be changed easily, whereas only those synapses that are essential for the retrieval of the important memory items are in the stable late phase. The model can solve the distal reward problem, where the initial exposure of a memory item and its evaluation are temporally separated. Synaptic tagging hence provides a viable mechanism to consolidate and evaluate memories on a synaptic basis.

  18. Global Neural Pattern Similarity as a Common Basis for Categorization and Recognition Memory

    PubMed Central

    Xue, Gui; Love, Bradley C.; Preston, Alison R.; Poldrack, Russell A.

    2014-01-01

    Familiarity, or memory strength, is a central construct in models of cognition. In previous categorization and long-term memory research, correlations have been found between psychological measures of memory strength and activation in the medial temporal lobes (MTLs), which suggests a common neural locus for memory strength. However, activation alone is insufficient for determining whether the same mechanisms underlie neural function across domains. Guided by mathematical models of categorization and long-term memory, we develop a theory and a method to test whether memory strength arises from the global similarity among neural representations. In human subjects, we find significant correlations between global similarity among activation patterns in the MTLs and both subsequent memory confidence in a recognition memory task and model-based measures of memory strength in a category learning task. Our work bridges formal cognitive theories and neuroscientific models by illustrating that the same global similarity computations underlie processing in multiple cognitive domains. Moreover, by establishing a link between neural similarity and psychological memory strength, our findings suggest that there may be an isomorphism between psychological and neural representational spaces that can be exploited to test cognitive theories at both the neural and behavioral levels. PMID:24872552

  19. Activation and Binding in Verbal Working Memory: A Dual-Process Model for the Recognition of Nonwords

    ERIC Educational Resources Information Center

    Oberauer, Klauss; Lange, Elke B.

    2009-01-01

    The article presents a mathematical model of short-term recognition based on dual-process models and the three-component theory of working memory [Oberauer, K. (2002). Access to information in working memory: Exploring the focus of attention. "Journal of Experimental Psychology: Learning, Memory, and Cognition, 28", 411-421]. Familiarity arises…

  20. A new model for CD8+ T cell memory inflation based upon a recombinant adenoviral vector1

    PubMed Central

    Bolinger, Beatrice; Sims, Stuart; O’Hara, Geraldine; de Lara, Catherine; Tchilian, Elma; Firner, Sonja; Engeler, Daniel; Ludewig, Burkhard; Klenerman, Paul

    2013-01-01

    CD8+ T cell memory inflation, first described in murine cytomegalovirus (MCMV) infection, is characterized by the accumulation of high-frequency, functional antigen-specific CD8+ T cell pools with an effector-memory phenotype and enrichment in peripheral organs. Although persistence of antigen is considered essential, the rules underpinning memory inflation are still unclear. The MCMV model is, however, complicated by the virus’s low-level persistence, and stochastic reactivation. We developed a new model of memory inflation based upon a βgal-recombinant adenovirus vector (Ad-LacZ). After i.v. administration in C57BL/6 mice we observe marked memory inflation in the βgal96 epitope, while a second epitope, βgal497, undergoes classical memory formation. The inflationary T cell responses show kinetics, distribution, phenotype and functions similar to those seen in MCMV and are reproduced using alternative routes of administration. Memory inflation in this model is dependent on MHC Class II. As in MCMV, only the inflating epitope showed immunoproteasome-independence. These data define a new model for memory inflation, which is fully replication-independent, internally controlled and reproduces the key immunologic features of the CD8+ T cell response. This model provides insight into the mechanisms responsible for memory inflation, and since it is based on a vaccine vector, also is relevant to novel T cell-inducing vaccines in humans. PMID:23509359

  1. Generalized memory associativity in a network model for the neuroses

    NASA Astrophysics Data System (ADS)

    Wedemann, Roseli S.; Donangelo, Raul; de Carvalho, Luís A. V.

    2009-03-01

    We review concepts introduced in earlier work, where a neural network mechanism describes some mental processes in neurotic pathology and psychoanalytic working-through, as associative memory functioning, according to the findings of Freud. We developed a complex network model, where modules corresponding to sensorial and symbolic memories interact, representing unconscious and conscious mental processes. The model illustrates Freud's idea that consciousness is related to symbolic and linguistic memory activity in the brain. We have introduced a generalization of the Boltzmann machine to model memory associativity. Model behavior is illustrated with simulations and some of its properties are analyzed with methods from statistical mechanics.

  2. Goal-Driven Autonomy and Robust Architecture for Long-Duration Missions (Year 1: 1 July 2013 - 31 July 2014)

    DTIC Science & Technology

    2014-09-30

    Mental Domain = Ω Goal Management goal change goal input World =Ψ Memory Mission & Goals( ) World Model (-Ψ) Episodic Memory Semantic Memory ...Activations Trace Meta-Level Control Introspective Monitoring Memory Reasoning Trace ( ) Strategies Episodic Memory Metaknowledge Self Model...it is from incorrect or missing memory associations (i.e., indices). Similarly, correct information may exist in the input stream, but may not be

  3. Research on bathymetry estimation by Worldview-2 based with the semi-analytical model

    NASA Astrophysics Data System (ADS)

    Sheng, L.; Bai, J.; Zhou, G.-W.; Zhao, Y.; Li, Y.-C.

    2015-04-01

    South Sea Islands of China are far away from the mainland, the reefs takes more than 95% of south sea, and most reefs scatter over interested dispute sensitive area. Thus, the methods of obtaining the reefs bathymetry accurately are urgent to be developed. Common used method, including sonar, airborne laser and remote sensing estimation, are limited by the long distance, large area and sensitive location. Remote sensing data provides an effective way for bathymetry estimation without touching over large area, by the relationship between spectrum information and bathymetry. Aimed at the water quality of the south sea of China, our paper develops a bathymetry estimation method without measured water depth. Firstly the semi-analytical optimization model of the theoretical interpretation models has been studied based on the genetic algorithm to optimize the model. Meanwhile, OpenMP parallel computing algorithm has been introduced to greatly increase the speed of the semi-analytical optimization model. One island of south sea in China is selected as our study area, the measured water depth are used to evaluate the accuracy of bathymetry estimation from Worldview-2 multispectral images. The results show that: the semi-analytical optimization model based on genetic algorithm has good results in our study area;the accuracy of estimated bathymetry in the 0-20 meters shallow water area is accepted.Semi-analytical optimization model based on genetic algorithm solves the problem of the bathymetry estimation without water depth measurement. Generally, our paper provides a new bathymetry estimation method for the sensitive reefs far away from mainland.

  4. Long-term memory and volatility clustering in high-frequency price changes

    NASA Astrophysics Data System (ADS)

    oh, Gabjin; Kim, Seunghwan; Eom, Cheoljun

    2008-02-01

    We studied the long-term memory in diverse stock market indices and foreign exchange rates using Detrended Fluctuation Analysis (DFA). For all high-frequency market data studied, no significant long-term memory property was detected in the return series, while a strong long-term memory property was found in the volatility time series. The possible causes of the long-term memory property were investigated using the return data filtered by the AR(1) model, reflecting the short-term memory property, the GARCH(1,1) model, reflecting the volatility clustering property, and the FIGARCH model, reflecting the long-term memory property of the volatility time series. The memory effect in the AR(1) filtered return and volatility time series remained unchanged, while the long-term memory property diminished significantly in the volatility series of the GARCH(1,1) filtered data. Notably, there is no long-term memory property, when we eliminate the long-term memory property of volatility by the FIGARCH model. For all data used, although the Hurst exponents of the volatility time series changed considerably over time, those of the time series with the volatility clustering effect removed diminish significantly. Our results imply that the long-term memory property of the volatility time series can be attributed to the volatility clustering observed in the financial time series.

  5. Combining Distributed and Shared Memory Models: Approach and Evolution of the Global Arrays Toolkit

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nieplocha, Jarek; Harrison, Robert J.; Kumar, Mukul

    2002-07-29

    Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in the modern computers this characteristic might have a negative impact on performance and scalability. Various techniques, such as code restructuring to increase data reuse and introducing blocking in data accesses, can address the problem and yield performance competitive with message passing[Singh], however at the cost of compromising the ease of use feature. Distributed memory models such as message passing or one-sided communication offer performance and scalability butmore » they compromise the ease-of-use. In this context, the message-passing model is sometimes referred to as?assembly programming for the scientific computing?. The Global Arrays toolkit[GA1, GA2] attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed explicitly by the programmer. This management is achieved by explicit calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be explicitly specified and hence managed. The GA model exposes to the programmer the hierarchical memory of modern high-performance computer systems, and by recognizing the communication overhead for remote data transfer, it promotes data reuse and locality of reference. This paper describes the characteristics of the Global Arrays programming model, capabilities of the toolkit, and discusses its evolution.« less

  6. Formation of model-free motor memories during motor adaptation depends on perturbation schedule.

    PubMed

    Orban de Xivry, Jean-Jacques; Lefèvre, Philippe

    2015-04-01

    Motor adaptation to an external perturbation relies on several mechanisms such as model-based, model-free, strategic, or repetition-dependent learning. Depending on the experimental conditions, each of these mechanisms has more or less weight in the final adaptation state. Here we focused on the conditions that lead to the formation of a model-free motor memory (Huang VS, Haith AM, Mazzoni P, Krakauer JW. Neuron 70: 787-801, 2011), i.e., a memory that does not depend on an internal model or on the size or direction of the errors experienced during the learning. The formation of such model-free motor memory was hypothesized to depend on the schedule of the perturbation (Orban de Xivry JJ, Ahmadi-Pajouh MA, Harran MD, Salimpour Y, Shadmehr R. J Neurophysiol 109: 124-136, 2013). Here we built on this observation by directly testing the nature of the motor memory after abrupt or gradual introduction of a visuomotor rotation, in an experimental paradigm where the presence of model-free motor memory can be identified (Huang VS, Haith AM, Mazzoni P, Krakauer JW. Neuron 70: 787-801, 2011). We found that relearning was faster after abrupt than gradual perturbation, which suggests that model-free learning is reduced during gradual adaptation to a visuomotor rotation. In addition, the presence of savings after abrupt introduction of the perturbation but gradual extinction of the motor memory suggests that unexpected errors are necessary to induce a model-free motor memory. Overall, these data support the hypothesis that different perturbation schedules do not lead to a more or less stabilized motor memory but to distinct motor memories with different attributes and neural representations. Copyright © 2015 the American Physiological Society.

  7. Formation of model-free motor memories during motor adaptation depends on perturbation schedule

    PubMed Central

    Lefèvre, Philippe

    2015-01-01

    Motor adaptation to an external perturbation relies on several mechanisms such as model-based, model-free, strategic, or repetition-dependent learning. Depending on the experimental conditions, each of these mechanisms has more or less weight in the final adaptation state. Here we focused on the conditions that lead to the formation of a model-free motor memory (Huang VS, Haith AM, Mazzoni P, Krakauer JW. Neuron 70: 787–801, 2011), i.e., a memory that does not depend on an internal model or on the size or direction of the errors experienced during the learning. The formation of such model-free motor memory was hypothesized to depend on the schedule of the perturbation (Orban de Xivry JJ, Ahmadi-Pajouh MA, Harran MD, Salimpour Y, Shadmehr R. J Neurophysiol 109: 124–136, 2013). Here we built on this observation by directly testing the nature of the motor memory after abrupt or gradual introduction of a visuomotor rotation, in an experimental paradigm where the presence of model-free motor memory can be identified (Huang VS, Haith AM, Mazzoni P, Krakauer JW. Neuron 70: 787–801, 2011). We found that relearning was faster after abrupt than gradual perturbation, which suggests that model-free learning is reduced during gradual adaptation to a visuomotor rotation. In addition, the presence of savings after abrupt introduction of the perturbation but gradual extinction of the motor memory suggests that unexpected errors are necessary to induce a model-free motor memory. Overall, these data support the hypothesis that different perturbation schedules do not lead to a more or less stabilized motor memory but to distinct motor memories with different attributes and neural representations. PMID:25673736

  8. A Framework for Cognitive Interventions Targeting Everyday Memory Performance and Memory Self-efficacy

    PubMed Central

    McDougall, Graham J.

    2009-01-01

    The human brain has the potential for self-renewal through adult neurogenesis, which is the birth of new neurons. Neural plasticity implies that the nervous system can change and grow. This understanding has created new possibilities for cognitive enhancement and rehabilitation. However, as individuals age, they have decreased confidence, or memory self-efficacy, which is directly related to their everyday memory performance. In this article, a developmental account of studies about memory self-efficacy and nonpharmacologic cognitive intervention models is presented and a cognitive intervention model, called the cognitive behavioral model of everyday memory, is proposed. PMID:19065089

  9. A smooth particle hydrodynamics code to model collisions between solid, self-gravitating objects

    NASA Astrophysics Data System (ADS)

    Schäfer, C.; Riecker, S.; Maindl, T. I.; Speith, R.; Scherrer, S.; Kley, W.

    2016-05-01

    Context. Modern graphics processing units (GPUs) lead to a major increase in the performance of the computation of astrophysical simulations. Owing to the different nature of GPU architecture compared to traditional central processing units (CPUs) such as x86 architecture, existing numerical codes cannot be easily migrated to run on GPU. Here, we present a new implementation of the numerical method smooth particle hydrodynamics (SPH) using CUDA and the first astrophysical application of the new code: the collision between Ceres-sized objects. Aims: The new code allows for a tremendous increase in speed of astrophysical simulations with SPH and self-gravity at low costs for new hardware. Methods: We have implemented the SPH equations to model gas, liquids and elastic, and plastic solid bodies and added a fragmentation model for brittle materials. Self-gravity may be optionally included in the simulations and is treated by the use of a Barnes-Hut tree. Results: We find an impressive performance gain using NVIDIA consumer devices compared to our existing OpenMP code. The new code is freely available to the community upon request. If you are interested in our CUDA SPH code miluphCUDA, please write an email to Christoph Schäfer. miluphCUDA is the CUDA port of miluph. miluph is pronounced [maßl2v]. We do not support the use of the code for military purposes.

  10. Retrieval-induced NMDA receptor-dependent Arc expression in two models of cocaine-cue memory.

    PubMed

    Alaghband, Yasaman; O'Dell, Steven J; Azarnia, Siavash; Khalaj, Anna J; Guzowski, John F; Marshall, John F

    2014-12-01

    The association of environmental cues with drugs of abuse results in persistent drug-cue memories. These memories contribute significantly to relapse among addicts. While conditioned place preference (CPP) is a well-established paradigm frequently used to examine the modulation of drug-cue memories, very few studies have used the non-preference-based model conditioned activity (CA) for this purpose. Here, we used both experimental approaches to investigate the neural substrates of cocaine-cue memories. First, we directly compared, in a consistent setting, the involvement of cortical and subcortical brain regions in cocaine-cue memory retrieval by quantifying activity-regulated cytoskeletal-associated (Arc) protein expression in both the CPP and CA models. Second, because NMDA receptor activation is required for Arc expression, we investigated the NMDA receptor dependency of memory persistence using the CA model. In both the CPP and CA models, drug-paired animals showed significant increases in Arc immunoreactivity in regions of the frontal cortex and amygdala compared to unpaired controls. Additionally, administration of a NMDA receptor antagonist (MK-801 or memantine) immediately after cocaine-CA memory reactivation impaired the subsequent conditioned locomotion associated with the cocaine-paired environment. The enhanced Arc expression evident in a subset of corticolimbic regions after retrieval of a cocaine-context memory, observed in both the CPP and CA paradigms, likely signifies that these regions: (i) are activated during retrieval of these memories irrespective of preference-based decisions, and (ii) undergo neuroplasticity in order to update information about cues previously associated with cocaine. This study also establishes the involvement of NMDA receptors in maintaining memories established using the CA model, a characteristic previously demonstrated using CPP. Overall, these results demonstrate the utility of the CA model for studies of cocaine-context memory and suggest the involvement of an NMDA receptor-dependent Arc induction pathway in drug-cue memory interference. Copyright © 2014 Elsevier Inc. All rights reserved.

  11. Modeling soil moisture memory in savanna ecosystems

    NASA Astrophysics Data System (ADS)

    Gou, S.; Miller, G. R.

    2011-12-01

    Antecedent soil conditions create an ecosystem's "memory" of past rainfall events. Such soil moisture memory effects may be observed over a range of timescales, from daily to yearly, and lead to feedbacks between hydrological and ecosystem processes. In this study, we modeled the soil moisture memory effect on savanna ecosystems in California, Arizona, and Africa, using a system dynamics model created to simulate the ecohydrological processes at the plot-scale. The model was carefully calibrated using soil moisture and evapotranspiration data collected at three study sites. The model was then used to simulate scenarios with various initial soil moisture conditions and antecedent precipitation regimes, in order to study the soil moisture memory effects on the evapotranspiration of understory and overstory species. Based on the model results, soil texture and antecedent precipitation regime impact the redistribution of water within soil layers, potentially causing deeper soil layers to influence the ecosystem for a longer time. Of all the study areas modeled, soil moisture memory of California savanna ecosystem site is replenished and dries out most rapidly. Thus soil moisture memory could not maintain the high rate evapotranspiration for more than a few days without incoming rainfall event. On the contrary, soil moisture memory of Arizona savanna ecosystem site lasts the longest time. The plants with different root depths respond to different memory effects; shallow-rooted species mainly respond to the soil moisture memory in the shallow soil. The growing season of grass is largely depended on the soil moisture memory of the top 25cm soil layer. Grass transpiration is sensitive to the antecedent precipitation events within daily to weekly timescale. Deep-rooted plants have different responses since these species can access to the deeper soil moisture memory with longer time duration Soil moisture memory does not have obvious impacts on the phenology of woody plants, as these can maintain transpiration for a longer time even through the top soil layer dries out.

  12. Retrieval-induced NMDA receptor-dependent Arc expression in two models of cocaine-cue memory

    PubMed Central

    Alaghband, Yasaman; O'Dell, Steven J.; Azarnia, Siavash; Khalaj, Anna J.; Guzowski, John F.; Marshall, John F.

    2014-01-01

    The association of environmental cues with drugs of abuse results in persistent drug-cue memories. These memories contribute significantly to relapse among addicts. While conditioned place preference (CPP) is a well-established paradigm frequently used to examine the modulation of drug-cue memories, very few studies have used the non-preference-based model conditioned activity (CA) for this purpose. Here, we used both experimental approaches to investigate the neural substrates of cocaine-cue memories. First, we directly compared, in a consistent setting, the involvement of cortical and subcortical brain regions in cocaine-cue memory retrieval by quantifying activity-regulated cytoskeletal associated gene (Arc) protein expression in both the CPP and CA models. Second, because NMDA receptor activation is required for Arc expression, we investigated the NMDA receptor dependency of memory persistence using the CA model. In both the CPP and CA models, drug-paired animals showed significant increases in Arc immunoreactivity in regions of the frontal cortex and amygdala compared to unpaired controls. Additionally, administration of a NMDA receptor antagonist (MK-801 or memantine) immediately after cocaine-CA memory reactivation impaired the subsequent conditioned locomotion associated with the cocaine-paired environment. The enhanced Arc expression evident in a subset of corticolimbic regions after retrieval of a cocaine-context memory, observed in both the CPP and CA paradigms, likely signifies that these regions: (i) are activated during retrieval of these memories irrespective of preference-based decisions, and (ii) undergo neuroplasticity in order to update information about cues previously associated with cocaine. This study also establishes the involvement of NMDA receptors in maintaining memories established using the CA model, a characteristic previously demonstrated using CPP. Overall, these results demonstrate the utility of the CA model for studies of cocaine-context memory and suggest the involvement of an NMDA receptor-dependent Arc induction pathway in drug-cue memory interference. PMID:25225165

  13. Modeling of SONOS Memory Cell Erase Cycle

    NASA Technical Reports Server (NTRS)

    Phillips, Thomas A.; MacLeod, Todd C.; Ho, Fat H.

    2011-01-01

    Utilization of Silicon-Oxide-Nitride-Oxide-Silicon (SONOS) nonvolatile semiconductor memories as a flash memory has many advantages. These electrically erasable programmable read-only memories (EEPROMs) utilize low programming voltages, have a high erase/write cycle lifetime, are radiation hardened, and are compatible with high-density scaled CMOS for low power, portable electronics. In this paper, the SONOS memory cell erase cycle was investigated using a nonquasi-static (NQS) MOSFET model. Comparisons were made between the model predictions and experimental data.

  14. Pareto Joint Inversion of Love and Quasi Rayleigh's waves - synthetic study

    NASA Astrophysics Data System (ADS)

    Bogacz, Adrian; Dalton, David; Danek, Tomasz; Miernik, Katarzyna; Slawinski, Michael A.

    2017-04-01

    In this contribution the specific application of Pareto joint inversion in solving geophysical problem is presented. Pareto criterion combine with Particle Swarm Optimization were used to solve geophysical inverse problems for Love and Quasi Rayleigh's waves. Basic theory of forward problem calculation for chosen surface waves is described. To avoid computational problems some simplification were made. This operation allowed foster and more straightforward calculation without lost of solution generality. According to the solving scheme restrictions, considered model must have exact two layers, elastic isotropic surface layer and elastic isotropic half space with infinite thickness. The aim of the inversion is to obain elastic parameters and model geometry using dispersion data. In calculations different case were considered, such as different number of modes for different wave types and different frequencies. Created solutions are using OpenMP standard for parallel computing, which help in reduction of computational times. The results of experimental computations are presented and commented. This research was performed in the context of The Geomechanics Project supported by Husky Energy. Also, this research was partially supported by the Natural Sciences and Engineering Research Council of Canada, grant 238416-2013, and by the Polish National Science Center under contract No. DEC-2013/11/B/ST10/0472.

  15. Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.

    PubMed

    Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong

    2010-10-01

    Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.

  16. Concept of dynamic memory in economics

    NASA Astrophysics Data System (ADS)

    Tarasova, Valentina V.; Tarasov, Vasily E.

    2018-02-01

    In this paper we discuss a concept of dynamic memory and an application of fractional calculus to describe the dynamic memory. The concept of memory is considered from the standpoint of economic models in the framework of continuous time approach based on fractional calculus. We also describe some general restrictions that can be imposed on the structure and properties of dynamic memory. These restrictions include the following three principles: (a) the principle of fading memory; (b) the principle of memory homogeneity on time (the principle of non-aging memory); (c) the principle of memory reversibility (the principle of memory recovery). Examples of different memory functions are suggested by using the fractional calculus. To illustrate an application of the concept of dynamic memory in economics we consider a generalization of the Harrod-Domar model, where the power-law memory is taken into account.

  17. Prospective memory: A comparative perspective

    PubMed Central

    Crystal, Jonathon D.; Wilson, A. George

    2014-01-01

    Prospective memory consists of forming a representation of a future action, temporarily storing that representation in memory, and retrieving it at a future time point. Here we review the recent development of animal models of prospective memory. We review experiments using rats that focus on the development of time-based and event-based prospective memory. Next, we review a number of prospective-memory approaches that have been used with a variety of non-human primates. Finally, we review selected approaches from the human literature on prospective memory to identify targets for development of animal models of prospective memory. PMID:25101562

  18. A model for memory systems based on processing modes rather than consciousness.

    PubMed

    Henke, Katharina

    2010-07-01

    Prominent models of human long-term memory distinguish between memory systems on the basis of whether learning and retrieval occur consciously or unconsciously. Episodic memory formation requires the rapid encoding of associations between different aspects of an event which, according to these models, depends on the hippocampus and on consciousness. However, recent evidence indicates that the hippocampus mediates rapid associative learning with and without consciousness in humans and animals, for long-term and short-term retention. Consciousness seems to be a poor criterion for differentiating between declarative (or explicit) and non declarative (or implicit) types of memory. A new model is therefore required in which memory systems are distinguished based on the processing operations involved rather than by consciousness.

  19. Thermodynamic Model of Spatial Memory

    NASA Astrophysics Data System (ADS)

    Kaufman, Miron; Allen, P.

    1998-03-01

    We develop and test a thermodynamic model of spatial memory. Our model is an application of statistical thermodynamics to cognitive science. It is related to applications of the statistical mechanics framework in parallel distributed processes research. Our macroscopic model allows us to evaluate an entropy associated with spatial memory tasks. We find that older adults exhibit higher levels of entropy than younger adults. Thurstone's Law of Categorical Judgment, according to which the discriminal processes along the psychological continuum produced by presentations of a single stimulus are normally distributed, is explained by using a Hooke spring model of spatial memory. We have also analyzed a nonlinear modification of the ideal spring model of spatial memory. This work is supported by NIH/NIA grant AG09282-06.

  20. An extended continuum model considering optimal velocity change with memory and numerical tests

    NASA Astrophysics Data System (ADS)

    Qingtao, Zhai; Hongxia, Ge; Rongjun, Cheng

    2018-01-01

    In this paper, an extended continuum model of traffic flow is proposed with the consideration of optimal velocity changes with memory. The new model's stability condition and KdV-Burgers equation considering the optimal velocities change with memory are deduced through linear stability theory and nonlinear analysis, respectively. Numerical simulation is carried out to study the extended continuum model, which explores how optimal velocity changes with memory affected velocity, density and energy consumption. Numerical results show that when considering the effects of optimal velocity changes with memory, the traffic jams can be suppressed efficiently. Both the memory step and sensitivity parameters of optimal velocity changes with memory will enhance the stability of traffic flow efficiently. Furthermore, numerical results demonstrates that the effect of optimal velocity changes with memory can avoid the disadvantage of historical information, which increases the stability of traffic flow on road, and so it improve the traffic flow stability and minimize cars' energy consumptions.

  1. Latent constructs model explaining the attachment-linked variation in autobiographical remembering.

    PubMed

    Öner, Sezin; Gülgöz, Sami

    2016-01-01

    In the current study, we proposed a latent constructs model to characterise the qualitative aspects of autobiographical remembering and investigated the structural relations in the model that may vary across individuals. Primarily, we focused on the memories of romantic relationships and argued that attachment anxiety and avoidance would be reflected in the ways that individuals encode, rehearse, or remember autobiographical memories in close relationships. Participants reported two positive and two negative relationship-specific memories and rated the characteristics for each memory. As predicted, the basic memory model yielded appropriate fit, indicating that event characteristics (EC) predicted the frequency of rehearsal (RC) and phenomenology at retrieval (PC). When attachment variables were integrated, the model showed that rehearsal mediated the link between anxiety and PC, especially for negative memories. On the other hand, for avoidance EC was the key factor mediating the link between avoidance and RC, as well as PC. Findings were discussed with respect to autobiographical memory functions emphasising a systematically, integrated framework.

  2. Global neural pattern similarity as a common basis for categorization and recognition memory.

    PubMed

    Davis, Tyler; Xue, Gui; Love, Bradley C; Preston, Alison R; Poldrack, Russell A

    2014-05-28

    Familiarity, or memory strength, is a central construct in models of cognition. In previous categorization and long-term memory research, correlations have been found between psychological measures of memory strength and activation in the medial temporal lobes (MTLs), which suggests a common neural locus for memory strength. However, activation alone is insufficient for determining whether the same mechanisms underlie neural function across domains. Guided by mathematical models of categorization and long-term memory, we develop a theory and a method to test whether memory strength arises from the global similarity among neural representations. In human subjects, we find significant correlations between global similarity among activation patterns in the MTLs and both subsequent memory confidence in a recognition memory task and model-based measures of memory strength in a category learning task. Our work bridges formal cognitive theories and neuroscientific models by illustrating that the same global similarity computations underlie processing in multiple cognitive domains. Moreover, by establishing a link between neural similarity and psychological memory strength, our findings suggest that there may be an isomorphism between psychological and neural representational spaces that can be exploited to test cognitive theories at both the neural and behavioral levels. Copyright © 2014 the authors 0270-6474/14/347472-13$15.00/0.

  3. The role of visual imagery in the retention of information from sentences.

    PubMed

    Drose, G S; Allen, G L

    1994-01-01

    We conducted two experiments to evaluate a multiple-code model for sentence memory that posits both propositional and visual representational systems. Both sentences involved recognition memory. The results of Experiment 1 indicated that subjects' recognition memory for concrete sentences was superior to their recognition memory for abstract sentences. Instructions to use visual imagery to enhance recognition performance yielded no effects. Experiment 2 tested the prediction that interference by a visual task would differentially affect recognition memory for concrete sentences. Results showed the interference task to have had a detrimental effect on recognition memory for both concrete and abstract sentences. Overall, the evidence provided partial support for both a multiple-code model and a semantic integration model of sentence memory.

  4. Modelling memory colour region for preference colour reproduction

    NASA Astrophysics Data System (ADS)

    Zeng, Huanzhao; Luo, Ronnier

    2010-01-01

    Colour preference adjustment is an essential step for colour image enhancement and perceptual gamut mapping. In colour reproduction for pictorial images, properly shifting colours away from their colorimetric originals may produce more preferred colour reproduction result. Memory colours, as a portion of the colour regions for colour preference adjustment, are especially important for preference colour reproduction. Identifying memory colours or modelling the memory colour region is a basic step to study preferred memory colour enhancement. In this study, we first created gamut for each memory colour region represented as a convex hull, and then used the convex hull to guide mathematical modelling to formulate the colour region for colour enhancement.

  5. Optimized multiple quantum MAS lineshape simulations in solid state NMR

    NASA Astrophysics Data System (ADS)

    Brouwer, William J.; Davis, Michael C.; Mueller, Karl T.

    2009-10-01

    The majority of nuclei available for study in solid state Nuclear Magnetic Resonance have half-integer spin I>1/2, with corresponding electric quadrupole moment. As such, they may couple with a surrounding electric field gradient. This effect introduces anisotropic line broadening to spectra, arising from distinct chemical species within polycrystalline solids. In Multiple Quantum Magic Angle Spinning (MQMAS) experiments, a second frequency dimension is created, devoid of quadrupolar anisotropy. As a result, the center of gravity of peaks in the high resolution dimension is a function of isotropic second order quadrupole and chemical shift alone. However, for complex materials, these parameters take on a stochastic nature due in turn to structural and chemical disorder. Lineshapes may still overlap in the isotropic dimension, complicating the task of assignment and interpretation. A distributed computational approach is presented here which permits simulation of the two-dimensional MQMAS spectrum, generated by random variates from model distributions of isotropic chemical and quadrupole shifts. Owing to the non-convex nature of the residual sum of squares (RSS) function between experimental and simulated spectra, simulated annealing is used to optimize the simulation parameters. In this manner, local chemical environments for disordered materials may be characterized, and via a re-sampling approach, error estimates for parameters produced. Program summaryProgram title: mqmasOPT Catalogue identifier: AEEC_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEC_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3650 No. of bytes in distributed program, including test data, etc.: 73 853 Distribution format: tar.gz Programming language: C, OCTAVE Computer: UNIX/Linux Operating system: UNIX/Linux Has the code been vectorised or parallelized?: Yes RAM: Example: (1597 powder angles) × (200 Samples) × (81 F2 frequency pts) × (31 F1 frequency points) = 3.5M, SMP AMD opteron Classification: 2.3 External routines: OCTAVE ( http://www.gnu.org/software/octave/), GNU Scientific Library ( http://www.gnu.org/software/gsl/), OPENMP ( http://openmp.org/wp/) Nature of problem: The optimal simulation and modeling of multiple quantum magic angle spinning NMR spectra, for general systems, especially those with mild to significant disorder. The approach outlined and implemented in C and OCTAVE also produces model parameter error estimates. Solution method: A model for each distinct chemical site is first proposed, for the individual contribution of crystallite orientations to the spectrum. This model is averaged over all powder angles [1], as well as the (stochastic) parameters; isotropic chemical shift and quadrupole coupling constant. The latter is accomplished via sampling from a bi-variate Gaussian distribution, using the Box-Muller algorithm to transform Sobol (quasi) random numbers [2]. A simulated annealing optimization is performed, and finally the non-linear jackknife [3] is applied in developing model parameter error estimates. Additional comments: The distribution contains a script, mqmasOpt.m, which runs in the OCTAVE language workspace. Running time: Example: (1597 powder angles) × (200 Samples) × (81 F2 frequency pts) × (31 F1 frequency points) = 58.35 seconds, SMP AMD opteron. References:S.K. Zaremba, Annali di Matematica Pura ed Applicata 73 (1966) 293. H. Niederreiter, Random Number Generation and Quasi-Monte Carlo Methods, SIAM, 1992. T. Fox, D. Hinkley, K. Larntz, Technometrics 22 (1980) 29.

  6. Performance Evaluation and Improvement of Ferroelectric Field-Effect Transistor Memory

    NASA Astrophysics Data System (ADS)

    Yu, Hyung Suk

    Flash memory is reaching scaling limitations rapidly due to reduction of charge in floating gates, charge leakage and capacitive coupling between cells which cause threshold voltage fluctuations, short retention times, and interference. Many new memory technologies are being considered as alternatives to flash memory in an effort to overcome these limitations. Ferroelectric Field-Effect Transistor (FeFET) is one of the main emerging candidates because of its structural similarity to conventional FETs and fast switching speed. Nevertheless, the performance of FeFETs have not been systematically compared and analyzed against other competing technologies. In this work, we first benchmark the intrinsic performance of FeFETs and other memories by simulations in order to identify the strengths and weaknesses of FeFETs. To simulate realistic memory applications, we compare memories on an array structure. For the comparisons, we construct an accurate delay model and verify it by benchmarking against exact HSPICE simulations. Second, we propose an accurate model for FeFET memory window since the existing model has limitations. The existing model assumes symmetric operation voltages but it is not valid for the practical asymmetric operation voltages. In this modeling, we consider practical operation voltages and device dimensions. Also, we investigate realistic changes of memory window over time and retention time of FeFETs. Last, to improve memory window and subthreshold swing, we suggest nonplanar junctionless structures for FeFETs. Using the suggested structures, we study the dimensional dependences of crucial parameters like memory window and subthreshold swing and also analyze key interference mechanisms.

  7. A Hamiltonian driven quantum-like model for overdistribution in episodic memory recollection.

    NASA Astrophysics Data System (ADS)

    Broekaert, Jan B.; Busemeyer, Jerome R.

    2017-06-01

    While people famously forget genuine memories over time, they also tend to mistakenly over-recall equivalent memories concerning a given event. The memory phenomenon is known by the name of episodic overdistribution and occurs both in memories of disjunctions and partitions of mutually exclusive events and has been tested, modeled and documented in the literature. The total classical probability of recalling exclusive sub-events most often exceeds the probability of recalling the composed event, i.e. a subadditive total. We present a Hamiltonian driven propagation for the Quantum Episodic Memory model developed by Brainerd (et al., 2015) for the episodic memory overdistribution in the experimental immediate item false memory paradigm (Brainerd and Reyna, 2008, 2010, 2015). Following the Hamiltonian method of Busemeyer and Bruza (2012) our model adds time-evolution of the perceived memory state through the stages of the experimental process based on psychologically interpretable parameters - γ_c for recollection capability of cues, κ_p for bias or description-dependence by probes and β for the average gist component in the memory state at start. With seven parameters the Hamiltonian model shows good accuracy of predictions both in the EOD-disjunction and in the EOD-subadditivity paradigm. We noticed either an outspoken preponderance of the gist over verbatim trace, or the opposite, in the initial memory state when β is real. Only for complex β a mix of both traces is present in the initial state for the EOD-subadditivity paradigm.

  8. Ising formulation of associative memory models and quantum annealing recall

    NASA Astrophysics Data System (ADS)

    Santra, Siddhartha; Shehab, Omar; Balu, Radhakrishnan

    2017-12-01

    Associative memory models, in theoretical neuro- and computer sciences, can generally store at most a linear number of memories. Recalling memories in these models can be understood as retrieval of the energy minimizing configuration of classical Ising spins, closest in Hamming distance to an imperfect input memory, where the energy landscape is determined by the set of stored memories. We present an Ising formulation for associative memory models and consider the problem of memory recall using quantum annealing. We show that allowing for input-dependent energy landscapes allows storage of up to an exponential number of memories (in terms of the number of neurons). Further, we show how quantum annealing may naturally be used for recall tasks in such input-dependent energy landscapes, although the recall time may increase with the number of stored memories. Theoretically, we obtain the radius of attractor basins R (N ) and the capacity C (N ) of such a scheme and their tradeoffs. Our calculations establish that for randomly chosen memories the capacity of our model using the Hebbian learning rule as a function of problem size can be expressed as C (N ) =O (eC1N) , C1≥0 , and succeeds on randomly chosen memory sets with a probability of (1 -e-C2N) , C2≥0 with C1+C2=(0.5-f ) 2/(1 -f ) , where f =R (N )/N , 0 ≤f ≤0.5 , is the radius of attraction in terms of the Hamming distance of an input probe from a stored memory as a fraction of the problem size. We demonstrate the application of this scheme on a programmable quantum annealing device, the D-wave processor.

  9. Memory mechanisms supporting syntactic comprehension.

    PubMed

    Caplan, David; Waters, Gloria

    2013-04-01

    Efforts to characterize the memory system that supports sentence comprehension have historically drawn extensively on short-term memory as a source of mechanisms that might apply to sentences. The focus of these efforts has changed significantly in the past decade. As a result of changes in models of short-term working memory (ST-WM) and developments in models of sentence comprehension, the effort to relate entire components of an ST-WM system, such as those in the model developed by Baddeley (Nature Reviews Neuroscience 4: 829-839, 2003) to sentence comprehension has largely been replaced by an effort to relate more specific mechanisms found in modern models of ST-WM to memory processes that support one aspect of sentence comprehension--the assignment of syntactic structure (parsing) and its use in determining sentence meaning (interpretation) during sentence comprehension. In this article, we present the historical background to recent studies of the memory mechanisms that support parsing and interpretation and review recent research into this relation. We argue that the results of this research do not converge on a set of mechanisms derived from ST-WM that apply to parsing and interpretation. We argue that the memory mechanisms supporting parsing and interpretation have features that characterize another memory system that has been postulated to account for skilled performance-long-term working memory. We propose a model of the relation of different aspects of parsing and interpretation to ST-WM and long-term working memory.

  10. The role of cognitive reserve and memory self-efficacy in compensatory strategy use: A structural equation approach.

    PubMed

    Simon, Christa; Schmitter-Edgecombe, Maureen

    2016-08-01

    The use of compensatory strategies plays an important role in the ability of older adults to adapt to late-life memory changes. Even with the benefits associated with compensatory strategy use, little research has explored specific mechanisms associated with memory performance and compensatory strategies. Rather than an individual's objective memory performance directly predicting their use of compensatory strategies, it is possible that some other variables are indirectly influencing that relationship. The purpose of this study was to: (a) examine the moderating effects of cognitive reserve (CR) and (b) evaluate the potential mediating effects of memory self-efficacy on the relationship between objective memory performance and compensatory strategy use. Two structural equation models (SEM) were used to evaluate CR (latent moderator model) and memory self-efficacy (mediator model) in a sample of 155 community-dwelling older adults over the age of 55. The latent variable moderator model indicated that CR was not substantiated as a moderator variable in this sample (p = .861). However, memory self-efficacy significantly mediated the association between objective memory performance and compensatory strategy use (β = .22, 95% confidence interval, CI [.002, .437]). More specifically, better objective memory was associated with lower compensatory strategy use because of its relation to higher memory self-efficacy. These findings provide initial support for an explanatory framework of the relation between objective memory and compensatory strategy use in a healthy older adult population by identifying the importance of an individual's memory perceptions.

  11. The Generalized Quantum Episodic Memory Model.

    PubMed

    Trueblood, Jennifer S; Hemmer, Pernille

    2017-11-01

    Recent evidence suggests that experienced events are often mapped to too many episodic states, including those that are logically or experimentally incompatible with one another. For example, episodic over-distribution patterns show that the probability of accepting an item under different mutually exclusive conditions violates the disjunction rule. A related example, called subadditivity, occurs when the probability of accepting an item under mutually exclusive and exhaustive instruction conditions sums to a number >1. Both the over-distribution effect and subadditivity have been widely observed in item and source-memory paradigms. These phenomena are difficult to explain using standard memory frameworks, such as signal-detection theory. A dual-trace model called the over-distribution (OD) model (Brainerd & Reyna, 2008) can explain the episodic over-distribution effect, but not subadditivity. Our goal is to develop a model that can explain both effects. In this paper, we propose the Generalized Quantum Episodic Memory (GQEM) model, which extends the Quantum Episodic Memory (QEM) model developed by Brainerd, Wang, and Reyna (2013). We test GQEM by comparing it to the OD model using data from a novel item-memory experiment and a previously published source-memory experiment (Kellen, Singmann, & Klauer, 2014) examining the over-distribution effect. Using the best-fit parameters from the over-distribution experiments, we conclude by showing that the GQEM model can also account for subadditivity. Overall these results add to a growing body of evidence suggesting that quantum probability theory is a valuable tool in modeling recognition memory. Copyright © 2016 Cognitive Science Society, Inc.

  12. How Fuzzy-Trace Theory Predicts True and False Memories for Words, Sentences, and Narratives

    PubMed Central

    Reyna, Valerie F.; Corbin, Jonathan C.; Weldon, Rebecca B.; Brainerd, Charles J.

    2016-01-01

    Fuzzy-trace theory posits independent verbatim and gist memory processes, a distinction that has implications for such applied topics as eyewitness testimony. This distinction between precise, literal verbatim memory and meaning-based, intuitive gist accounts for memory paradoxes including dissociations between true and false memory, false memories outlasting true memories, and developmental increases in false memory. We provide an overview of fuzzy-trace theory, and, using mathematical modeling, also present results demonstrating verbatim and gist memory in true and false recognition of narrative sentences and inferences. Results supported fuzzy-trace theory's dual-process view of memory: verbatim memory was relied on to reject meaning-consistent, but unpresented, sentences (via recollection rejection). However, verbatim memory was often not retrieved, and gist memory supported acceptance of these sentences (via similarity judgment and phantom recollection). Thus, mathematical models of words can be extended to explain memory for complex stimuli, such as narratives, the kind of memory interrogated in law. PMID:27042402

  13. Memory-induced nonlinear dynamics of excitation in cardiac diseases.

    PubMed

    Landaw, Julian; Qu, Zhilin

    2018-04-01

    Excitable cells, such as cardiac myocytes, exhibit short-term memory, i.e., the state of the cell depends on its history of excitation. Memory can originate from slow recovery of membrane ion channels or from accumulation of intracellular ion concentrations, such as calcium ion or sodium ion concentration accumulation. Here we examine the effects of memory on excitation dynamics in cardiac myocytes under two diseased conditions, early repolarization and reduced repolarization reserve, each with memory from two different sources: slow recovery of a potassium ion channel and slow accumulation of the intracellular calcium ion concentration. We first carry out computer simulations of action potential models described by differential equations to demonstrate complex excitation dynamics, such as chaos. We then develop iterated map models that incorporate memory, which accurately capture the complex excitation dynamics and bifurcations of the action potential models. Finally, we carry out theoretical analyses of the iterated map models to reveal the underlying mechanisms of memory-induced nonlinear dynamics. Our study demonstrates that the memory effect can be unmasked or greatly exacerbated under certain diseased conditions, which promotes complex excitation dynamics, such as chaos. The iterated map models reveal that memory converts a monotonic iterated map function into a nonmonotonic one to promote the bifurcations leading to high periodicity and chaos.

  14. Memory-induced nonlinear dynamics of excitation in cardiac diseases

    NASA Astrophysics Data System (ADS)

    Landaw, Julian; Qu, Zhilin

    2018-04-01

    Excitable cells, such as cardiac myocytes, exhibit short-term memory, i.e., the state of the cell depends on its history of excitation. Memory can originate from slow recovery of membrane ion channels or from accumulation of intracellular ion concentrations, such as calcium ion or sodium ion concentration accumulation. Here we examine the effects of memory on excitation dynamics in cardiac myocytes under two diseased conditions, early repolarization and reduced repolarization reserve, each with memory from two different sources: slow recovery of a potassium ion channel and slow accumulation of the intracellular calcium ion concentration. We first carry out computer simulations of action potential models described by differential equations to demonstrate complex excitation dynamics, such as chaos. We then develop iterated map models that incorporate memory, which accurately capture the complex excitation dynamics and bifurcations of the action potential models. Finally, we carry out theoretical analyses of the iterated map models to reveal the underlying mechanisms of memory-induced nonlinear dynamics. Our study demonstrates that the memory effect can be unmasked or greatly exacerbated under certain diseased conditions, which promotes complex excitation dynamics, such as chaos. The iterated map models reveal that memory converts a monotonic iterated map function into a nonmonotonic one to promote the bifurcations leading to high periodicity and chaos.

  15. Cholinergic modulation of cognitive processing: insights drawn from computational models

    PubMed Central

    Newman, Ehren L.; Gupta, Kishan; Climer, Jason R.; Monaghan, Caitlin K.; Hasselmo, Michael E.

    2012-01-01

    Acetylcholine plays an important role in cognitive function, as shown by pharmacological manipulations that impact working memory, attention, episodic memory, and spatial memory function. Acetylcholine also shows striking modulatory influences on the cellular physiology of hippocampal and cortical neurons. Modeling of neural circuits provides a framework for understanding how the cognitive functions may arise from the influence of acetylcholine on neural and network dynamics. We review the influences of cholinergic manipulations on behavioral performance in working memory, attention, episodic memory, and spatial memory tasks, the physiological effects of acetylcholine on neural and circuit dynamics, and the computational models that provide insight into the functional relationships between the physiology and behavior. Specifically, we discuss the important role of acetylcholine in governing mechanisms of active maintenance in working memory tasks and in regulating network dynamics important for effective processing of stimuli in attention and episodic memory tasks. We also propose that theta rhythm plays a crucial role as an intermediary between the physiological influences of acetylcholine and behavior in episodic and spatial memory tasks. We conclude with a synthesis of the existing modeling work and highlight future directions that are likely to be rewarding given the existing state of the literature for both empiricists and modelers. PMID:22707936

  16. An interference model of visual working memory.

    PubMed

    Oberauer, Klaus; Lin, Hsuan-Yu

    2017-01-01

    The article introduces an interference model of working memory for information in a continuous similarity space, such as the features of visual objects. The model incorporates the following assumptions: (a) Probability of retrieval is determined by the relative activation of each retrieval candidate at the time of retrieval; (b) activation comes from 3 sources in memory: cue-based retrieval using context cues, context-independent memory for relevant contents, and noise; (c) 1 memory object and its context can be held in the focus of attention, where it is represented with higher precision, and partly shielded against interference. The model was fit to data from 4 continuous-reproduction experiments testing working memory for colors or orientations. The experiments involved variations of set size, kind of context cues, precueing, and retro-cueing of the to-be-tested item. The interference model fit the data better than 2 competing models, the Slot-Averaging model and the Variable-Precision resource model. The interference model also fared well in comparison to several new models incorporating alternative theoretical assumptions. The experiments confirm 3 novel predictions of the interference model: (a) Nontargets intrude in recall to the extent that they are close to the target in context space; (b) similarity between target and nontarget features improves recall, and (c) precueing-but not retro-cueing-the target substantially reduces the set-size effect. The success of the interference model shows that working memory for continuous visual information works according to the same principles as working memory for more discrete (e.g., verbal) contents. Data and model codes are available at https://osf.io/wgqd5/. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  17. Differentiating the Differentiation Models: A Comparison of the Retrieving Effectively from Memory Model (REM) and the Subjective Likelihood Model (SLiM)

    ERIC Educational Resources Information Center

    Criss, Amy H.; McClelland, James L.

    2006-01-01

    The subjective likelihood model [SLiM; McClelland, J. L., & Chappell, M. (1998). Familiarity breeds differentiation: a subjective-likelihood approach to the effects of experience in recognition memory. "Psychological Review," 105(4), 734-760.] and the retrieving effectively from memory model [REM; Shiffrin, R. M., & Steyvers, M. (1997). A model…

  18. Memory consolidation from seconds to weeks: a three-stage neural network model with autonomous reinstatement dynamics

    PubMed Central

    Fiebig, Florian; Lansner, Anders

    2014-01-01

    Declarative long-term memories are not created in an instant. Gradual stabilization and temporally shifting dependence of acquired declarative memories in different brain regions—called systems consolidation—can be tracked in time by lesion experiments. The observation of temporally graded retrograde amnesia (RA) following hippocampal lesions points to a gradual transfer of memory from hippocampus to neocortical long-term memory. Spontaneous reactivations of hippocampal memories, as observed in place cell reactivations during slow-wave-sleep, are supposed to drive neocortical reinstatements and facilitate this process. We propose a functional neural network implementation of these ideas and furthermore suggest an extended three-state framework that includes the prefrontal cortex (PFC). It bridges the temporal chasm between working memory percepts on the scale of seconds and consolidated long-term memory on the scale of weeks or months. We show that our three-stage model can autonomously produce the necessary stochastic reactivation dynamics for successful episodic memory consolidation. The resulting learning system is shown to exhibit classical memory effects seen in experimental studies, such as retrograde and anterograde amnesia (AA) after simulated hippocampal lesioning; furthermore the model reproduces peculiar biological findings on memory modulation, such as retrograde facilitation of memory after suppressed acquisition of new long-term memories—similar to the effects of benzodiazepines on memory. PMID:25071536

  19. Through the Immune Looking Glass: A Model for Brain Memory Strategies

    PubMed Central

    Sánchez-Ramón, Silvia; Faure, Florence

    2016-01-01

    The immune system (IS) and the central nervous system (CNS) are complex cognitive networks involved in defining the identity (self) of the individual through recognition and memory processes that enable one to anticipate responses to stimuli. Brain memory has traditionally been classified as either implicit or explicit on psychological and anatomical grounds, with reminiscences of the evolutionarily-based innate-adaptive IS responses. Beyond the multineuronal networks of the CNS, we propose a theoretical model of brain memory integrating the CNS as a whole. This is achieved by analogical reasoning between the operational rules of recognition and memory processes in both systems, coupled to an evolutionary analysis. In this new model, the hippocampus is no longer specifically ascribed to explicit memory but rather it both becomes part of the innate (implicit) memory system and tightly controls the explicit memory system. Alike the antigen presenting cells for the IS, the hippocampus would integrate transient and pseudo-specific (i.e., danger-fear) memories and would drive the formation of long-term and highly specific or explicit memories (i.e., the taste of the Proust’s madeleine cake) by the more complex and recent, evolutionarily speaking, neocortex. Experimental and clinical evidence is provided to support the model. We believe that the singularity of this model’s approximation could help to gain a better understanding of the mechanisms operating in brain memory strategies from a large-scale network perspective. PMID:26869886

  20. Memory Effects on Movement Behavior in Animal Foraging

    PubMed Central

    Bracis, Chloe; Gurarie, Eliezer; Van Moorter, Bram; Goodwin, R. Andrew

    2015-01-01

    An individual’s choices are shaped by its experience, a fundamental property of behavior important to understanding complex processes. Learning and memory are observed across many taxa and can drive behaviors, including foraging behavior. To explore the conditions under which memory provides an advantage, we present a continuous-space, continuous-time model of animal movement that incorporates learning and memory. Using simulation models, we evaluate the benefit memory provides across several types of landscapes with variable-quality resources and compare the memory model within a nested hierarchy of simpler models (behavioral switching and random walk). We find that memory almost always leads to improved foraging success, but that this effect is most marked in landscapes containing sparse, contiguous patches of high-value resources that regenerate relatively fast and are located in an otherwise devoid landscape. In these cases, there is a large payoff for finding a resource patch, due to size, value, or locational difficulty. While memory-informed search is difficult to differentiate from other factors using solely movement data, our results suggest that disproportionate spatial use of higher value areas, higher consumption rates, and consumption variability all point to memory influencing the movement direction of animals in certain ecosystems. PMID:26288228

  1. The role of memory in the relationship between attention toward thin-ideal media and body dissatisfaction.

    PubMed

    Jiang, Michelle Y W; Vartanian, Lenny R

    2016-03-01

    This study examined the causal relationship between attention and memory bias toward thin-body images, and the indirect effect of attending to thin-body images on women's body dissatisfaction via memory. In a 2 (restrained vs. unrestrained eaters) × 2 (long vs. short exposure) quasi-experimental design, female participants (n = 90) were shown images of thin models for either 7 s or 150 ms, and then completed a measure of body dissatisfaction and a recognition test to assess their memory for the images. Both restrained and unrestrained eaters in the long exposure condition had better recognition memory for images of thin models than did those in the short exposure condition. Better recognition memory for images of thin models was associated with lower body dissatisfaction. Finally, exposure duration to images of thin models had an indirect effect on body dissatisfaction through recognition memory. These findings suggest that memory for body-related information may be more critical in influencing women's body image than merely the exposure itself, and that targeting memory bias might enhance the effectiveness of cognitive bias modification programs.

  2. Memory Effects on Movement Behavior in Animal Foraging.

    PubMed

    Bracis, Chloe; Gurarie, Eliezer; Van Moorter, Bram; Goodwin, R Andrew

    2015-01-01

    An individual's choices are shaped by its experience, a fundamental property of behavior important to understanding complex processes. Learning and memory are observed across many taxa and can drive behaviors, including foraging behavior. To explore the conditions under which memory provides an advantage, we present a continuous-space, continuous-time model of animal movement that incorporates learning and memory. Using simulation models, we evaluate the benefit memory provides across several types of landscapes with variable-quality resources and compare the memory model within a nested hierarchy of simpler models (behavioral switching and random walk). We find that memory almost always leads to improved foraging success, but that this effect is most marked in landscapes containing sparse, contiguous patches of high-value resources that regenerate relatively fast and are located in an otherwise devoid landscape. In these cases, there is a large payoff for finding a resource patch, due to size, value, or locational difficulty. While memory-informed search is difficult to differentiate from other factors using solely movement data, our results suggest that disproportionate spatial use of higher value areas, higher consumption rates, and consumption variability all point to memory influencing the movement direction of animals in certain ecosystems.

  3. Acute effects of alcohol on intrusive memory development and viewpoint dependence in spatial memory support a dual representation model.

    PubMed

    Bisby, James A; King, John A; Brewin, Chris R; Burgess, Neil; Curran, H Valerie

    2010-08-01

    A dual representation model of intrusive memory proposes that personally experienced events give rise to two types of representation: an image-based, egocentric representation based on sensory-perceptual features; and a more abstract, allocentric representation that incorporates spatiotemporal context. The model proposes that intrusions reflect involuntary reactivation of egocentric representations in the absence of a corresponding allocentric representation. We tested the model by investigating the effect of alcohol on intrusive memories and, concurrently, on egocentric and allocentric spatial memory. With a double-blind independent group design participants were administered alcohol (.4 or .8 g/kg) or placebo. A virtual environment was used to present objects and test recognition memory from the same viewpoint as presentation (tapping egocentric memory) or a shifted viewpoint (tapping allocentric memory). Participants were also exposed to a trauma video and required to detail intrusive memories for 7 days, after which explicit memory was assessed. There was a selective impairment of shifted-view recognition after the low dose of alcohol, whereas the high dose induced a global impairment in same-view and shifted-view conditions. Alcohol showed a dose-dependent inverted "U"-shaped effect on intrusions, with only the low dose increasing the number of intrusions, replicating previous work. When same-view recognition was intact, decrements in shifted-view recognition were associated with increases in intrusions. The differential effect of alcohol on intrusive memories and on same/shifted-view recognition support a dual representation model in which intrusions might reflect an imbalance between two types of memory representation. These findings highlight important clinical implications, given alcohol's involvement in real-life trauma. Copyright 2010 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  4. `Unlearning' has a stabilizing effect in collective memories

    NASA Astrophysics Data System (ADS)

    Hopfield, J. J.; Feinstein, D. I.; Palmer, R. G.

    1983-07-01

    Crick and Mitchison1 have presented a hypothesis for the functional role of dream sleep involving an `unlearning' process. We have independently carried out mathematical and computer modelling of learning and `unlearning' in a collective neural network of 30-1,000 neurones. The model network has a content-addressable memory or `associative memory' which allows it to learn and store many memories. A particular memory can be evoked in its entirety when the network is stimulated by any adequate-sized subpart of the information of that memory2. But different memories of the same size are not equally easy to recall. Also, when memories are learned, spurious memories are also created and can also be evoked. Applying an `unlearning' process, similar to the learning processes but with a reversed sign and starting from a noise input, enhances the performance of the network in accessing real memories and in minimizing spurious ones. Although our model was not motivated by higher nervous function, our system displays behaviours which are strikingly parallel to those needed for the hypothesized role of `unlearning' in rapid eye movement (REM) sleep.

  5. Memory as the "whole brain work": a large-scale model based on "oscillations in super-synergy".

    PubMed

    Başar, Erol

    2005-01-01

    According to recent trends, memory depends on several brain structures working in concert across many levels of neural organization; "memory is a constant work-in progress." The proposition of a brain theory based on super-synergy in neural populations is most pertinent for the understanding of this constant work in progress. This report introduces a new model on memory basing on the processes of EEG oscillations and Brain Dynamics. This model is shaped by the following conceptual and experimental steps: 1. The machineries of super-synergy in the whole brain are responsible for formation of sensory-cognitive percepts. 2. The expression "dynamic memory" is used for memory processes that evoke relevant changes in alpha, gamma, theta and delta activities. The concerted action of distributed multiple oscillatory processes provides a major key for understanding of distributed memory. It comprehends also the phyletic memory and reflexes. 3. The evolving memory, which incorporates reciprocal actions or reverberations in the APLR alliance and during working memory processes, is especially emphasized. 4. A new model related to "hierarchy of memories as a continuum" is introduced. 5. The notions of "longer activated memory" and "persistent memory" are proposed instead of long-term memory. 6. The new analysis to recognize faces emphasizes the importance of EEG oscillations in neurophysiology and Gestalt analysis. 7. The proposed basic framework called "Memory in the Whole Brain Work" emphasizes that memory and all brain functions are inseparable and are acting as a "whole" in the whole brain. 8. The role of genetic factors is fundamental in living system settings and oscillations and accordingly in memory, according to recent publications. 9. A link from the "whole brain" to "whole body," and incorporation of vegetative and neurological system, is proposed, EEG oscillations and ultraslow oscillations being a control parameter.

  6. Sequence memory based on coherent spin-interaction neural networks.

    PubMed

    Xia, Min; Wong, W K; Wang, Zhijie

    2014-12-01

    Sequence information processing, for instance, the sequence memory, plays an important role on many functions of brain. In the workings of the human brain, the steady-state period is alterable. However, in the existing sequence memory models using heteroassociations, the steady-state period cannot be changed in the sequence recall. In this work, a novel neural network model for sequence memory with controllable steady-state period based on coherent spininteraction is proposed. In the proposed model, neurons fire collectively in a phase-coherent manner, which lets a neuron group respond differently to different patterns and also lets different neuron groups respond differently to one pattern. The simulation results demonstrating the performance of the sequence memory are presented. By introducing a new coherent spin-interaction sequence memory model, the steady-state period can be controlled by dimension parameters and the overlap between the input pattern and the stored patterns. The sequence storage capacity is enlarged by coherent spin interaction compared with the existing sequence memory models. Furthermore, the sequence storage capacity has an exponential relationship to the dimension of the neural network.

  7. Competitive Trace Theory: A Role for the Hippocampus in Contextual Interference during Retrieval.

    PubMed

    Yassa, Michael A; Reagh, Zachariah M

    2013-01-01

    Much controversy exists regarding the role of the hippocampus in retrieval. The two dominant and competing accounts have been the Standard Model of Systems Consolidation (SMSC) and Multiple Trace Theory (MTT), which specifically make opposing predictions as to the necessity of the hippocampus for retrieval of remote memories. Under SMSC, memories eventually become independent of the hippocampus as they become more reliant on cortical connectivity, and thus the hippocampus is not required for retrieval of remote memories, only recent ones. MTT on the other hand claims that the hippocampus is always required no matter the age of the memory. We argue that this dissociation may be too simplistic, and a continuum model may be better suited to address the role of the hippocampus in retrieval of remote memories. Such a model is presented here with the main function of the hippocampus during retrieval being "recontextualization," or the reconstruction of memory using overlapping traces. As memories get older, they are decontextualized due to competition among partially overlapping traces and become more semantic and reliant on neocortical storage. In this framework dubbed the Competitive Trace Theory (CTT), consolidation events that lead to the strengthening of memories enhance conceptual knowledge (semantic memory) at the expense of contextual details (episodic memory). As a result, remote memories are more likely to have a stronger semantic representation. At the same time, remote memories are also more likely to include illusory details. The CTT is a novel candidate model that may provide some resolution to the memory consolidation debate.

  8. Competitive Trace Theory: A Role for the Hippocampus in Contextual Interference during Retrieval

    PubMed Central

    Yassa, Michael A.; Reagh, Zachariah M.

    2013-01-01

    Much controversy exists regarding the role of the hippocampus in retrieval. The two dominant and competing accounts have been the Standard Model of Systems Consolidation (SMSC) and Multiple Trace Theory (MTT), which specifically make opposing predictions as to the necessity of the hippocampus for retrieval of remote memories. Under SMSC, memories eventually become independent of the hippocampus as they become more reliant on cortical connectivity, and thus the hippocampus is not required for retrieval of remote memories, only recent ones. MTT on the other hand claims that the hippocampus is always required no matter the age of the memory. We argue that this dissociation may be too simplistic, and a continuum model may be better suited to address the role of the hippocampus in retrieval of remote memories. Such a model is presented here with the main function of the hippocampus during retrieval being “recontextualization,” or the reconstruction of memory using overlapping traces. As memories get older, they are decontextualized due to competition among partially overlapping traces and become more semantic and reliant on neocortical storage. In this framework dubbed the Competitive Trace Theory (CTT), consolidation events that lead to the strengthening of memories enhance conceptual knowledge (semantic memory) at the expense of contextual details (episodic memory). As a result, remote memories are more likely to have a stronger semantic representation. At the same time, remote memories are also more likely to include illusory details. The CTT is a novel candidate model that may provide some resolution to the memory consolidation debate. PMID:23964216

  9. Mobile, Virtual Enhancements for Rehabilitation (MOVER)

    DTIC Science & Technology

    2013-11-28

    Modeling Autobiographical Memory for Believable Agents, AIIDE, Boston, MA. 2013. From the abstract: “We present a multi-layer hierarchical...connectionist network model for simulating human autobiographical memory in believable agents. Grounded in psychological theory, this model improves on...previous attempts to model agents’ event knowledge by providing a more dynamic and nondeterministic representation of autobiographical memories .” This

  10. Testing Signal-Detection Models of Yes/No and Two-Alternative Forced-Choice Recognition Memory

    ERIC Educational Resources Information Center

    Jang, Yoonhee; Wixted, John T.; Huber, David E.

    2009-01-01

    The current study compared 3 models of recognition memory in their ability to generalize across yes/no and 2-alternative forced-choice (2AFC) testing. The unequal-variance signal-detection model assumes a continuous memory strength process. The dual-process signal-detection model adds a thresholdlike recollection process to a continuous…

  11. Statistical Mechanics Model of the Speed - Accuracy Tradeoff in Spatial and Lexical Memory

    NASA Astrophysics Data System (ADS)

    Kaufman, Miron; Allen, Philip

    2000-03-01

    The molar neural network model of P. Allen, M. Kaufman, A. F. Smith, R. E. Popper, Psychology and Aging 13, 501 (1998) and Experimental Aging Research, 24, 307 (1998) is extended to incorporate reaction times. In our model the entropy associated with a particular task determines the reaction time. We use this molar neural model to directly analyze experimental data on episodic (spatial) memory and semantic (lexical) memory tasks. In particular we are interested in the effect of aging on the two types of memory. We find that there is no difference in performance levels for lexical memory tasks between younger and older adults. In the case spatial memory tasks we find that aging has a detrimental effect on the performance level. This work is supported by NIH/NIA grant AG09282-06.

  12. Recognition and source memory as multivariate decision processes.

    PubMed

    Banks, W P

    2000-07-01

    Recognition memory, source memory, and exclusion performance are three important domains of study in memory, each with its own findings, it specific theoretical developments, and its separate research literature. It is proposed here that results from all three domains can be treated with a single analytic model. This article shows how to generate a comprehensive memory representation based on multidimensional signal detection theory and how to make predictions for each of these paradigms using decision axes drawn through the space. The detection model is simpler than the comparable multinomial model, it is more easily generalizable, and it does not make threshold assumptions. An experiment using the same memory set for all three tasks demonstrates the analysis and tests the model. The results show that some seemingly complex relations between the paradigms derive from an underlying simplicity of structure.

  13. Why some colors appear more memorable than others: A model combining categories and particulars in color working memory.

    PubMed

    Bae, Gi-Yeul; Olkkonen, Maria; Allred, Sarah R; Flombaum, Jonathan I

    2015-08-01

    Categorization with basic color terms is an intuitive and universal aspect of color perception. Yet research on visual working memory capacity has largely assumed that only continuous estimates within color space are relevant to memory. As a result, the influence of color categories on working memory remains unknown. We propose a dual content model of color representation in which color matches to objects that are either present (perception) or absent (memory) integrate category representations along with estimates of specific values on a continuous scale ("particulars"). We develop and test the model through 4 experiments. In a first experiment pair, participants reproduce a color target, both with and without a delay, using a recently influential estimation paradigm. In a second experiment pair, we use standard methods in color perception to identify boundary and focal colors in the stimulus set. The main results are that responses drawn from working memory are significantly biased away from category boundaries and toward category centers. Importantly, the same pattern of results is present without a memory delay. The proposed dual content model parsimoniously explains these results, and it should replace prevailing single content models in studies of visual working memory. More broadly, the model and the results demonstrate how the main consequence of visual working memory maintenance is the amplification of category related biases and stimulus-specific variability that originate in perception. (c) 2015 APA, all rights reserved).

  14. Robert Hooke's model of memory.

    PubMed

    Hintzman, Douglas L

    2003-03-01

    In 1682 the scientist and inventor Robert Hooke read a lecture to the Royal Society of London, in which he described a mechanistic model of human memory. Yet few psychologists today seem to have heard of Hooke's memory model. The lecture addressed questions of encoding, memory capacity, repetition, retrieval, and forgetting--some of these in a surprisingly modern way. Hooke's model shares several characteristics with the theory of Richard Semon, which came more than 200 years later, but it is more complete. Among the model's interesting properties are that (1) it allows for attention and other top-down influences on encoding; (2) it uses resonance to implement parallel, cue-dependent retrieval; (3) it explains memory for recency; (4) it offers a single-system account of repetition priming; and (5) the power law of forgetting can be derived from the model's assumptions in a straightforward way.

  15. Ghosts in the machine: memory interference from the previous trial.

    PubMed

    Papadimitriou, Charalampos; Ferdoash, Afreen; Snyder, Lawrence H

    2015-01-15

    Previous memoranda can interfere with the memorization or storage of new information, a concept known as proactive interference. Studies of proactive interference typically use categorical memoranda and match-to-sample tasks with categorical measures such as the proportion of correct to incorrect responses. In this study we instead train five macaques in a spatial memory task with continuous memoranda and responses, allowing us to more finely probe working memory circuits. We first ask whether the memoranda from the previous trial result in proactive interference in an oculomotor delayed response task. We then characterize the spatial and temporal profile of this interference and ask whether this profile can be predicted by an attractor network model of working memory. We find that memory in the current trial shows a bias toward the location of the memorandum of the previous trial. The magnitude of this bias increases with the duration of the memory period within which it is measured. Our simulations using standard attractor network models of working memory show that these models easily replicate the spatial profile of the bias. However, unlike the behavioral findings, these attractor models show an increase in bias with the duration of the previous rather than the current memory period. To model a bias that increases with current trial duration we posit two separate memory stores, a rapidly decaying visual store that resists proactive interference effects and a sustained memory store that is susceptible to proactive interference. Copyright © 2015 the American Physiological Society.

  16. Ghosts in the machine: memory interference from the previous trial

    PubMed Central

    Ferdoash, Afreen; Snyder, Lawrence H.

    2014-01-01

    Previous memoranda can interfere with the memorization or storage of new information, a concept known as proactive interference. Studies of proactive interference typically use categorical memoranda and match-to-sample tasks with categorical measures such as the proportion of correct to incorrect responses. In this study we instead train five macaques in a spatial memory task with continuous memoranda and responses, allowing us to more finely probe working memory circuits. We first ask whether the memoranda from the previous trial result in proactive interference in an oculomotor delayed response task. We then characterize the spatial and temporal profile of this interference and ask whether this profile can be predicted by an attractor network model of working memory. We find that memory in the current trial shows a bias toward the location of the memorandum of the previous trial. The magnitude of this bias increases with the duration of the memory period within which it is measured. Our simulations using standard attractor network models of working memory show that these models easily replicate the spatial profile of the bias. However, unlike the behavioral findings, these attractor models show an increase in bias with the duration of the previous rather than the current memory period. To model a bias that increases with current trial duration we posit two separate memory stores, a rapidly decaying visual store that resists proactive interference effects and a sustained memory store that is susceptible to proactive interference. PMID:25376781

  17. BAIAP2 is related to emotional modulation of human memory strength.

    PubMed

    Luksys, Gediminas; Ackermann, Sandra; Coynel, David; Fastenrath, Matthias; Gschwind, Leo; Heck, Angela; Rasch, Bjoern; Spalek, Klara; Vogler, Christian; Papassotiropoulos, Andreas; de Quervain, Dominique

    2014-01-01

    Memory performance is the result of many distinct mental processes, such as memory encoding, forgetting, and modulation of memory strength by emotional arousal. These processes, which are subserved by partly distinct molecular profiles, are not always amenable to direct observation. Therefore, computational models can be used to make inferences about specific mental processes and to study their genetic underpinnings. Here we combined a computational model-based analysis of memory-related processes with high density genetic information derived from a genome-wide study in healthy young adults. After identifying the best-fitting model for a verbal memory task and estimating the best-fitting individual cognitive parameters, we found a common variant in the gene encoding the brain-specific angiogenesis inhibitor 1-associated protein 2 (BAIAP2) that was related to the model parameter reflecting modulation of verbal memory strength by negative valence. We also observed an association between the same genetic variant and a similar emotional modulation phenotype in a different population performing a picture memory task. Furthermore, using functional neuroimaging we found robust genotype-dependent differences in activity of the parahippocampal cortex that were specifically related to successful memory encoding of negative versus neutral information. Finally, we analyzed cortical gene expression data of 193 deceased subjects and detected significant BAIAP2 genotype-dependent differences in BAIAP2 mRNA levels. Our findings suggest that model-based dissociation of specific cognitive parameters can improve the understanding of genetic underpinnings of human learning and memory.

  18. Memory mechanisms supporting syntactic comprehension

    PubMed Central

    Waters, Gloria

    2013-01-01

    Efforts to characterize the memory system that supports sentence comprehension have historically drawn extensively on short-term memory as a source of mechanisms that might apply to sentences. The focus of these efforts has changed significantly in the past decade. As a result of changes in models of short-term working memory (ST-WM) and developments in models of sentence comprehension, the effort to relate entire components of an ST-WM system, such as those in the model developed by Baddeley (Nature Reviews Neuroscience 4: 829–839, 2003) to sentence comprehension has largely been replaced by an effort to relate more specific mechanisms found in modern models of ST-WM to memory processes that support one aspect of sentence comprehension—the assignment of syntactic structure (parsing) and its use in determining sentence meaning (interpretation) during sentence comprehension. In this article, we present the historical background to recent studies of the memory mechanisms that support parsing and interpretation and review recent research into this relation. We argue that the results of this research do not converge on a set of mechanisms derived from ST-WM that apply to parsing and interpretation. We argue that the memory mechanisms supporting parsing and interpretation have features that characterize another memory system that has been postulated to account for skilled performance—long-term working memory. We propose a model of the relation of different aspects of parsing and interpretation to ST-WM and long-term working memory. PMID:23319178

  19. Modeling of Sonos Memory Cell Erase Cycle

    NASA Technical Reports Server (NTRS)

    Phillips, Thomas A.; MacLeond, Todd C.; Ho, Fat D.

    2010-01-01

    Silicon-oxide-nitride-oxide-silicon (SONOS) nonvolatile semiconductor memories (NVSMS) have many advantages. These memories are electrically erasable programmable read-only memories (EEPROMs). They utilize low programming voltages, endure extended erase/write cycles, are inherently resistant to radiation, and are compatible with high-density scaled CMOS for low power, portable electronics. The SONOS memory cell erase cycle was investigated using a nonquasi-static (NQS) MOSFET model. The SONOS floating gate charge and voltage, tunneling current, threshold voltage, and drain current were characterized during an erase cycle. Comparisons were made between the model predictions and experimental device data.

  20. Changing Behavior by Memory Aids: A Social Psychological Model of Prospective Memory and Habit Development Tested with Dynamic Field Data

    ERIC Educational Resources Information Center

    Tobias, Robert

    2009-01-01

    This article presents a social psychological model of prospective memory and habit development. The model is based on relevant research literature, and its dynamics were investigated by computer simulations. Time-series data from a behavior-change campaign in Cuba were used for calibration and validation of the model. The model scored well in…

  1. Content-Addressable Memory Storage by Neural Networks: A General Model and Global Liapunov Method,

    DTIC Science & Technology

    1988-03-01

    point ex- ists. Liapunov functions were also described for Volterra -Lotka systems whose off-diagonal terms are relatively small (Kilmer, 1972...field, bidirectional associative memory, Volterra -Lotka, Gilpin-Ayala, and Eigen- Schuster models. The Cohen-Grossberg model thus defines a general...masking field, bidirectional associative memory. Volterra -Lotka, Gilpin-Ayala. and Eigen-Schuster models. The Cohen-Grossberg model thus defines a

  2. Spatial working memory capacity predicts bias in estimates of location.

    PubMed

    Crawford, L Elizabeth; Landy, David; Salthouse, Timothy A

    2016-09-01

    Spatial memory research has attributed systematic bias in location estimates to a combination of a noisy memory trace with a prior structure that people impose on the space. Little is known about intraindividual stability and interindividual variation in these patterns of bias. In the current work, we align recent empirical and theoretical work on working memory capacity limits and spatial memory bias to generate the prediction that those with lower working memory capacity will show greater bias in memory of the location of a single item. Reanalyzing data from a large study of cognitive aging, we find support for this prediction. Fitting separate models to individuals' data revealed a surprising variety of strategies. Some were consistent with Bayesian models of spatial category use, however roughly half of participants biased estimates outward in a way not predicted by current models and others seemed to combine these strategies. These analyses highlight the importance of studying individuals when developing general models of cognition. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  3. Spatial Working Memory Capacity Predicts Bias in Estimates of Location

    PubMed Central

    Crawford, L. Elizabeth; Landy, David H.; Salthouse, Timothy A.

    2016-01-01

    Spatial memory research has attributed systematic bias in location estimates to a combination of a noisy memory trace with a prior structure that people impose on the space. Little is known about intra-individual stability and inter-individual variation in these patterns of bias. In the current work we align recent empirical and theoretical work on working memory capacity limits and spatial memory bias to generate the prediction that those with lower working memory capacity will show greater bias in memory of the location of a single item. Reanalyzing data from a large study of cognitive aging, we find support for this prediction. Fitting separate models to individuals’ data revealed a surprising variety of strategies. Some were consistent with Bayesian models of spatial category use, however roughly half of participants biased estimates outward in a way not predicted by current models and others seemed to combine these strategies. These analyses highlight the importance of studying individuals when developing general models of cognition. PMID:26900708

  4. A Memory-Based Model of Hick's Law

    ERIC Educational Resources Information Center

    Schneider, Darryl W.; Anderson, John R.

    2011-01-01

    We propose and evaluate a memory-based model of Hick's law, the approximately linear increase in choice reaction time with the logarithm of set size (the number of stimulus-response alternatives). According to the model, Hick's law reflects a combination of associative interference during retrieval from declarative memory and occasional savings…

  5. A multiprocessor computer simulation model employing a feedback scheduler/allocator for memory space and bandwidth matching and TMR processing

    NASA Technical Reports Server (NTRS)

    Bradley, D. B.; Irwin, J. D.

    1974-01-01

    A computer simulation model for a multiprocessor computer is developed that is useful for studying the problem of matching multiprocessor's memory space, memory bandwidth and numbers and speeds of processors with aggregate job set characteristics. The model assumes an input work load of a set of recurrent jobs. The model includes a feedback scheduler/allocator which attempts to improve system performance through higher memory bandwidth utilization by matching individual job requirements for space and bandwidth with space availability and estimates of bandwidth availability at the times of memory allocation. The simulation model includes provisions for specifying precedence relations among the jobs in a job set, and provisions for specifying precedence execution of TMR (Triple Modular Redundant and SIMPLEX (non redundant) jobs.

  6. A Single-System Model Predicts Recognition Memory and Repetition Priming in Amnesia

    PubMed Central

    Kessels, Roy P.C.; Wester, Arie J.; Shanks, David R.

    2014-01-01

    We challenge the claim that there are distinct neural systems for explicit and implicit memory by demonstrating that a formal single-system model predicts the pattern of recognition memory (explicit) and repetition priming (implicit) in amnesia. In the current investigation, human participants with amnesia categorized pictures of objects at study and then, at test, identified fragmented versions of studied (old) and nonstudied (new) objects (providing a measure of priming), and made a recognition memory judgment (old vs new) for each object. Numerous results in the amnesic patients were predicted in advance by the single-system model, as follows: (1) deficits in recognition memory and priming were evident relative to a control group; (2) items judged as old were identified at greater levels of fragmentation than items judged new, regardless of whether the items were actually old or new; and (3) the magnitude of the priming effect (the identification advantage for old vs new items) overall was greater than that of items judged new. Model evidence measures also favored the single-system model over two formal multiple-systems models. The findings support the single-system model, which explains the pattern of recognition and priming in amnesia primarily as a reduction in the strength of a single dimension of memory strength, rather than a selective explicit memory system deficit. PMID:25122896

  7. OpenSWPC: an open-source integrated parallel simulation code for modeling seismic wave propagation in 3D heterogeneous viscoelastic media

    NASA Astrophysics Data System (ADS)

    Maeda, Takuto; Takemura, Shunsuke; Furumura, Takashi

    2017-07-01

    We have developed an open-source software package, Open-source Seismic Wave Propagation Code (OpenSWPC), for parallel numerical simulations of seismic wave propagation in 3D and 2D (P-SV and SH) viscoelastic media based on the finite difference method in local-to-regional scales. This code is equipped with a frequency-independent attenuation model based on the generalized Zener body and an efficient perfectly matched layer for absorbing boundary condition. A hybrid-style programming using OpenMP and the Message Passing Interface (MPI) is adopted for efficient parallel computation. OpenSWPC has wide applicability for seismological studies and great portability to allowing excellent performance from PC clusters to supercomputers. Without modifying the code, users can conduct seismic wave propagation simulations using their own velocity structure models and the necessary source representations by specifying them in an input parameter file. The code has various modes for different types of velocity structure model input and different source representations such as single force, moment tensor and plane-wave incidence, which can easily be selected via the input parameters. Widely used binary data formats, the Network Common Data Form (NetCDF) and the Seismic Analysis Code (SAC) are adopted for the input of the heterogeneous structure model and the outputs of the simulation results, so users can easily handle the input/output datasets. All codes are written in Fortran 2003 and are available with detailed documents in a public repository.[Figure not available: see fulltext.

  8. Reversible Shape Memory Polymers and Composites: Synthesis, Modeling and Design

    DTIC Science & Technology

    2013-03-01

    Polymer; and (iii) Development of a Shape Memory Assisted Self - Healing Polymer. Page 3 of 19 Mather/FA9550-09-1-0195 IV(i) Modeling and Model...0195 IV(iii) Development of a Shape Memory Assisted Self - Healing Polymer Erika D. Rodriguez, X. Luo, and P.T. Mather, “Linear and Crosslinked...Poly (ε- Caprolactone) Polymers for Shape Memory Assisted Self - Healing (SMASH),” ACS Applied Materials and Interfaces 3 152-161 (2011). Self

  9. A Probabilistic Model of Visual Working Memory: Incorporating Higher Order Regularities into Working Memory Capacity Estimates

    ERIC Educational Resources Information Center

    Brady, Timothy F.; Tenenbaum, Joshua B.

    2013-01-01

    When remembering a real-world scene, people encode both detailed information about specific objects and higher order information like the overall gist of the scene. However, formal models of change detection, like those used to estimate visual working memory capacity, assume observers encode only a simple memory representation that includes no…

  10. Investigating the Contribution of Procedural and Declarative Memory to the Acquisition of Past Tense Morphology: Evidence from Finnish

    ERIC Educational Resources Information Center

    Kidd, Evan; Kirjavainen, Minna

    2011-01-01

    The present paper reports on a study that investigated the role of procedural and declarative memory in the acquisition of Finnish past tense morphology. Two competing models were tested. Ullman's (2004) declarative/procedural model predicts that procedural memory supports the acquisition of regular morphology, whereas declarative memory supports…

  11. Working Memory and Strategy Use Contribute to Gender Differences in Spatial Ability

    ERIC Educational Resources Information Center

    Wang, Lu; Carr, Martha

    2014-01-01

    In this review, a new model that is grounded in information-processing theory is proposed to account for gender differences in spatial ability. The proposed model assumes that the relative strength of working memory, as expressed by the ratio of visuospatial working memory to verbal working memory, influences the type of strategies used on spatial…

  12. The Source of Adult Age Differences in Event-Based Prospective Memory: A Multinomial Modeling Approach

    ERIC Educational Resources Information Center

    Smith, Rebekah E.; Bayen, Ute J.

    2006-01-01

    Event-based prospective memory involves remembering to perform an action in response to a particular future event. Normal younger and older adults performed event-based prospective memory tasks in 2 experiments. The authors applied a formal multinomial processing tree model of prospective memory (Smith & Bayen, 2004) to disentangle age differences…

  13. Dynamic intersectoral models with power-law memory

    NASA Astrophysics Data System (ADS)

    Tarasova, Valentina V.; Tarasov, Vasily E.

    2018-01-01

    Intersectoral dynamic models with power-law memory are proposed. The equations of open and closed intersectoral models, in which the memory effects are described by the Caputo derivatives of non-integer orders, are derived. We suggest solutions of these equations, which have the form of linear combinations of the Mittag-Leffler functions and which are characterized by different effective growth rates. Examples of intersectoral dynamics with power-law memory are suggested for two sectoral cases. We formulate two principles of intersectoral dynamics with memory: the principle of changing of technological growth rates and the principle of domination change. It has been shown that in the input-output economic dynamics the effects of fading memory can change the economic growth rate and dominant behavior of economic sectors.

  14. Autobiographical memory development from an attachment perspective: the special role of negative events.

    PubMed

    Chae, Yoojin; Goodman, Gail S; Edelstein, Robin S

    2011-01-01

    The authors propose a novel model of autobiographical memory development that features the fundamental role of attachment orientations and negative life events. In the model, it is proposed that early autobiographical memory derives in part from the need to express and remember negative experiences, a need that has adaptive value, and that attachment orientations create individual differences in children's recollections of negative experiences. Specifically, the role of attachment in the processing of negative information is discussed in regard to the mnemonic stages of encoding, storage, and retrieval. This model sheds light on several areas of contradictory data in the memory development literature, such as concerning earliest memories and children's and adults' memory/suggestibility for stressful events.

  15. Can the hippocampus tell time? The temporo-septal engram shift model.

    PubMed

    Lytton, W W; Lipton, P

    1999-08-02

    An essential feature of episodic memory, the type of memory dependent on hippocampus, is that individual memories belong to particular moments in time. Recent PET studies suggest that memory encoding and recall occur at different locations in human hippocampus. Coupled with other attributes of hippocampus, this suggested to us that the septo-temporal hippocampal axis may play an important role in time perception. We propose a temporo-septal engram shift model of hippocampal memory. The model posits that memories gradually move along the hippocampus from a temporal encoding site to ever more septal sites from which they are recalled. We propose that the sense of time is encoded by the location of the engram along the temporo-septal axis.

  16. Modeling age differences in effects of pair repetition and proactive interference using a single parameter.

    PubMed

    Stephens, Joseph D W; Overman, Amy A

    2018-02-01

    In this article, we apply the REM model (Shiffrin & Steyvers, 1997) to age differences in associative memory. Using Criss and Shiffrin's (2005) associative version of REM, we show that in a task with pairs repeated across 2 study lists, older adults' reduced benefit of pair repetition can be produced by a general reduction in the diagnosticity of information stored in memory. This reduction can be modeled similarly well by reducing the overall distinctiveness of memory features, or by reducing the accuracy of memory encoding. We report a new experiment in which pairs are repeated across 3 study lists and extend the model accordingly. Finally, we extend the model to previously reported data using the same task paradigm, in which the use of a high-association strategy introduced proactive interference effects in young adults but not older adults. Reducing the diagnosticity of information in memory also reduces the proactive interference effect. Taken together, the modeling and empirical results reported here are consistent with the claim that some age differences that appear to be specific to associative information can be produced via general degradation of information stored in memory. The REM model provides a useful framework for examining age differences in memory as well as harmonizing seemingly conflicting prior modeling approaches for the associative deficit. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  17. An improved car-following model considering headway changes with memory

    NASA Astrophysics Data System (ADS)

    Yu, Shaowei; Shi, Zhongke

    2015-03-01

    To describe car-following behaviors in complex situations better, increase roadway traffic mobility and minimize cars' fuel consumptions, the linkage between headway changes with memory and car-following behaviors was explored with the field car-following data by using the gray correlation analysis method, and then an improved car-following model considering headway changes with memory on a single lane was proposed based on the full velocity difference model. Some numerical simulations were carried out by employing the improved car-following model to explore how headway changes with memory affected each car's velocity, acceleration, headway and fuel consumptions. The research results show that headway changes with memory have significant effects on car-following behaviors and fuel consumptions and that considering headway changes with memory in designing the adaptive cruise control strategy can improve the traffic flow stability and minimize cars' fuel consumptions.

  18. Working memory and intraindividual variability as neurocognitive indicators in ADHD: examining competing model predictions.

    PubMed

    Kofler, Michael J; Alderson, R Matt; Raiker, Joseph S; Bolden, Jennifer; Sarver, Dustin E; Rapport, Mark D

    2014-05-01

    The current study examined competing predictions of the default mode, cognitive neuroenergetic, and functional working memory models of attention-deficit/hyperactivity disorder (ADHD) regarding the relation between neurocognitive impairments in working memory and intraindividual variability. Twenty-two children with ADHD and 15 typically developing children were assessed on multiple tasks measuring intraindividual reaction time (RT) variability (ex-Gaussian: tau, sigma) and central executive (CE) working memory. Latent factor scores based on multiple, counterbalanced tasks were created for each construct of interest (CE, tau, sigma) to reflect reliable variance associated with each construct and remove task-specific, test-retest, and random error. Bias-corrected, bootstrapped mediation analyses revealed that CE working memory accounted for 88% to 100% of ADHD-related RT variability across models, and between-group differences in RT variability were no longer detectable after accounting for the mediating role of CE working memory. In contrast, RT variability accounted for 10% to 29% of between-group differences in CE working memory, and large magnitude CE working memory deficits remained after accounting for this partial mediation. Statistical comparison of effect size estimates across models suggests directionality of effects, such that the mediation effects of CE working memory on RT variability were significantly greater than the mediation effects of RT variability on CE working memory. The current findings question the role of RT variability as a primary neurocognitive indicator in ADHD and suggest that ADHD-related RT variability may be secondary to underlying deficits in CE working memory.

  19. Why are you telling me that? A conceptual model of the social function of autobiographical memory.

    PubMed

    Alea, Nicole; Bluck, Susan

    2003-03-01

    In an effort to stimulate and guide empirical work within a functional framework, this paper provides a conceptual model of the social functions of autobiographical memory (AM) across the lifespan. The model delineates the processes and variables involved when AMs are shared to serve social functions. Components of the model include: lifespan contextual influences, the qualitative characteristics of memory (emotionality and level of detail recalled), the speaker's characteristics (age, gender, and personality), the familiarity and similarity of the listener to the speaker, the level of responsiveness during the memory-sharing process, and the nature of the social relationship in which the memory sharing occurs (valence and length of the relationship). These components are shown to influence the type of social function served and/or, the extent to which social functions are served. Directions for future empirical work to substantiate the model and hypotheses derived from the model are provided.

  20. Measuring memory with the order of fractional derivative

    NASA Astrophysics Data System (ADS)

    Du, Maolin; Wang, Zaihua; Hu, Haiyan

    2013-12-01

    Fractional derivative has a history as long as that of classical calculus, but it is much less popular than it should be. What is the physical meaning of fractional derivative? This is still an open problem. In modeling various memory phenomena, we observe that a memory process usually consists of two stages. One is short with permanent retention, and the other is governed by a simple model of fractional derivative. With the numerical least square method, we show that the fractional model perfectly fits the test data of memory phenomena in different disciplines, not only in mechanics, but also in biology and psychology. Based on this model, we find that a physical meaning of the fractional order is an index of memory.

  1. Analysis of Memory Formation during General Anesthesia (Propofol/Remifentanil) for Elective Surgery Using the Process-dissociation Procedure.

    PubMed

    Hadzidiakos, Daniel; Horn, Nadja; Degener, Roland; Buchner, Axel; Rehberg, Benno

    2009-08-01

    There have been reports of memory formation during general anesthesia. The process-dissociation procedure has been used to determine if these are controlled (explicit/conscious) or automatic (implicit/unconscious) memories. This study used the process-dissociation procedure with the original measurement model and one which corrected for guessing to determine if more accurate results were obtained in this setting. A total of 160 patients scheduled for elective surgery were enrolled. Memory for words presented during propofol and remifentanil general anesthesia was tested postoperatively by using a word-stem completion task in a process-dissociation procedure. To assign possible memory effects to different levels of anesthetic depth, the authors measured depth of anesthesia using the BIS XP monitor (Aspect Medical Systems, Norwood, MA). Word-stem completion performance showed no evidence of memory for intraoperatively presented words. Nevertheless, an evaluation of these data using the original measurement model for process-dissociation data suggested an evidence of controlled (C = 0.05; 95% confidence interval [CI] 0.02-0.08) and automatic (A = 0.11; 95% CI 0.09-0.12) memory processes (P < 0.01). However, when the data were evaluated with an extended measurement model taking base rates into account adequately, no evidence for controlled (C = 0.00; 95% CI -0.04 to 0.04) or automatic (A = 0.00; 95% CI -0.02 to 0.02) memory processes was obtained. The authors report and discuss parallel findings for published data sets that were generated by using the process-dissociation procedure. Patients had no memories for auditory information presented during propofol/remifentanil anesthesia after midazolam premedication. The use of the process-dissociation procedure with the original measurement model erroneously detected memories, whereas the extended model, corrected for guessing, correctly revealed no memory.

  2. The New ISD: Applying Cognitive Strategies to Instructional Design.

    ERIC Educational Resources Information Center

    Clark, Ruth Colvin

    2002-01-01

    Discusses cognitive models of instruction that can help develop new models of Instructional Systems Design (ISD) that include cognitive task analysis to identify mental models; constructive assumptions of learning; working memory and long-term memory; retrieval of new knowledge and skills from long-term memory; and support of metacognitive skills.…

  3. Modeling Confidence and Response Time in Recognition Memory

    ERIC Educational Resources Information Center

    Ratcliff, Roger; Starns, Jeffrey J.

    2009-01-01

    A new model for confidence judgments in recognition memory is presented. In the model, the match between a single test item and memory produces a distribution of evidence, with better matches corresponding to distributions with higher means. On this match dimension, confidence criteria are placed, and the areas between the criteria under the…

  4. Neural circuit basis of visuo-spatial working memory precision: a computational and behavioral study.

    PubMed

    Almeida, Rita; Barbosa, João; Compte, Albert

    2015-09-01

    The amount of information that can be retained in working memory (WM) is limited. Limitations of WM capacity have been the subject of intense research, especially in trying to specify algorithmic models for WM. Comparatively, neural circuit perspectives have barely been used to test WM limitations in behavioral experiments. Here we used a neuronal microcircuit model for visuo-spatial WM (vsWM) to investigate memory of several items. The model assumes that there is a topographic organization of the circuit responsible for spatial memory retention. This assumption leads to specific predictions, which we tested in behavioral experiments. According to the model, nearby locations should be recalled with a bias, as if the two memory traces showed attraction or repulsion during the delay period depending on distance. Another prediction is that the previously reported loss of memory precision for an increasing number of memory items (memory load) should vanish when the distances between items are controlled for. Both predictions were confirmed experimentally. Taken together, our findings provide support for a topographic neural circuit organization of vsWM, they suggest that interference between similar memories underlies some WM limitations, and they put forward a circuit-based explanation that reconciles previous conflicting results on the dependence of WM precision with load. Copyright © 2015 the American Physiological Society.

  5. Age effects on explicit and implicit memory

    PubMed Central

    Ward, Emma V.; Berry, Christopher J.; Shanks, David R.

    2013-01-01

    It is well-documented that explicit memory (e.g., recognition) declines with age. In contrast, many argue that implicit memory (e.g., priming) is preserved in healthy aging. For example, priming on tasks such as perceptual identification is often not statistically different in groups of young and older adults. Such observations are commonly taken as evidence for distinct explicit and implicit learning/memory systems. In this article we discuss several lines of evidence that challenge this view. We describe how patterns of differential age-related decline may arise from differences in the ways in which the two forms of memory are commonly measured, and review recent research suggesting that under improved measurement methods, implicit memory is not age-invariant. Formal computational models are of considerable utility in revealing the nature of underlying systems. We report the results of applying single and multiple-systems models to data on age effects in implicit and explicit memory. Model comparison clearly favors the single-system view. Implications for the memory systems debate are discussed. PMID:24065942

  6. Identification of Nascent Memory CD8 T Cells and Modeling of Their Ontogeny.

    PubMed

    Crauste, Fabien; Mafille, Julien; Boucinha, Lilia; Djebali, Sophia; Gandrillon, Olivier; Marvel, Jacqueline; Arpin, Christophe

    2017-03-22

    Primary immune responses generate short-term effectors and long-term protective memory cells. The delineation of the genealogy linking naive, effector, and memory cells has been complicated by the lack of phenotypes discriminating effector from memory differentiation stages. Using transcriptomics and phenotypic analyses, we identify Bcl2 and Mki67 as a marker combination that enables the tracking of nascent memory cells within the effector phase. We then use a formal approach based on mathematical models describing the dynamics of population size evolution to test potential progeny links and demonstrate that most cells follow a linear naive→early effector→late effector→memory pathway. Moreover, our mathematical model allows long-term prediction of memory cell numbers from a few early experimental measurements. Our work thus provides a phenotypic means to identify effector and memory cells, as well as a mathematical framework to investigate their genealogy and to predict the outcome of immunization regimens in terms of memory cell numbers generated. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  7. Multiple Memory Systems Are Unnecessary to Account for Infant Memory Development: An Ecological Model

    PubMed Central

    Rovee-Collier, Carolyn; Cuevas, Kimberly

    2009-01-01

    How the memory of adults evolves from the memory abilities of infants is a central problem in cognitive development. The popular solution holds that the multiple memory systems of adults mature at different rates during infancy. The early-maturing system (implicit or nondeclarative memory) functions automatically from birth, whereas the late-maturing system (explicit or declarative memory) functions intentionally, with awareness, from late in the first year. Data are presented from research on deferred imitation, sensory preconditioning, potentiation, and context for which this solution cannot account and present an alternative model that eschews the need for multiple memory systems. The ecological model of infant memory development (N. E. Spear, 1984) holds that members of all species are perfectly adapted to their niche at each point in ontogeny and exhibit effective, evolutionarily selected solutions to whatever challenges each new niche poses. Because adults and infants occupy different niches, what they perceive, learn, and remember about the same event differs, but their raw capacity to learn and remember does not. PMID:19209999

  8. A critical evaluation of monkey models of amnesia and dementia.

    PubMed

    Ridley, R M; Baker, H F

    1991-01-01

    In this review we consider various models of amnesia and dementia in monkeys and examine the validity of such models. In Section 2 we describe the various types of memory tests (tasks) available for use with monkeys and discuss the extent to which these tasks assess different facets of memory according to present theories of human memory. We argue that the rules which govern correct task performance are best regarded as a form of semantic rather than procedural memory, and that when information about stimulus attributes or reward associations is stored long-term then that knowledge is semantic. The demonstration of episodic memory in monkeys is problematic and the term recognition memory has been used too loosely. In particular, it is difficult to dissociate episodic memory for stimulus events from the use of semantic memory for the rule of the task, since dysfunction of either can produce impairment on performance of the same task. Tasks can also be divided into those which assess memory for stimulus-reward associations (evaluative memory) and those which tax stimulus-response associations including spatial and conditional responding (non-evaluative memory). This dissociation cuts across the distinction between semantic and episodic memory. In Section 3 we examine the usefulness of the classification of tasks described in Section 2 in clarifying our understanding of the contribution of the temporal lobes and the cholinergic system to memory. We conclude that evaluative and non-evaluative memory are mediated by separate parallel systems involving the amygdala and hippocampus, respectively.

  9. Dynamic interactions between visual working memory and saccade target selection

    PubMed Central

    Schneegans, Sebastian; Spencer, John P.; Schöner, Gregor; Hwang, Seongmin; Hollingworth, Andrew

    2014-01-01

    Recent psychophysical experiments have shown that working memory for visual surface features interacts with saccadic motor planning, even in tasks where the saccade target is unambiguously specified by spatial cues. Specifically, a match between a memorized color and the color of either the designated target or a distractor stimulus influences saccade target selection, saccade amplitudes, and latencies in a systematic fashion. To elucidate these effects, we present a dynamic neural field model in combination with new experimental data. The model captures the neural processes underlying visual perception, working memory, and saccade planning relevant to the psychophysical experiment. It consists of a low-level visual sensory representation that interacts with two separate pathways: a spatial pathway implementing spatial attention and saccade generation, and a surface feature pathway implementing color working memory and feature attention. Due to bidirectional coupling between visual working memory and feature attention in the model, the working memory content can indirectly exert an effect on perceptual processing in the low-level sensory representation. This in turn biases saccadic movement planning in the spatial pathway, allowing the model to quantitatively reproduce the observed interaction effects. The continuous coupling between representations in the model also implies that modulation should be bidirectional, and model simulations provide specific predictions for complementary effects of saccade target selection on visual working memory. These predictions were empirically confirmed in a new experiment: Memory for a sample color was biased toward the color of a task-irrelevant saccade target object, demonstrating the bidirectional coupling between visual working memory and perceptual processing. PMID:25228628

  10. Properties of a memory network in psychology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wedemann, Roseli S.; Donangelo, Raul; Carvalho, Luis A. V. de

    We have previously described neurotic psychopathology and psychoanalytic working-through by an associative memory mechanism, based on a neural network model, where memory was modelled by a Boltzmann machine (BM). Since brain neural topology is selectively structured, we simulated known microscopic mechanisms that control synaptic properties, showing that the network self-organizes to a hierarchical, clustered structure. Here, we show some statistical mechanical properties of the complex networks which result from this self-organization. They indicate that a generalization of the BM may be necessary to model memory.

  11. Properties of a memory network in psychology

    NASA Astrophysics Data System (ADS)

    Wedemann, Roseli S.; Donangelo, Raul; de Carvalho, Luís A. V.

    2007-12-01

    We have previously described neurotic psychopathology and psychoanalytic working-through by an associative memory mechanism, based on a neural network model, where memory was modelled by a Boltzmann machine (BM). Since brain neural topology is selectively structured, we simulated known microscopic mechanisms that control synaptic properties, showing that the network self-organizes to a hierarchical, clustered structure. Here, we show some statistical mechanical properties of the complex networks which result from this self-organization. They indicate that a generalization of the BM may be necessary to model memory.

  12. A direct comparison of popular models of normal memory loss and Alzheimer's disease in samples of African Americans, Mexican Americans, and refugees and immigrants from the former Soviet Union.

    PubMed

    Schrauf, Robert W; Iris, Madelyn

    2011-04-01

    To understand how people differentiate normal memory loss from Alzheimer's disease (AD) by investigating cultural models of these conditions. Ethnographic interviews followed by a survey. Cultural consensus analysis was used to test for the presence of group models, derive the "culturally correct" set of beliefs, and compare models of normal memory loss and AD. Chicago, Illinois. One hundred eight individuals from local neighborhoods: African Americans, Mexican Americans, and refugees and immigrants from the former Soviet Union. Participants responded to yes-or-no questions about the nature and causes of normal memory loss and AD and provided information on ethnicity, age, sex, acculturation, and experience with AD. Groups held a common model of AD as a brain-based disease reflecting irreversible cognitive decline. Higher levels of acculturation predicted greater knowledge of AD. Russian speakers favored biological over psychological models of the disease. Groups also held a common model of normal memory loss, including the important belief that "normal" forgetting involves eventual recall of the forgotten material. Popular models of memory loss and AD confirm that patients and clinicians are speaking the same "language" in their discussions of memory loss and AD. Nevertheless, the presence of coherent models of memory loss and AD, and the unequal distribution of that knowledge across groups, suggests that clinicians should include wider circles of patients' families and friends in their consultations. These results frame knowledge as distributed across social groups rather than simply the possession of individual minds. © 2011, Copyright the Authors. Journal compilation © 2011, The American Geriatrics Society.

  13. Spatial memory tasks in rodents: what do they model?

    PubMed

    Morellini, Fabio

    2013-10-01

    The analysis of spatial learning and memory in rodents is commonly used to investigate the mechanisms underlying certain forms of human cognition and to model their dysfunction in neuropsychiatric and neurodegenerative diseases. Proper interpretation of rodent behavior in terms of spatial memory and as a model of human cognitive functions is only possible if various navigation strategies and factors controlling the performance of the animal in a spatial task are taken into consideration. The aim of this review is to describe the experimental approaches that are being used for the study of spatial memory in rats and mice and the way that they can be interpreted in terms of general memory functions. After an introduction to the classification of memory into various categories and respective underlying neuroanatomical substrates, I explain the concept of spatial memory and its measurement in rats and mice by analysis of their navigation strategies. Subsequently, I describe the most common paradigms for spatial memory assessment with specific focus on methodological issues relevant for the correct interpretation of the results in terms of cognitive function. Finally, I present recent advances in the use of spatial memory tasks to investigate episodic-like memory in mice.

  14. Some Surprising Findings on the Involvement of the Parietal Lobe in Human Memory

    PubMed Central

    Olson, Ingrid R.; Berryhill, Marian

    2009-01-01

    The posterior parietal lobe is known to play some role in a far-flung list of mental processes: linking vision to action (saccadic eye movements, reaching, grasping), attending to visual space, numerical calculation, and mental rotation. Here we review findings from humans and monkeys that illuminate an untraditional function of this region: memory. Our review draws on neuroimaging findings that have repeatedly identified parietal lobe activations associated with short-term or working memory and episodic memory. We also discuss recent neuropsychological findings showing that individuals with parietal lobe damage exhibit both working memory and long-term memory deficits. These deficits are not ubiquitous; they are only evident under certain retrieval demands. Our review elaborates on these findings and evaluates various theories about the mechanistic role of the posterior parietal lobe in memory. The available data point towards the conclusion that the posterior parietal lobe plays an important role in memory retrieval irrespective of elapsed time. The two models that are best supported by existing data are the Attention to Memory Model and the Subjective Memory Model. We conclude by formalizing several open questions that are intended to encourage future research. PMID:18848635

  15. Dissociation of item and source memory in rhesus monkeys.

    PubMed

    Basile, Benjamin M; Hampton, Robert R

    2017-09-01

    Source memory, or memory for the context in which a memory was formed, is a defining characteristic of human episodic memory and source memory errors are a debilitating symptom of memory dysfunction. Evidence for source memory in nonhuman primates is sparse despite considerable evidence for other types of sophisticated memory and the practical need for good models of episodic memory in nonhuman primates. A previous study showed that rhesus monkeys confused the identity of a monkey they saw with a monkey they heard, but only after an extended memory delay. This suggests that they initially remembered the source - visual or auditory - of the information but forgot the source as time passed. Here, we present a monkey model of source memory that is based on this previous study. In each trial, monkeys studied two images, one that they simply viewed and touched and the other that they classified as a bird, fish, flower, or person. In a subsequent memory test, they were required to select the image from one source but avoid the other. With training, monkeys learned to suppress responding to images from the to-be-avoided source. After longer memory intervals, monkeys continued to show reliable item memory, discriminating studied images from distractors, but made many source memory errors. Monkeys discriminated source based on study method, not study order, providing preliminary evidence that our manipulation of retention interval caused errors due to source forgetting instead of source confusion. Finally, some monkeys learned to select remembered images from either source on cue, showing that they did indeed remember both items and both sources. This paradigm potentially provides a new model to study a critical aspect of episodic memory in nonhuman primates. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Resonator memories and optical novelty filters

    NASA Astrophysics Data System (ADS)

    Anderson, Dana Z.; Erle, Marie C.

    Optical resonators having holographic elements are potential candidates for storing information that can be accessed through content addressable or associative recall. Closely related to the resonator memory is the optical novelty filter, which can detect the differences between a test object and a set of reference objects. We discuss implementations of these devices using continuous optical media such as photorefractive materials. The discussion is framed in the context of neural network models. There are both formal and qualitative similarities between the resonator memory and optical novelty filter and network models. Mode competition arises in the theory of the resonator memory, much as it does in some network models. We show that the role of the phenomena of "daydreaming" in the real-time programmable optical resonator is very much akin to the role of "unlearning" in neural network memories. The theory of programming the real-time memory for a single mode is given in detail. This leads to a discussion of the optical novelty filter. Experimental results for the resonator memory, the real-time programmable memory, and the optical tracking novelty filter are reviewed. We also point to several issues that need to be addressed in order to implement more formal models of neural networks.

  17. Resonator Memories And Optical Novelty Filters

    NASA Astrophysics Data System (ADS)

    Anderson, Dana Z.; Erie, Marie C.

    1987-05-01

    Optical resonators having holographic elements are potential candidates for storing information that can be accessed through content-addressable or associative recall. Closely related to the resonator memory is the optical novelty filter, which can detect the differences between a test object and a set of reference objects. We discuss implementations of these devices using continuous optical media such as photorefractive ma-terials. The discussion is framed in the context of neural network models. There are both formal and qualitative similarities between the resonator memory and optical novelty filter and network models. Mode competition arises in the theory of the resonator memory, much as it does in some network models. We show that the role of the phenomena of "daydream-ing" in the real-time programmable optical resonator is very much akin to the role of "unlearning" in neural network memories. The theory of programming the real-time memory for a single mode is given in detail. This leads to a discussion of the optical novelty filter. Experimental results for the resonator memory, the real-time programmable memory, and the optical tracking novelty filter are reviewed. We also point to several issues that need to be addressed in order to implement more formal models of neural networks.

  18. Constitutive modeling of glassy shape memory polymers

    NASA Astrophysics Data System (ADS)

    Khanolkar, Mahesh

    The aim of this research is to develop constitutive models for non-linear materials. Here, issues related for developing constitutive model for glassy shape memory polymers are addressed in detail. Shape memory polymers are novel material that can be easily formed into complex shapes, retaining memory of their original shape even after undergoing large deformations. The temporary shape is stable and return to the original shape is triggered by a suitable mechanism such heating the polymer above a transition temperature. Glassy shape memory polymers are called glassy because the temporary shape is fixed by the formation of a glassy solid, while return to the original shape is due to the melting of this glassy phase. The constitutive model has been developed to capture the thermo-mechanical behavior of glassy shape memory polymers using elements of nonlinear mechanics and polymer physics. The key feature of this framework is that a body can exist stress free in numerous natural configurations, the underlying natural configuration of the body changing during the process, with the response of the body being elastic from these evolving natural configurations. The aim of this research is to formulate a constitutive model for glassy shape memory polymers (GSMP) which takes in to account the fact that the stress-strain response depends on thermal expansion of polymers. The model developed is for the original amorphous phase, the temporary glassy phase and transition between these phases. The glass transition process has been modeled using a framework that was developed recently for studying crystallization in polymers and is based on the theory of multiple natural configurations. Using the same frame work, the melting of the glassy phase to capture the return of the polymer to its original shape is also modeled. The effect of nanoreinforcement on the response of shape memory polymers (GSMP) is studied and a model is developed. In addition to modeling and solving boundary value problems for GSMP's, problems of importance for CSMP, specifically a shape memory cycle (Torsion of a Cylinder) is solved using the developed crystallizable shape memory polymer model. To solve complex boundary value problems in realistic geometries a user material subroutine (UMAT) for GSMP model has been developed for use in conjunction with the commercial finite element software ABAQUS. The accuracy of the UMAT has been verified by testing it against problems for which the results are known.

  19. Multiple memory systems as substrates for multiple decision systems

    PubMed Central

    Doll, Bradley B.; Shohamy, Daphna; Daw, Nathaniel D.

    2014-01-01

    It has recently become widely appreciated that value-based decision making is supported by multiple computational strategies. In particular, animal and human behavior in learning tasks appears to include habitual responses described by prominent model-free reinforcement learning (RL) theories, but also more deliberative or goal-directed actions that can be characterized by a different class of theories, model-based RL. The latter theories evaluate actions by using a representation of the contingencies of the task (as with a learned map of a spatial maze), called an “internal model.” Given the evidence of behavioral and neural dissociations between these approaches, they are often characterized as dissociable learning systems, though they likely interact and share common mechanisms. In many respects, this division parallels a longstanding dissociation in cognitive neuroscience between multiple memory systems, describing, at the broadest level, separate systems for declarative and procedural learning. Procedural learning has notable parallels with model-free RL: both involve learning of habits and both are known to depend on parts of the striatum. Declarative memory, by contrast, supports memory for single events or episodes and depends on the hippocampus. The hippocampus is thought to support declarative memory by encoding temporal and spatial relations among stimuli and thus is often referred to as a relational memory system. Such relational encoding is likely to play an important role in learning an internal model, the representation that is central to model-based RL. Thus, insofar as the memory systems represent more general-purpose cognitive mechanisms that might subserve performance on many sorts of tasks including decision making, these parallels raise the question whether the multiple decision systems are served by multiple memory systems, such that one dissociation is grounded in the other. Here we investigated the relationship between model-based RL and relational memory by comparing individual differences across behavioral tasks designed to measure either capacity. Human subjects performed two tasks, a learning and generalization task (acquired equivalence) which involves relational encoding and depends on the hippocampus; and a sequential RL task that could be solved by either a model-based or model-free strategy. We assessed the correlation between subjects’ use of flexible, relational memory, as measured by generalization in the acquired equivalence task, and their differential reliance on either RL strategy in the decision task. We observed a significant positive relationship between generalization and model-based, but not model-free, choice strategies. These results are consistent with the hypothesis that model-based RL, like acquired equivalence, relies on a more general-purpose relational memory system. PMID:24846190

  20. The Role of Short-Term Memory in Operator Workload

    DTIC Science & Technology

    1988-04-01

    Craik , F.I. and Lockhart , R.S., 1972, Levels of processing : A framework for memory research...examples of postcategorical selction models. Precategorical Selection Models Sperling’s Model. Unlike Craik and Lockhart’s levels -of- processing model...subdivided to contain a primary memory unit and a mechanism for the direction of conscious attention. Levels -of- Processing . Craik and Lockhart’s levels

  1. Fast associative memory + slow neural circuitry = the computational model of the brain.

    NASA Astrophysics Data System (ADS)

    Berkovich, Simon; Berkovich, Efraim; Lapir, Gennady

    1997-08-01

    We propose a computational model of the brain based on a fast associative memory and relatively slow neural processors. In this model, processing time is expensive but memory access is not, and therefore most algorithmic tasks would be accomplished by using large look-up tables as opposed to calculating. The essential feature of an associative memory in this context (characteristic for a holographic type memory) is that it works without an explicit mechanism for resolution of multiple responses. As a result, the slow neuronal processing elements, overwhelmed by the flow of information, operate as a set of templates for ranking of the retrieved information. This structure addresses the primary controversy in the brain architecture: distributed organization of memory vs. localization of processing centers. This computational model offers an intriguing explanation of many of the paradoxical features in the brain architecture, such as integration of sensors (through DMA mechanism), subliminal perception, universality of software, interrupts, fault-tolerance, certain bizarre possibilities for rapid arithmetics etc. In conventional computer science the presented type of a computational model did not attract attention as it goes against the technological grain by using a working memory faster than processing elements.

  2. Comparing single- and dual-process models of memory development.

    PubMed

    Hayes, Brett K; Dunn, John C; Joubert, Amy; Taylor, Robert

    2017-11-01

    This experiment examined single-process and dual-process accounts of the development of visual recognition memory. The participants, 6-7-year-olds, 9-10-year-olds and adults, were presented with a list of pictures which they encoded under shallow or deep conditions. They then made recognition and confidence judgments about a list containing old and new items. We replicated the main trends reported by Ghetti and Angelini () in that recognition hit rates increased from 6 to 9 years of age, with larger age changes following deep than shallow encoding. Formal versions of the dual-process high threshold signal detection model and several single-process models (equal variance signal detection, unequal variance signal detection, mixture signal detection) were fit to the developmental data. The unequal variance and mixture signal detection models gave a better account of the data than either of the other models. A state-trace analysis found evidence for only one underlying memory process across the age range tested. These results suggest that single-process memory models based on memory strength are a viable alternative to dual-process models for explaining memory development. © 2016 John Wiley & Sons Ltd.

  3. Two Unipolar Terminal-Attractor-Based Associative Memories

    NASA Technical Reports Server (NTRS)

    Liu, Hua-Kuang; Wu, Chwan-Hwa

    1995-01-01

    Two unipolar mathematical models of electronic neural network functioning as terminal-attractor-based associative memory (TABAM) developed. Models comprise sets of equations describing interactions between time-varying inputs and outputs of neural-network memory, regarded as dynamical system. Simplifies design and operation of optoelectronic processor to implement TABAM performing associative recall of images. TABAM concept described in "Optoelectronic Terminal-Attractor-Based Associative Memory" (NPO-18790). Experimental optoelectronic apparatus that performed associative recall of binary images described in "Optoelectronic Inner-Product Neural Associative Memory" (NPO-18491).

  4. SONOS Nonvolatile Memory Cell Programming Characteristics

    NASA Technical Reports Server (NTRS)

    MacLeod, Todd C.; Phillips, Thomas A.; Ho, Fat D.

    2010-01-01

    Silicon-oxide-nitride-oxide-silicon (SONOS) nonvolatile memory is gaining favor over conventional EEPROM FLASH memory technology. This paper characterizes the SONOS write operation using a nonquasi-static MOSFET model. This includes floating gate charge and voltage characteristics as well as tunneling current, voltage threshold and drain current characterization. The characterization of the SONOS memory cell predicted by the model closely agrees with experimental data obtained from actual SONOS memory cells. The tunnel current, drain current, threshold voltage and read drain current all closely agreed with empirical data.

  5. Working memory load modulation of parieto-frontal connections: evidence from dynamic causal modeling

    PubMed Central

    Ma, Liangsuo; Steinberg, Joel L.; Hasan, Khader M.; Narayana, Ponnada A.; Kramer, Larry A.; Moeller, F. Gerard

    2011-01-01

    Previous neuroimaging studies have shown that working memory load has marked effects on regional neural activation. However, the mechanism through which working memory load modulates brain connectivity is still unclear. In this study, this issue was addressed using dynamic causal modeling (DCM) based on functional magnetic resonance imaging (fMRI) data. Eighteen normal healthy subjects were scanned while they performed a working memory task with variable memory load, as parameterized by two levels of memory delay and three levels of digit load (number of digits presented in each visual stimulus). Eight regions of interest, i.e., bilateral middle frontal gyrus (MFG), anterior cingulate cortex (ACC), inferior frontal cortex (IFC), and posterior parietal cortex (PPC), were chosen for DCM analyses. Analysis of the behavioral data during the fMRI scan revealed that accuracy decreased as digit load increased. Bayesian inference on model structure indicated that a bilinear DCM in which memory delay was the driving input to bilateral PPC and in which digit load modulated several parieto-frontal connections was the optimal model. Analysis of model parameters showed that higher digit load enhanced connection from L PPC to L IFC, and lower digit load inhibited connection from R PPC to L ACC. These findings suggest that working memory load modulates brain connectivity in a parieto-frontal network, and may reflect altered neuronal processes, e.g., information processing or error monitoring, with the change in working memory load. PMID:21692148

  6. Cognitive and behavioral evaluation of nutritional interventions in rodent models of brain aging and dementia

    PubMed Central

    Wahl, Devin; Coogan, Sean CP; Solon-Biet, Samantha M; de Cabo, Rafael; Haran, James B; Raubenheimer, David; Cogger, Victoria C; Mattson, Mark P; Simpson, Stephen J; Le Couteur, David G

    2017-01-01

    Evaluation of behavior and cognition in rodent models underpins mechanistic and interventional studies of brain aging and neurodegenerative diseases, especially dementia. Commonly used tests include Morris water maze, Barnes maze, object recognition, fear conditioning, radial arm water maze, and Y maze. Each of these tests reflects some aspects of human memory including episodic memory, recognition memory, semantic memory, spatial memory, and emotional memory. Although most interventional studies in rodent models of dementia have focused on pharmacological agents, there are an increasing number of studies that have evaluated nutritional interventions including caloric restriction, intermittent fasting, and manipulation of macronutrients. Dietary interventions have been shown to influence various cognitive and behavioral tests in rodents indicating that nutrition can influence brain aging and possibly neurodegeneration. PMID:28932108

  7. The ASC Sequoia Programming Model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seager, M

    2008-08-06

    In the late 1980's and early 1990's, Lawrence Livermore National Laboratory was deeply engrossed in determining the next generation programming model for the Integrated Design Codes (IDC) beyond vectorization for the Cray 1s series of computers. The vector model, developed in mid 1970's first for the CDC 7600 and later extended from stack based vector operation to memory to memory operations for the Cray 1s, lasted approximately 20 years (See Slide 5). The Cray vector era was deemed an extremely long lived era as it allowed vector codes to be developed over time (the Cray 1s were faster in scalarmore » mode than the CDC 7600) with vector unit utilization increasing incrementally over time. The other attributes of the Cray vector era at LLNL were that we developed, supported and maintained the Operating System (LTSS and later NLTSS), communications protocols (LINCS), Compilers (Civic Fortran77 and Model), operating system tools (e.g., batch system, job control scripting, loaders, debuggers, editors, graphics utilities, you name it) and math and highly machine optimized libraries (e.g., SLATEC, and STACKLIB). Although LTSS was adopted by Cray for early system generations, they later developed COS and UNICOS operating systems and environment on their own. In the late 1970s and early 1980s two trends appeared that made the Cray vector programming model (described above including both the hardware and system software aspects) seem potentially dated and slated for major revision. These trends were the appearance of low cost CMOS microprocessors and their attendant, departmental and mini-computers and later workstations and personal computers. With the wide spread adoption of Unix in the early 1980s, it appeared that LLNL (and the other DOE Labs) would be left out of the mainstream of computing without a rapid transition to these 'Killer Micros' and modern OS and tools environments. The other interesting advance in the period is that systems were being developed with multiple 'cores' in them and called Symmetric Multi-Processor or Shared Memory Processor (SMP) systems. The parallel revolution had begun. The Laboratory started a small 'parallel processing project' in 1983 to study the new technology and its application to scientific computing with four people: Tim Axelrod, Pete Eltgroth, Paul Dubois and Mark Seager. Two years later, Eugene Brooks joined the team. This team focused on Unix and 'killer micro' SMPs. Indeed, Eugene Brooks was credited with coming up with the 'Killer Micro' term. After several generations of SMP platforms (e.g., Sequent Balance 8000 with 8 33MHz MC32032s, Allian FX8 with 8 MC68020 and FPGA based Vector Units and finally the BB&N Butterfly with 128 cores), it became apparent to us that the killer micro revolution would indeed take over Crays and that we definitely needed a new programming and systems model. The model developed by Mark Seager and Dale Nielsen focused on both the system aspects (Slide 3) and the code development aspects (Slide 4). Although now succinctly captured in two attached slides, at the time there was tremendous ferment in the research community as to what parallel programming model would emerge, dominate and survive. In addition, we wanted a model that would provide portability between platforms of a single generation but also longevity over multiple--and hopefully--many generations. Only after we developed the 'Livermore Model' and worked it out in considerable detail did it become obvious that what we came up with was the right approach. In a nutshell, the applications programming model of the Livermore Model posited that SMP parallelism would ultimately not scale indefinitely and one would have to bite the bullet and implement MPI parallelism within the Integrated Design Code (IDC). We also had a major emphasis on doing everything in a completely standards based, portable methodology with POSIX/Unix as the target environment. We decided against specialized libraries like STACKLIB for performance, but kept as many general purpose, portable math libraries as were needed by the codes. Third, we assumed that the SMPs in clusters would evolve in time to become more powerful, feature rich and, in particular, offer more cores. Thus, we focused on OpenMP, and POSIX PThreads for programming SMP parallelism. These code porting efforts were lead by Dale Nielsen, A-Division code group leader, and Randy Christensen, B-Division code group leader. Most of the porting effort revolved removing 'Crayisms' in the codes: artifacts of LTSS/NLTSS, Civic compiler extensions beyond Fortran77, IO libraries and dealing with new code control languages (we switched to Perl and later to Python). Adding MPI to the codes was initially problematic and error prone because the programmers used MPI directly and sprinkled the calls throughout the code.« less

  8. OpenMP GNU and Intel Fortran programs for solving the time-dependent Gross-Pitaevskii equation

    NASA Astrophysics Data System (ADS)

    Young-S., Luis E.; Muruganandam, Paulsamy; Adhikari, Sadhan K.; Lončar, Vladimir; Vudragović, Dušan; Balaž, Antun

    2017-11-01

    We present Open Multi-Processing (OpenMP) version of Fortran 90 programs for solving the Gross-Pitaevskii (GP) equation for a Bose-Einstein condensate in one, two, and three spatial dimensions, optimized for use with GNU and Intel compilers. We use the split-step Crank-Nicolson algorithm for imaginary- and real-time propagation, which enables efficient calculation of stationary and non-stationary solutions, respectively. The present OpenMP programs are designed for computers with multi-core processors and optimized for compiling with both commercially-licensed Intel Fortran and popular free open-source GNU Fortran compiler. The programs are easy to use and are elaborated with helpful comments for the users. All input parameters are listed at the beginning of each program. Different output files provide physical quantities such as energy, chemical potential, root-mean-square sizes, densities, etc. We also present speedup test results for new versions of the programs. Program files doi:http://dx.doi.org/10.17632/y8zk3jgn84.2 Licensing provisions: Apache License 2.0 Programming language: OpenMP GNU and Intel Fortran 90. Computer: Any multi-core personal computer or workstation with the appropriate OpenMP-capable Fortran compiler installed. Number of processors used: All available CPU cores on the executing computer. Journal reference of previous version: Comput. Phys. Commun. 180 (2009) 1888; ibid.204 (2016) 209. Does the new version supersede the previous version?: Not completely. It does supersede previous Fortran programs from both references above, but not OpenMP C programs from Comput. Phys. Commun. 204 (2016) 209. Nature of problem: The present Open Multi-Processing (OpenMP) Fortran programs, optimized for use with commercially-licensed Intel Fortran and free open-source GNU Fortran compilers, solve the time-dependent nonlinear partial differential (GP) equation for a trapped Bose-Einstein condensate in one (1d), two (2d), and three (3d) spatial dimensions for six different trap symmetries: axially and radially symmetric traps in 3d, circularly symmetric traps in 2d, fully isotropic (spherically symmetric) and fully anisotropic traps in 2d and 3d, as well as 1d traps, where no spatial symmetry is considered. Solution method: We employ the split-step Crank-Nicolson algorithm to discretize the time-dependent GP equation in space and time. The discretized equation is then solved by imaginary- or real-time propagation, employing adequately small space and time steps, to yield the solution of stationary and non-stationary problems, respectively. Reasons for the new version: Previously published Fortran programs [1,2] have now become popular tools [3] for solving the GP equation. These programs have been translated to the C programming language [4] and later extended to the more complex scenario of dipolar atoms [5]. Now virtually all computers have multi-core processors and some have motherboards with more than one physical computer processing unit (CPU), which may increase the number of available CPU cores on a single computer to several tens. The C programs have been adopted to be very fast on such multi-core modern computers using general-purpose graphic processing units (GPGPU) with Nvidia CUDA and computer clusters using Message Passing Interface (MPI) [6]. Nevertheless, previously developed Fortran programs are also commonly used for scientific computation and most of them use a single CPU core at a time in modern multi-core laptops, desktops, and workstations. Unless the Fortran programs are made aware and capable of making efficient use of the available CPU cores, the solution of even a realistic dynamical 1d problem, not to mention the more complicated 2d and 3d problems, could be time consuming using the Fortran programs. Previously, we published auto-parallel Fortran programs [2] suitable for Intel (but not GNU) compiler for solving the GP equation. Hence, a need for the full OpenMP version of the Fortran programs to reduce the execution time cannot be overemphasized. To address this issue, we provide here such OpenMP Fortran programs, optimized for both Intel and GNU Fortran compilers and capable of using all available CPU cores, which can significantly reduce the execution time. Summary of revisions: Previous Fortran programs [1] for solving the time-dependent GP equation in 1d, 2d, and 3d with different trap symmetries have been parallelized using the OpenMP interface to reduce the execution time on multi-core processors. There are six different trap symmetries considered, resulting in six programs for imaginary-time propagation and six for real-time propagation, totaling to 12 programs included in BEC-GP-OMP-FOR software package. All input data (number of atoms, scattering length, harmonic oscillator trap length, trap anisotropy, etc.) are conveniently placed at the beginning of each program, as before [2]. Present programs introduce a new input parameter, which is designated by Number_of_Threads and defines the number of CPU cores of the processor to be used in the calculation. If one sets the value 0 for this parameter, all available CPU cores will be used. For the most efficient calculation it is advisable to leave one CPU core unused for the background system's jobs. For example, on a machine with 20 CPU cores such that we used for testing, it is advisable to use up to 19 CPU cores. However, the total number of used CPU cores can be divided into more than one job. For instance, one can run three simulations simultaneously using 10, 4, and 5 CPU cores, respectively, thus totaling to 19 used CPU cores on a 20-core computer. The Fortran source programs are located in the directory src, and can be compiled by the make command using the makefile in the root directory BEC-GP-OMP-FOR of the software package. The examples of produced output files can be found in the directory output, although some large density files are omitted, to save space. The programs calculate the values of actually used dimensionless nonlinearities from the physical input parameters, where the input parameters correspond to the identical nonlinearity values as in the previously published programs [1], so that the output files of the old and new programs can be directly compared. The output files are conveniently named such that their contents can be easily identified, following the naming convention introduced in Ref. [2]. For example, a file named -out.txt, where is a name of the individual program, represents the general output file containing input data, time and space steps, nonlinearity, energy and chemical potential, and was named fort.7 in the old Fortran version of programs [1]. A file named -den.txt is the output file with the condensate density, which had the names fort.3 and fort.4 in the old Fortran version [1] for imaginary- and real-time propagation programs, respectively. Other possible density outputs, such as the initial density, are commented out in the programs to have a simpler set of output files, but users can uncomment and re-enable them, if needed. In addition, there are output files for reduced (integrated) 1d and 2d densities for different programs. In the real-time programs there is also an output file reporting the dynamics of evolution of root-mean-square sizes after a perturbation is introduced. The supplied real-time programs solve the stationary GP equation, and then calculate the dynamics. As the imaginary-time programs are more accurate than the real-time programs for the solution of a stationary problem, one can first solve the stationary problem using the imaginary-time programs, adapt the real-time programs to read the pre-calculated wave function and then study the dynamics. In that case the parameter NSTP in the real-time programs should be set to zero and the space mesh and nonlinearity parameters should be identical in both programs. The reader is advised to consult our previous publication where a complete description of the output files is given [2]. A readme.txt file, included in the root directory, explains the procedure to compile and run the programs. We tested our programs on a workstation with two 10-core Intel Xeon E5-2650 v3 CPUs. The parameters used for testing are given in sample input files, provided in the corresponding directory together with the programs. In Table 1 we present wall-clock execution times for runs on 1, 6, and 19 CPU cores for programs compiled using Intel and GNU Fortran compilers. The corresponding columns "Intel speedup" and "GNU speedup" give the ratio of wall-clock execution times of runs on 1 and 19 CPU cores, and denote the actual measured speedup for 19 CPU cores. In all cases and for all numbers of CPU cores, although the GNU Fortran compiler gives excellent results, the Intel Fortran compiler turns out to be slightly faster. Note that during these tests we always ran only a single simulation on a workstation at a time, to avoid any possible interference issues. Therefore, the obtained wall-clock times are more reliable than the ones that could be measured with two or more jobs running simultaneously. We also studied the speedup of the programs as a function of the number of CPU cores used. The performance of the Intel and GNU Fortran compilers is illustrated in Fig. 1, where we plot the speedup and actual wall-clock times as functions of the number of CPU cores for 2d and 3d programs. We see that the speedup increases monotonically with the number of CPU cores in all cases and has large values (between 10 and 14 for 3d programs) for the maximal number of cores. This fully justifies the development of OpenMP programs, which enable much faster and more efficient solving of the GP equation. However, a slow saturation in the speedup with the further increase in the number of CPU cores is observed in all cases, as expected. The speedup tends to increase for programs in higher dimensions, as they become more complex and have to process more data. This is why the speedups of the supplied 2d and 3d programs are larger than those of 1d programs. Also, for a single program the speedup increases with the size of the spatial grid, i.e., with the number of spatial discretization points, since this increases the amount of calculations performed by the program. To demonstrate this, we tested the supplied real2d-th program and varied the number of spatial discretization points NX=NY from 20 to 1000. The measured speedup obtained when running this program on 19 CPU cores as a function of the number of discretization points is shown in Fig. 2. The speedup first increases rapidly with the number of discretization points and eventually saturates. Additional comments: Example inputs provided with the programs take less than 30 minutes to run on a workstation with two Intel Xeon E5-2650 v3 processors (2 QPI links, 10 CPU cores, 25 MB cache, 2.3 GHz).

  9. Superdiffusion in a non-Markovian random walk model with a Gaussian memory profile

    NASA Astrophysics Data System (ADS)

    Borges, G. M.; Ferreira, A. S.; da Silva, M. A. A.; Cressoni, J. C.; Viswanathan, G. M.; Mariz, A. M.

    2012-09-01

    Most superdiffusive Non-Markovian random walk models assume that correlations are maintained at all time scales, e.g., fractional Brownian motion, Lévy walks, the Elephant walk and Alzheimer walk models. In the latter two models the random walker can always "remember" the initial times near t = 0. Assuming jump size distributions with finite variance, the question naturally arises: is superdiffusion possible if the walker is unable to recall the initial times? We give a conclusive answer to this general question, by studying a non-Markovian model in which the walker's memory of the past is weighted by a Gaussian centered at time t/2, at which time the walker had one half the present age, and with a standard deviation σt which grows linearly as the walker ages. For large widths we find that the model behaves similarly to the Elephant model, but for small widths this Gaussian memory profile model behaves like the Alzheimer walk model. We also report that the phenomenon of amnestically induced persistence, known to occur in the Alzheimer walk model, arises in the Gaussian memory profile model. We conclude that memory of the initial times is not a necessary condition for generating (log-periodic) superdiffusion. We show that the phenomenon of amnestically induced persistence extends to the case of a Gaussian memory profile.

  10. On climate prediction: how much can we expect from climate memory?

    NASA Astrophysics Data System (ADS)

    Yuan, Naiming; Huang, Yan; Duan, Jianping; Zhu, Congwen; Xoplaki, Elena; Luterbacher, Jürg

    2018-03-01

    Slowing variability in climate system is an important source of climate predictability. However, it is still challenging for current dynamical models to fully capture the variability as well as its impacts on future climate. In this study, instead of simulating the internal multi-scale oscillations in dynamical models, we discussed the effects of internal variability in terms of climate memory. By decomposing climate state x(t) at a certain time point t into memory part M(t) and non-memory part ɛ (t) , climate memory effects from the past 30 years on climate prediction are quantified. For variables with strong climate memory, high variance (over 20% ) in x(t) is explained by the memory part M(t), and the effects of climate memory are non-negligible for most climate variables, but the precipitation. Regarding of multi-steps climate prediction, a power law decay of the explained variance was found, indicating long-lasting climate memory effects. The explained variances by climate memory can remain to be higher than 10% for more than 10 time steps. Accordingly, past climate conditions can affect both short (monthly) and long-term (interannual, decadal, or even multidecadal) climate predictions. With the memory part M(t) precisely calculated from Fractional Integral Statistical Model, one only needs to focus on the non-memory part ɛ (t) , which is an important quantity that determines climate predictive skills.

  11. Strategies To Enhance Memory Based on Brain-Research.

    ERIC Educational Resources Information Center

    Banikowski, Alison K.; Mehring, Teresa A.

    1999-01-01

    This article reviews the literature on three aspects of memory: (1) an information processing model of memory (including the sensory register, attention, short-term memory, and long-term memory); (2) instructional strategies designed to enhance memory (which stress gaining students' attention and active involvement); and (3) reasons why…

  12. Working memory capacity and overgeneral autobiographical memory in young and older adults.

    PubMed

    Ros, Laura; Latorre, José Miguel; Serrano, Juan Pedro

    2010-01-01

    The objectives of this study are to compare the Autobiographical Memory Test (AMT) performance of two healthy samples of younger and older adults and to analyse the relationship between overgeneral memory (OGM) and working memory executive processes (WMEP) using a structural equation modelling with latent variables. The AMT and sustained attention, short-term memory and working memory tasks were administered to a group of young adults (N = 50) and a group of older adults (N = 46). On the AMT, the older adults recalled a greater number of categorical memories (p = .000) and fewer specific memories (p = .000) than the young adults, confirming that OGM occurs in the normal population and increases with age. WMEP was measured by reading span and a working memory with sustained attention load task. Structural equation modelling reflects that WMEP shows a strong relationship with OGM: lower scores on WMEP reflect an OGM phenomenon characterized by higher categorical and lower specific memories.

  13. A Neural Network Model of Retrieval-Induced Forgetting

    ERIC Educational Resources Information Center

    Norman, Kenneth A.; Newman, Ehren L.; Detre, Greg

    2007-01-01

    Retrieval-induced forgetting (RIF) refers to the finding that retrieving a memory can impair subsequent recall of related memories. Here, the authors present a new model of how the brain gives rise to RIF in both semantic and episodic memory. The core of the model is a recently developed neural network learning algorithm that leverages regular…

  14. Optical Associative Memory Model With Threshold Modification Using Complementary Vector

    NASA Astrophysics Data System (ADS)

    Bian, Shaoping; Xu, Kebin; Hong, Jing

    1989-02-01

    A new criterion to evaluate the similarity between two vectors in associative memory is presented. According to it, an experimental research about optical associative memory model with threshold modification using complementary vector is carried out. This model is capable of eliminating the posibility to recall erroneously. Therefore the accuracy of reading out is improved.

  15. A Revised Model of Short-Term Memory and Long-Term Learning of Verbal Sequences

    ERIC Educational Resources Information Center

    Burgess, Neil; Hitch, Graham J.

    2006-01-01

    The interaction between short- and long-term memory is studied within a model in which phonemic and (temporal) contextual information have separate influences on immediate verbal serial recall via connections with short- and long-term plasticity [Burgess, N., & Hitch, G.J. (1999). Memory for serial order: a network model of the phonological loop…

  16. The memory state heuristic: A formal model based on repeated recognition judgments.

    PubMed

    Castela, Marta; Erdfelder, Edgar

    2017-02-01

    The recognition heuristic (RH) theory predicts that, in comparative judgment tasks, if one object is recognized and the other is not, the recognized one is chosen. The memory-state heuristic (MSH) extends the RH by assuming that choices are not affected by recognition judgments per se, but by the memory states underlying these judgments (i.e., recognition certainty, uncertainty, or rejection certainty). Specifically, the larger the discrepancy between memory states, the larger the probability of choosing the object in the higher state. The typical RH paradigm does not allow estimation of the underlying memory states because it is unknown whether the objects were previously experienced or not. Therefore, we extended the paradigm by repeating the recognition task twice. In line with high threshold models of recognition, we assumed that inconsistent recognition judgments result from uncertainty whereas consistent judgments most likely result from memory certainty. In Experiment 1, we fitted 2 nested multinomial models to the data: an MSH model that formalizes the relation between memory states and binary choices explicitly and an approximate model that ignores the (unlikely) possibility of consistent guesses. Both models provided converging results. As predicted, reliance on recognition increased with the discrepancy in the underlying memory states. In Experiment 2, we replicated these results and found support for choice consistency predictions of the MSH. Additionally, recognition and choice latencies were in agreement with the MSH in both experiments. Finally, we validated critical parameters of our MSH model through a cross-validation method and a third experiment. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  17. Using chaotic artificial neural networks to model memory in the brain

    NASA Astrophysics Data System (ADS)

    Aram, Zainab; Jafari, Sajad; Ma, Jun; Sprott, Julien C.; Zendehrouh, Sareh; Pham, Viet-Thanh

    2017-03-01

    In the current study, a novel model for human memory is proposed based on the chaotic dynamics of artificial neural networks. This new model explains a biological fact about memory which is not yet explained by any other model: There are theories that the brain normally works in a chaotic mode, while during attention it shows ordered behavior. This model uses the periodic windows observed in a previously proposed model for the brain to store and then recollect the information.

  18. Testing episodic memory in animals: a new approach.

    PubMed

    Griffiths, D P; Clayton, N S

    2001-08-01

    Episodic memory involves the encoding and storage of memories concerned with unique personal experiences and their subsequent recall, and it has long been the subject of intensive investigation in humans. According to Tulving's classical definition, episodic memory "receives and stores information about temporally dated episodes or events and temporal-spatial relations among these events." Thus, episodic memory provides information about the 'what' and 'when' of events ('temporally dated experiences') and about 'where' they happened ('temporal-spatial relations'). The storage and subsequent recall of this episodic information was thought to be beyond the memory capabilities of nonhuman animals. Although there are many laboratory procedures for investigating memory for discrete past episodes, until recently there were no previous studies that fully satisfied the criteria of Tulving's definition: they can all be explained in much simpler terms than episodic memory. However, current studies of memory for cache sites in food-storing jays provide an ethologically valid model for testing episodic-like memory in animals, thereby bridging the gap between human and animal studies memory. There is now a pressing need to adapt these experimental tests of episodic memory for other animals. Given the potential power of transgenic and knock-out procedures for investigating the genetic and molecular bases of learning and memory in laboratory rodents, not to mention the wealth of knowledge about the neuroanatomy and neurophysiology of the rodent hippocampus (a brain area heavily implicated in episodic memory), an obvious next step is to develop a rodent model of episodic-like memory based on the food-storing bird paradigm. The development of a rodent model system could make an important contribution to our understanding of the neural, molecular, and behavioral mechanisms of mammalian episodic memory.

  19. Personal semantics: at the crossroads of semantic and episodic memory.

    PubMed

    Renoult, Louis; Davidson, Patrick S R; Palombo, Daniela J; Moscovitch, Morris; Levine, Brian

    2012-11-01

    Declarative memory is usually described as consisting of two systems: semantic and episodic memory. Between these two poles, however, may lie a third entity: personal semantics (PS). PS concerns knowledge of one's past. Although typically assumed to be an aspect of semantic memory, it is essentially absent from existing models of knowledge. Furthermore, like episodic memory (EM), PS is idiosyncratically personal (i.e., not culturally-shared). We show that, depending on how it is operationalized, the neural correlates of PS can look more similar to semantic memory, more similar to EM, or dissimilar to both. We consider three different perspectives to better integrate PS into existing models of declarative memory and suggest experimental strategies for disentangling PS from semantic and episodic memory. Copyright © 2012 Elsevier Ltd. All rights reserved.

  20. The Role of Life-Space, Social Activity, and Depression on the Subjective Memory Complaints of Community-Dwelling Filipino Elderly: A Structural Equation Model

    ERIC Educational Resources Information Center

    de Guzman, Allan B.; Lagdaan, Lovely France M.; Lagoy, Marie Lauren V.

    2015-01-01

    Subjective memory complaints are one of the major concerns of the elderly and remain a challenging area in gerontology. There are previous studies that identify different factors affecting subjective memory complaints. However, an extended model that correlates life-space on subjective memory complaints remains a blank spot. The objective of this…

  1. No Evidence for a Fixed Object Limit in Working Memory: Spatial Ensemble Representations Inflate Estimates of Working Memory Capacity for Complex Objects

    ERIC Educational Resources Information Center

    Brady, Timothy F.; Alvarez, George A.

    2015-01-01

    A central question for models of visual working memory is whether the number of objects people can remember depends on object complexity. Some influential "slot" models of working memory capacity suggest that people always represent 3-4 objects and that only the fidelity with which these objects are represented is affected by object…

  2. Histone Deacetylase Inhibition Induces Odor Preference Memory Extension and Maintains Enhanced AMPA Receptor Expression in the Rat Pup Model

    ERIC Educational Resources Information Center

    Bhattacharya, Sriya; Mukherjee, Bandhan; Doré, Jules J. E.; Yuan, Qi; Harley, Carolyn W.; McLean, John H.

    2017-01-01

    Histone deacetylase (HDAC) plays a role in synaptic plasticity and long-term memory formation. We hypothesized that trichostatin-A (TSA), an HDAC inhibitor, would promote long-term odor preference memory and maintain enhanced GluA1 receptor levels that have been hypothesized to support memory. We used an early odor preference learning model in…

  3. Working memory for braille is shaped by experience.

    PubMed

    Cohen, Henri; Scherzer, Peter; Viau, Robert; Voss, Patrice; Lepore, Franco

    2011-03-01

    Tactile working memory was found to be more developed in completely blind (congenital and acquired) than in semi-sighted subjects, indicating that experience plays a crucial role in shaping working memory. A model of working memory, adapted from the classical model proposed by Baddeley and Hitch1 and Baddeley2 is presented where the connection strengths of a highly cross-modal network are altered through experience.

  4. The contribution of attentional lapses to individual differences in visual working memory capacity.

    PubMed

    Adam, Kirsten C S; Mance, Irida; Fukuda, Keisuke; Vogel, Edward K

    2015-08-01

    Attentional control and working memory capacity are important cognitive abilities that substantially vary between individuals. Although much is known about how attentional control and working memory capacity relate to each other and to constructs like fluid intelligence, little is known about how trial-by-trial fluctuations in attentional engagement impact trial-by-trial working memory performance. Here, we employ a novel whole-report memory task that allowed us to distinguish between varying levels of attentional engagement in humans performing a working memory task. By characterizing low-performance trials, we can distinguish between models in which working memory performance failures are caused by either (1) complete lapses of attention or (2) variations in attentional control. We found that performance failures increase with set-size and strongly predict working memory capacity. Performance variability was best modeled by an attentional control model of attention, not a lapse model. We examined neural signatures of performance failures by measuring EEG activity while participants performed the whole-report task. The number of items correctly recalled in the memory task was predicted by frontal theta power, with decreased frontal theta power associated with poor performance on the task. In addition, we found that poor performance was not explained by failures of sensory encoding; the P1/N1 response and ocular artifact rates were equivalent for high- and low-performance trials. In all, we propose that attentional lapses alone cannot explain individual differences in working memory performance. Instead, we find that graded fluctuations in attentional control better explain the trial-by-trial differences in working memory that we observe.

  5. Memory in a fractional-order cardiomyocyte model alters properties of alternans and spontaneous activity

    NASA Astrophysics Data System (ADS)

    Comlekoglu, T.; Weinberg, S. H.

    2017-09-01

    Cardiac memory is the dependence of electrical activity on the prior history of one or more system state variables, including transmembrane potential (Vm), ionic current gating, and ion concentrations. While prior work has represented memory either phenomenologically or with biophysical detail, in this study, we consider an intermediate approach of a minimal three-variable cardiomyocyte model, modified with fractional-order dynamics, i.e., a differential equation of order between 0 and 1, to account for history-dependence. Memory is represented via both capacitive memory, due to fractional-order Vm dynamics, that arises due to non-ideal behavior of membrane capacitance; and ionic current gating memory, due to fractional-order gating variable dynamics, that arises due to gating history-dependence. We perform simulations for varying Vm and gating variable fractional-orders and pacing cycle length and measure action potential duration (APD) and incidence of alternans, loss of capture, and spontaneous activity. In the absence of ionic current gating memory, we find that capacitive memory, i.e., decreased Vm fractional-order, typically shortens APD, suppresses alternans, and decreases the minimum cycle length (MCL) for loss of capture. However, in the presence of ionic current gating memory, capacitive memory can prolong APD, promote alternans, and increase MCL. Further, we find that reduced Vm fractional order (typically less than 0.75) can drive phase 4 depolarizations that promote spontaneous activity. Collectively, our results demonstrate that memory reproduced by a fractional-order model can play a role in alternans formation and pacemaking, and in general, can greatly increase the range of electrophysiological characteristics exhibited by a minimal model.

  6. The Contribution of Attentional Lapses to Individual Differences in Visual Working Memory Capacity

    PubMed Central

    Adam, Kirsten C. S.; Mance, Irida; Fukuda, Keisuke; Vogel, Edward K.

    2015-01-01

    Attentional control and working memory capacity are important cognitive abilities that substantially vary between individuals. Although much is known about how attentional control and working memory capacity relate to each other and to constructs like fluid intelligence, little is known about how trial-by-trial fluctuations in attentional engagement impact trial-by-trial working memory performance. Here, we employ a novel whole-report memory task that allowed us to distinguish between varying levels of attentional engagement in humans performing a working memory task. By characterizing low-performance trials, we can distinguish between models in which working memory performance failures are caused by either (1) complete lapses of attention or (2) variations in attentional control. We found that performance failures increase with set-size and strongly predict working memory capacity. Performance variability was best modeled by an attentional control model of attention, not a lapse model. We examined neural signatures of performance failures by measuring EEG activity while participants performed the whole-report task. The number of items correctly recalled in the memory task was predicted by frontal theta power, with decreased frontal theta power associated with poor performance on the task. In addition, we found that poor performance was not explained by failures of sensory encoding; the P1/N1 response and ocular artifact rates were equivalent for high- and low-performance trials. In all, we propose that attentional lapses alone cannot explain individual differences in working memory performance. Instead, we find that graded fluctuations in attentional control better explain the trial-by-trial differences in working memory that we observe. PMID:25811710

  7. Toxin-Induced Experimental Models of Learning and Memory Impairment

    PubMed Central

    More, Sandeep Vasant; Kumar, Hemant; Cho, Duk-Yeon; Yun, Yo-Sep; Choi, Dong-Kug

    2016-01-01

    Animal models for learning and memory have significantly contributed to novel strategies for drug development and hence are an imperative part in the assessment of therapeutics. Learning and memory involve different stages including acquisition, consolidation, and retrieval and each stage can be characterized using specific toxin. Recent studies have postulated the molecular basis of these processes and have also demonstrated many signaling molecules that are involved in several stages of memory. Most insights into learning and memory impairment and to develop a novel compound stems from the investigations performed in experimental models, especially those produced by neurotoxins models. Several toxins have been utilized based on their mechanism of action for learning and memory impairment such as scopolamine, streptozotocin, quinolinic acid, and domoic acid. Further, some toxins like 6-hydroxy dopamine (6-OHDA), 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP) and amyloid-β are known to cause specific learning and memory impairment which imitate the disease pathology of Parkinson’s disease dementia and Alzheimer’s disease dementia. Apart from these toxins, several other toxins come under a miscellaneous category like an environmental pollutant, snake venoms, botulinum, and lipopolysaccharide. This review will focus on the various classes of neurotoxin models for learning and memory impairment with their specific mechanism of action that could assist the process of drug discovery and development for dementia and cognitive disorders. PMID:27598124

  8. Toxin-Induced Experimental Models of Learning and Memory Impairment.

    PubMed

    More, Sandeep Vasant; Kumar, Hemant; Cho, Duk-Yeon; Yun, Yo-Sep; Choi, Dong-Kug

    2016-09-01

    Animal models for learning and memory have significantly contributed to novel strategies for drug development and hence are an imperative part in the assessment of therapeutics. Learning and memory involve different stages including acquisition, consolidation, and retrieval and each stage can be characterized using specific toxin. Recent studies have postulated the molecular basis of these processes and have also demonstrated many signaling molecules that are involved in several stages of memory. Most insights into learning and memory impairment and to develop a novel compound stems from the investigations performed in experimental models, especially those produced by neurotoxins models. Several toxins have been utilized based on their mechanism of action for learning and memory impairment such as scopolamine, streptozotocin, quinolinic acid, and domoic acid. Further, some toxins like 6-hydroxy dopamine (6-OHDA), 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP) and amyloid-β are known to cause specific learning and memory impairment which imitate the disease pathology of Parkinson's disease dementia and Alzheimer's disease dementia. Apart from these toxins, several other toxins come under a miscellaneous category like an environmental pollutant, snake venoms, botulinum, and lipopolysaccharide. This review will focus on the various classes of neurotoxin models for learning and memory impairment with their specific mechanism of action that could assist the process of drug discovery and development for dementia and cognitive disorders.

  9. A Comparative Study of the Effects of the Neurocognitive-Based Model and the Conventional Model on Learner Attention, Working Memory and Mood

    ERIC Educational Resources Information Center

    Srikoon, Sanit; Bunterm, Tassanee; Nethanomsak, Teerachai; Ngang, Tang Keow

    2017-01-01

    Purpose: The attention, working memory, and mood of learners are the most important abilities in the learning process. This study was concerned with the comparison of contextualized attention, working memory, and mood through a neurocognitive-based model (5P) and a conventional model (5E). It sought to examine the significant change in attention,…

  10. The construction of meaning.

    PubMed

    Kintsch, Walter; Mangalath, Praful

    2011-04-01

    We argue that word meanings are not stored in a mental lexicon but are generated in the context of working memory from long-term memory traces that record our experience with words. Current statistical models of semantics, such as latent semantic analysis and the Topic model, describe what is stored in long-term memory. The CI-2 model describes how this information is used to construct sentence meanings. This model is a dual-memory model, in that it distinguishes between a gist level and an explicit level. It also incorporates syntactic information about how words are used, derived from dependency grammar. The construction of meaning is conceptualized as feature sampling from the explicit memory traces, with the constraint that the sampling must be contextually relevant both semantically and syntactically. Semantic relevance is achieved by sampling topically relevant features; local syntactic constraints as expressed by dependency relations ensure syntactic relevance. Copyright © 2010 Cognitive Science Society, Inc.

  11. Overlapping parietal activity in memory and perception: evidence for the attention to memory model.

    PubMed

    Cabeza, Roberto; Mazuz, Yonatan S; Stokes, Jared; Kragel, James E; Woldorff, Marty G; Ciaramelli, Elisa; Olson, Ingrid R; Moscovitch, Morris

    2011-11-01

    The specific role of different parietal regions to episodic retrieval is a topic of intense debate. According to the Attention to Memory (AtoM) model, dorsal parietal cortex (DPC) mediates top-down attention processes guided by retrieval goals, whereas ventral parietal cortex (VPC) mediates bottom-up attention processes captured by the retrieval output or the retrieval cue. This model also hypothesizes that the attentional functions of DPC and VPC are similar for memory and perception. To investigate this last hypothesis, we scanned participants with event-related fMRI whereas they performed memory and perception tasks, each comprising an orienting phase (top-down attention) and a detection phase (bottom-up attention). The study yielded two main findings. First, consistent with the AtoM model, orienting-related activity for memory and perception overlapped in DPC, whereas detection-related activity for memory and perception overlapped in VPC. The DPC overlap was greater in the left intraparietal sulcus, and the VPC overlap in the left TPJ. Around overlapping areas, there were differences in the spatial distribution of memory and perception activations, which were consistent with trends reported in the literature. Second, both DPC and VPC showed stronger connectivity with medial-temporal lobe during the memory task and with visual cortex during the perception task. These findings suggest that, during memory tasks, some parietal regions mediate similar attentional control processes to those involved in perception tasks (orienting in DPC vs. detection in VPC), although on different types of information (mnemonic vs. sensory).

  12. Leveling up the analysis of the reminiscence bump in autobiographical memory: A new approach based on multilevel multinomial models.

    PubMed

    Zimprich, Daniel; Wolf, Tabea

    2018-06-20

    In many studies of autobiographical memory, participants are asked to generate more than one autobiographical memory. The resulting data then have a hierarchical or multilevel structure, in the sense that the autobiographical memories (Level 1) generated by the same person (Level 2) tend to be more similar. Transferred to an analysis of the reminiscence bump in autobiographical memory, at Level 1 the prediction of whether an autobiographical memory will fall within the reminiscence bump is based on the characteristics of that memory. At Level 2, the prediction of whether an individual will report more autobiographical memories that fall in the reminiscence bump is based on the characteristics of the individual. We suggest a multilevel multinomial model that allows for analyzing whether an autobiographical memory falls in the reminiscence bump at both levels of analysis simultaneously. The data come from 100 older participants who reported up to 33 autobiographical memories. Our results showed that about 12% of the total variance was between persons (Level 2). Moreover, at Level 1, memories of first-time experiences were more likely to fall in the reminiscence bump than were emotionally more positive memories. At Level 2, persons who reported more emotionally positive memories tended to report fewer memories from the life period after the reminiscence bump. In addition, cross-level interactions showed that the effects at Level 1 partly depended on the Level 2 effects. We discuss possible extensions of the model we present and the meaning of our findings for two prominent explanatory approaches to the reminiscence bump, as well as future directions.

  13. Developing a hippocampal neural prosthetic to facilitate human memory encoding and recall.

    PubMed

    Hampson, Robert E; Song, Dong; Robinson, Brian S; Fetterhoff, Dustin; Dakos, Alexander S; Roeder, Brent M; She, Xiwei; Wicks, Robert T; Witcher, Mark R; Couture, Daniel E; Laxton, Adrian W; Munger-Clary, Heidi; Popli, Gautam; Sollman, Myriam J; Whitlow, Christopher T; Marmarelis, Vasilis Z; Berger, Theodore W; Deadwyler, Sam A

    2018-06-01

    We demonstrate here the first successful implementation in humans of a proof-of-concept system for restoring and improving memory function via facilitation of memory encoding using the patient's own hippocampal spatiotemporal neural codes for memory. Memory in humans is subject to disruption by drugs, disease and brain injury, yet previous attempts to restore or rescue memory function in humans typically involved only nonspecific, modulation of brain areas and neural systems related to memory retrieval. We have constructed a model of processes by which the hippocampus encodes memory items via spatiotemporal firing of neural ensembles that underlie the successful encoding of short-term memory. A nonlinear multi-input, multi-output (MIMO) model of hippocampal CA3 and CA1 neural firing is computed that predicts activation patterns of CA1 neurons during the encoding (sample) phase of a delayed match-to-sample (DMS) human short-term memory task. MIMO model-derived electrical stimulation delivered to the same CA1 locations during the sample phase of DMS trials facilitated short-term/working memory by 37% during the task. Longer term memory retention was also tested in the same human subjects with a delayed recognition (DR) task that utilized images from the DMS task, along with images that were not from the task. Across the subjects, the stimulated trials exhibited significant improvement (35%) in both short-term and long-term retention of visual information. These results demonstrate the facilitation of memory encoding which is an important feature for the construction of an implantable neural prosthetic to improve human memory.

  14. Developing a hippocampal neural prosthetic to facilitate human memory encoding and recall

    NASA Astrophysics Data System (ADS)

    Hampson, Robert E.; Song, Dong; Robinson, Brian S.; Fetterhoff, Dustin; Dakos, Alexander S.; Roeder, Brent M.; She, Xiwei; Wicks, Robert T.; Witcher, Mark R.; Couture, Daniel E.; Laxton, Adrian W.; Munger-Clary, Heidi; Popli, Gautam; Sollman, Myriam J.; Whitlow, Christopher T.; Marmarelis, Vasilis Z.; Berger, Theodore W.; Deadwyler, Sam A.

    2018-06-01

    Objective. We demonstrate here the first successful implementation in humans of a proof-of-concept system for restoring and improving memory function via facilitation of memory encoding using the patient’s own hippocampal spatiotemporal neural codes for memory. Memory in humans is subject to disruption by drugs, disease and brain injury, yet previous attempts to restore or rescue memory function in humans typically involved only nonspecific, modulation of brain areas and neural systems related to memory retrieval. Approach. We have constructed a model of processes by which the hippocampus encodes memory items via spatiotemporal firing of neural ensembles that underlie the successful encoding of short-term memory. A nonlinear multi-input, multi-output (MIMO) model of hippocampal CA3 and CA1 neural firing is computed that predicts activation patterns of CA1 neurons during the encoding (sample) phase of a delayed match-to-sample (DMS) human short-term memory task. Main results. MIMO model-derived electrical stimulation delivered to the same CA1 locations during the sample phase of DMS trials facilitated short-term/working memory by 37% during the task. Longer term memory retention was also tested in the same human subjects with a delayed recognition (DR) task that utilized images from the DMS task, along with images that were not from the task. Across the subjects, the stimulated trials exhibited significant improvement (35%) in both short-term and long-term retention of visual information. Significance. These results demonstrate the facilitation of memory encoding which is an important feature for the construction of an implantable neural prosthetic to improve human memory.

  15. A simplified conjoint recognition paradigm for the measurement of gist and verbatim memory.

    PubMed

    Stahl, Christoph; Klauer, Karl Christoph

    2008-05-01

    The distinction between verbatim and gist memory traces has furthered the understanding of numerous phenomena in various fields, such as false memory research, research on reasoning and decision making, and cognitive development. To measure verbatim and gist memory empirically, an experimental paradigm and multinomial measurement model has been proposed but rarely applied. In the present article, a simplified conjoint recognition paradigm and multinomial model is introduced and validated as a measurement tool for the separate assessment of verbatim and gist memory processes. A Bayesian metacognitive framework is applied to validate guessing processes. Extensions of the model toward incorporating the processes of phantom recollection and erroneous recollection rejection are discussed.

  16. Statistical prediction with Kanerva's sparse distributed memory

    NASA Technical Reports Server (NTRS)

    Rogers, David

    1989-01-01

    A new viewpoint of the processing performed by Kanerva's sparse distributed memory (SDM) is presented. In conditions of near- or over-capacity, where the associative-memory behavior of the model breaks down, the processing performed by the model can be interpreted as that of a statistical predictor. Mathematical results are presented which serve as the framework for a new statistical viewpoint of sparse distributed memory and for which the standard formulation of SDM is a special case. This viewpoint suggests possible enhancements to the SDM model, including a procedure for improving the predictiveness of the system based on Holland's work with genetic algorithms, and a method for improving the capacity of SDM even when used as an associative memory.

  17. Performance analysis and comparison of a minimum interconnections direct storage model with traditional neural bidirectional memories.

    PubMed

    Bhatti, A Aziz

    2009-12-01

    This study proposes an efficient and improved model of a direct storage bidirectional memory, improved bidirectional associative memory (IBAM), and emphasises the use of nanotechnology for efficient implementation of such large-scale neural network structures at a considerable lower cost reduced complexity, and less area required for implementation. This memory model directly stores the X and Y associated sets of M bipolar binary vectors in the form of (MxN(x)) and (MxN(y)) memory matrices, requires O(N) or about 30% of interconnections with weight strength ranging between +/-1, and is computationally very efficient as compared to sequential, intraconnected and other bidirectional associative memory (BAM) models of outer-product type that require O(N(2)) complex interconnections with weight strength ranging between +/-M. It is shown that it is functionally equivalent to and possesses all attributes of a BAM of outer-product type, and yet it is simple and robust in structure, very large scale integration (VLSI), optical and nanotechnology realisable, modular and expandable neural network bidirectional associative memory model in which the addition or deletion of a pair of vectors does not require changes in the strength of interconnections of the entire memory matrix. The analysis of retrieval process, signal-to-noise ratio, storage capacity and stability of the proposed model as well as of the traditional BAM has been carried out. Constraints on and characteristics of unipolar and bipolar binaries for improved storage and retrieval are discussed. The simulation results show that it has log(e) N times higher storage capacity, superior performance, faster convergence and retrieval time, when compared to traditional sequential and intraconnected bidirectional memories.

  18. NAAG Peptidase Inhibitors Act via mGluR3: Animal Models of Memory, Alzheimer's, and Ethanol Intoxication.

    PubMed

    Olszewski, Rafal T; Janczura, Karolina J; Bzdega, Tomasz; Der, Elise K; Venzor, Faustino; O'Rourke, Brennen; Hark, Timothy J; Craddock, Kirsten E; Balasubramanian, Shankar; Moussa, Charbel; Neale, Joseph H

    2017-09-01

    Glutamate carboxypeptidase II (GCPII) inactivates the peptide neurotransmitter N-acetylaspartylglutamate (NAAG) following synaptic release. Inhibitors of GCPII increase extracellular NAAG levels and are efficacious in animal models of clinical disorders via NAAG activation of a group II metabotropic glutamate receptor. mGluR2 and mGluR3 knock-out (ko) mice were used to test the hypothesis that mGluR3 mediates the activity of GCPII inhibitors ZJ43 and 2-PMPA in animal models of memory and memory loss. Short- (1.5 h) and long- (24 h) term novel object recognition tests were used to assess memory. Treatment with ZJ43 or 2-PMPA prior to acquisition trials increased long-term memory in mGluR2, but not mGluR3, ko mice. Nine month-old triple transgenic Alzheimer's disease model mice exhibited impaired short-term novel object recognition memory that was rescued by treatment with a NAAG peptidase inhibitor. NAAG peptidase inhibitors and the group II mGluR agonist, LY354740, reversed the short-term memory deficit induced by acute ethanol administration in wild type mice. 2-PMPA also moderated the effect of ethanol on short-term memory in mGluR2 ko mice but failed to do so in mGluR3 ko mice. LY354740 and ZJ43 blocked ethanol-induced motor activation. Both GCPII inhibitors and LY354740 also significantly moderated the loss of motor coordination induced by 2.1 g/kg ethanol treatment. These data support the conclusion that inhibitors of glutamate carboxypeptidase II are efficacious in object recognition models of normal memory and memory deficits via an mGluR3 mediated process, actions that could have widespread clinical applications.

  19. Self-defining memories, scripts, and the life story: narrative identity in personality and psychotherapy.

    PubMed

    Singer, Jefferson A; Blagov, Pavel; Berry, Meredith; Oost, Kathryn M

    2013-12-01

    An integrative model of narrative identity builds on a dual memory system that draws on episodic memory and a long-term self to generate autobiographical memories. Autobiographical memories related to critical goals in a lifetime period lead to life-story memories, which in turn become self-defining memories when linked to an individual's enduring concerns. Self-defining memories that share repetitive emotion-outcome sequences yield narrative scripts, abstracted templates that filter cognitive-affective processing. The life story is the individual's overarching narrative that provides unity and purpose over the life course. Healthy narrative identity combines memory specificity with adaptive meaning-making to achieve insight and well-being, as demonstrated through a literature review of personality and clinical research, as well as new findings from our own research program. A clinical case study drawing on this narrative identity model is also presented with implications for treatment and research. © 2012 Wiley Periodicals, Inc.

  20. A Cerebellar-model Associative Memory as a Generalized Random-access Memory

    NASA Technical Reports Server (NTRS)

    Kanerva, Pentti

    1989-01-01

    A versatile neural-net model is explained in terms familiar to computer scientists and engineers. It is called the sparse distributed memory, and it is a random-access memory for very long words (for patterns with thousands of bits). Its potential utility is the result of several factors: (1) a large pattern representing an object or a scene or a moment can encode a large amount of information about what it represents; (2) this information can serve as an address to the memory, and it can also serve as data; (3) the memory is noise tolerant--the information need not be exact; (4) the memory can be made arbitrarily large and hence an arbitrary amount of information can be stored in it; and (5) the architecture is inherently parallel, allowing large memories to be fast. Such memories can become important components of future computers.

  1. Memory recall and spike-frequency adaptation

    NASA Astrophysics Data System (ADS)

    Roach, James P.; Sander, Leonard M.; Zochowski, Michal R.

    2016-05-01

    The brain can reproduce memories from partial data; this ability is critical for memory recall. The process of memory recall has been studied using autoassociative networks such as the Hopfield model. This kind of model reliably converges to stored patterns that contain the memory. However, it is unclear how the behavior is controlled by the brain so that after convergence to one configuration, it can proceed with recognition of another one. In the Hopfield model, this happens only through unrealistic changes of an effective global temperature that destabilizes all stored configurations. Here we show that spike-frequency adaptation (SFA), a common mechanism affecting neuron activation in the brain, can provide state-dependent control of pattern retrieval. We demonstrate this in a Hopfield network modified to include SFA, and also in a model network of biophysical neurons. In both cases, SFA allows for selective stabilization of attractors with different basins of attraction, and also for temporal dynamics of attractor switching that is not possible in standard autoassociative schemes. The dynamics of our models give a plausible account of different sorts of memory retrieval.

  2. Monte Carlo based investigation of berry phase for depth resolved characterization of biomedical scattering samples

    NASA Astrophysics Data System (ADS)

    Baba, J. S.; Koju, V.; John, D.

    2015-03-01

    The propagation of light in turbid media is an active area of research with relevance to numerous investigational fields, e.g., biomedical diagnostics and therapeutics. The statistical random-walk nature of photon propagation through turbid media is ideal for computational based modeling and simulation. Ready access to super computing resources provide a means for attaining brute force solutions to stochastic light-matter interactions entailing scattering by facilitating timely propagation of sufficient (>107) photons while tracking characteristic parameters based on the incorporated physics of the problem. One such model that works well for isotropic but fails for anisotropic scatter, which is the case for many biomedical sample scattering problems, is the diffusion approximation. In this report, we address this by utilizing Berry phase (BP) evolution as a means for capturing anisotropic scattering characteristics of samples in the preceding depth where the diffusion approximation fails. We extend the polarization sensitive Monte Carlo method of Ramella-Roman, et al., to include the computationally intensive tracking of photon trajectory in addition to polarization state at every scattering event. To speed-up the computations, which entail the appropriate rotations of reference frames, the code was parallelized using OpenMP. The results presented reveal that BP is strongly correlated to the photon penetration depth, thus potentiating the possibility of polarimetric depth resolved characterization of highly scattering samples, e.g., biological tissues.

  3. Monte Carlo based investigation of Berry phase for depth resolved characterization of biomedical scattering samples

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baba, Justin S; John, Dwayne O; Koju, Vijay

    The propagation of light in turbid media is an active area of research with relevance to numerous investigational fields, e.g., biomedical diagnostics and therapeutics. The statistical random-walk nature of photon propagation through turbid media is ideal for computational based modeling and simulation. Ready access to super computing resources provide a means for attaining brute force solutions to stochastic light-matter interactions entailing scattering by facilitating timely propagation of sufficient (>10million) photons while tracking characteristic parameters based on the incorporated physics of the problem. One such model that works well for isotropic but fails for anisotropic scatter, which is the case formore » many biomedical sample scattering problems, is the diffusion approximation. In this report, we address this by utilizing Berry phase (BP) evolution as a means for capturing anisotropic scattering characteristics of samples in the preceding depth where the diffusion approximation fails. We extend the polarization sensitive Monte Carlo method of Ramella-Roman, et al.,1 to include the computationally intensive tracking of photon trajectory in addition to polarization state at every scattering event. To speed-up the computations, which entail the appropriate rotations of reference frames, the code was parallelized using OpenMP. The results presented reveal that BP is strongly correlated to the photon penetration depth, thus potentiating the possibility of polarimetric depth resolved characterization of highly scattering samples, e.g., biological tissues.« less

  4. Investigation of obstacle effect to improve conjugate heat transfer in backward facing step channel using fast simulation of incompressible flow

    NASA Astrophysics Data System (ADS)

    Nouri-Borujerdi, Ali; Moazezi, Arash

    2018-01-01

    The current study investigates the conjugate heat transfer characteristics for laminar flow in backward facing step channel. All of the channel walls are insulated except the lower thick wall under a constant temperature. The upper wall includes a insulated obstacle perpendicular to flow direction. The effect of obstacle height and location on the fluid flow and heat transfer are numerically explored for the Reynolds number in the range of 10 ≤ Re ≤ 300. Incompressible Navier-Stokes and thermal energy equations are solved simultaneously in fluid region by the upwind compact finite difference scheme based on flux-difference splitting in conjunction with artificial compressibility method. In the thick wall, the energy equation is obtained by Laplace equation. A multi-block approach is used to perform parallel computing to reduce the CPU time. Each block is modeled separately by sharing boundary conditions with neighbors. The developed program for modeling was written in FORTRAN language with OpenMP API. The obtained results showed that using of the multi-block parallel computing method is a simple robust scheme with high performance and high-order accurate. Moreover, the obtained results demonstrated that the increment of Reynolds number and obstacle height as well as decrement of horizontal distance between the obstacle and the step improve the heat transfer.

  5. Massively parallel and linear-scaling algorithm for second-order Møller-Plesset perturbation theory applied to the study of supramolecular wires

    NASA Astrophysics Data System (ADS)

    Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul

    2017-03-01

    We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the "resolution of the identity second-order Møller-Plesset perturbation theory" (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.

  6. A Buffer Model of Memory Encoding and Temporal Correlations in Retrieval

    ERIC Educational Resources Information Center

    Lehman, Melissa; Malmberg, Kenneth J.

    2013-01-01

    Atkinson and Shiffrin's (1968) dual-store model of memory includes structural aspects of memory along with control processes. The rehearsal buffer is a process by which items are kept in mind and long-term episodic traces are formed. The model has been both influential and controversial. Here, we describe a novel variant of Atkinson and Shiffrin's…

  7. Slave to the Rhythm: Experimental Tests of a Model for Verbal Short-Term Memory and Long-Term Sequence Learning

    ERIC Educational Resources Information Center

    Hitch, Graham J.; Flude, Brenda; Burgess, Neil

    2009-01-01

    Three experiments tested predictions of a neural network model of phonological short-term memory that assumes separate representations for order and item information, order being coded via a context-timing signal [Burgess, N., & Hitch, G. J. (1999). Memory for serial order: A network model of the phonological loop and its timing. "Psychological…

  8. Avalanches and generalized memory associativity in a network model for conscious and unconscious mental functioning

    NASA Astrophysics Data System (ADS)

    Siddiqui, Maheen; Wedemann, Roseli S.; Jensen, Henrik Jeldtoft

    2018-01-01

    We explore statistical characteristics of avalanches associated with the dynamics of a complex-network model, where two modules corresponding to sensorial and symbolic memories interact, representing unconscious and conscious mental processes. The model illustrates Freud's ideas regarding the neuroses and that consciousness is related with symbolic and linguistic memory activity in the brain. It incorporates the Stariolo-Tsallis generalization of the Boltzmann Machine in order to model memory retrieval and associativity. In the present work, we define and measure avalanche size distributions during memory retrieval, in order to gain insight regarding basic aspects of the functioning of these complex networks. The avalanche sizes defined for our model should be related to the time consumed and also to the size of the neuronal region which is activated, during memory retrieval. This allows the qualitative comparison of the behaviour of the distribution of cluster sizes, obtained during fMRI measurements of the propagation of signals in the brain, with the distribution of avalanche sizes obtained in our simulation experiments. This comparison corroborates the indication that the Nonextensive Statistical Mechanics formalism may indeed be more well suited to model the complex networks which constitute brain and mental structure.

  9. Comparing soil moisture memory in satellite observations and models

    NASA Astrophysics Data System (ADS)

    Stacke, Tobias; Hagemann, Stefan; Loew, Alexander

    2013-04-01

    A major obstacle to a correct parametrization of soil processes in large scale global land surface models is the lack of long term soil moisture observations for large parts of the globe. Currently, a compilation of soil moisture data derived from a range of satellites is released by the ESA Climate Change Initiative (ECV_SM). Comprising the period from 1978 until 2010, it provides the opportunity to compute climatological relevant statistics on a quasi-global scale and to compare these to the output of climate models. Our study is focused on the investigation of soil moisture memory in satellite observations and models. As a proxy for memory we compute the autocorrelation length (ACL) of the available satellite data and the uppermost soil layer of the models. Additional to the ECV_SM data, AMSR-E soil moisture is used as observational estimate. Simulated soil moisture fields are taken from ERA-Interim reanalysis and generated with the land surface model JSBACH, which was driven with quasi-observational meteorological forcing data. The satellite data show ACLs between one week and one month for the greater part of the land surface while the models simulate a longer memory of up to two months. Some pattern are similar in models and observations, e.g. a longer memory in the Sahel Zone and the Arabian Peninsula, but the models are not able to reproduce regions with a very short ACL of just a few days. If the long term seasonality is subtracted from the data the memory is strongly shortened, indicating the importance of seasonal variations for the memory in most regions. Furthermore, we analyze the change of soil moisture memory in the different soil layers of the models to investigate to which extent the surface soil moisture includes information about the whole soil column. A first analysis reveals that the ACL is increasing for deeper layers. However, its increase is stronger in the soil moisture anomaly than in its absolute values and the first even exceeds the latter in the deepest layer. From this we conclude that the seasonal soil moisture variations dominate the memory close to the surface but these are dampened in lower layers where the memory is mainly affected by longer term variations.

  10. Are the neural substrates of memory the final common pathway in posttraumatic stress disorder (PTSD)?

    PubMed

    Elzinga, B M; Bremner, J D

    2002-06-01

    A model for the posttraumatic stress disorder (PTSD) as a disorder of memory is presented drawing both on psychological and neurobiological data. Evidence on intrusive memories and deficits in declarative memory function in PTSD-patients is reviewed in relation to three brain areas that are involved in memory functioning and the stress response: the hippocampus, amygdala, and the prefrontal cortex. Neurobiological studies have shown that the noradrenergic stress-system is involved in enhanced encoding of emotional memories, sensitization, and fear conditioning, by way of its effects on the amygdala. Chronic stress also affects the hippocampus, a brain area involved in declarative memories, suggesting that hippocampal dysfunction may partly account for the deficits in declarative memory in PTSD-patients. Deficits in the medial prefrontal cortex, a structure that normally inhibits the amygdala, may further enhance the effects of the amygdala, thereby increasing the frequency and intensity of the traumatic memories. Thus, by way of its influence on these brain structures, exposure to severe stress may simultaneously result in strong emotional reactions and in difficulties to recall the emotional event. This model is also relevant for understanding the distinction between declarative and non-declarative memory-functions in processing trauma-related information in PTSD. Implications of our model are reviewed.

  11. Are the neural substrates of memory the final common pathway in posttraumatic stress disorder (PTSD)?

    PubMed Central

    Elzinga, B.M.; Bremner, J.D.

    2017-01-01

    A model for the posttraumatic stress disorder (PTSD) as a disorder of memory is presented drawing both on psychological and neurobiological data. Evidence on intrusive memories and deficits in declarative memory function in PTSD-patients is reviewed in relation to three brain areas that are involved in memory functioning and the stress response: the hippocampus, amygdala, and the prefrontal cortex. Neurobiological studies have shown that the noradrenergic stress-system is involved in enhanced encoding of emotional memories, sensitization, and fear conditioning, by way of its effects on the amygdala. Chronic stress also affects the hippocampus, a brain area involved in declarative memories, suggesting that hippocampal dysfunction may partly account for the deficits in declarative memory in PTSD-patients. Deficits in the medial prefrontal cortex, a structure that normally inhibits the amygdala, may further enhance the effects of the amygdala, thereby increasing the frequency and intensity of the traumatic memories. Thus, by way of its influence on these brain structures, exposure to severe stress may simultaneously result in strong emotional reactions and in difficulties to recall the emotional event. This model is also relevant for understanding the distinction between declarative and non-declarative memory-functions in processing trauma-related information in PTSD. Implications of our model are reviewed. PMID:12113915

  12. Working memory for braille is shaped by experience

    PubMed Central

    Scherzer, Peter; Viau, Robert; Voss, Patrice; Lepore, Franco

    2011-01-01

    Tactile working memory was found to be more developed in completely blind (congenital and acquired) than in semi-sighted subjects, indicating that experience plays a crucial role in shaping working memory. A model of working memory, adapted from the classical model proposed by Baddeley and Hitch1 and Baddeley2 is presented where the connection strengths of a highly cross-modal network are altered through experience. PMID:21655448

  13. Recollection is a continuous process: Evidence from plurality memory receiver operating characteristics.

    PubMed

    Slotnick, Scott D; Jeye, Brittany M; Dodson, Chad S

    2016-01-01

    Is recollection a continuous/graded process or a threshold/all-or-none process? Receiver operating characteristic (ROC) analysis can answer this question as the continuous model and the threshold model predict curved and linear recollection ROCs, respectively. As memory for plurality, an item's previous singular or plural form, is assumed to rely on recollection, the nature of recollection can be investigated by evaluating plurality memory ROCs. The present study consisted of four experiments. During encoding, words (singular or plural) or objects (single/singular or duplicate/plural) were presented. During retrieval, old items with the same plurality or different plurality were presented. For each item, participants made a confidence rating ranging from "very sure old", which was correct for same plurality items, to "very sure new", which was correct for different plurality items. Each plurality memory ROC was the proportion of same versus different plurality items classified as "old" (i.e., hits versus false alarms). Chi-squared analysis revealed that all of the plurality memory ROCs were adequately fit by the continuous unequal variance model, whereas none of the ROCs were adequately fit by the two-high threshold model. These plurality memory ROC results indicate recollection is a continuous process, which complements previous source memory and associative memory ROC findings.

  14. Sleep and memory in the making. Are current concepts sufficient in children?

    PubMed

    Peigneux, P

    2014-01-01

    Memory consolidation is an active process wired in brain plasticity. How plasticity mechanisms develop and are modulated after learning is at the core of current models of sleep-dependent memory consolidation. Nowadays, two main classes of sleep-related memory consolidation theories are proposed, namely system consolidation and synaptic homeostasis. However, novel models of consolidation emerge, that might better account for the highly dynamic and interactive processes of encoding and memory consolidation. Processing steps can take place at various temporal phases and be modulated by interactions with prior experiences and ongoing events. In this perspective, sleep might support (or not) memory consolidation processes under specific neurophysiological and environmental circumstances leading to enduring representations in long-term memory stores. We outline here a discussion about how current and emergent models account for the complexity and apparent inconsistency of empirical data. Additionally, models aimed at understanding neurophysiological and/or cognitive processes should not only provide a satisfactory explanation for the phenomena at stake, but also account for their ontogeny and the conditions that disrupt their organisation. Looking at the available literature, this developmental condition appears to remain unfulfilled when trying to understand the relationships between sleep, learning and memory consolidation processes, both in healthy children and in children with pathological conditions.

  15. Time Series Model Identification and Prediction Variance Horizon.

    DTIC Science & Technology

    1980-06-01

    stationary time series Y(t). -6- In terms of p(v), the definition of the three time series memory types is: No Memory Short Memory Long Memory X IP (v)I 0 0...X lp(v)l < - I IP (v) = v=1 v=l v=l Within short memory time series there are three types whose classification in terms of correlation functions is...1974) "Some Recent Advances in Time Series Modeling", TEEE Transactions on Automatic ControZ, VoZ . AC-19, No. 6, December, 723-730. Parzen, E. (1976) "An

  16. Human learning and memory.

    PubMed

    Johnson, M K; Hasher, L

    1987-01-01

    There have been several notable recent trends in the area of learning and memory. Problems with the episodic/semantic distinction have become more apparent, and new efforts have been made (exemplar models, distributed-memory models) to represent general knowledge without assuming a separate semantic system. Less emphasis is being placed on stable, prestored prototypes and more emphasis on a flexible memory system that provides the basis for a multitude of categories or frames of reference, derived on the spot as tasks demand. There is increasing acceptance of the idea that mental models are constructed and stored in memory in addition to, rather than instead of, memorial representations that are more closely tied to perceptions. This gives rise to questions concerning the conditions that permit inferences to be drawn and mental models to be constructed, and to questions concerning the similarities and differences in the nature of the representations in memory of perceived and generated information and in their functions. There has also been a swing from interest in deliberate strategies to interest in automatic, unconscious (even mechanistic!) processes, reflecting an appreciation that certain situations (e.g. recognition, frequency judgements, savings in indirect tasks, aspects of skill acquisition, etc) seem not to depend much on the products of strategic, effortful or reflective processes. There is a lively interest in relations among memory measures and attempts to characterize memory representations and/or processes that could give rise to dissociations among measures. Whether the pattern of results reflects the operation of functional subsystems of memory and, if so, what the "modules" are is far from clear. This issue has been fueled by work with amnesics and has contributed to a revival of interaction between researchers studying learning and memory in humans and those studying learning and memory in animals. Thus, neuroscience rivals computer science as a source of interdisciplinary stimulation. Research on topics such as memory for spatial location, the relation between memory and affect, and autobiographical memory reminds us that general theories of memory based on studies of verbal materials alone are limited. Investigating how people remember complex natural events should provide us with a larger set of memory phenomena to explain and consequently insight into a wider range of memory principles or a deeper understanding of the ones we already accept (e.g. the role of repetition, encoding specificity), including their functional significance for human behavior.(ABSTRACT TRUNCATED AT 400 WORDS)

  17. Social Models Enhance Apes' Memory for Novel Events.

    PubMed

    Howard, Lauren H; Wagner, Katherine E; Woodward, Amanda L; Ross, Stephen R; Hopper, Lydia M

    2017-01-20

    Nonhuman primates are more likely to learn from the actions of a social model than a non-social "ghost display", however the mechanism underlying this effect is still unknown. One possibility is that live models are more engaging, drawing increased attention to social stimuli. However, recent research with humans has suggested that live models fundamentally alter memory, not low-level attention. In the current study, we developed a novel eye-tracking paradigm to disentangle the influence of social context on attention and memory in apes. Tested in two conditions, zoo-housed apes (2 gorillas, 5 chimpanzees) were familiarized to videos of a human hand (social condition) and mechanical claw (non-social condition) constructing a three-block tower. During the memory test, subjects viewed side-by-side pictures of the previously-constructed block tower and a novel block tower. In accordance with looking-time paradigms, increased looking time to the novel block tower was used to measure event memory. Apes evidenced memory for the event featuring a social model, though not for the non-social condition. This effect was not dependent on attention differences to the videos. These findings provide the first evidence that, like humans, social stimuli increase nonhuman primates' event memory, which may aid in information transmission via social learning.

  18. Social Models Enhance Apes’ Memory for Novel Events

    PubMed Central

    Howard, Lauren H.; Wagner, Katherine E.; Woodward, Amanda L.; Ross, Stephen R.; Hopper, Lydia M.

    2017-01-01

    Nonhuman primates are more likely to learn from the actions of a social model than a non-social “ghost display”, however the mechanism underlying this effect is still unknown. One possibility is that live models are more engaging, drawing increased attention to social stimuli. However, recent research with humans has suggested that live models fundamentally alter memory, not low-level attention. In the current study, we developed a novel eye-tracking paradigm to disentangle the influence of social context on attention and memory in apes. Tested in two conditions, zoo-housed apes (2 gorillas, 5 chimpanzees) were familiarized to videos of a human hand (social condition) and mechanical claw (non-social condition) constructing a three-block tower. During the memory test, subjects viewed side-by-side pictures of the previously-constructed block tower and a novel block tower. In accordance with looking-time paradigms, increased looking time to the novel block tower was used to measure event memory. Apes evidenced memory for the event featuring a social model, though not for the non-social condition. This effect was not dependent on attention differences to the videos. These findings provide the first evidence that, like humans, social stimuli increase nonhuman primates’ event memory, which may aid in information transmission via social learning. PMID:28106098

  19. A single-trace dual-process model of episodic memory: a novel computational account of familiarity and recollection.

    PubMed

    Greve, Andrea; Donaldson, David I; van Rossum, Mark C W

    2010-02-01

    Dual-process theories of episodic memory state that retrieval is contingent on two independent processes: familiarity (providing a sense of oldness) and recollection (recovering events and their context). A variety of studies have reported distinct neural signatures for familiarity and recollection, supporting dual-process theory. One outstanding question is whether these signatures reflect the activation of distinct memory traces or the operation of different retrieval mechanisms on a single memory trace. We present a computational model that uses a single neuronal network to store memory traces, but two distinct and independent retrieval processes access the memory. The model is capable of performing familiarity and recollection-based discrimination between old and new patterns, demonstrating that dual-process models need not to rely on multiple independent memory traces, but can use a single trace. Importantly, our putative familiarity and recollection processes exhibit distinct characteristics analogous to those found in empirical data; they diverge in capacity and sensitivity to sparse and correlated patterns, exhibit distinct ROC curves, and account for performance on both item and associative recognition tests. The demonstration that a single-trace, dual-process model can account for a range of empirical findings highlights the importance of distinguishing between neuronal processes and the neuronal representations on which they operate.

  20. From Brown-Peterson to continual distractor via operation span: A SIMPLE account of complex span.

    PubMed

    Neath, Ian; VanWormer, Lisa A; Bireta, Tamra J; Surprenant, Aimée M

    2014-09-01

    Three memory tasks-Brown-Peterson, complex span, and continual distractor-all alternate presentation of a to-be-remembered item and a distractor activity, but each task is associated with a different memory system, short-term memory, working memory, and long-term memory, respectively. SIMPLE, a relative local distinctiveness model, has previously been fit to data from both the Brown-Peterson and continual distractor tasks; here we use the same version of the model to fit data from a complex span task. Despite the many differences between the tasks, including unpredictable list length, SIMPLE fit the data well. Because SIMPLE posits a single memory system, these results constitute yet another demonstration that performance on tasks originally thought to tap different memory systems can be explained without invoking multiple memory systems.

  1. Median Hetero-Associative Memories Applied to the Categorization of True-Color Patterns

    NASA Astrophysics Data System (ADS)

    Vázquez, Roberto A.; Sossa, Humberto

    Median associative memories (MED-AMs) are a special type of associative memory based on the median operator. This type of associative model has been applied to the restoration of gray scale images and provides better performance than other models, such as morphological associative memories, when the patterns are altered with mixed noise. Despite of his power, MED-AMs have not been applied in problems involving true-color patterns. In this paper we describe how a median hetero-associative memory (MED-HAM) could be applied in problems that involve true-color patterns. A complete study of the behavior of this associative model in the restoration of true-color images is performed using a benchmark of 14400 images altered by different type of noises. Furthermore, we describe how this model can be applied to an image categorization problem.

  2. An Investigation of Unified Memory Access Performance in CUDA

    PubMed Central

    Landaverde, Raphael; Zhang, Tiansheng; Coskun, Ayse K.; Herbordt, Martin

    2015-01-01

    Managing memory between the CPU and GPU is a major challenge in GPU computing. A programming model, Unified Memory Access (UMA), has been recently introduced by Nvidia to simplify the complexities of memory management while claiming good overall performance. In this paper, we investigate this programming model and evaluate its performance and programming model simplifications based on our experimental results. We find that beyond on-demand data transfers to the CPU, the GPU is also able to request subsets of data it requires on demand. This feature allows UMA to outperform full data transfer methods for certain parallel applications and small data sizes. We also find, however, that for the majority of applications and memory access patterns, the performance overheads associated with UMA are significant, while the simplifications to the programming model restrict flexibility for adding future optimizations. PMID:26594668

  3. Colored noise and memory effects on formal spiking neuron models

    NASA Astrophysics Data System (ADS)

    da Silva, L. A.; Vilela, R. D.

    2015-06-01

    Simplified neuronal models capture the essence of the electrical activity of a generic neuron, besides being more interesting from the computational point of view when compared to higher-dimensional models such as the Hodgkin-Huxley one. In this work, we propose a generalized resonate-and-fire model described by a generalized Langevin equation that takes into account memory effects and colored noise. We perform a comprehensive numerical analysis to study the dynamics and the point process statistics of the proposed model, highlighting interesting new features such as (i) nonmonotonic behavior (emergence of peak structures, enhanced by the choice of colored noise characteristic time scale) of the coefficient of variation (CV) as a function of memory characteristic time scale, (ii) colored noise-induced shift in the CV, and (iii) emergence and suppression of multimodality in the interspike interval (ISI) distribution due to memory-induced subthreshold oscillations. Moreover, in the noise-induced spike regime, we study how memory and colored noise affect the coherence resonance (CR) phenomenon. We found that for sufficiently long memory, not only is CR suppressed but also the minimum of the CV-versus-noise intensity curve that characterizes the presence of CR may be replaced by a maximum. The aforementioned features allow to interpret the interplay between memory and colored noise as an effective control mechanism to neuronal variability. Since both variability and nontrivial temporal patterns in the ISI distribution are ubiquitous in biological cells, we hope the present model can be useful in modeling real aspects of neurons.

  4. The cognitive processes underlying event-based prospective memory in school-age children and young adults: a formal model-based study.

    PubMed

    Smith, Rebekah E; Bayen, Ute J; Martin, Claudia

    2010-01-01

    Fifty children 7 years of age (29 girls, 21 boys), 53 children 10 years of age (29 girls, 24 boys), and 36 young adults (19 women, 17 men) performed a computerized event-based prospective memory task. All 3 groups differed significantly in prospective memory performance, with adults showing the best performance and with 7-year-olds showing the poorest performance. We used a formal multinomial process tree model of event-based prospective memory to decompose age differences in cognitive processes that jointly contribute to prospective memory performance. The formal modeling results demonstrate that adults differed significantly from the 7-year-olds and the 10-year-olds on both the prospective component and the retrospective component of the task. The 7-year-olds and the 10-year-olds differed only in the ability to recognize prospective memory target events. The prospective memory task imposed a cost to ongoing activities in all 3 age groups. Copyright 2009 APA, all rights reserved.

  5. Olfactory memory in the old and very old: relations to episodic and semantic memory and APOE genotype.

    PubMed

    Larsson, Maria; Hedner, Margareta; Papenberg, Goran; Seubert, Janina; Bäckman, Lars; Laukka, Erika J

    2016-02-01

    The neuroanatomical organization that underlies olfactory memory is different from that of other memory types. The present work examines olfactory memory in an elderly population-based sample (Swedish National Study on Aging and Care in Kungsholmen) aged 60-100 years (n = 2280). We used structural equation modeling to investigate whether olfactory memory in old age is best conceptualized as a distinct category, differentiated from episodic and semantic memory. Further, potential olfactory dedifferentiation and genetic associations (APOE) to olfactory function in late senescence were investigated. Results are in support of a 3-factor solution where olfactory memory, as indexed by episodic odor recognition and odor identification, is modeled separately from episodic and semantic memory for visual and verbal information. Increasing age was associated with poorer olfactory memory performance, and observed age-related deficits were further exacerbated for carriers of the APOE ε4 allele; these effects tended to be larger for olfactory memory compared to episodic and semantic memory pertaining to other sensory systems (vision, auditory). Finally, stronger correlations between olfactory and episodic memory, indicating dedifferentiation, were observed in the older age groups. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. Parental influences on memories of parents and friends.

    PubMed

    Tani, Franca; Bonechi, Alice; Peterson, Carole; Smorti, Andrea

    2010-01-01

    The authors evaluated the role parent-child relationship quality has on two types of memories, those of parents and those of friends. Participants were 198 Italian university students who recalled memories during 4 separate timed memory-fluency tasks about their preschool, elementary school, middle school, high school and university years. Half were instructed to recall memories involving parents and the remainder memories involving friends. Moreover, parent-child relationships were assessed by the Network of Relationships Inventory (NRI; W. Furman & D. Buhrmester, 1985) and Adolescents' Report of Parental Monitoring (D. M. Capaldi & G. R. Patterson, 1989). Results showed that men with positive parent-son relationships had more memories of parents and more affectively positive memories of friends, supporting a consistency model positing similarity between parent-child relationships and memories of friends. Women with positive parental relationship quality had more affectively positive memories of parents but for friends, positive relationship quality only predicted positive memories when young. At older ages, especially middle school-aged children, negative parent-daughter relationships predicted more positive memories of friends, supporting a compensatory model. The gender of parent also mattered, with fathers having a more influential role on affect for memories of friends.

  7. Limited capacity of working memory in unihemispheric random walks implies conceivable slow dispersal.

    PubMed

    Wei, Kun; Zhong, Suchuan

    2017-08-01

    Phenomenologically inspired by dolphins' unihemispheric sleep, we introduce a minimal model for random walks with physiological memory. The physiological memory consists of long-term memory which includes unconscious implicit memory and conscious explicit memory, and working memory which serves as a multi-component system for integrating, manipulating and managing short-term storage. The model assumes that the sleeping state allows retrievals of episodic objects merely from the episodic buffer where these memory objects are invoked corresponding to the ambient objects and are thus object-oriented, together with intermittent but increasing use of implicit memory in which decisions are unconsciously picked up from historical time series. The process of memory decay and forgetting is constructed in the episodic buffer. The walker's risk attitude, as a product of physiological heuristics according to the performance of objected-oriented decisions, is imposed on implicit memory. The analytical results of unihemispheric random walks with the mixture of object-oriented and time-oriented memory, as well as the long-time behavior which tends to the use of implicit memory, are provided, indicating the common sense that a conservative risk attitude is inclinable to slow movement.

  8. Reasoning=working Memory<>attention

    ERIC Educational Resources Information Center

    Buehner, M.; Krumm, S.; Pick, M.

    2005-01-01

    The purpose of this study was to clarify the relationship between attention, components of working memory, and reasoning. Therefore, twenty working memory tests, two attention tests, and nine intelligence subtests were administered to 135 students. Using structural equation modeling, we were able to replicate a functional model of working memory…

  9. Fast 3D Surface Extraction 2 pages (including abstract)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sewell, Christopher Meyer; Patchett, John M.; Ahrens, James P.

    Ocean scientists searching for isosurfaces and/or thresholds of interest in high resolution 3D datasets required a tedious and time-consuming interactive exploration experience. PISTON research and development activities are enabling ocean scientists to rapidly and interactively explore isosurfaces and thresholds in their large data sets using a simple slider with real time calculation and visualization of these features. Ocean Scientists can now visualize more features in less time, helping them gain a better understanding of the high resolution data sets they work with on a daily basis. Isosurface timings (512{sup 3} grid): VTK 7.7 s, Parallel VTK (48-core) 1.3 s, PISTONmore » OpenMP (48-core) 0.2 s, PISTON CUDA (Quadro 6000) 0.1 s.« less

  10. Seeing the Wood for the Trees: Applying the dual-memory system model to investigate expert teachers' observational skills in natural ecological learning environments

    NASA Astrophysics Data System (ADS)

    Stolpe, Karin; Björklund, Lars

    2012-01-01

    This study aims to investigate two expert ecology teachers' ability to attend to essential details in a complex environment during a field excursion, as well as how they teach this ability to their students. In applying a cognitive dual-memory system model for learning, we also suggest a rationale for their behaviour. The model implies two separate memory systems: the implicit, non-conscious, non-declarative system and the explicit, conscious, declarative system. This model provided the starting point for the research design. However, it was revised from the empirical findings supported by new theoretical insights. The teachers were video and audio recorded during their excursion and interviewed in a stimulated recall setting afterwards. The data were qualitatively analysed using the dual-memory system model. The results show that the teachers used holistic pattern recognition in their own identification of natural objects. However, teachers' main strategy to teach this ability is to give the students explicit rules or specific characteristics. According to the dual-memory system model the holistic pattern recognition is processed in the implicit memory system as a non-conscious match with earlier experienced situations. We suggest that this implicit pattern matching serves as an explanation for teachers' ecological and teaching observational skills. Another function of the implicit memory system is its ability to control automatic behaviour and non-conscious decision-making. The teachers offer the students firsthand sensory experiences which provide a prerequisite for the formation of implicit memories that provides a foundation for expertise.

  11. A Latent Variable Analysis of Working Memory Capacity, Short-Term Memory Capacity, Processing Speed, and General Fluid Intelligence.

    ERIC Educational Resources Information Center

    Conway, Andrew R. A.; Cowan, Nelsin; Bunting, Michael F.; Therriault, David J.; Minkoff, Scott R. B.

    2002-01-01

    Studied the interrelationships among general fluid intelligence, short-term memory capacity, working memory capacity, and processing speed in 120 young adults and used structural equation modeling to determine the best predictor of general fluid intelligence. Results suggest that working memory capacity, but not short-term memory capacity or…

  12. Multiple Memory Systems Are Unnecessary to Account for Infant Memory Development: An Ecological Model

    ERIC Educational Resources Information Center

    Rovee-Collier, Carolyn; Cuevas, Kimberly

    2009-01-01

    How the memory of adults evolves from the memory abilities of infants is a central problem in cognitive development. The popular solution holds that the multiple memory systems of adults mature at different rates during infancy. The "early-maturing system" (implicit or nondeclarative memory) functions automatically from birth, whereas the…

  13. Human short-term spatial memory: precision predicts capacity.

    PubMed

    Banta Lavenex, Pamela; Boujon, Valérie; Ndarugendamwo, Angélique; Lavenex, Pierre

    2015-03-01

    Here, we aimed to determine the capacity of human short-term memory for allocentric spatial information in a real-world setting. Young adults were tested on their ability to learn, on a trial-unique basis, and remember over a 1-min interval the location(s) of 1, 3, 5, or 7 illuminating pads, among 23 pads distributed in a 4m×4m arena surrounded by curtains on three sides. Participants had to walk to and touch the pads with their foot to illuminate the goal locations. In contrast to the predictions from classical slot models of working memory capacity limited to a fixed number of items, i.e., Miller's magical number 7 or Cowan's magical number 4, we found that the number of visited locations to find the goals was consistently about 1.6 times the number of goals, whereas the number of correct choices before erring and the number of errorless trials varied with memory load even when memory load was below the hypothetical memory capacity. In contrast to resource models of visual working memory, we found no evidence that memory resources were evenly distributed among unlimited numbers of items to be remembered. Instead, we found that memory for even one individual location was imprecise, and that memory performance for one location could be used to predict memory performance for multiple locations. Our findings are consistent with a theoretical model suggesting that the precision of the memory for individual locations might determine the capacity of human short-term memory for spatial information. Copyright © 2015 Elsevier Inc. All rights reserved.

  14. Recurrent Neural Networks With Auxiliary Memory Units.

    PubMed

    Wang, Jianyong; Zhang, Lei; Guo, Quan; Yi, Zhang

    2018-05-01

    Memory is one of the most important mechanisms in recurrent neural networks (RNNs) learning. It plays a crucial role in practical applications, such as sequence learning. With a good memory mechanism, long term history can be fused with current information, and can thus improve RNNs learning. Developing a suitable memory mechanism is always desirable in the field of RNNs. This paper proposes a novel memory mechanism for RNNs. The main contributions of this paper are: 1) an auxiliary memory unit (AMU) is proposed, which results in a new special RNN model (AMU-RNN), separating the memory and output explicitly and 2) an efficient learning algorithm is developed by employing the technique of error flow truncation. The proposed AMU-RNN model, together with the developed learning algorithm, can learn and maintain stable memory over a long time range. This method overcomes both the learning conflict problem and gradient vanishing problem. Unlike the traditional method, which mixes the memory and output with a single neuron in a recurrent unit, the AMU provides an auxiliary memory neuron to maintain memory in particular. By separating the memory and output in a recurrent unit, the problem of learning conflicts can be eliminated easily. Moreover, by using the technique of error flow truncation, each auxiliary memory neuron ensures constant error flow during the learning process. The experiments demonstrate good performance of the proposed AMU-RNNs and the developed learning algorithm. The method exhibits quite efficient learning performance with stable convergence in the AMU-RNN learning and outperforms the state-of-the-art RNN models in sequence generation and sequence classification tasks.

  15. The relationships between memory systems and sleep stages.

    PubMed

    Rauchs, Géraldine; Desgranges, Béatrice; Foret, Jean; Eustache, Francis

    2005-06-01

    Sleep function remains elusive despite our rapidly increasing comprehension of the processes generating and maintaining the different sleep stages. Several lines of evidence support the hypothesis that sleep is involved in the off-line reprocessing of recently-acquired memories. In this review, we summarize the main results obtained in the field of sleep and memory consolidation in both animals and humans, and try to connect sleep stages with the different memory systems. To this end, we have collated data obtained using several methodological approaches, including electrophysiological recordings of neuronal ensembles, post-training modifications of sleep architecture, sleep deprivation and functional neuroimaging studies. Broadly speaking, all the various studies emphasize the fact that the four long-term memory systems (procedural memory, perceptual representation system, semantic and episodic memory, according to Tulving's SPI model; Tulving, 1995) benefit either from non-rapid eye movement (NREM) (not just SWS) or rapid eye movement (REM) sleep, or from both sleep stages. Tulving's classification of memory systems appears more pertinent than the declarative/non-declarative dichotomy when it comes to understanding the role of sleep in memory. Indeed, this model allows us to resolve several contradictions, notably the fact that episodic and semantic memory (the two memory systems encompassed in declarative memory) appear to rely on different sleep stages. Likewise, this model provides an explanation for why the acquisition of various types of skills (perceptual-motor, sensory-perceptual and cognitive skills) and priming effects, subserved by different brain structures but all designated by the generic term of implicit or non-declarative memory, may not benefit from the same sleep stages.

  16. Examining Object Location and Object Recognition Memory in Mice

    PubMed Central

    Vogel-Ciernia, Annie; Wood, Marcelo A.

    2014-01-01

    Unit Introduction The ability to store and recall our life experiences defines a person's identity. Consequently, the loss of long-term memory is a particularly devastating part of a variety of cognitive disorders, diseases and injuries. There is a great need to develop therapeutics to treat memory disorders, and thus a variety of animal models and memory paradigms have been developed. Mouse models have been widely used both to study basic disease mechanisms and to evaluate potential drug targets for therapeutic development. The relative ease of genetic manipulation of Mus musculus has led to a wide variety of genetically altered mice that model cognitive disorders ranging from Alzheimer's disease to autism. Rodents, including mice, are particularly adept at encoding and remembering spatial relationships, and these long-term spatial memories are dependent on the medial temporal lobe of the brain. These brain regions are also some of the first and most heavily impacted in disorders of human memory including Alzheimer's disease. Consequently, some of the simplest and most commonly used tests of long-term memory in mice are those that examine memory for objects and spatial relationships. However, many of these tasks, such as Morris water maze and contextual fear conditioning, are dependent upon the encoding and retrieval of emotionally aversive and inherently stressful training events. While these types of memories are important, they do not reflect the typical day-to-day experiences or memories most commonly affected in human disease. In addition, stress hormone release alone can modulate memory and thus obscure or artificially enhance these types of tasks. To avoid these sorts of confounds, we and many others have utilized tasks testing animals’ memory for object location and novel object recognition. These tasks involve exploiting rodents’ innate preference for novelty, and are inherently not stressful. In this protocol we detail how memory for object location and object identity can be used to evaluate a wide variety of mouse models and treatments. PMID:25297693

  17. Effects of children's working memory capacity and processing speed on their sentence imitation performance.

    PubMed

    Poll, Gerard H; Miller, Carol A; Mainela-Arnold, Elina; Adams, Katharine Donnelly; Misra, Maya; Park, Ji Sook

    2013-01-01

    More limited working memory capacity and slower processing for language and cognitive tasks are characteristics of many children with language difficulties. Individual differences in processing speed have not consistently been found to predict language ability or severity of language impairment. There are conflicting views on whether working memory and processing speed are integrated or separable abilities. To evaluate four models for the relations of individual differences in children's processing speed and working memory capacity in sentence imitation. The models considered whether working memory and processing speed are integrated or separable, as well as the effect of the number of operations required per sentence. The role of working memory as a mediator of the effect of processing speed on sentence imitation was also evaluated. Forty-six children with varied language and reading abilities imitated sentences. Working memory was measured with the Competing Language Processing Task (CLPT), and processing speed was measured with a composite of truth-value judgment and rapid automatized naming tasks. Mixed-effects ordinal regression models evaluated the CLPT and processing speed as predictors of sentence imitation item scores. A single mediator model evaluated working memory as a mediator of the effect of processing speed on sentence imitation total scores. Working memory was a reliable predictor of sentence imitation accuracy, but processing speed predicted sentence imitation only as a component of a processing speed by number of operations interaction. Processing speed predicted working memory capacity, and there was evidence that working memory acted as a mediator of the effect of processing speed on sentence imitation accuracy. The findings support a refined view of working memory and processing speed as separable factors in children's sentence imitation performance. Processing speed does not independently explain sentence imitation accuracy for all sentence types, but contributes when the task requires more mental operations. Processing speed also has an indirect effect on sentence imitation by contributing to working memory capacity. © 2013 Royal College of Speech and Language Therapists.

  18. Effects of ketamine, dexmedetomidine and propofol anesthesia on emotional memory consolidation in rats: Consequences for the development of post-traumatic stress disorder.

    PubMed

    Morena, Maria; Berardi, Andrea; Peloso, Andrea; Valeri, Daniela; Palmery, Maura; Trezza, Viviana; Schelling, Gustav; Campolongo, Patrizia

    2017-06-30

    Intensive Care Unit (ICU) or emergency care patients, exposed to traumatic events, are at increased risk for Post-Traumatic Stress Disorder (PTSD) development. Commonly used sedative/anesthetic agents can interfere with the mechanisms of memory formation, exacerbating or attenuating the memory for the traumatic event, and subsequently promote or reduce the risk of PTSD development. Here, we evaluated the effects of ketamine, dexmedetomidine and propofol on fear memory consolidation and subsequent cognitive and emotional alterations related to traumatic stress exposure. Immediately following an inhibitory avoidance training, rats were intraperitoneally injected with ketamine (100-125mg/kg), dexmedetomidine (0.3-0.4mg/kg) or their vehicle and tested for 48h memory retention. Furthermore, the effects of ketamine (125mg/kg), dexmedetomidine (0.4mg/kg), propofol (300mg/kg) or their vehicle on long-term memory and social interaction were evaluated two weeks after drug injection in a rat PTSD model. Ketamine anesthesia increased memory retention without altering the traumatic memory strength in the PTSD model. However, ketamine induced a long-term reduction of social behavior. Conversely, dexmedetomidine markedly impaired memory retention, without affecting long-lasting cognitive or emotional behaviors in the PTSD model. We have previously shown that propofol anesthesia enhanced 48h memory retention. Here, we found that propofol induced an enduring traumatic memory enhancement and anxiogenic effects in the PTSD model. These findings provide new evidence for clinical studies showing that the use of ketamine or propofol anesthesia in emergency care and ICU might be more likely to promote the development of PTSD, while dexmedetomidine might have prophylactic effects. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. A Self-Organizing Incremental Spatiotemporal Associative Memory Networks Model for Problems with Hidden State

    PubMed Central

    2016-01-01

    Identifying the hidden state is important for solving problems with hidden state. We prove any deterministic partially observable Markov decision processes (POMDP) can be represented by a minimal, looping hidden state transition model and propose a heuristic state transition model constructing algorithm. A new spatiotemporal associative memory network (STAMN) is proposed to realize the minimal, looping hidden state transition model. STAMN utilizes the neuroactivity decay to realize the short-term memory, connection weights between different nodes to represent long-term memory, presynaptic potentials, and synchronized activation mechanism to complete identifying and recalling simultaneously. Finally, we give the empirical illustrations of the STAMN and compare the performance of the STAMN model with that of other methods. PMID:27891146

  20. The MNESIS model: Memory systems and processes, identity and future thinking.

    PubMed

    Eustache, Francis; Viard, Armelle; Desgranges, Béatrice

    2016-07-01

    The Memory NEo-Structural Inter-Systemic model (MNESIS; Eustache and Desgranges, Neuropsychology Review, 2008) is a macromodel based on neuropsychological data which presents an interactive construction of memory systems and processes. Largely inspired by Tulving's SPI model, MNESIS puts the emphasis on the existence of different memory systems in humans and their reciprocal relations, adding new aspects, such as the episodic buffer proposed by Baddeley. The more integrative comprehension of brain dynamics offered by neuroimaging has contributed to rethinking the existence of memory systems. In the present article, we will argue that understanding the concept of memory by dividing it into systems at the functional level is still valid, but needs to be considered in the light of brain imaging. Here, we reinstate the importance of this division in different memory systems and illustrate, with neuroimaging findings, the links that operate between memory systems in response to task demands that constrain the brain dynamics. During a cognitive task, these memory systems interact transiently to rapidly assemble representations and mobilize functions to propose a flexible and adaptative response. We will concentrate on two memory systems, episodic and semantic memory, and their links with autobiographical memory. More precisely, we will focus on interactions between episodic and semantic memory systems in support of 1) self-identity in healthy aging and in brain pathologies and 2) the concept of the prospective brain during future projection. In conclusion, this MNESIS global framework may help to get a general representation of human memory and its brain implementation with its specific components which are in constant interaction during cognitive processes. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. Inferring Soil Moisture Memory from Streamflow Observations Using a Simple Water Balance Model

    NASA Technical Reports Server (NTRS)

    Orth, Rene; Koster, Randal Dean; Seneviratne, Sonia I.

    2013-01-01

    Soil moisture is known for its integrative behavior and resulting memory characteristics. Soil moisture anomalies can persist for weeks or even months into the future, making initial soil moisture a potentially important contributor to skill in weather forecasting. A major difficulty when investigating soil moisture and its memory using observations is the sparse availability of long-term measurements and their limited spatial representativeness. In contrast, there is an abundance of long-term streamflow measurements for catchments of various sizes across the world. We investigate in this study whether such streamflow measurements can be used to infer and characterize soil moisture memory in respective catchments. Our approach uses a simple water balance model in which evapotranspiration and runoff ratios are expressed as simple functions of soil moisture; optimized functions for the model are determined using streamflow observations, and the optimized model in turn provides information on soil moisture memory on the catchment scale. The validity of the approach is demonstrated with data from three heavily monitored catchments. The approach is then applied to streamflow data in several small catchments across Switzerland to obtain a spatially distributed description of soil moisture memory and to show how memory varies, for example, with altitude and topography.

  2. Interaction between emotion and memory: importance of mammillary bodies damage in a mouse model of the alcoholic Korsakoff syndrome.

    PubMed

    Béracochéa, Daniel

    2005-01-01

    Chronic alcohol consumption (CAC) can lead to the Korsakoff syndrome (KS), a memory deficiency attributed to diencephalic damage and/or to medial temporal or cortical related dysfunction. The etiology of KS remains unclear. Most animal models of KS involve thiamine-deficient diets associated with pyrithiamine treatment. Here we present a mouse model of CAC-induced KS. We demonstrate that CAC-generated retrieval memory deficits in working/ episodic memory tasks, together with a reduction of fear reactivity, result from damage to the mammillary bodies (MB). Experimental lesions of MB in non-alcoholic mice produced the same memory and emotional impairments. Drugs having anxiogenic-like properties counteract such impairments produced by CAC or by MB lesions. We suggest (a) that MB are the essential components of a brain network underlying emotional processes, which would be critically important in the retrieval processes involved in working/ episodic memory tasks, and (b) that failure to maintain emotional arousal due to MB damage can be a main factor of CAC-induced memory deficits. Overall, our animal model fits well with general neuropsychological and anatomic impairments observed in KS.

  3. Interaction Between Emotion and Memory: Importance of Mammillary Bodies Damage in a Mouse Model of the Alcoholic Korsakoff Syndrome

    PubMed Central

    Béracochéa, Daniel

    2005-01-01

    Chronic alcohol consumption (CAC) can lead to the Korsakoff syndrome (KS), a memory deficiency attributed to diencephalie damage and/or to medial temporal or cortical related dysfunction. The etiology of KS remains unclear. Most animal models of KS involve thiaminedeficient diets associated with pyrithiamine treatment. Here we present a mouse model of CAC-induced KS. We demonstrate that CAC-generated retrieval memory deficits in working/ episodic memory tasks, together with a reduction of fear reactivity, result from damage to the mammillary bodies (MB). Experimental lesions of MB in non-alcoholic mice produced the same memory and emotional impairments. Drugs having anxiogenic-like properties counteract such impairments produced by CAC or by MB lesions. We suggest (a) that MB are the essential components of a brain network underlying emotional processes, which would be critically important in the retrieval processes involved in working/ episodic memory tasks, and (b) that failure to maintain emotional arousal due to MB damage can be a main factor of CAC-induced memory deficits. Overall, our animal model fits well with general neuropsychological and anatomic impairments observed in KS. PMID:16444899

  4. Balanced Cortical Microcircuitry for Spatial Working Memory Based on Corrective Feedback Control

    PubMed Central

    2014-01-01

    A hallmark of working memory is the ability to maintain graded representations of both the spatial location and amplitude of a memorized stimulus. Previous work has identified a neural correlate of spatial working memory in the persistent maintenance of spatially specific patterns of neural activity. How such activity is maintained by neocortical circuits remains unknown. Traditional models of working memory maintain analog representations of either the spatial location or the amplitude of a stimulus, but not both. Furthermore, although most previous models require local excitation and lateral inhibition to maintain spatially localized persistent activity stably, the substrate for lateral inhibitory feedback pathways is unclear. Here, we suggest an alternative model for spatial working memory that is capable of maintaining analog representations of both the spatial location and amplitude of a stimulus, and that does not rely on long-range feedback inhibition. The model consists of a functionally columnar network of recurrently connected excitatory and inhibitory neural populations. When excitation and inhibition are balanced in strength but offset in time, drifts in activity trigger spatially specific negative feedback that corrects memory decay. The resulting networks can temporally integrate inputs at any spatial location, are robust against many commonly considered perturbations in network parameters, and, when implemented in a spiking model, generate irregular neural firing characteristic of that observed experimentally during persistent activity. This work suggests balanced excitatory–inhibitory memory circuits implementing corrective negative feedback as a substrate for spatial working memory. PMID:24828633

  5. Incorporation of memory effects in coarse-grained modeling via the Mori-Zwanzig formalism

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Zhen; Bian, Xin; Karniadakis, George Em, E-mail: george-karniadakis@brown.edu

    2015-12-28

    The Mori-Zwanzig formalism for coarse-graining a complex dynamical system typically introduces memory effects. The Markovian assumption of delta-correlated fluctuating forces is often employed to simplify the formulation of coarse-grained (CG) models and numerical implementations. However, when the time scales of a system are not clearly separated, the memory effects become strong and the Markovian assumption becomes inaccurate. To this end, we incorporate memory effects into CG modeling by preserving non-Markovian interactions between CG variables, and the memory kernel is evaluated directly from microscopic dynamics. For a specific example, molecular dynamics (MD) simulations of star polymer melts are performed while themore » corresponding CG system is defined by grouping many bonded atoms into single clusters. Then, the effective interactions between CG clusters as well as the memory kernel are obtained from the MD simulations. The constructed CG force field with a memory kernel leads to a non-Markovian dissipative particle dynamics (NM-DPD). Quantitative comparisons between the CG models with Markovian and non-Markovian approximations indicate that including the memory effects using NM-DPD yields similar results as the Markovian-based DPD if the system has clear time scale separation. However, for systems with small separation of time scales, NM-DPD can reproduce correct short-time properties that are related to how the system responds to high-frequency disturbances, which cannot be captured by the Markovian-based DPD model.« less

  6. Modeling Age-Related Differences in Immediate Memory Using SIMPLE

    ERIC Educational Resources Information Center

    Surprenant, Aimee M.; Neath, Ian; Brown, Gordon D. A.

    2006-01-01

    In the SIMPLE model (Scale Invariant Memory and Perceptual Learning), performance on memory tasks is determined by the locations of items in multidimensional space, and better performance is associated with having fewer close neighbors. Unlike most previous simulations with SIMPLE, the ones reported here used measured, rather than assumed,…

  7. Path Analysis Tests of Theoretical Models of Children's Memory Performance

    ERIC Educational Resources Information Center

    DeMarie, Darlene; Miller, Patricia H.; Ferron, John; Cunningham, Walter R.

    2004-01-01

    Path analysis was used to test theoretical models of relations among variables known to predict differences in children's memory--strategies, capacity, and metamemory. Children in kindergarten to fourth grade (chronological ages 5 to 11) performed different memory tasks. Several strategies (i.e., sorting, clustering, rehearsal, and self-testing)…

  8. Autobiographical Memory and Suicide Attempts in Schizophrenia

    ERIC Educational Resources Information Center

    Pettersen, Kenneth; Rydningen, Nora Nord; Christensen, Tore Buer; Walby, Fredrik A.

    2010-01-01

    According to the cry of pain model of suicidal behavior, an over-general autobiographical memory function is often found in suicide attempters. The model has received empirical support in several studies, mainly of depressed patients. The present study investigated whether deficits in autobiographical memory may be associated with an increased…

  9. An Approach toward the Development of a Functional Encoding Model of Short Term Memory during Reading.

    ERIC Educational Resources Information Center

    Herndon, Mary Anne

    1978-01-01

    In a model of the functioning of short term memory, the encoding of information for subsequent storage in long term memory is simulated. In the encoding process, semantically equivalent paragraphs are detected for recombination into a macro information unit. (HOD)

  10. Nonlinear analysis of an improved continuum model considering headway change with memory

    NASA Astrophysics Data System (ADS)

    Cheng, Rongjun; Wang, Jufeng; Ge, Hongxia; Li, Zhipeng

    2018-01-01

    Considering the effect of headway changes with memory, an improved continuum model of traffic flow is proposed in this paper. By means of linear stability theory, the new model’s linear stability with the effect of headway changes with memory is obtained. Through nonlinear analysis, the KdV-Burgers equation is derived to describe the propagating behavior of traffic density wave near the neutral stability line. Numerical simulation is carried out to study the improved traffic flow model, which explores how the headway changes with memory affected each car’s velocity, density and energy consumption. Numerical results show that when considering the effects of headway changes with memory, the traffic jams can be suppressed efficiently. Furthermore, research results demonstrate that the effect of headway changes with memory can avoid the disadvantage of historical information, which will improve the stability of traffic flow and minimize car energy consumption.

  11. Sparse distributed memory and related models

    NASA Technical Reports Server (NTRS)

    Kanerva, Pentti

    1992-01-01

    Described here is sparse distributed memory (SDM) as a neural-net associative memory. It is characterized by two weight matrices and by a large internal dimension - the number of hidden units is much larger than the number of input or output units. The first matrix, A, is fixed and possibly random, and the second matrix, C, is modifiable. The SDM is compared and contrasted to (1) computer memory, (2) correlation-matrix memory, (3) feet-forward artificial neural network, (4) cortex of the cerebellum, (5) Marr and Albus models of the cerebellum, and (6) Albus' cerebellar model arithmetic computer (CMAC). Several variations of the basic SDM design are discussed: the selected-coordinate and hyperplane designs of Jaeckel, the pseudorandom associative neural memory of Hassoun, and SDM with real-valued input variables by Prager and Fallside. SDM research conducted mainly at the Research Institute for Advanced Computer Science (RIACS) in 1986-1991 is highlighted.

  12. Positive and negative generation effects in source monitoring.

    PubMed

    Riefer, David M; Chien, Yuchin; Reimer, Jason F

    2007-10-01

    Research is mixed as to whether self-generation improves memory for the source of information. We propose the hypothesis that positive generation effects (better source memory for self-generated information) occur in reality-monitoring paradigms, while negative generation effects (better source memory for externally presented information) tend to occur in external source-monitoring paradigms. This hypothesis was tested in an experiment in which participants read or generated words, followed by a memory test for the source of each word (read or generated) and the word's colour. Meiser and Bröder's (2002) multinomial model for crossed source dimensions was used to analyse the data, showing that source memory for generation (reality monitoring) was superior for the generated words, while source memory for word colour (external source monitoring) was superior for the read words. The model also revealed the influence of strong response biases in the data, demonstrating the usefulness of formal modelling when examining generation effects in source monitoring.

  13. Competitive advantage for multiple-memory strategies in an artificial market

    NASA Astrophysics Data System (ADS)

    Mitman, Kurt E.; Choe, Sehyo C.; Johnson, Neil F.

    2005-05-01

    We consider a simple binary market model containing N competitive agents. The novel feature of our model is that it incorporates the tendency shown by traders to look for patterns in past price movements over multiple time scales, i.e. multiple memory-lengths. In the regime where these memory-lengths are all small, the average winnings per agent exceed those obtained for either (1) a pure population where all agents have equal memory-length, or (2) a mixed population comprising sub-populations of equal-memory agents with each sub-population having a different memory-length. Agents who consistently play strategies of a given memory-length, are found to win more on average -- switching between strategies with different memory lengths incurs an effective penalty, while switching between strategies of equal memory does not. Agents employing short-memory strategies can outperform agents using long-memory strategies, even in the regime where an equal-memory system would have favored the use of long-memory strategies. Using the many-body 'Crowd-Anticrowd' theory, we obtain analytic expressions which are in good agreement with the observed numerical results. In the context of financial markets, our results suggest that multiple-memory agents have a better chance of identifying price patterns of unknown length and hence will typically have higher winnings.

  14. Hemispheric encoding/retrieval asymmetry in episodic memory: positron emission tomography findings.

    PubMed Central

    Tulving, E; Kapur, S; Craik, F I; Moscovitch, M; Houle, S

    1994-01-01

    Data are reviewed from positron emission tomography studies of encoding and retrieval processes in episodic memory. These data suggest a hemispheric encoding/retrieval asymmetry model of prefrontal involvement in encoding and retrieval of episodic memory. According to this model, the left and right prefrontal lobes are part of an extensive neuronal network that subserves episodic remembering, but the two prefrontal hemispheres play different roles. Left prefrontal cortical regions are differentially more involved in retrieval of information from semantic memory and in simultaneously encoding novel aspects of the retrieved information into episodic memory. Right prefrontal cortical regions, on the other hand, are differentially more involved in episodic memory retrieval. PMID:8134342

  15. Recognition of simple visual images using a sparse distributed memory: Some implementations and experiments

    NASA Technical Reports Server (NTRS)

    Jaeckel, Louis A.

    1990-01-01

    Previously, a method was described of representing a class of simple visual images so that they could be used with a Sparse Distributed Memory (SDM). Herein, two possible implementations are described of a SDM, for which these images, suitably encoded, will serve both as addresses to the memory and as data to be stored in the memory. A key feature of both implementations is that a pattern that is represented as an unordered set with a variable number of members can be used as an address to the memory. In the 1st model, an image is encoded as a 9072 bit string to be used as a read or write address; the bit string may also be used as data to be stored in the memory. Another representation, in which an image is encoded as a 256 bit string, may be used with either model as data to be stored in the memory, but not as an address. In the 2nd model, an image is not represented as a vector of fixed length to be used as an address. Instead, a rule is given for determining which memory locations are to be activated in response to an encoded image. This activation rule treats the pieces of an image as an unordered set. With this model, the memory can be simulated, based on a method of computing the approximate result of a read operation.

  16. A process-model based approach to prospective memory impairment in Parkinson's disease.

    PubMed

    Kliegel, Matthias; Altgassen, Mareike; Hering, Alexandra; Rose, Nathan S

    2011-07-01

    The present review discusses the current state of research on the clinical neuropsychology of prospective memory in Parkinson's disease. To do so the paper is divided in two sections. In the first section, we briefly outline key features of the (partly implicit) rationale underlying the available literature on the clinical neuropsychology of prospective memory. Here, we present a conceptual model that guides our approach to the clinical neuropsychology of prospective memory in general and to the effects of Parkinson's disease on prospective memory in particular. In the second section, we use this model to guide our review of the available literature and suggest some open issues and future directions motivated by previous findings and the proposed conceptual model. The review suggests that certain phases of the prospective memory process (intention formation und initiation) are particularly impaired by Parkinson's disease. In addition, it is argued that prospective memory may be preserved when tasks involve specific features (e.g., focal cues) that reduce the need for strategic monitoring processes. In terms of suggestions for future directions, it is noted that intervention studies are needed which target the specific phases of the prospective memory process that are impaired in Parkinson's disease, such as planning interventions. Moreover, it is proposed that prospective memory deficits in Parkinson's disease should be explored in the context of a general impairment in the ability to form an intention and plan or coordinate an appropriate series of actions. Copyright © 2011 Elsevier Ltd. All rights reserved.

  17. Linking working memory and long-term memory: a computational model of the learning of new words.

    PubMed

    Jones, Gary; Gobet, Fernand; Pine, Julian M

    2007-11-01

    The nonword repetition (NWR) test has been shown to be a good predictor of children's vocabulary size. NWR performance has been explained using phonological working memory, which is seen as a critical component in the learning of new words. However, no detailed specification of the link between phonological working memory and long-term memory (LTM) has been proposed. In this paper, we present a computational model of children's vocabulary acquisition (EPAM-VOC) that specifies how phonological working memory and LTM interact. The model learns phoneme sequences, which are stored in LTM and mediate how much information can be held in working memory. The model's behaviour is compared with that of children in a new study of NWR, conducted in order to ensure the same nonword stimuli and methodology across ages. EPAM-VOC shows a pattern of results similar to that of children: performance is better for shorter nonwords and for wordlike nonwords, and performance improves with age. EPAM-VOC also simulates the superior performance for single consonant nonwords over clustered consonant nonwords found in previous NWR studies. EPAM-VOC provides a simple and elegant computational account of some of the key processes involved in the learning of new words: it specifies how phonological working memory and LTM interact; makes testable predictions; and suggests that developmental changes in NWR performance may reflect differences in the amount of information that has been encoded in LTM rather than developmental changes in working memory capacity.

  18. A model of attention-guided visual perception and recognition.

    PubMed

    Rybak, I A; Gusakova, V I; Golovan, A V; Podladchikova, L N; Shevtsova, N A

    1998-08-01

    A model of visual perception and recognition is described. The model contains: (i) a low-level subsystem which performs both a fovea-like transformation and detection of primary features (edges), and (ii) a high-level subsystem which includes separated 'what' (sensory memory) and 'where' (motor memory) structures. Image recognition occurs during the execution of a 'behavioral recognition program' formed during the primary viewing of the image. The recognition program contains both programmed attention window movements (stored in the motor memory) and predicted image fragments (stored in the sensory memory) for each consecutive fixation. The model shows the ability to recognize complex images (e.g. faces) invariantly with respect to shift, rotation and scale.

  19. Flexible Kernel Memory

    PubMed Central

    Nowicki, Dimitri; Siegelmann, Hava

    2010-01-01

    This paper introduces a new model of associative memory, capable of both binary and continuous-valued inputs. Based on kernel theory, the memory model is on one hand a generalization of Radial Basis Function networks and, on the other, is in feature space, analogous to a Hopfield network. Attractors can be added, deleted, and updated on-line simply, without harming existing memories, and the number of attractors is independent of input dimension. Input vectors do not have to adhere to a fixed or bounded dimensionality; they can increase and decrease it without relearning previous memories. A memory consolidation process enables the network to generalize concepts and form clusters of input data, which outperforms many unsupervised clustering techniques; this process is demonstrated on handwritten digits from MNIST. Another process, reminiscent of memory reconsolidation is introduced, in which existing memories are refreshed and tuned with new inputs; this process is demonstrated on series of morphed faces. PMID:20552013

  20. Not All Order Memory Is Equal: Test Demands Reveal Dissociations in Memory for Sequence Information

    ERIC Educational Resources Information Center

    Jonker, Tanya R.; MacLeod, Colin M.

    2017-01-01

    Remembering the order of a sequence of events is a fundamental feature of episodic memory. Indeed, a number of formal models represent temporal context as part of the memory system, and memory for order has been researched extensively. Yet, the nature of the code(s) underlying sequence memory is still relatively unknown. Across 4 experiments that…

Top